From c982128c9a6c549d9f8ca6ed99e9d32061383523 Mon Sep 17 00:00:00 2001 From: Brett <33626252+brettory@users.noreply.github.com> Date: Wed, 21 Feb 2018 11:03:53 +0100 Subject: [PATCH] Upload data and map files --- TM_WORLD_BORDERS_SIMPL-0.3/Readme.txt | 70 + .../TM_WORLD_BORDERS_SIMPL-0.3.dbf | Bin 0 -> 24740 bytes .../TM_WORLD_BORDERS_SIMPL-0.3.prj | 1 + .../TM_WORLD_BORDERS_SIMPL-0.3.shp | Bin 0 -> 448188 bytes .../TM_WORLD_BORDERS_SIMPL-0.3.shx | Bin 0 -> 2068 bytes data/conversionRates.csv | 87 + data/multipleChoiceResponses.csv | 16717 ++++++++++++++++ 7 files changed, 16875 insertions(+) create mode 100644 TM_WORLD_BORDERS_SIMPL-0.3/Readme.txt create mode 100644 TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.dbf create mode 100644 TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.prj create mode 100644 TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.shp create mode 100644 TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.shx create mode 100644 data/conversionRates.csv create mode 100644 data/multipleChoiceResponses.csv diff --git a/TM_WORLD_BORDERS_SIMPL-0.3/Readme.txt b/TM_WORLD_BORDERS_SIMPL-0.3/Readme.txt new file mode 100644 index 0000000..a87018f --- /dev/null +++ b/TM_WORLD_BORDERS_SIMPL-0.3/Readme.txt @@ -0,0 +1,70 @@ +TM_WORLD_BORDERS-0.1.ZIP + +Provided by Bjorn Sandvik, thematicmapping.org + +Use this dataset with care, as several of the borders are disputed. + +The original shapefile (world_borders.zip, 3.2 MB) was downloaded from the Mapping Hacks website: +http://www.mappinghacks.com/data/ + +The dataset was derived by Schuyler Erle from public domain sources. +Sean Gilles did some clean up and made some enhancements. + + +COLUMN TYPE DESCRIPTION + +Shape Polygon Country/area border as polygon(s) +FIPS String(2) FIPS 10-4 Country Code +ISO2 String(2) ISO 3166-1 Alpha-2 Country Code +ISO3 String(3) ISO 3166-1 Alpha-3 Country Code +UN Short Integer(3) ISO 3166-1 Numeric-3 Country Code +NAME String(50) Name of country/area +AREA Long Integer(7) Land area, FAO Statistics (2002) +POP2005 Double(10,0) Population, World Polulation Prospects (2005) +REGION Short Integer(3) Macro geographical (continental region), UN Statistics +SUBREGION Short Integer(3) Geogrpahical sub-region, UN Statistics +LON FLOAT (7,3) Longitude +LAT FLOAT (6,3) Latitude + + +CHANGELOG VERSION 0.3 - 30 July 2008 + +- Corrected spelling mistake (United Arab Emirates) +- Corrected population number for Japan +- Adjusted long/lat values for India, Italy and United Kingdom + + +CHANGELOG VERSION 0.2 - 1 April 2008 + +- Made new ZIP archieves. No change in dataset. + + +CHANGELOG VERSION 0.1 - 13 March 2008 + +- Polygons representing each country were merged into one feature +- Åland Islands was extracted from Finland +- Hong Kong was extracted from China +- Holy See (Vatican City) was added +- Gaza Strip and West Bank was merged into "Occupied Palestinean Territory" +- Saint-Barthelemy was extracted from Netherlands Antilles +- Saint-Martin (Frensh part) was extracted from Guadeloupe +- Svalbard and Jan Mayen was merged into "Svalbard and Jan Mayen Islands" +- Timor-Leste was extracted from Indonesia +- Juan De Nova Island was merged with "French Southern & Antarctic Land" +- Baker Island, Howland Island, Jarvis Island, Johnston Atoll, Midway Islands + and Wake Island was merged into "United States Minor Outlying Islands" +- Glorioso Islands, Parcel Islands, Spartly Islands was removed + (almost uninhabited and missing ISO-3611-1 code) + +- Added ISO-3166-1 codes (alpha-2, alpha-3, numeric-3). Source: + https://www.cia.gov/library/publications/the-world-factbook/appendix/appendix-d.html + http://unstats.un.org/unsd/methods/m49/m49alpha.htm + http://www.fysh.org/~katie/development/geography.txt +- AREA column has been replaced with data from UNdata: + Land area, 1000 hectares, 2002, FAO Statistics +- POPULATION column (POP2005) has been replaced with data from UNdata: + Population, 2005, Medium variant, World Population Prospects: The 2006 Revision +- Added region and sub-region codes from UN Statistics Division. Source: + http://unstats.un.org/unsd/methods/m49/m49regin.htm +- Added LAT, LONG values for each country + diff --git a/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.dbf b/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.dbf new file mode 100644 index 0000000000000000000000000000000000000000..64ea895768fbc285b261771d82705d3af6aec7e7 GIT binary patch literal 24740 zcmbW9+jbh;lBQR8UF;q`M!i5^^q6j&xgz8AV81_CP7-#U0PzvY-8|7!Y=!+ zAFf`huKW9E0TYegvz-a}(W3sVYn@?ry*VU%k zD0tjBP#3-%`myi2p<|dX<2fxSN=)Fk+<;N!v{`OKH`~2iHt!y{3 zF^bR+Og5qQC&ml1&Et06Xen60Ad2b2jhr-Oj_g{*ypCI85L~!k%S$7Zy~@h0%=uLI zj@7SYF?CJSWj8 zLgc}ZT5ghFctIR56}O?uH^DkH}!e`TDrru6(WeP-(oYHcDJ3k^M#3=_Hy^`Or`PF z14x>++>j$J%D|S-Wx8?hhX_MkLDz(iH=s)mDb9mh_T@A)3`d>BQXFhc=? z(2BgowEK*s-!-Art~QVB>S6ix=BtPN=?m9K=8%IBE#$f>> zK3MKPFCVubek~-STPu*%4W)JuI*NP8NyOwM9&|BoPui2QiIVnqz538ZDwrTA34El% zq%OJ&HO@0z(PI%}q&#ZH{IZ?vnz>HU-tXQukZe0?QOE>Id8bkAp$D}VyiO4atp%1z zYf-eX+P%V~3-oML!E;>)5o3LW)M4eV8r0f&t$-PFkwAKCPs?_Ry-1yMyWYOrZp`p$ z&3c@x_Dp+$Wb|1Jr@dc4Rt=4}CQ1q?au3db#=DqC=*SB}q&Nl+(!y>{qAS zROjKEQ{SnR`g&^L^I|7-Vyi&dtWF9p@D19~K{7k-(O@*lT{r4f?_O^o8VW`nh3qI7 zyb;pHaTJbhH_OgMp{hE)@n}5C+#o1-D>JS(ZyU)*6lg8n6d{iz|D>Z>X9rysv!guI zx$N{ilfD}#oh|s_GzaJX-~-Ss0s$W+cA-&BQj4t=1#$t_K{Dx}+nIFT#OW^IZSQy0 z^XlHrmhbnk*Zi>k!#uxPnjiJoZkImC85VZhC$`}pd&BIM)3Yb7!E;0FI&JGVLLs#AS>#~d|Mv^>8T+3^IR8rMoU z8a4rvr^1E+PDC$Pn;|+#_j0#In>V-u_NGP=$BZJ-s?>4;US<&txD!3n>CU_JP8UGh zc~d)P@F2p>f@#sTonn-CCOJ5(gF)Ig27{g_Wp3g$PFahC zTm&qFA78i_)(}lQQ5%ETJFvCp0h4P-1wk5w@yR5`m?Xs7_^p%;*C{)*&TQx= z=qN|>*?*i~|vOj81=A@{8 z(p%(t>>Rp4*GdwOItsbKDK-eXx>%^S-PG&sUpIMv+cCx`k-gBmrzk{5A-_8oRI?xl z&0;#&wCT0$d!6e2(p)ch4~^t6aO`rUK@zNPU6DwCHz$TdRWvr@h*UB!n7S^yK_5r)2Q?5J;12_`fC^FOh@Rx4JL%@#Jo8+yyM4FXG?NP5JU%#LOnwb= zY9>eKj^z$`yE~(kNsfba>|*m`x^;`6Uu=I1{z1tRu=%)YK_F~iG_5>>UPo<#!_>Rq zv(_3*rzd;Bq!U}`;9`4{+l0RnO1VlU{TuXdZz#p9><&)?kQS1;2v6gd=o zp#PXIv=+P;JOD4_ccZ&ZZ#e7?XJbG1hP$Ou|F7=)_4`qRS(5@mOW{EZHZ!UVxGqp4 z03AzWdZYfUe(?%K)cfbYdfo!f*QUQ;ZI)+g6E6{tqK_m%m)KDt7wZM zdga{n{od2_wlN2i>BgOOSgKxGUYFx*&+~d~n5gxKX1fZs%#BSH;U`HF4)!8xB@wTDXsO8b z=4u0^wDN#A(n}FsdtTipm?aRqS)|n^(3T)IoTg>6{EFaaV2OdB1v}ySU zlLM&Lj^Z_567GPAP6lUscO6;}(yH@$dH<$~WNS5W7wrzh>cvsL+<<3652QXor=uk= z`~A!Q;?fI)%iVHAA8c4WGX42E7c`gMv{+5VTWcZdMj-H!Ej&QiUk=p@sLIRLrm^Vx z>=GDc#=mp;8>L1M%xkd^1!9-~>Q0KxN*0slWr!gdI z+)5bN<%l*OLBm{D>$i3oM^yXwoIQ-s@$~Py$@eeB&2}NZgx)?*6k90`E<&UmPOaio@?nWjwtO(0wxRdpAYlmT2@ni#_8WYve`guQZ?VCy-I@RhZHv$wN&>%QUa)DEf z0tp~d;aCX6wBf7z)$3-?g`J-`(sJWZI?=%-Nfd$PvU>1Q_%;2$%-+O{lKyhLdu%!k zHjXt(LF{8Nz_oRV#0P;O-5w?>GW~&C<|RoVfAwss=E-$IugJ>rJ@i=?eh}c4(MAQM zURr<#)4%F>`?Id^Bz+V}Q!0nMij#j1;Bq5tl4|LO1CB+2$?rxkW@;VZaWqj)MbGt;R30m? z48ah1-$O)2`1@*}n=MvfQ_{#Vx^s{{B;A!NBujvbkbE zg$@2fnPV>ia0t0)2H9Xfm=As58&s=?&IPwd^rUH1;&&hfyJ_W4ilDF@h`61>WKax> zF5*AfZXWi#W?>i?(UJloZ~#qcyRd#Yh5}j=AR?$A2db8m)0SSDjSh$?{B#Ut% z?(mXAOcj2&EY@(QR`^jc#ITyb3x2%&nyURV03jwHiI2i+;IF>Wk0J3}R6B7!8CIrg zJS=B^91Wj#)v{Uqf=%n%04Tv+AgdToYm$7n5^B(A6u^VwTrKkxXZT#L8!k5BNn4eT zm54l#PCE*knHf_2LKAfj$2q+A#eqC8&BMP9KWtaKbAbqsJ@)_;h6!})i`5H3Ac!O& z2jnxn8@?LOUineNTKrR6yvY?18o_`nq7*{29489mP7xlc5QiZ6RdzMKnihWQT~+TJ z@XYmtuuIV2DB6UmTL1~keR+)*S9$|sZ^Y4D<+LpKljMpZZexA|Z7x0v00TC+;Qv5* zVI7!1d}F$psLcY$x!Ud?8V@7k)^tIDHE?X`u-;J-957^&A|<{V_0)>M2}aA!r-q^j z9Rzj)g`lc1;x8U6s9zvjNg9qB_0{eOh>t$)9zT6QbCBsE;kbg^dPpR zR1@fEGb%>Y>1-4@$!Lq${U7FwV9a#6eZOA*+tZ5!_^l5B4=#rb^g#dw&3;&JkslI0 zd_By@Xf_&-hBHjcXti6tMxEDt#Q5R%SL}_9bh~2bGDWXF*t$e-FVstaQnIko${YOmecW?CLUlK{PCQ0H) z8p07alUW2_hBFHIHoBu_cVLoH^|^ZcrqObH5%5V0goGTCUdQE*=mK>TC_l-~l_zAyx@bp*YZZFrJU+3slT__55bv zG?fgeS6%o4E^k~T5EKfue1fkRifxI^csQPo+cVPE#;ey&GwyhN>uu8o8i<6hqay|+ z7dr)XcQqnFjnac93?}+rj?s9E*ZzLN5;r_1t0ai z+&n!mnc~; z2WgZmF`!X49J&dbJx;(_3B!fS;WLfUa?f!8Kw*Byxxy!5j<-@m>H?*a6 z6n52*a4}vyR_uoiD~zGZ$7;hMilFJ5T0GYyD@tAxX(V!tPojtB0GU8}!Hk%EV)I0? zPB}r``}<~T8UaX_iGtpM(LvY_@rw)GBlI80oQ-AjX}-u8B~qGKdx-f}V@0TmBHzY! zWnz&FHgQOrXyRXx^9rG7@|oHk>sxoxNDpEyLh=2WP*Pc>*6E`5yK#Y)R;!HOd`=tF z1*2FKDQXBa%r00a*#LE-vPAUJtY!B{2~yvc9X${=L9*VQMu<|czs_O})puaVe#oiI}) z5LZ+f36@};{Ejxc0~qGp&thsDslix*FuHI^;*1mutv9U?@8~faR-Fx(5A&P+W*CHM z+I97@seGs4ZN!31G9k577uaqmxeyhM0x3u=Dmr73Lb3d4I@S8mdgx~}hb59>W!eFI zjG_)MkoFS+U3c9s&!9Epoq8Do~ zlmNmjMb=3Izs`oVD00Y_->=q<*hf1q+Xc51(|GIxhnr~wC5VD<_teH1Edb}^YJXnN zaWVDO7B5OAAM3uq%m0_rE|py~obtwWU5FfS6{XX=-V z-{lQ0(_s|EyHEbb1?&M)OZYpIo6LBj_B(cPy#LQ7>8|@-!!2dIAXP0QZ%_;;nhM9l z!Piry7_X9aJ77)+v;q{X_P^LY-+TM8+?0*TDXurI%pE2PURzMDP*&e0NId{&)zIpF zwHCft?abS<(F-jiPW_tBD4kIm!1pRKI`1&53A)c>el zNPG;dVyx7N-BMpQrLeE&`tAWh`r0tF%*by?0Yv-u@3=#Ustk`ialhH|A&BQxV47EFAn3lIqI#svz* zCp~q5KRL*|z8?&h)$ZXq2;D!w?;`N)Cz3sw%rL4lPrLJpV~xk!-G*lMkMtWW3?{KUa{0>m7@C5-qGWJbCCumWG>7ce-X7; z9tZd5D%a18-=SZdBiAGdTn=+broAE!N~)OKy;G4VTzgBf&fGTd(d#P?ekc_8yzPpJD|ZR zq6^1nv!FnVvYeHR8RIJ9CpIMIHyHw6-Cf+1p?{0}t*&tOEH`UgN;kU*R z)<=`1sJ$Ov063MXJ*>1+;OoZ^V>9~%m|O`1UN&~b1S zbW{5m5Zbt9l#qj^&2&2LPm4aB`m}nFOla$#pE`rYiN*$$MeJJXR-63@URL5Io{aL$ zri=bfzi3CtoScjHih{ti6$zEHQXrfbKxzQ1VwJfloCqRHIEuA+1&jhnAtJA;=1!Q> zmaGB+G{emwk-2iO;B^Qok#}WmYIARaGM+8>n^n_@7`LgrfbUXHDx*Nt65%0eA|?k? z!OSLV3xWf)E&O9ctEV~@B?f>lsF|&OJvOb%(;@T#=!xPNsw(JaLCPpziuCgB`hl>I z-TOcN0&^#;N3sn0TrU-%q?(|+6onx*XEvPiv7%r1K3cgHW<7^{jMo zC~Slq6f+@pqI9^#%wA1r)7cQ}d%9olp0}zRYT_oT^p8x)a2SpZzY004n&t|lGetxp0>c#TI>WqcQ=IDa&g6=r31%9MhXkasjwd|Fh z^0pIEe7*d1|7N+y`L=&^?ib}{bHO@`;XH{RAvSV3Vbx+&UUK3{b{V_WcjLPpD#Xsb zztDlISE*9R1gS2UE|3!}DsNqoYJw3l#EHvsIl(NBmrvX0H%&Gy55kEo=vO8PyGOb8 zu?u!JOr%3u<13oYaK^J!EXV;L5L)+3>`K$A;7#+(eSyhfTsh3-$P=+?YM;j8ZVw(D8 ze@d2kiNjUBCo$)|n;$=ArCWuKU|K{^=N`ZeE4aYmVE$cGPSkQjd?ogxSrk_Z6fm1~ z2FM}M!``apf^fae9bOW3{-&JvkiIF$7L@OspJ@V(rD}8v>q&0?g5jj<5L^HPAxqqp z^D#xm(DLXgDUk90{%jRx=ZacbT?APbRXr$kaFi5?5-CuFi?UmG3xdWaN18MY?7{0> z|AKryl5-AK>P#16Et*+fpjFBn=IN#vhhF)SqKk%okn=>*Fq2Nr?2-9r;*``(=tP7z zVp86!&3Kyfli2t=z2RiX?Iis83sMcuh;S#!Hs_iglsVN^l!V&aNr~ParSJnr5tScT z&m6wg3_BC?637AiljiXfzkpR^(;|9cD??p0*CbAyIj-L7f$C`C^ZE95b=DCDA&<#0 zz|E==J1#e61t=DAAUJ^JPSY|{Yv*sORkI-=)ubk3^#B(XcpTE*L;XI`-`3jez$o%-q0<2Ab(uG5MWgv>OjhT7`jvjOvI z57cr23{fWP;p8Qk#3)Xhfm{Qs&5QYDJ}KfjniEE9WGw1VY_>f2hCX=d0xCGcUZ~Q; zGju@G|Ab1;`Sz^!Jw3hsfgBAX5F$quF4VSLqavFHdm$I0x9;SNe~al~*bkJ8OrfN*W*E~B`n zEw1B~QGBS@%{P>h=xHG2lGQs3h(>S$DWsg0kcOH6PL{?fCu2^d(6MJt+GF39vm3XlLjO>Hm^3Hf9g3wrOEx`eq^=2WV0yG9571+ zQ3{t@8(IdJv3P8b%&Ey#@)Ch)5-PP#Ys0GJujDKv7vx_qMpTKlXO#UMMlk{2Lh^6QEpwK%olhUAF4RH)LQH*9EXSxLy$tE1DJ5vC3>!d`U7*U(2 zS6?p+%1+je=k4 zyAoxivjYMl3gdI&7P$w&tQ&CiO_{Sbnw=_n++3cmB9Y4qV|405K{p~|I!4MfH+SV- zd2>fDcllB0s%a1nE;kk*%TiDrc%UDdDAXVaMqE%Mq4OEZh3nn+7)5-$#O5`)yFN-P z1}RHy9o02b&VQs?aKy-@0C4KS7U&7`;o8`gyT4GjOtrqGo9)C&uHx5flf>gwZezJr zmg;VJcXNje<3@L@cdx709~&7AY@47BYpWeaXNNBANg6hh3^h3Hs!@jK5L>N_)X$$R zY^h7iy3*|c=a{5}Q>?9|-~uimp~F6Wc7HrYPf}$1uv~BV=Rm-hr_iy8TsSat?T-qC zleYGNy>@2kqDI0~d2 zpirm7MF|82Hk>ZtUdr|~FQ+-AdTqO>Vx?*~CRgI?`j_E`!$c7QsjVJ3iXs?{m1uc( zfhh-S1&5c_T@03MO74%dd(QV$7m8CJa1`FKivl6ggI7tyKW#?MU%jpf< zt&`AhhP^?r98j%$BZ*+omYh@DBz@3!K}8UyO?ILHIoKj);JX$_wd+ccglmgG|k= z;8>IF5>CC|wCpLe?wa)SMHk2A1~#kUQH4D~+j*Npw{Z?qsJq~3Wzh0^`R>!X-TbHv ztcOa)IK5%SaUC2{<-2{tRhy@#&izqG>48oY#p#$f6BW*goqW|(qYLk9x%&$p{HP0M z1>@_}!vL{*__6=z{{tN#_P785 literal 0 HcmV?d00001 diff --git a/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.prj b/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.prj new file mode 100644 index 0000000..f45cbad --- /dev/null +++ b/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.prj @@ -0,0 +1 @@ +GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]] \ No newline at end of file diff --git a/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.shp b/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.shp new file mode 100644 index 0000000000000000000000000000000000000000..63e155c4d8149962951b8035de3778286ca8b237 GIT binary patch literal 448188 zcmb@v2Urxz7KYv6sE9dZMnRHDmLx&}0}P4;17Jo)%!;l7mC)d*m@%vo6~ij#>>8oK zQ88nVs~DFxVvgW{>YTy7x4nOS{&}9Wv#;M*r>d)}tHaDRl4NNt{jXPr;eCrMNK(ae zO^)OEzjCB_S4nalgLh5J<8I>p|L$b~hMpo%HDJc4P@#y;65Q3FW$RyUH$F|C_JeuUT9t zt@~f0d`|tMBj;B-vh1_g297U7`H1sRD*6BJ$ad{2O6ITstzYSfCP0%6WYKohpaQnK zdrRKC!}65(Uz`}<5X$O)v1B#_Y9FsL+mDHjYzu*}GeuUD_E%;zP8Ry5O6@})z z&FiwXR8uy6tRG`m{Wrc0tIr}ut_VMtwY9~{$4eXi%q96p{yEvZC2Pf782<%&_?^!=x_0jI*KH5uckeJB!is@;8h*h=1Q>%KFA9 z_^Q>wV(}*D4-8SBQsOhto!OV5aiQ6SJR~F$Z|aeC zz8QP}N$1)9whPN@`S|D$pN7nK_*H+GvCfR$y!P1oTP&)ulgr(T7+FiG=Za=jgIeXgD_V~36bfA228KJeD3 z>FAn%tZ_Rg7fiGs?@^a;GdNnYj^x3zjJ5>s+}N1M<~^ubyQL?KS$-g85RAR;)&A3Z zb!^X=m{-92hQCbOvZXyMpBvkTY%5*0--{J>ubzK5xheM&{k6UWlS3RU`&u^O-A;7- zTIKH!tZ4R-_8WK9<%#Q_=hsa0X42hv--lMH&J(3Zo)bQMGkLwop0~4W@Nq)l;{yXW zJMZ|_O5CpAKgpq08Ixx_?wa@LZEly~RkB45#-zuq{2SSnki;>7EfPnkvMsp~vo zzAERhB*AK6Fhsu|6z(1v$Y7oqX1V+0`w#nIe82ow&4kAqLcA(NRE4MsQ5OPU;ag^5 z+3>K3ST_7aF>#-hl&i2o5CdP~mf@r?kWsG{vY+(YLy$9a3W>|xGVC__75R5<%|n0u zq0VZN*P%l`u|s#{!XBrIKZGMI{j~mG8a^*`azO`D-mlE;*2btn|m9orZeh zR4wZ9L8-{K+Uwbzwl6+rHt7m~EwfS)tB2*BepB!>oek|$BD0X zBfoxZ!ZTVThbQEqUZ}21ymt(-ek{Yc7$VntHw^nTqq-gGp}VPnE0ogs)Vzyat4#oM zc-?s75aJA<0^~yfbGW@34yDK$J`$C`*Oc13XA$+!aN3^2*J;ERH=>@=a4Yqf^AT$A z7C$UsD{eY+VPQ1Y|8W!Ldt^fT=^MmXJg7b~H=&=D{izatR}ubG&@DaaVX!mnn{une zq?`UM;f=L)@|z1w{=?S4X;=WR?_wT{S9T5g;MUR3Y+lvMetXi?tk!SuKObA}%r2)~ z|Ni-jniUR9?O@T@nYG$_{>RoAYF4!Wqj&c|oPXxZvTJloQ^$q%{VnLn*`NSs(!=T0 z+a)e6?&}Nrd5M}ui~Ty;g;3IZpkvZDraQN9T@IvzVN$ zX{v25%+t(Hnp+LpQ(>>)0z=5J-59QO(M?wM?wX?I52xE=_i3 zU#~sNxn`?o%I$+q7G3mF{0CkcUk?W`crV{aYw5?lP>h-A&?XS#RleUhB&?+vYvo)} z&Xw}K<#S0n*9w(ug<>sHt~Ec`E&W=NF~*kWT8#=cm#nVhc4a)Rj-26BgdBdbFR^~R zGse#&zhbT`x9^vIU9oS+EBAfH+OSKpwm2x(9_3o2^efjk<=UrQ>y-XVu;wq;k2yCT zXvtt0nUl(i!@H?}MR&n+*Lk%$92_Kj2=Nl*Z;AgWv5_*Ck$UDiaQLU6IU3I-J~sz- z+3*6g($5Z-wP8?B{u!~ZUwSw?KjAB%-NoRPQvosHOo=>vmR-Pmv1KC0_RHb??1<6+%qTSf-5ANklx_SeRd4WuN>oqi9pz%xIjm*BPtTA8CE!-ZS z|2vIe_unni-(U{1w74(!2dnZK{nB|`>`xY3fbEg|PU86S_#`WlPaYC$BFeKRy-=4Q zjKJ|%&JUi}mh`xdbcZs8VKgMMq++AIiY<0s}uLCN&fru(2sLm z_|Mo;exG#AudJ`K@Kl8V?EBVT9^G>gH*u+THg2OWFNE_>54%C!!Q|Pl{O-2ssRuss z68no0L^X(75Y?k9v)&Pdc=QXq@Gi@2dA-c9>@9G~rLxM=i)?wFvG;xi0h>5p?AUXG zEm!)Td)!Tl?KX(VKWm|OT5QXeZK=@YM(%ScFSVL=x=Fe%SNhA_I^^Lzw=dIdxzFt% z(r|}CTn^vx&10@Dzpy!0vT_{6SxrlaRTKW*wsL>PmOZ%2We`6ot^xYn@=z-^UkjY@ zYN30Zp0?cJYtoOW&d~mmkHWe|LSHESlf2scl(ZYf)4M*I5;y?r+h`^?as9XN4!o%K z!vXYybju^ld)RWNpSn}EPK-n0^{^j326f@rD!pQ>rg!49IFJ9kBM-kZ#G~+3I}Qt+ zlmQ`L<>OEr{=Ys%BZ$x6b5h<}L~vQ0Uytp?7d~sBYh@k5&%cT?$Qjd#->w$#d%!A! zPkK1^sNvvFT`|!LT;r++zCH3Z~;a@$9QFQX?J0xzohB zv-JRW z#1r@ZFu1M>=T*e@MAuH72e0ti7#Yqd98NWSJ)sjn>TlvRH#7Wae_VauAaysi|BA^t zzv9?Vyzx5Yl*G5Zukq(vF~_QfalC$+?ayy! z*&i!~>wAl?=X?tn04w~e;!mY3e}r-e^$fls-JkEvZ7uz%6~-f*UidyN&7bmo#X4`oY3spl#de+!;U&-9le6b|@MZ3A zS#0OdJbGB2qb(!abJIC(RUY>`@$Un7PE3h!&oghdH5}NZ6XlEd5@RJiJBC zq=_m1-1KAL#7hogyp`zJIDh{A@qwiIhGBfqWbPHa3;b+IheWe5Zt=Y+@!1xC90;hJKfIO%CEqzSKy5I3}Eb z-+kkV&$1w1Ci-DwI8PAc`5=fVi|w8dy2Tyd>{Sqtztys=|JZO|VDr`A`CAYlWnLw} zKGbL9J~g+WR>N)E>wHFhf%eZ1GjJ%=(6}*J@3I<9ona8n4~N{z-P16Fr#gTBcA`!& zmq)MgZtoSrKMwg(VVh$x4-xaWX9T|~`g3M5-y!brb&B8!8}^bTOU|&f3z(9crlpYePfXu&@_UZHMIHB)1f2h(f9e| zln9zPKZ3b~lR<9a*9aaa<|puunH6)^HR{4sBWii&U7~>t(*GVbPeX= z!EJKZcI?7armXfE857LQk55&^Jkn{X9p+v54>7)R8a_hw7nHXXXucVB;Z|b5PYvL5=BFQH1Sb`zeKil@HsUx3ew+{00(gQro%8^7c*dWniTl5|B6y+D$NTf6VNF@?z6gFn%x8^1e+!4{J7=Li?_pjJ z_v5$EbSkr48NoYFYM(!}wjcMqnk04E5Wy`vcl7;pjW0KZW4<&lf}gQGbljr3FK1%^ z923E_#k#+X@nq5e(_wqX`m}`cN|rFM*Fpb^42y2M85RMS-AUq&C zK=?ugKmOeGrXbjODq7{S%ge`;vgbRc_gck&ZP(x@SLLkB+xLfQcwi)Sg=@ z$nxM-$U3WG#QjDfZ|L5Kxc)`bJ$#WF&qY>e1d;x5GwDz5iI+_#eSrlse`HF$*^%_> zw&cHFA6Z(GiS5&lX+XMde`3Y<>ozo_e3N^VzK~eX3q{sVH6^`r4`QW%$aKnIJp^@i z)2YOfYV!Y0`Ys=atX256ZB5AEkIHL5=-Zn@e)T)OKE#E}EB!5+V|nSAC)K~UG3Arp zkX5my%NdSTUg6jFpz)Pr;d)kr{lQ~CmS>mTDc@NdPkyx>l~?+=(|myOAl*F?%k%Ui zf00l1CLLMM`i87i_;r7j^=IId>(#*WDzhKR(!kT`S07YidA9Q-l~?$+;k3W;m${^0 z*YCe^Imp_IdcDUXVufG3oc!?l8I?EHbM{eU6PmwL$3LmO!ms|?joLST0P5;d5yO}G4ZP!#5oJ7yuzmm*$JkN%Er;8S@~Wb? z$ZYyBVlPL%POS86gUBz-KIm7cQ$99Vg{)gix^%xcveI8LpfheCuWs%s>ep20fXv39 z_7GftZYkGKLRR{<%}lAi#!;xN`ZP!8`2oo4{1(*yGoi>zztI6T)qlP{>Q-<4ss64( z$U{;Ck>y7w$V$J_8x87`=Sz3dUZdVY$oy3q@rYn#d44{z!mmzfMeQAP4)d$qQv0RX z^N1^0Q~ST~!SYJK&hy0p2KDmN2W}kRb&H-2VDQexeJOZXO(13koOFd>y{32o1D|~P z6zS`Wh$F`#t5cpLvq3Y66@JyI`>6ApPN-|c?)?|5L-qPB3)Gc<^~^shf9DadT!Qsu z5;3=nB91;t`7HY&|KdM#5q0TePtq6Aq_gZqWTjv2l12XH2-J0pRwK*HI#NEjHOSHqOUkG4tKP06zsi>U?wRE8 z*^>O#s5~E24O!_|*VmtK6;&lYb{^$R|Ln{q7>`k;%NYigPvO_oedT)v2`^s6%_k{+BwdXnBBM4!#7 z3WuW3;wGT3^y_-mCwMSt|LHiKPpj$p5qv zm46dLtnh0sTA?m&&(`~!A@i$LpQ@259gvPe^*7yXIhu? zjcrf5!msvjLjK)q)Kza9BeU_1kk!qaQu#88Sm8H1ZP|}OyLgQ&cA|ah@hy?%Z;y~Q zgPS2sMMsdc_qIZ2U#1}|%NGPUp?pDW$v?Y3mCsyC{DXAq&lF^(zu41-{IBMtZe{FD z`HoIT*2K9IFG@yM`qk!@=zM5F4C>mg73h3uo-j5G3UBiUNx#J)j#by`qhK={QEItgVu>aPYNs2hz4K$cZr$i>bbk=e*1ThX6Nf5C-~)W6fuv=Msx@hr-> z?h_NOzW zh!y#C9#v@nnmCd2#Z(}^xsK}VqrV=C?1ub{|B8z0%l&L4woh$ak@!_j@>^3r`O!ht zwMzfHIW)daG*(#t(JUH|lD5dYd!)0k=ADSJ_Pvw(qlfT0~WO-BzWR(q-=P^~Oyuz=o zNc}In)FFRuegDm+?bV$pU2=MFDf&n0SMPa5^^LxVx@y`3WZvR2vbx?wWa-2eVufF8 zN!QCP>9Rh*p8M=T)?Fu-Y*v!4^xviRUHW4_`KQwQFF#s_tS+VXUamNW$}9ZZArj6P z*)<+@)xk1qZ>PRg{*VFb&m*Y3!moX~0R8OQSk!f6GN}BvVB$rKi7#}b@=CvE8|rd? zck&xB?^jY3_SPy2BZ>96OZu8yPe;oiZNZ~8*i*|e%u zUg6hS>hJe_C?0yLkh`Xbfh(xVE;MtlIGJgPiw)t%3p}H zuHgEpj;u9+!M}5R_|6QxtD4vhVDJw66R!8_XST>vH#&dQ9`z=6e1hk5IukEsR;L!7 ze-8C0-C!rOx-%nwlD1+(hbNXebq+dhQzF!1zn$gy-fb9ONe8yBC{&d#QY92 z4@#x$Pp>z`R~oo;3D*CIABZhW>H4*_6j{o3C;f8kfed`S#yh+o*Vb=|ES0ov&*9(I zDi>sV%N8%ey6vR%4@P*uMg6W8>6@ML{*5Z>)c^)Q`F1wl4|;N&c-btxKc#X$fh@&- zM%G#9lD;bbGDFbMZZ1UDhPOqQ?gbEE zb)58um>#QbbV%wAJ_2EU>CS#K2Tx-sXF*-pBj zs@3cveQ-M}{~?3)DiZl?PA0w2M$D)4C5HAQZ(=Y9H==%{e1>-9|6?MuI*ypf(EU#B5O37^%2PBya@~n5`V$uuGvgHEgBFy} z{~GbFQe2PZzI6XpovuF~MWj-D2GH@Ap9>*&FCl-^Flx{4>&Vh;_}m)CQN#dM9tv{f>~HAAqb*$U$b);ipefUiHsj;w9C{-(x$n9NL8Z6P8i=q3`H^N&7%q^0U3o;+}Wza76 z_8oDAgv_6OL{2|Hr4Q;&GO$04HZ?<*o5;uoQ+aQpOWRr^Yl3o-<%juHU%(C0jV>V< zr@cgG^Vbn~d5z4|{-X9j%P0L*syBzU+1n0M`9$Xqf{prw5_@*=6kPCE1hTwag{-MI zl(^|8(oal4=Fj1K0HS>D9;9FX6W1$E)xOAV+$`dZk>oFRqNrFPPo=tiu zZ#;fwA3coB`wzq8o#xVG;t$JEPq%-HEXS9kUU1?oaik0C1=e?vr4965XHDOiRNi6~ z<(okHWR)$l(au+>vw3EuSNlT#vKZn+rO5JKZ}JD!#QDma3?V(kgXVV!YH#|wM#Pcb zP%p@|Bfr5Kyq}=iMLILgK`!3Z6m?$t3ts;k8QLIAt3MFGu_Zky6}eztC(5_f7|R!2 zqw9b9bQtRCM|zR|IS@H}8(m+rrBm?yLbGcS#^ct3yHfxF!I zFy_;amS{bJUm}S4scWU8_2kD0+#YRE6=L}!vf7pIU&s}j5!W+Dz5IGy=b^uU@RaUf zXeaCMH}Hm(&%GM?AFROoRiA4i%e%G6>UcNOzs8b(h5mj(w*G#CyEo~LXHxlFZK(W~ zEx7%vUS7oC9?|vd1PwBuwUMq@Q`Mx~-oo=mtxqJfv|})`I-Aa)r2%K?`fX=#@?Q=@ zU3V{r{L?$q^-^j;yZs(`B$Sbzk2ZB z$a3BiYHtlXUb3CjNgwrr^xd(j>yFa(9S`@Q?FpdkIhOo}%9qmly{x}})eUcf$77av zfc%&B*H5qKVSlQ!>GNAS|D^G8t4jU{jcNQc^zEsWfvno7zrKu{j`LZU`HHsp)&rW) zsdPOd*S$#dd4T?W#J~sF2W^8u%+Dw9Lsrj^L6+_QCXO6}EbWNG^-XtuKKZL}Kvr+w zj?C2c(68D_=WEique3e$4pF|={fL|7P`;5ISzYfS<#T&Y{dJehvnnR2YhQ3Ge~0cj z=w|7VS>Zpexdih=dHs`GK)NCQHbnQ=31n%&EcB~(9Y>a3x*%(n{wGvko)JuX=_&N{ zbGF3Q&k#3kgRE2dbpyyR2bht+#Yyx_`&$w-(%GA8R9@j%2U33c{E+k>r>T6;+Q{m* zdcA37VufF2Mt(kXhPmh;ork`@iyesv&=;seJ^GIo6qIWjH6FBdN8Hmc2F1pCPRYR zWAR(i1$#%tnacWO{kB5yfkY1_TGQZQ{Ge7d=>c|0asxm44Fq z&wu!1+tf|o+%=e~#c%O_JlVDbSAF|-hW^}i=F=RtC%e06e~u>n-|1xQns0 zt+=7I=H<@Z3ci;Au#pW*5a0Vf(3yW;*mT#{%{Jnzhd74tCgFFfb^m(H%a(h~%NFJw z%AMTJhAI7!LQI21QLl;k9=cmNuhFw+KZ{n3sm~rxv9%88#%JyyTkpVF1-%*Ge$Js0 z;`g?$;XF=kzZqlYIl81A%TH^~*cS2K>UQCLVW@Ais}*BPe|a5;MSrzrEJu8|7Jh4U zTg)TixF&(VCD4W~Ml1Ret5BsIbAutXUVT!7`)4dPF)0>HhSjCn!^7iLWtM@+K0scB|N4O z<`7m8wh#^wE)ebzUJwjI4WWSuh3Eni3DFZG24X10NQf~I;~~UrfFHIaYmZqw2FM=| zCoXSihA(mR?x_Fbw`-4j(fKJAMQ~7ys%$sGBxgL3+o5$XSOo zk@=l|#G_Xu!|%3`%l*9VGSc7AL|qCdHl4Q!nLl4Z`tr@hO25Yh(!CF%&ZbQyZkvEC z8!kkS>1>NEDg6husGAm2e%Wvda#sE|^uy z1X-Rxj5x0W=?cFb@Cfy+(W6l3%O4QW98Bdm{X^wLWx8?yAY73pCesl383*Byx?MtQl`qFq6T@9f2IAx+gtNtp|BlYD!EJKzQ`B*C&AJZ>`Fu&Zt0k!8r6!Pbl zm5DQ2QvD~5sXgm=VLtwyIIFt3Ue}jTjzDIGZP9O9dkpC{?UA!u&nI0`KVLD9^6U1J z|GIv>KNOJv;sh$6a1vSR=hbPx6kUUJBhlX*h-0co5)T+a`4@#E|Ki^_nev~>A%D;9 z$T5ABkbfS>OLvp5@H1omd=|%bQU5+a$`_bM=B!GS@%*4*;Ah%~9(l;= zcgv7X|6D_?@EZ+UgL=%N^Qdc{ucGobFB8wy>nVlEN`Jxjl~n%uCDaRst)TjD>dSx4 zM9$iJi*$wGsELNkUpYYez5Jv2XaYQ`ki)B`N+qVf5T4lTfHEE0qI3X z#l#A~QOE}>|L7p;UZ1G^;+M$jhp9hJ`3=$)eoed0=+FB2H~G`EiQQizYnE+6HeK_c zSm7_o*@k+|5Yp4#wvylZ9r_C#NzbbMg;?p2UyJ^thIddm@+XdwUlVWogYtQOLrz!t z3p&zx75P3TeI|{k=|y8#2J_3vn8x4K_!H_%ztP*Jm_KH|A^Hpc`km?O2bH3_zQgX z^V5s!(|o1*Tl6YapHJWadw!#Q3V(W7ek9hveK6^kdB~=1s6Ndy(z9BOLtW`lpLCx5 zGlo+6`)84}CdVRc&YVGxX*L1*7yo41{;YE&NO#k3|Lu5Wqa~!ryq|RCZ+$sc)@{6#B?h{+9fu`th9c z59v$v>&29Jq^EC)6#bMHX6z>3f5|tW+Lu*ofV$BpI$p-Ku8FMKP5H9!5F07{1y9#f z{sz^^KbY9mum*Wn^1mnP6Yj9RMH*Uw6_Xb6FQo7BP-+rlL^Q%4}Fb-lQ?Hi+=zAM%R;?fWJu}{>D}0DLD5XIcw|@ zYHzDLn6GHgKIDSmEReIt>W_c=b2`(?A$Yve?5;(AlVNoHJ5C()cXec=1NBJPpA#0G zdx7e2Odx^5BbMJB_reMZju_4ScEazSPN z{$t#b`1efmuc!Gi#Cj#NX&c(VG&A+(m-WQ@3LMFw60Kifc|`J zaVoLZT{`}5(;{bWwMM^Y^Au#$Q9r0ZB$@ce6x0hG1`%iNrT#lI7&)s(SJaK(h7nKR zfV##=ja)Rp8S0v*KFFrs+u`=6H!vst(?;5UBM0P|h3+^X%FoebTJJ>G+-^$c>p0;2 zN?*Z<3-$A@Ze_~%^cu}S#r#con~!={4@XxqpVD7WCI4eOA4$JD7uhs~+N<;zT&MNX zbmZ`Z_bKS_UxH>w$Z!p zyj#cCUTdc~;pYU+z$ae+>-wYIZ=Mk{Beatv+p)Lhk8|(4^Q+>SsvR9!w0M>Z@P>Z> zBqqc;{>)$Q_Y%*70KHq67=w_Jj%?doRetEV?)-&M-icX(jx13;TLN@j@yy#fj!fxS zi)X1AM)Jkt8MkvC*>&-(A{B7yjYC7%JF-!C2PBWG5XpOsXD4PlGNpgi{q<#&jUxF7 zm+NI0wm33;_JwgI_nO@3#MrHlOnMH_1cH3J4}YqPjzB(z|5;q1PvbA$`Qmyzf*u$< zF?@EA6v?}Z&qba(vfED<%ga7?=SM4hS88M9#FTyq@oW#1NdDf>FmNgK_krMQhHdIX zd42x}<{n7?3iVf;*I|mC8+>U;Jd2_ZwD(Z>mmBLG*-i0miRzJDn>jT9<0l7ZBc5gA z8OcrZ;q#Nz_DngxL&URMf+OKsF7S-lOO8xgKMb?@2ne60SyBjtun=;<<7G!^0+{sv z{?r?*otdk6j`5N}{M-R1627vRU;f%_qa%yFTGjjfm;hGvaC1=O#V{V?86OD&EPcj0 zw#M9nT^66~Gzw(pxyy4|CG`wQVNUG&ZEVd=h zne}AwAha6+te~>WbCIVr+XK%b+m#o9$PrGhq`?CUAhG_3Nr=Q!d+&w4} z#j}(?hx2UTev`jlaiwPjRfA`b`~ed*q#euy*k2qXIF9QO7j7u(Z3@p;5zj&y<;+)D zjPPmX8Nts9uHnoVsz3TJQAco{c-Br8Cwlf-UwBrE*oJkEeBqCboYz)ZO!AwGqd2?D%poRAB=Lf;Pu3_owDKCMDS#-UWH~nP4q`}_|LKE zDV~MZ1D=ia(JUY~-GY1l#Zo8wg!3YUn$=8IR@_!R`vsoeWFnru^TLW}s%9TwJ~NE7 zt;IFZhgkCj@eHMBq1-F=t+K z8a}Ig5yE4p_3~c$)tR1ER}Y^3_2lB872CLQZ9<>in7W<$L7Y#{Ty+K7Kc_R7u8;X1 z-ou&S6wipe56@nI9&xbwAI`r#TTS`wHRZF}l+SMa<=JkEXTQO0tplfbJH&p~v--CV z;yfv#R)+oBg=Q?xXH;%ta1W*y&q1H##l&Z6Mq=lJ*Z(>;nZUpQzy18}Kktq8gz|kM z20+9?41*X2F&1J1#AJwMh%|`l5VIiWK`ew=3b7Jm4a9ngO%Pilc0!~%56kJ87X5R( z@Okg4(ae2p`azHFX6!(Nw(QQ-Xm(S4HV$8$RQhk@x*W~o#q;kgnltZ}^Q2KTqM2N@ zDsl2YGv+6Fb~Gyt{rz>LD&|b-pD?AC;aj0+!SlrHnX@z1hvn{{13Y{BoS^>Z%;Mer zoZ$J)YTE*SWnbL3g#ER9M9L-F3jq~<8Yt9;sZy{fu6wQhb4RsFvX3iE~d-|h$ zVl;Cowwt;4p*d6f8!t&k~yHTO_&-%l~XXiCrGo^oQ z-t3&Twb5+InGMrjGg`B!HVtyE*F>`t^^IvQbgkLy`CnwVKAL5TwP9~-ru1(S-ywkZ zXSg5kIPuTcEKff4Dx-^@Alu$Onm{{=f`rizy&)O z&i|_v>RWxz=WAv((+u=?`&Pk%DgAhkur8WOHt=@|p2GH9Uh{6YI+{g4A6C_uR61z-$bzNCoJvVu7|l$69}on;%V8%w7#Vzl^1FPK&SrkJ zfajk6x8KhFL+WD~uY@;ms@!U6$qe@dC+n8N_K3fCun@R-C^v=iyD9!I!6Hkh^hdUt z#_amRd=dQz&mMl&va7eMHzF?@Z+0Khi-V3^11tRpYHjt&pQdlWza<-0 z=}l@g*dHA(z;_J#0Oxe{o*={ig!99aDgD>Q{x=Nfm#BZ6C0k;)Ds{mS=)W0Zo(s4o zdm7_vFdg(=PVOE1p0{L5zu$pDya)78#?%ifHTFV%XF`3lVLVL5IZ&Y`v-sY{XEgL@ zycmy@mQ3kS2HOzuXH~g-c;yXC);K1*ED^@f=ufM?zE3Tg_a;-HCZNkr1{{6*!ICNc z;=4am7R;}n@_}nQOXfOgR8DP}ZzgftKP!7!u{hm1@A+GzSQ^FN}A(XrGBSQ~E8$`ZWQ@GZ((&Q_q^&;q&IB zS+Te#vjMh%zb`T#+9TTKWX+WR3rp{n^m1ej>rsa)*$Y7+3!$a#_Egm zD+8x-FEZM&oE9Zqg7FI1m{hVo)P`Lb^@;K8eX?`)eQn?x@xT4g#PI}llPDk8(-+%N z{es)YWV}D!h8fqX=W`PBhcBJyaj~ckQ~H;D^x-|Dqv-FunA);4cE?i=1Itfd8g1%q z%hHCcC*jz_~<=t*_MT#2AxMlF^%|+ zUO`)?^luc$M_9ko#oy;Kwr4hCJp#u4wakvCiTOShjt`<8R`yKk-+}8|6f+Xz_q#pw zJF&%g*$${LdCsBXYwg)dgSecj`#|TWR{Gz}!-VRJz{#nm`(#Y6oR?khBaK60*tFfki zURg{u(*=(A%t(d&qJIWQGxJlmq^1)cn9@I1%#UGFtlQ8NZuRatu?C_%^j~J#CH`IojAy;f@N{{;GaYY}fZJZ2dSp2K zJqv~3_&VIU6tzrImSCwy0& znfIZMEY35E#a?JQ#lC?Hdn%3x9irG(@tww6F6T z&OMMD&F+f!o^oc9;$8(DZ#T>}_dR&Qnc0Z*%Q5=xU!cE-6bZWc4)1&?w&e7~GBfDk zZArhSe9$zrx6^0l&S;h`uFv+Evz{=#-{E+hf#-i_ zY?jy_I9`~nZe3uM(E8_ihKu=g9eCruvL#TyXz8rEle$~d{&^72hlk!9Gb;|xhs1ie z5st_B_p&V64sqPt0rhRJHDCA6lEuOBwuSj2J1(yKa;z1*ytH9V3#dQ-!SCi5;cpL| zdD~-kWuXTQNNWt|2edxG@u5}e$MMlN%%)*x-u>B7pE&L|w`Cta{z-b24#&rrOT!}T zwPEwt!ObX`znZo0r)3;(OZ`6@#%JFcUC#^$me#iTm}7J_d%we|?9XHeHcPO3G>aAE z^qV6)BhIHCqgkf-_k9j*quBpVqM2&Z+{(RVxM%lcQSOwQz&l^0<(+q6Tf{YSZCH=Q zdGSC8x}LiO>*uVD!>h*Ivn40jCO5nh#TJ{l?oh3@16?1Tk7Di)wNwQe_N=kkp9)}q z7U$Vg0h%)S!1}!{T~$J>!ciEqF zqo6%vzRrjGv^5TEpWCx!vA=Bszqp4w+mShm?M>0Qe>}`*F`wd~e0&9OpMx-;#rX>y zzk1)bNl$&{%$~J{1F@) zxB-iMeJxzrMzL$TB@Y(aDnHjK8`wEQdlkN~)kW-oXQS8)aZj@*tZxq^eXE_0V&}y1Ak&kD zdY;J}Q4Q8ZaU2Y2Psck=G~4&hqvDtP?U|i8{=xiuf!8!jm0Py=q-kE&d?P8z!vo{8PzHh?h4dNoa1}lnvl-J;_72e~8 z8=d%Ky!Y$|MHN`qw=KQ1dpBb8$0-9=*?j%CY((m!kue{A9ID*^*5qz0Mt`sEgOAaR zVQ&|gnlVXVSGw4bxjB>eHdtdA^U|jvJ;~6jtTmIxJ++S?ed0wwKC@twZ|nk#<`P?G z|LJtZP2dK@I={24z~rRp(O1t{GWHb~`brg8hSf61ic_qZWLsl!^8QLc+b-E}T>k1> zHJ07IS?oXImDn!vSw#cjs&L@Q ztH@#+MZO7~(1ArJ{R|Y&q9dT z|JQp}-cZ&LA`qe@gz~%M|4;8#iSKgFhL{hr2;%?hUe#XE|AaUKkqdDg;uHkDv~G8B z`=sl!xV`E=H;K1f5nuib8UEfPveK_wVn}|29~K<)sUFlqF7H2GPT4>P|IQm$#qvtO zI`kXrlHW!P4nFnm&%`GdA?sd!MP^G^lCJRUOfOM=VTZ}TsF2!wah9Gh6My-GSm9SW zhml_OKx-~R{ey{l{3>LvDxC7y+kvd~>n7_+-!+Z&2M3WQ>r7b- zXPf4li~6;xA;fieQoh>ORQ~;D)Rlg9Rd>?!mY}XOc0-m*MpC}Lr1QWukXCs-wtSg}a(iML7!N*jdyP>W! zen5VGeX3Z}xn*ff93Q2BX$5NEqB7J~O=!J>XN6FG(ZtOE3F!*IF3o_pcga0~7=?cG2vyb{qUtV{8FKvIN736o> zfh^Y>g{<_e7n~%0%NEqNXLQJ{P6qj}k4q(tRAZzF$|1mb=hiTAgp@(RE1uMbqdiV5k?Z-`I7HWlNoa(Rm^m*ycW z{kj8D)W4&0QP=j1MwT z_*KR$NnhfHx~leaDzE>$eCle%a_6R`EBtC#nt#0VB&zS*AuJEilEePc-lg{OF^Qx< z--)_>_-+f_9~I?w){Ch8$HC~=R$M^sscVm{>p?6HI)&wxeq9;$AA1u*{!go^Ja0&R zPv4(aCs28XU)A|CwJ)WxxoDrR@+D;H+i_&=ay>W6MppW@-wVlqZx!ll`7H6P`Q$e| zLwvFej?wnYDdAMUh{i)|QjDzhtAl90mfvlm?d|^^^YK#Jf7Ht_ zQvQiZy zi;&d{zxLc%)TLMRQU7_qJ##R!Zogjtrz`0Szi!GDD!((XDbBA^(VufFQ zh}J`?eF$ye`&E>0WD&MsXEcwv=Ay>9eG0$s!bj3;*KUCNZDLt|T_3mC`V-Zk|AAQP ze^x^J-7?fw4_^~o+#`Pk=~AnQ$ZCaO`{D)p8=fWo@pIy#$EmzEu~fJjS?O2RDMr5( z%&5E}vHZ3Z`9IS6fGn=JBv{`S{`&CMrgarT5U=fW7Y=Xi!aauX#y2{McX&=jJtnTR z3=QEw@KQl2e;aQC|MtJWSKti)?gr5w!Uw`1A_$@rL>NR@h$x6&5d9zqLWtk?{-1rP zU+=BQE;sL}Hh<_knLcD?a0PzK) z6vDs&{va?!6^QB(wIS+5z>6mYYPfh0yxd;A%PxM%tkWoDX-+%hc>|G^erZ5Q)Ma(7 zCzfB<6ItpQhAdrIQ~8TF#0tM;ZBG8Ni5{XnUu#4DoT13_ZBt}9FA`blXT3^-L^&C{mjW6b>>OWnPVaCsC>XXH&LFAuqLkeidf<2Z8HLe z5BiVv@e7by`*X-VelF>wFAyvIvO1IWne?1@`Rz(%dG~wt%P9v*Kl_qc;fLpel0VxP zb;;KrS!zhk{F{;PQitkS_*wb>P&dVeOYn-gh5HA0m62JS+sN{vBw~eMUfRi9l#5AD zCH+(cGH*T&xoCEGWa+ml$V$Jwdz`;0&vJUS!S~O}$kHG7$kKJv;cw0&EB(@|iR70| zNjKD&*K^U>DX7ase%N4s|7pb8FNoi0k@@Dc#ONig)S#>B{1rb<*objgXaosb;4@;ghBf z$MXDnF!6w4l{ow}E9pR7Z%0MJzxY!t1>nCwX;=y0GaC|58;*Yd$cT7JBC*2nkb9H# z-&Yxm@@&;T;=bpR<-7~T%Nrpp{nFx5!C3Ffg=M)=9{x@&@zp0kkiRb^F7quFtn^DJ zQ*k|%)N0ZjE+$@f7#W`DM!aq2H!QF4^NLeQKYZjn>VHoozN|q%oA(=Wh}RcvkHXL9 z>BnpAH`0%H3*@4m-Ijhu|DJ)wJ$E6)ae=sLG5Y0zUc}*FKMTJyAG4{@k;A`Bmo?~@ z_BSHViN<`=3BBI9$|tO^+6U?{Q-8{*EI+3mmERDLI=6Ep?lu9-v)}6zr>B1u`QUi> zUw+m&g8E}b9{Oe59>iztktOSH#KS6Jd8MC+`jZ|PNA0`Oi8!(@wf_nof2AdNKH&B! z{Bm4!AQ$cGJBsq(Pb5BL{~r6VAf7mG2C~vG+g>95!Ys_c%jYKX=M#D^B>q_Wt;nbJ zvw84$EaBg!S%cr9zO+5@(G}?D$6bh>oZn#kgRF^d+9NB=^E*CxJe6v6Ex~+hHSvxh zWXU0rIKJs?p|f$G#PaADf~79*#5Rr5|4Y6;`u%C#q37toH=gE)?kcjpDwQ~6#7oSd zltz5?2-YWkN+-Us0`s%pONkRE;`Ydj{H)3@(%%lF{_)yNY||cDn!SVAr4IGioi)UM zgQ)$b%ZW=~J;U~FTut1!Gy0YJ`C0w;cp2mN@MaUp->v)KSiWdD@%vz8rJwEiMEy4; zfc%ADh)qTy%j)mM8Tn5zUvIsCalglcWpn-c*0!0XEAmNGuTlBL-A~bPdy81QkIYqf ziQ`i+pJb)?t9m^|f5s)!@BZ~jureQiltJ64+xrUjANuin?2h@_$RVU}2*Z58_-9Wi z{c%HT|B^Yxg&NvEuc^fKN*-W-g`dZ~BYjjBtp^Xk5bt`7`zH^2LwxwIe*d^i+^sv! zpQ;y#Rf#k{kM#4WR#&V~&iFYWibVaq{CLoN6t=hI#5yWp$&>b%d&h|FZsL6VrF~iY z`6!1{{_K6^kNh;pb=P zlD;j7`g5$7cv-W{SfA}AV$+S+1^?nV9Y?w*`I^v~vwnOs?p#F98B<;!^GOOn54}U{ zQK98|tdH9hA50~Ga5(YD%CugE{7#(se~@(+&`~7a+b?|Zf#4e4Jp>3=3kjJ3fer+h z1PxA*;LI!MJ&sa)OXeV-HK66^ zs=u+nrL!3FpXm>_))yaRjlaI*tf@1_&-x63Uj%J1b=|9v#U-}amn{6-f%RdFP5#aA zcO2s%Uj3MvFA61sL&u{(-U94t-)ZLSs=v7p>a&)MD8FqBaMI(Grk_u)4YvOiYqC)M zyx&FCcNaQG^)t%%T1y}E9J|x3r`N5D{OHt&sGjOCP#Ai6XUb>ua)V1By<+;=?hN3k zqlk6IFRwqL8{gun{SCh0%5Tn^EN)c-_q+dq+E@KO3qj8_^eW{KCH5_4VBa3E$shC?>m%j8B$HKt(n)LmRLu8-{A)P!t5suM`uTzi z;0dpZfAe3h1HH<7(>N;9DzFQ2w)O;K?(Hnd0XQGc^;k9r}F9{F{&+{32U+Gha-& z8Ys+vuQ@B5$=u~4X4$G{C0_RfJb6YMljY|M;8i=n((*J8184a3mX_bx*i=X#JF@hr z$)au@@TLObgYTM<{vm+IL%y#RT%=)YZu%X8;3}(9n#=|d#rbcxIkbHG>}#~V`q!d% z(u4M6`L~~<{<3o?pt}rO9?@nx^t??-=PBBNI|O08e%1s>d4D(i%QJk$`m7p^-xaCk@nBx@(1YC z!sz^$t+8QwTkQJ)e?f4ADWvOlI)M3lIv#j#&j%LS>3T`Lj7Gi2<8c1xF&-S9O#6Gi zymh`gKZMR#cnV98Uq-t9jVjV9oIdHvJWHajDZ-?uv8`k>h zP#M>w9oIt_HOQ|A_5-iVmRgwOlQb6GV=6P5^=%K{bv%X1{8Vf3{Es-^w|sYNoUJ*L(esnn0M#D>Lb5ZU2yCb zT;B}E=Yj0iD`F8=0(y8|8c$uM0~bGv^S`lQaQ>L%0?h~Z>K?dqFrD9tOn1QF{vch1 zp9EXyx2*3@up=d%k0;gG3%xJK_BA__s#p zdSt(@$1){?@3+JFmAngfBv3xjcLTiPCFXne{owMSG5%!_f)^ja^{aa$xKAoHx|MD8cll5HJdlN*OlvX^Uf96= zt8duiVg422f8+DI_**%MC-ek2m{N#%idbCG5v%c z`9Sczqm@kNLzluITcV1|m5&<4x(D8i!@6Dqm#aD5_<=csS$!asdu^Dl3bi<^I0){Nlmm!Ma`@sl4N z0bc3|FnuCZ1buHH-w#X7U6t{@!exBkAwv6tGp`|CpPiDJ?Z)-7{5&7|v%Zqf9=E3Y z`nVu!U(7MkpJ&_AAM5);{dp|0{(LU!{MmH)!#WX*rAg4mN2>GQFz+#!f4O;&xyib`2i^R8M&5%i|FYfkc?j|kH}6rGe|7U7 zcJr@!54-u-yvN=AYu@87{|fUSc$3Y0;N@R7N!}wb{|NIQdHGj2@1Zx@yoa7x-eWKS zGV>mL`A0YJ!Iyuzc@Mt%*Stqx{$-=&J^JQf^B#WluXzu@{3FbJ{N-O}-s3O-aPt`e z@~<$T0U-bC<}(81UuHfdK>pFqX9&o@@;zB4&A)80e8zzM!_DVC$iKpT27$@uGYHJT z<}(W9UtvC@K>p?CGYsTk-F$|D{408G0Pjd$(qx_#P0WhjqWwE7a+z3^{6Y2f>gnnG zYt}a8W<0e@_p3!eQJT(vEDOD~*1B=iJ#oK2r!@x2qlNu^I5 zmHFcGpWp&d@7I^CfAt67UQO4d!aoR{_OB#UXHO!)mrA`Mf5~B%-kGi!#F_Nqstdt$ zeR%;tLqJ1i}Jv|{M{AF$v^V*rf{f}ibb@o_3 z(_H@1i+j@blwK?a^kccQa{2H4*=E%15JSvZIC#it%GZa75bOR$?51DOF^=ZHo+)Q` zVz0*FZl^s=7GpYr_g>6lvU>fROa9ySH1dB=^B(a|-uzGVIruAl`Qn9hMm1j`??I@% zj2$n4XT;=@KJ&k}#}d+HOQ_AGT95Sz9IGZ$%T5<=A%-b9g{AF^dG&jFTA4x6Cf<(mAp|9eNR|Mhj>lQkb_Hmu3h-hXm5 z-DzL$Ht$P(?ac>gpD5xs`*PjraqL8@8hlmk^#(?QFaNcybbnSEKdzha4jAOk>6tFQ z{kU!Z(b4|$`D|Cs`=S^7@zUmVayonSs@EIUx$(@8FK$<9%HVC@{~H@-_CI3{e)mIYL84es?BcE9l9(Q|8u>rMsuEaMDeB6Hg zQ_Ncbmg#Hr!QKT^+|J(Ue`Eh8s{_r~;zxA)5jrUxX zQ2)-cj@;d=O_4|b9=zznwBv$L|G#=_{vMB)N6#PW$SdEJ_w3E_;M+1ROHongv+b81 zO%{0YF9~v;$ahcu>ff*ahMM<$cg)V!eAz`g%?6(^`zrs_^F((M@~;+o&6mr+d|fj- zUb@(+V16qFWA`%4f0h5GmSK~TNrqiU4jJZrEg++`jA}A!$fzyD-0&O9@R!jPy|=EztmW4VlI8Ea&0l(ALDP8s`T z9FieqoRHy?aaM*~#uXXYWxSE`UdCq`-(~#&zv*cvm$QY8)-u}5=q#hFjGi+3$QU4F zu#90cM#&f_<4+k=WK5GWOU7Ioi)AdA5iMhlj14lj$cT}#TgE;ahhzvDCuF!}oRx81 z#zPrTWhBaYBjde{&oX|HnUl{{s43RNh#%LMiWlWSYRmOA~vt`Vau}H=;87pM0ma$&O9vKH@ z9FY+#;~yDuGA_utEaRGtTQcs+cp&46j2ALq%Se{-S;lu6DN@PpMn-xWnPp^?kyA!q z83kn&kx@cM85tF1RFP3#MlBh2Wi*uGFQbW!<}zB!Xd|P8j1U>!WQ57+D`Sw12pJ<~ zjFs`Hj43jv$(SW$u8f5;mdc2du}a1|8JlHnm$6I6UKs~v9F=ig#wi(RWL%JOS;jRP zw`AOt@j%8C882kKmhn!;CmG*lm>X1T88#W2WY}fokda450U4e$ip!`iqrQxWGW=yU zk@c=W0Q<+ zGIq+?BjccqqcV=mI3?qZj0-X@%eW@vmW+Ec9>{niQDWJJgqDPydR2{IKmyA_1 z*2&l;W1EbfGWN(gAR}XXxh`aQ$jB`tzl_2%ipeM?qnwONGOEd_A)~g8`Z5~I@R!j< zMspc0WwepeK}Lv-ZZcNOSTAFX^nyqED=#&;RTvi8T(`$k|AWAka1eZzcOyfNRp8(bn7dZSo`kS){ zv9atUF`HDL*e*(;{-PJOKJEEGK`*kIba&U&$k+CRJw5)@AGha3Vzs|w8m_Ul5+U%X zxJS#c)yMMk>z9a)r4HopTn7EP7yOgIqQ7Z1)A?pY4gkc=otJ zx;$q?f1WHS*5f{4c?$1_uK0OG5o+H^-eLJG5^Fo)=i}|rul!5;Z~hnQNw;@CjQmWQ z;jbXzFJ|fD8g#`k2ID(QqtH3h+4L;*nUQ_UCHNyhqJIAC#HwFBDunSK`5yUI%M;tX z#}o5?@;zYY_f;gk+1keH(Wt}%>nQ) zz7PNV5yayCOXz>jBv$>b_RtWsJ$rrw_4kejKU)TWi6O+srlrKHUks9O36S|l+nvZS zK8aY%A5E+`n+;ulm&m10^$W*!>YsLFBlJf%!8c=2@9SkOPock|EB^3C?J+<6c9E`^ z??UXh?SsE$Yv>t{60>wIh>g&L#J}Ya&q)2X4?{ldo(Ib_^%&}J%!=iyAi#>BH~p9T z`zv1$yMz9pLwo$~E$G){DL-`IRp_}+5Ucqj-3|2j0ooH2Z)16SoPvMRWv~bMH-F*s z=f+i_AFTOb9Vsgnr`)X z>|kR1^EmSBD}2GJ{v}rZVh)3za0&GrltjH8=r1pq9Xg9AUG;kg+OWLME~CBUnXtTF zZlazo1$4g$#H!zOW&zUO3vMI-RyyQY#C#HI%OXGJW8^D-HYOvE9}zo{zvBw_*JI}r zv&qwOJgNK_v3OwZf2O^HuH=$5*B4v;O{|3cFAlUn zH!HFD9)SGig^1afK83&kB?;&cLtTtV|qT59?E&pAxHnHgp!+ z^YAAo@*F`?~{3*N3kLmy+*gmGhYuSww7HZU$ZPbKfn2 zLfZb<-m>k)_F3{hukzpZf$P9y8xyO3{o!7;_avBf&nkZtYfkwdSeefoo+h>z3nEti zdi9O)&uL3Ki^uk>6_f9=HS>d4A^!+i^?Ob^54|7ib5}=;F<(5VV163CaXiwGTjvLT zus@0(_<@c`+DfpPc8S>i#EZ5^`I}Fw=bqmQ+vm<|#9CYw_+4k{wIYbGKOTwpPwyob zyU>3laXWa)WaJ;e0uEn`{KGehbsfuZACLVR8?y`kkYv(B#VPo6CnA6F1+eE=VqP>3 z{??f=-mN|p+mo(nMae(7!G@4@!TgKt8w)&%3fHbtnA>p`nfi?~j!%*5^? zU5VB47Y-aBHKQHr;==_xUKz)`gE#Ldc30{NR{Zhjx1m4hQJ;U_Pi$zt$ zEU_pwlXUyIAm|WoVaqJwf%nxL%mGa_UmX1f0r+q|51L#zwHlh z_iQTUV!3xGz+bpKv2oxbG55oK)9T)buKH{IM1NN;vBn#p_t;Zmf5ruLZT3>qh3Y?F zF3@b>KFo>wyUG$9$&titK?Ugjp) zqrT(#XP|v?Cl%$}%Vj4&>;4IP#$4#1;^(fkgn7(VuBqkpCF#!}yU0`HEj( z0>3?9Hu$q=Li-yE!0(riSW6B5&Hp#fSBxdd=N;_u$L2(SOX&75$Y+XQuV|ewHNod068>;Ut4}R=BGPPS<+QM?~xPZowq3JKg*5r z3#mgac6(qvvid+*{48HqgT&*EE9QEvrnt=7}q#NJBtVlNaueTswttV~| zfImoszGoP*)~G$|RUb&~ZrqvpH~&ejf2}|z(z!3@mzJX^>Yc;(YTs86toZech0%Um zEI+$f8tt{~4gZXS#P;=p#HwGcM17-IQ|LW2kY8)l4IE?5x5KEX`o)N$X#YEO9x)vG zdKl%4rK6yq=nGc;<*+@tvtoRCRj^$QBtP$;h4SqlsK*t*9$5|T)g6WUm%WJHef}h7 z*Q!9jsS~SyJr#6!y9w}*M!tRhWMZDOD)Qr|AYb)w!EZ{96BD6Vu)aT-Wbuf{w13vV zPDg#!?~newTTDfJ(KV<&!(o8mpg+bBu&(&owF>Zu&V#>U8T7Z;7_>LhTE291z=~f~ z`hxxY*lEa*_zVtSNG!U%#r{6iOt9kD;~LTaLo=2j-&LO2owpsao_P}XFTJhtQ2pIU zP=D>^kQ9ipj4tf&ShrScKaV7%w!^e|uEDcBdpRD7_AJ*|8{aoa`vHW80 z9Ad2v#!Ig=6Zt{bc>DedeJ6C0JQ6%C8uiuw+H*nAv;g`-YyPenivGRB{=_}68L{|) z{f#jQ+Z)@R1NqHH!yi}x{0`fLcwCIw7%>3(Kdk+iH9tk+lBBzLWB!U>MZuM^zF1MP zF<~9$Gqt~qaQ<%OT#NQIV!j$zz-;tGoPXR#K3DwgadB$TzHkQmx4JC)yTqDr^RYj& z%NG}Nc^u+BN}zv_t@=tnZ~YDPuiPs5Bhp}g-Pi~o23;$#9jy5E;pI^Om9@NU%7Yi| zCO`Wd+lRXg`tzIr9G2HTWGCvk!u6E<>Mk?c;WU)%<>mG#>7B$DmJqLu_O|K`ag=L4S;TT=DZ~b*Vr0&xet}m7{-K z1hHPc9l~M2DGVqmJ(92uvH^3S{uQ<}bo%;jz zo1P<%--Y#M+_lD|Yyk9XFGyG0)0=&y^{=H)Mtk3_{r49zYy1`SW6wwAtNu7&s_$O@ zgLGE45&G+up$ToTb)joL(-I5C&zoR;jHQfpcA+)u)qI6|r32Cbr}&uY3_OO zZ(Tt;k4OKsn`n=9v;57me_~@6!Cz)G>ZP*!KN>6wtw4JvPLa-e=HdNTbBjx z6>0~rhW7MSV0Z5PsHgfvn~>lA!xQ?uX2hCa3f$S6Z>P$jzUrTX<#o3!0)KVX(+VJ8 z9DuHQm4&YQotWSD^%YQmPFrf<@UH^y0$t1Og?z=YTlbJ_x9XEFKYyhCmG-VN{Kule zjT(RzKO5Q^^(SADA6KR^G{T^YculV^wYrakHfc(PP-x@#KTJ2$e+W(jTUqA2v z3FC8MDEtRs5W9b(J^eU7-?KmOgZ34_p4o-vyFQq7eLs$;_C?)@*#`kV{|M-+e_(6c zAK6nyz~8z-#EX zNMd{OPsH4JD6#SFt;LG}Nq_jazal++-vIcNKNIV>+CWeFftV?N7Jmx;E%bSsqsP<`ztBV9l97kFbvV$pj#I5sEpRsW?6=wG#J$UnXt>@EZU zlSjnvyI#;$f9&7r?`k{gV%H|HqbBkjZw6P)jeNz=wmpHaRY885bhN$=Eidx_xlg)% zRztAj=co6gzfJv+pJg6+7~0ppE}{Q_)FWN>f9iS+_1Afmu2(!rti@CZhhY5OqlzPc z{8{+RVLpgyM~K~r(SKI-l2s4$L99AWY+Nab{CzmSxaUFFGva#GefJ^e$JKcF{j-r@ z;`A$C`41pOua+AWMP4@Ud$9}d#hjO|Z5i}BS* zWByjo+?jOM&qXOL|K>>2S^Em;fBwa2FQF`!@5K_Z;umwV{cG_mQ@ci^9Uk^c_MZ;wAt z{F}dj68Sa1gYYj-*<8rg=q_~<9QY9Wh<}K=;^!IOVZ5ULCSAP5`g7ksLCoe`=M(G# z@)bWXe%)&S7V>w!CALqw4^DH7*nRRj@)f`C|Cw~-_(Rg!-OSW~&3Hzvmrp@#oOp+N zmp`C>U@~+iUzB`IzdtoXiU)Ez4tlc}_&w^plEiX<3tm%+SQJViwm&ZmR`WCcMdM=> z@*|yRu=eLAo1xyGEf~MzZHaZoF9tOYGS`RUa6&)UnpiVN5{ugw-=70k{M@T39S__A z+ev3-i{X5r`hL_KSP1(3!^EmzkIe>u#S73=<{~z@#ZA+Lo8Lq|#m|o5^BSXmJoHN$ z$#3sqaeeGhw6H|dRX=Zt-*3A|-y&VUhXmuD@-8?r6?ok%V%5(=@{_I|s?^L}pKMP7 zV!3{a`K%hm?n6G%Rex?f#(#Q2_@mQd{O<=4>jmDS{%k8>@r%;wF-&9rE|A!F&qoPON8~h4~ZF8@l2b)5f8{1BRpik_lLz`Z_V| zJ_dT~NzfHP53Y#*Jv~9Xp1LNn;lC02WkbM~&J(MCKE5CH-VacJ%NSyJsYl>5gNU`b zSIAfV`br<<=ekWgYoS^7-huzAhk9M#5UYMYrfwS{SG(p+fceqvT){r*Vkvpk?He$hM&)p!3{fpm5)H`*K+CiaM^0a zs$XAhoxiQ{f_}jI{`E}`Vm%|S-?X}ysQ)|)+pEE!5Q{=7XnS)%M*F|jYn~7N>x2I2 z14|NXQ>&o8??s4>8a1r`{nyWLl)>`*_>s=X=K)9j0e06V)}m{J6~F!*>(9QZA@pnk z@TY7LOPCEWR2oul>zg__r4YS33x9 z*a_p&_7d@L{%1H}(N<#piA}h_Nt=2R?Pbb|{*6UGSNuF>Ssbq`xY6E1FKoY)E~CAc z<-o81g|7ISy(Eo~JYK+Gv?@50SM&88xL?q4Ko?7}KHOz)!GAF|jfXp30$BAQDvSQwZ;;OH6^ZSq;OF^E zgZJMdT_}F;m7VGvyKW-CyLJ5P_yqnd*`RN_PyCzz9`0YWzkL9GkCpF6f5ao4zsTzm z)K~oCQ3kBDKTs)FFK^u(&4wQNG`*Ksiu+RNJlTpoV*cRTQ6)KmSu z9DWbw%!Pb?VAo(FyX{QQjP^6PCAJmKLah3E(<0PAR~}1`!1<7k*~!lqp#QdNS&^^! zS@t@VZ}YJ9YW}EqI4Ak_6*a+qvZJ2jXN@1w@;L{4z+XNQ%X=UBe9=AV^>ZL!@rzs| zNOuKT`E|!3{}Pz_;C#Y4H#g<0e);=U)c4PZ`t2i#Z9Z1}DF;JO&P)1l{;u9wzO7b! z?J=KhpW)ZTs$hJAbD_TC*Td$Z{)l|&U*YA%4moeQT#q96bo3{G0bg!O$;t6sLUEuSeo~%vKEb^$NKDa!m&dEfC}RrZnZNe&(~4J}+@?tq;A} zN@B-5A7XLlZ}5nEXkYP*inzbs6{Nxca6S2L9xaJ^(*?xN<}IKre);(^`sW3mZ570h zR;|!plLOGNwb@t=at{fBLPAJTas&R3lu`Xc|hedlEZ?Y{abWcw&S9q~n#Isur=W2aXr| zw+zsC;dm$>_!8Ss?I%C06-BIhTmu(bMC`uz4#&UXf50cdp#Fvv#J0B?HM2h=QxNsn zo?$lf|Ly_qv=IIm{fV`WI3Ho@&l0;QogrNmo>P_p{Wg7}T0_+SLK`ai0lI}b+lK40O zxNMkDW3Bq1OJKfb8-x5+(4BlV>MMTzrVH)2fUX{tm_{j5Pn8Xw0y z%qQ;A1iW!P<+B+jFrI_2_tRwYU8-v9kv9`7Dg5I8$ z4=4xTIvw>Dzy8IJ`k~Nuzud%*u$ibgG8^(2%pm^FADNrxx1-{0)Xz|Z*tuyI{NsvX zdq|oCR{VOPwg2c~kk0a44mRhXGuM3hWA_ugx?1^)Uw^oswg=m1OK)`;+-WK5om+$L z=T|+&FCHF+f7S}}i)DCbreptK;I^@(yDD3(__@c|_GW(_1NV@hC%gq;I*R-Q$;8gh z`=Kj-F?TA)r|DtRSwkGpT|olObkbeSs;BtHOV+_`-xh)T`lYnsO~)udH1jvKzw{8X z>Sr?|sJ>$b+7p}SfGeCp`vF$_gDh73Y-kJgZ|iaRiwq%l9<(@pW$4S}peugQyf>+T zF6|`g{8R$?2ioKLpAtJRo+VxN>m{zy=c%?0&_(D2VrP~MzBklsulFMuZUH@UeTZ0bKFdVziA+`t=wDqCp19+PJ&f`&F2|Vf8-nFze!E(hYcIpd~@hi zzEC~Y&u7(!e-m^O9suqF=B}E=jy7M>Kh+;r2II5lBkAIHQDSFr_&rl#zB`woKSK4V zorV65|Azhr&Ih}|Twr`0#&^_L{Oqa??O#Rv@_sU6n+xN`#$bNg)+1l_>p2Ecd#=Vm ztmPXI{u6%QZxHwp^xypBo1*;=KPjKxs{^*`QH>rtLKkx2I^<6dLXY;ZCY_q}9 zs|%oiQE4b&_49ssM)@y4Z`cLL-yo~K18s4##4fR#OXnq&RtA>_;_yc0c%3uCFINoK;WR1^%{qa8v zknWs@{)xfGz-^Jwi{~M>xsd;x|G{ZzT)Q#^H6=qP560KLGj{R199m}Er=P~~J zrDD(vWJUXmpJ&JODQyvGU);v`m#)WF`(CrK{SL756+bUG810?K{L?QFBmPy7cj%7x zFIoADpD)Dsb@KCT>YwO>-+S2dTjS}0@7rAe+O6^Xub)RGqJBY)4?FS(?N^0A{=+?R z3J>Hfe%|yEjjywYrB@Xg?_t*Vy5}shBLwGLVg!B<=g5KcPjIt@!!@T`zzLdEwQT)&R6uvqr?thYkyX4GyIFtUo~H7_&uuQ zGS)w+5q~=rc|eJ4+WJj`lwZeHd8vi!S(l z*(NaFtn?DB&++h!oYfboe% z|2^N-1b=}){$y2RTgpOIPxbR^d$GK2q3d=30Y3tJ&fG%m@+?gGsz20N3w?SK^!IoS zvF&X!V!ixI)SFP0_&5KfqO`wtwL|~8k01CW{GQv&5jzhQM}5T~`qV+XEf{`rq9L)v z*AxC-Euc3nf%a8@^TBkyvNeS+u8bvi6exxKN{YPx0&3alYa5gf7Tfx3yQDZ&&lU$v2)h`B(q3<`GU9f(Bv7@dx{OJbc`;Tns|1bTQ zpAB6|<6}ErgZ$BzqA-4iY7y%@aDC#MSkr3%zkb%>1CEzd(4H9io!B+0Hu*)aPsEPn zb&#+4d7X?{zS|fd7GDZ{yfXQD&yvKp^c<}E@3v}X?tdNc>Y@J6j>OIo_}R8L#IC{? zD}G*d6vpq65Ar8X!gyV`>b-{U*kHA<`q%nXecK!8;!$I?*SG=I6NwFpoo&HP@#|kS z_*eUqU!U3z{HGr=8`GTFb_1;X#fP8RKX=3U@|kvw*LJk$*&`Krb|b3yoB!)nT3%O~ zhNSB;bBG<89mKp}BzPtAx#AZS`cQpWJM>Tf{*%~QFOc#*w}lWp7B@!wil3c}ruxqJ zP0`-^9pFN(;P+n(PSb{1^@}b`Y5!ykY>WJHGr`we5bMQozl~!c)-R8`f&JS#%y%_k z)O$wvUpddUM05?x^_z9je$#f8|EX&kVuxWZ&!D!%uB*+_KgBO{ zF2M5UhAv)Wd|Uxoe*NYQ=z}^^zUr6HqeB00cZS}@>i+`xS>7+`f9X)PulPNCG{pLJ zK=(Y{jM%ma?Q>UK=zY3UzUtQpmZkaV7}g!_mrP6SOzuR?K4k%0>r?g1`xUUiDA5i5 z`}!LDlXhK@pW+^|?Nkr2;%ANT;q%lgXkSm2KaUQj)*Iua__^rNh0c%HV||I$eTZE?R)21EBX%ya@>Tz%FPINu zSpOn1D;-arm8|7|V;x^w^{4iQ>Tm6${yE-OBwfFC0bJZVzVyFI>{^Tcz3z7he5Wq_ z$LYn^{t)>wI6ktWkHI@}ykpZb9*$-fAALo-tu^+~;@dZHF!o1!v!BGS<=9_~ zIT&xp)IZSvyC2XGV}HsPye4*Xf70u;`fBB4f5#&p5j)0fq_Zm6pE)kKCl;5{f7=vm z|B&`P`EB{E?Q<{2$5nFx`FZql=wmUT`O|&G&Ii`|oAodBV(70>#+x^d#Qx$>tY0yB zHufhGSYOdAoW#y9R(pz{S-*F6bw$43dLQ<$R{!~|>Da#(56AMV{_)Xt{pBoV%@4;u z@RdPWzKFkYeRdT2y5i>^w@|ep9o%REsv7V#w;VDi3 zNo(c5c?x}>#fo3}Cy?$sZ>=wn*T|n>jZdYU(0>lc_^AGfcG&;j!+42$=deHR;|+gu z5z?Kb3SfE4JqBym_E3Ha^7r8Q%%fKmJIY)8m-;&RVFxT<@K0$5{TY@|^|P+tnBR3qqy2Ya*SvAWo@=XNzULlKtor$lldWif z85IfL=PzQ{D4kgMzE12cIfYpDd#-Ir&u4SaoeKX$JU`gE*-0!4)rCG`8nNnUW6D8q z3tb$?^SYgm8K~FVlXS-pu_dbplQIhXv*76{%478C2>`l79+tNx@iq`T71L4Ob6 z^IS)Pd8o(ABY&C2ieLXy8RIYS(BSgAz;g=5-+5p@^5b#6>~errzkEJeCv*IrOQXrZ zIHWkSEjWr;q^(Ho$^%yYtZ#PY_g)QuX&d~FR}n`aER1>)D~VM(4;;tVVfkL;d2-HOR)5Cfd5O;N8%fs{|F6&6Y)_Y4`DL-aoUq#8Uj}@8 z6X~j7gq|VYb#*=IJmn?uzZ-}}_&H)nv&~?|&xT$l-S!nakGezbyuA(nEB~Ti9gkd=7lC zH1p+mHR#sf%cDjzGO!S5UwFVequ8&=o)1i+9#Kcin{^)E513m_W>0co92C z-3BXueMe=|9U1SF&Sy7(ztAIM`8=_@@q82cw__X8NI>LuaU3#_1J%Ld^-P%bY920zSv{=Q(c5U{XOy(|8M7; z32``o8eES2tl=qQS3i7Sq@F*C?-y}?)zpvtA~Xp8lh{A=r{#$qk=Wn!y?7psBeQkB zzOxJRUA@r%XL!DmEoE2K+r1sSysyOEU+S4=qyCy|;7iL;KeaD0yEhbmug>7^ec&Gz z3=Zx=-0L~6f1?ZcA}+lI{m~=45pT$9oj&y6zUBPFp>#;^Xi5;;$(4P&^ z9RqPZ7Z32fKSzvp{yGNrZ41#KJ{Hf*ahwFfD-;UfEFV7xrpjYTe{`jSM--a_5=eK%H5cC5>NEeUr{sHG4oIgab zZc4f<0L-7|Cw6Ye`G#kD9N%3-2B1HGrG)+f?RhRNfciStmwdhky|2JI2cL(0>Ux;i z)h+_3G($_pNj-_tA(5%_-KJ$~<^cb*I$#HgO`?F-(WsG0Mho%15Lb+UdBm4x4W zIJ~Xj3w^VGKlDo{TMd&Wnn}Yn%4QSs{Jsauz z;>T9~48**@HNM%P^ZYq5{=dGj*ffj$4(~eTXJtcBe|tsL3uz0zL}BpCuBhJ!?Z>An zMeMZZi+EHP`du69F;C=YOo{w&H*tL5g#LMEH*kEnzK`VF7t-;;@ecEg_gIhPg)24k zGfp9PrNi%`^cB|cv)Vq!`i%C6f1j1Vw=evaQy~Av1Y)NH>yOzc6FZ9G_f%rZTCjoj z$L`HX{zvpz-;eMA9hY)r`R?XLy+jYxZ&iWVHWTxm9sWVbD@ULm{v-JQ-m&;D=2tCz zUgtV|2i!6p^4Fgu*1aE6zAgWCEMKelIR5pqem{Bd3ivR7->Jub1}`{7elZg5*}h_X z;DJw3ug+%FSNy!_dFY=Hlg@_55j(qJ`})*15%uTc`(xJNF8E$5EMKi{;7MT7yBo1% zY6;T$_CC;STKgZif!KMVGRAMiBw|~B94|!gdGJrd{#(>&1N}7ihkV?3xxh~_p00Uu#9}_U zbbB|rz9-s0jPpO%;6D1l9Osv;!#DJQ0M4)U%j?h|{*m(a7q~umj{1i7X7q=CAc>d< z*F(PhU-)DHAa>2$1N~k+J|EaM4SM-(q&qs#1YazG{F#&BFaC}Ewx$z^<^7bTJ8$=a zf9E{X9T#!Dl|PCU9wew)ksJ@1Bi6lQ(iwVK5V5G&ir94r^G_`84n1xd z^uoTxPJifn<~+o<1WPZ3&!1fBtn-PRMzE00xc==$`TFAX#Eucx`ILVv=v$jW--zEs zI1|xdQ78!d#L}dT2OE%o3Hxt8t2wwQ_CLI8DEKRWPtK0q0N1sSkF$Re+ZwLbxa=2S zhtJbo-S!ddUu?8I&X-$=MPhz%2#%lPU~Xbtxg+q;_91pWwDw1DN`n_eXMZ&Vd!xRX zR?_n0_$*FTB6f!FM|&Zj(7#y6>#F6!v45kz`vr+zeRjehRDjqvZY#K6dSd4?>_ zmM$iwCDy82_jCUP-97p^?#H^BMEP1X>wc^;+fl#LHr$U@dINabRP^WdujiHE`4nu; zC*<#F3V(P7v}Zk^WM2-nr=7s_CiF}Gq}vn9lAph+jQ-{-03Hwq4*G=Wt<-2v>>h1B z?C5ZLvk;p&N z1^y!e&L)t8K>fv9i}hz0yPR{HqGPX*}#lqrv<8)B4wLx|^8WlQ`#3Y_GGK zm=);&J#HM>lSBW!mRR38FW8))MwWTtZ*#$2tmp4n>qcxJR#J-H)X#jIn7gd!KWRZ&zmqJ^G8*%tO<(fc ztL=i`U=;WnSf4rwd>zm061jqj-9aw+J9Z$}a=s@Pb1H*Fa?pHbPw>1@&D)0gd&_$M zKwU5BL))YNdOu>lGoGhozYzrg(NNNjEPcVPXTa~%gIILwjP_ccBxaMx6U*m+!T&iD zdUP>tzfP;(Lw{_K3)j%+KkmwziS-pR;6C}_AB_8{weY&&HH>s^;8nEW+k@Cx>O%YH z9@6J^?#JgW{zRkDM zlvwj?2>;hG=&Lc_y7LgRJ-fy7a}Oam3uEPTtS@Ic{698;(=-Ra{tKLJwclb9G0)Hq z^#+82qlXh~aZ`wOZ}i6=(Hr?!uzc=G-N3!9{8;jho|P@0XsK zHOBMV?4L_O&zwy8?grJ7AN&e@$On21?w2-7en5ZjErx!#AM}T#iM5P)exEoo5xUjB zkoVov`fzCEXX~bb(_p;#l)=RI-~njgZ#=QKaR~e^f}uB^Oe~V|`*-`IdEncvNq3K4 z4NmL^u6YFf3eVHhD#Q@W=LL{%oWS$**eB@5ss!lQn<2mJ8}N2KkHKBDBIe_LJikJ| zuLtw>y%+p{{ILAi^Gm9wf2j+smaVhZERPYyUO>EEb z5&j_D&u@SB74vItI{2Tzg}xd0TWjC&yi1lIes{nV()H?Zn$i5K?}q=3AM%R})VsTt zbbGbw@TX3O|4a{Xj;qA(4tV}6j|iZAV;sB91mU2dvpD0UYo#ez7iV+XA$ei_7J zAIoL6UkF@&2m0sdf%Z>D61&?yqxN~ve~66}XNXym&CuKL1COo^y$Igtp?kC;)(T!l z{(vV<%<;D`vfhu-Ij2<*?;Bzt;-Sm;cGB`LsZY8+Gv2==(p)CpaNzt@ubvnALDu=I z`v&r-WI(<#68W9>1eo<$ca4@;^IHvG!YG{1UkD8zQ57IrB8p@ zkd9wk-LBvgmvKBx5rX$)PRtA*7eqQ=G>`I)#6`sHe0^ef6})dzRLTPV+hFKLiV@rU zhZBnvc;1vY0`J=tb6Y~+!{EP*=UWvTXVx!jPfWY5UK?f5lo~&6mPoMkl{J_d)2P%aETi4SG#HufyGc z5wUze8~lfF5XABCq+HU zU1Mi$Vt(D5*gbj(coshYGd`5U_Hem1K5q#74t>#HaL+>6zW%}QqwEdSfPdD(_5A7P zmX6O~wF4b3{RH&Sct0ziEq^9o&?Zqw<$CrdIpZACSYhRJy z7Vm3~pOZ}N9&NqfwlQ?~?q}o|Bl4l%5xoDE=c`UE-&aceQ$1A$=mUO|-*aq9V)wj4 zq>EVS+6%mooi)e>J&0TR`2N-Khc4#g{p#*J5u|&r#rLoF9yN%0j&5lGYdvDty(74H z2h=OypIAHQ4SfOL_pRN;`{#|&F!CGM}OUq@%~ZW`w-=@XmAF-JG1pdw? zi1nU~zOUCdmLv8IwB9G{)foPznMk)cg`efa@0aYeY9rq}HS#Njg9~OMmhX!P`(=av zz#wL0@`D4{5Qp}`?+vsbd!QF73_T9-W78u)Qh)8+qfjq3-j61q#}3Y!75>@JiA4<$ zVk2|vCba(yFGXz6?}mP$H2lsOVzwObAJeA3A?D+(_mM?g=W7`%A-@yO=Xj6i;O!Nu zew|i$-;#TaMm%OmJ?O(q6SL3O`%*UWBKbvb^v5`N5L~eq>4wvV_R3iAWAj}DeN{u~kybq~^v@`4>4klv54uf$eKaSw zhvND}4;cl$S5sWS#BL>apTy?}qSR~f+Q)SL$D3B_BILX^3J=Bg;nS+%X;HX7%-9ng znvt%b^$K0VIV*r~tsr){C}{C{Mf z2UOI`+r}eUqM~B2=n58W*cBU$h!`uNh>C6P*b#dV>c!ZxcU`ez@4byZ>P4~Ft82$z zK(7V$o4ilF-}y(*nX~?Up7))}WHOoE%_fZ=kgn7){n_urKic$LueWc{=zOSeZX053 zkD4_AoFjsm^$a4GzqSKg{NnR8od2R<1sn0wXd7|J_+;?&LBw(iH^f`~++!!LUy}Az zqVt97Iuz@hW_iIyPGWuY(--F-tG~!u#NY3S^}+oEi2pDI=lgTJ5T6=DtXllMH$Jb8 zD^(QZV=L}w?w-3e#>XA={?6md5L^8$=XL6zq%~)0d&T|TiQU&H67x6sekW-y?yqe1 zOZ%#}=AzrQ$M}9*I)n9$@}w%pL*crlE9a{aTmAeo?mwbk{RzMBhU;bOy0&B@FDs1u z1Ih2x5nKIItzpQ2H5dHJBZ$TQSHW2(gQr6O!=Ggvtrz7U`N^+d#`A`>sJz5d;wjQ~ z-$G`-|Mg3goTO{P`4E5T74pq23|4Wyzt$GRa~>`9nOw z7C+Cp0{zv$1nRpQ|3A!J))8wNOObB%J5MiwKNh;WVI|m68u{`-*UNyderbz&|C709 zd_!E1Bp>red(YwiOp03>;y?Tw=2Cr1XZWS2xPOcG+Km4?8|^8Cc-7)(S@8QS-BuOb zxA8L_uga<#sIT@1V!g62+GFvnf#&b4%K1Rw%W%EK`pV$-xZXu6UyIo4m!jWOe(`%A z(s^54f1&@)z@J`Y{@`_p|L`BGit8yFo4VW>*JF%q2!A53-_`#PL_Vwk%Uf)(zXJb0 zvp>o-AXdxcdKuX>2)f13zShF_dbWUnig|s}$6)y1*MmMTl-TN*ZkqS|DP#8ku64M+ zF2)=WdHN8G-;Yx~zfd3g67-Ln=OXs+*9y>6zY>f4ClK4lzQp+0o)h|G^Y{DLG2U$j z&HHDJHjlSebKp-YOMcdD5c2t&`?tUOd;W5M(5v+!U9O7$SDQz{UmN2^n%4sQu}b83 zo(_N>iu(B}bG%peB3+%5gLLi38=B8lw@-9^n7j?w7fWT5a6MVbSJK6L82T|8?b((K z@z)v?^Ml)QecADr#L|N>#Gk4N{ct|;$tRSLjVlR%;3;Ch^f9hK8@!I#@!cQSr)8gp z`nT3J^=NSHeQ-BipUK7@hkvPgeeI~fp|@;Aekrs0`}QZ$MgPIiaQy~raDZ5@YMu`o zG^Y0H)6DbztlHp-fs~K`SWGO>L_D9mj#ztx{I{N6CswNCdKJDa2>ms(0>wN0LUFyO zn+Gu~7l84%1=oj2t@;wnFVG$-q6>I{+1~ZV>3UKnPXyv$v?rEljwiP7KM()e+0aXr zApZUP71am#OHtzH!=G)8>EA-k6T9JhRbT86c54ZE^fA(HTThZ-8~Kn}^_h(NU)(g~ z=Yo^|fj+w*v9jtVv2%$I{)X$Rd|$>AYc+FVe07WmFD*u_t_mlX*VTspaW&Pi%Wa9J z>9~J{b^+t@kK;WA_v_Qwc13*a3^+b~dlEYuJ;na559Ss>zlYz;%KM>9({cTs;?s-b zdFl(yZ^KQt_-$Kr)B721X*cq#DR@7wghvufO)`-#7w%4M^^5DVNmqvVHRFAV_3QnJ zd9f0R{|@~R{{h_JL);G;@h^*F{8mGM@c2-SR}Y+z)SwgSkJFvtx5TqTnd$tm&qw`I zS$viV++1Je$9y2=@}+p?TNv8^v#MoA-cP#G z4BXCq|8uDdvG$`s;+?IC<#9t1f7}m#c?>bn7YW`n5Nxi$mGsjQzh^P>EuI0MWUe1` ztRQAC`214Kj_Wn~5>9?a-9);&9nZ^E;ueF`4k90$i}-CjQGce9(Cks|hXyhwD1?+ZX7EbHoO}Ji_ zSK#nFFkkYbLrGWMu2Z~PWTMGGkWcT9_z_uJGSM#aJOW~QK_2k(7R2KI6vVt@A!2<( z7wFM}RG;2*Ffm(Eomd%;2gDCk2Jj=v7Ym9(s{4O(3_dp>lUv{tfUiI|pey`2ih?uu16RWQBmGLF@u7QTeZ&$vkgn`( zg!+~wV>~^?dP~aH4DoBR-sI7^o=bn$ANgeHhpWJU)_gx%(GT4JODOG+H3f+E{P3$KixF$Z+>mdFJNysL z_mgGK_j`rS_1n5Eq$?{izQp$-I9~Iiy^@6MyR-{er@__tPl9=!Qe3Vbl`IYKZvtA@!JBj{gLo=Yg%4^c4rY8`e4fU&6 zCWBM(exHB3MJz9Th4v3hfZh}1Sv_!$SQEc%H1>~rax-+-+u-V_iS^MJP@fae<4_KK zK)&9y;J;%YPl>(2t)C)39KSzQ5^jOVjYPgJCy}pg1Nh71c;()v|M*e(2YSNa#BA>d ze4ZmGCy~ytl|lTs-C!{faj_lBN=_7E180c4y5i8jeh^6Nv!IArskFSJ(8_rMsaR-xklFswxA=c$J z;2+Du?}mblmLXQ+rXXLB@rdux3;I2L|EJB4BUUTme*bdp0QmVY`hHM}8w7veyvSD} z68cv>Z&C}!_2IrRcamSb)*AYZ?ZkRzg;-qQiTd}46RYCB6-Ix{%|eO!r}W7GsyY1c z>k(_E%=JSV8?nBp8R=|&eds^xA-;E3=v^BVtL1UMm*$4`2wUg{eK-7Srw`OV*`*5N zN46tf@u`XYRq;GnO(_7~Zwvh9{rS`y=KV;%xx@ePBJ_v(z{l`>6TNmGaMNe#k3LzD z?;v#fVJ>3TbqC@b<{*|9;Q4j(0}1{?n-O0+Gcj8UmaF~>VWJ<|ws_K&ai)H58n}8o z#NXQrR&hUJ)%<>6o1KdIj-yDI%b|U22EJd`-#GLB+R--fh7{C40@sIW*B+w&Dn+21 zzbD{_Qi6@~q|L$o%UKp&f1q5&?=M*0@6e+zqkg44>d(xGs z{*}01Pn*&g`lkfs``QEgi>qMIPQ+@;MsVF$V3#vs-51=s0Qz@fQ?L)NN6{y>CuTEo zy^8*Gg(zNoYSy>7G4h>5JddhCtRLx(_)V?B*SjPC!z#qeq5l2Q0`a-hNLM_d^VD>h&mN42 ze?3^KF^O2+fcqyYJEuV(TZ8YD`S z8C=2M@UsD6c>vnO-{5|UTKrhFe@t%Ff59A&5jdadYeu7eGqAmS=|SM%+jAcKpQm1? z?bnK8f2*DTL4Q5yOnxq&^I#kgN(+n+b=4i{yD^@4^dZFWY5{+#GsN=32E^(J%#Yd> zj90!H^Q(5U66sQIurfLTJbwoAKPU%&-yiwTRww2a2Y?;Lh}nWbVx>kAuxB4)&7K?a zWg9?WnI60c^Pv`>j+h6Rh5s6uRW+{%Ylr6#@+ds7PFydC?}vVXlRmEmN4KJQalI)q zE8$HXcW5xN+8obc4)GWVK3ErehE~Lq)C_DsFPE(!N*oe6iF7__8F-*ee&EE^n z91lM$2LC7x9G#U|+|LU2r3`h%4_NZXV2&wS*IzYShy zwl^d*tw)mz;`vy7*G2S4FZ27r*Z4hy`;hafzw%S`&tK;AHXnUOzAf1Q{M2`_?JD`% zu{_AX4C6;?l7l$p)mPHh?Zu&Ae}eW+ElaGI&OqZ+&4cH2#%1^py=`si7qehIx5xTi zuapzx+1`qDxm;Odw!S0$-ZuE3gSDEa!DB{~u1zfkb~iuYDhB`W`-h9)Z<5ZVCz4-T z0%yndZTj0P@aI{9_H6M2KiNbq^ZdkOen!5MXdjDx3s#}~ zzQpHa^0};tKQxWrf9b1Hzqnr--j6k^2fYSaS53e&}2JVdC6SB;9_&b{O@3rN` z$^mozpt%$44=y3U+dDn%1K{L|jZTS9+7sdU@^dtCu zNGgyU`Ih1HA)Z(h{u$=y1ut-~^W)>sA7V{13_m;RnfKu`~*59c! zkZ=AAtltaGMt%R_{W@Q|l33aPhIF;(1n9dTB7WCaVl4#kcb!YT@qNz&^Yez`+wlEP zTYNslblh)GiNp7mEK4isQRe4iT?!$7bVX`Ee_aH6NPTRNiwpD{`2LL-ElDirZc6%> zt{=?yG)MfYUWm7MM*Pge#7fo(V%ye6(5DU{{$qU1#QjP1+88g~?uF0yw_$wnr?@^) zs|aQmztm)WxN-j1*m%mX)|*4DKZ^k$7zyq(f%p&q`Gd4Q@c8QQ??V>FlFkR={;l$S#4}I){#Bld{+6oDBfoy#{CsYGNANscPatK+{Rs4B z_&kQK$M5s?HNP-EN^B>;oRk^uNx}URn98HHl6#3L5uZXpL)!;t_R_+86ONqGO zgEFfL+JDN_FL!|dGQKa@qPr2ZYR@U3QV8Gwu-x$LO|XC2%=OT>_eT6T+%Mwy{^R$| z`!y6me@T}9liW9=zst-;e4Bm5iV9|X9oRpe=aFvp+ZSi1>rJ(&*@#bjM&GCDZ)d{4 z=^egbi^uqoCZ2;nZwBe&`$S^J2jj_^!Ti24){HN64c`auGUum6e7~oS@}_w9;wr^! zQ_b(obGcJ{|C)R>7 zKKQ2k(5GYnN<)egD{aicmm=re$H?{ zkR6wxuLwnZ>fJ#61zfM8RJaDV`0e|@(R`#&Itl$=4f_61n|hvD-H;XE2R^tZkuE(q{cXT%dEC!L2~R~ntG{XrzQ5l6 z33`_A;JKfXPkxH)4MM+wEq;5_e2n*+|B}upZYEYdz7ea77GS)`g8%TRHiUn~TjU?o z68S2;M|=fe#J~8C{1(4@Sx5g)!S=H8x6uDT5zpU_gRZ3^pT*A)AX;M$6b^Q_uA z0sWEYLd=UE!1h0oh^>C^KM(WkYK$+{;|};V#*ezaBRC`j@>%@+(<1a&JLpn0u5Z!j zr6a$i;!pHnHuT>g{w9ZLy{%-!_)yLJ2k5oIe9kVc$5YMxO)kLy4f!NXJpZw$jj`TP zu4FQ|mlMlb&_Apne!roQhHmw%-@?L-c)4F@isy?vg3DzAKW;{>*|QQ`{p{;1^hcBI zq>Jlsh_$xn{&;F$?@$Q&t$y+RfpDXK{Xt=hmoi5Zi~I4Sevhu;n4+lP;%6&RKaxgkE+7}B+9^Y}Z~mRQ;OipD$pit~pio@>Iyb}+|4(&ZT# zuhP7!$lvuk>AX=7@ZUPI+B!3}UpD7smd=Y+wIEG2jcu7u5YD$ zTGSiTowN5NpW`|5Z)i%aKl?x|xsL*8!gy6LoA>K4=5F@aRMIt;gIL`$0&MPI_0AX6 zchDUF#jxI0JaGK8v)CSOl#Kq2y+eN46XTOT^h5oj*xx+y74m7`@E1A_f4_pL-?D$C zi|7wsstEtgyX03+mM50#$dq5M<^#6)oe|hxc>ud-vjbW?HF2#NzufoX=8o5=#*UiIo)d_#TDl(aHmU(Rg6Z(vdEI#rXKr^#G2q z0_OQ@`g~$V#r(wcwjzA$Bj8hg>3kqp!TQJX{TS-2 z@tSm7t`O3ddN_YLLw7^(Y|ig#xPP;p1?M+41@~*#b7DRazmI}H$Q;jAzY}Y1b5VcW z<{tp(El#Yuc@XQ3F&|6w-eP~GVZGoOn4NU_MGD&g`X2gY8rCc7l}q5#4@tMj)gqS1 z-Z%Z`{SrT#^U=(S@c+d2*m8vtD`^=q9=sP5Yk6^eS;rH98i3<5dsVdmj`{h=!y4c# z{y5&UVfNIRCN0C&b#DV&rH0TS8C4cu^ZZN4##%zg5il|1Zqr zEmS96H`lMyXBF`Y66NFBn-XiOIDXY{QSe_k&p$Wn!mr+K$wWJ)CKaKN#(3fP;&J{> z!}-Kf^$VS!lY5)%>5g5oo=C*_Wcm|Y-{^neqkL?J$wzU%lByITzcTOy>Wjermo&cv z@Gr;%J^f)~el{&Nkn`eZQb^IH!6pcDFTGrr?}V!06BAF4sQsQq#s2l=Io#lpGR z%-`=1d3G;iy}&`z)i#}p<(Kiq(){DZ%D2tLtlth|ef56utVhs0Z6M~oi_!kqHzg8F zEhSEJL7m;Z?bzhj>ifL@>%@+8lLZ~Z44rvFCC5e z=J4AJ;`)C%5N!2JW!!PTNC-rJM`2<)4Eb!iBxzlziUox_1j0}qw|9n(-r=2w`lvdrLBm0 zYEGPAcvE8M#kc4my)JZne$w^En4i9M{R%zMoFAeaAiswv@_PhfP zbY65U*lQ^K-MeEx?=lP=`@eq2YLn!bahx5^q0}_Tk5l|J&EnzFqw48-F#j|fvG0%BwcPei`eRC!}0l? zR?+nL-`vL7zw(SY^8fz)L@7TTZ1M9hxoG@pbC6%ERS3Lm4*8`sF2wT3rNn>u*UrWF zZ!I<<|IjVq^IPE`f#<_&*R~*^#qS*1oBCUeUkiOp2(eOrIq{dSIN!+4=0i_R!2Wr$ zjrfmv@5&gj=g=Ot!u~c!e<*H?5kDQ{UpYUMn4g5M&y6Kks|8WK)_XAY_(53D)ti9$ zefWM&(==ihhWGc%&XwSPxSzHh77sqR5b=4;{_vXu4&04=miCIzkFegDu^)Q6m0+&} z#C%s1%s;Jzp*;i9KI!>GVzFLA{!1T; zwG4BSPcuIcm@^Ta^&9kS=J64P&ujJR(@AFw&F6 z&#%?jpg*OF56LfcFyG~Z`7Jbo;<^7d(sc>rK|Sb3y4-9B>R*tG`o^LEBsm-E8-5J& z%%51#zYG3_c;2p@)9l|mk;oTz9=fG}#q(Co{>c=^#5j{`Zy;8#ek8xt0iSOx^}eA# zi=U^SgWo4B;+vc$md9oxR*&QJ4Lt|g>Ste%x8b5%?P*!~J&qD93Es#zJu})j6}r_g z&GR9@JhK(~xtqB@ty}`}TZ3`@7b*_kQ<+$al;ID_4BaIo*j^fZyCC?~D;lqQ{X)cQ z`X?B_!_DojlAiq9A@m<>c7$~8+E?_~;(rkD`-u2U*ZI&xUJ$D(6Tp#A;MXP*%jfPA zOKbZ=XV1a+YeDzA4FB7sV8=;t@8{rV8^ARrZ0`n)58mZ1>B=1(KWfh`*xnu(Pt5Zh z^dhS%UL9T)+m~TIILH(E1}*`IR|7wq0zN+OfBgJY2k1R9zw<+`ZH)ab?q5!RalI9> zb^-IHnve{C3g#avB9wH!E9O_0YY5_9aePR91OA8A5p&?Tna79!LdvIKxYUY?dd2lO z&`T#0vti|lm0?NHx1L4&58fi?7aEeT-Mde$E;&sshoS$~M<1XwbAJ^y>mT)={L=Ez zr0ealzquXnPvoQwG@jUuMacK?C;9oCZNyqp=;C=B(CcEnsg81VJm}BzB0ek{{&^Km+VHmcpfo0Pfudj;3@j^XLqpG-+Ljg zm*g4y5kGYav8Eg*mc~qm-W<#=eyQwax_(NR&ca`IZU`5zpf>g@ZwiwxZ|g)Xb!dlt@<{6eEzP$PV?^ZOib}s_{ZAFpq7WTh%^dIzJ({RL(`~>#u2!Ho^ z#F{pVSRMBjdXLWF6YVIUJ__|qiFv^7Mj~G{&j0%8(ePI=zYk0?&!<1?p?#lzA0H*q zzH$r5FO8`Qb}R%x@PjQRcFIe1n#+aX2;Xq>L>Ot^|`P3gu;2Px5x{g@?xf$#+6a9A&>wC3% z5;$rP>3nJ@a0bjzlCvW6XFY-VlpD|)bpG1x-<*FVK0`6I&*cs{67AD-CKF4=&G#=4 z%=3B65c2D9zrp`>1>$E&G+(n14%A;VH?f-aBkDht1M_)p{Qg*<_7wS_f5q|M3+n@M ze^2<`-48_1b zYN9{S!H(Y|>-Wl=jm%%^04Y7K4FR|Va{llYrK+hHl ze{HOnv{HS*H!#1*`(ugO{!+;QX$16K`-$Zb)1jwp0KTn#26RSZuUi80rBfbKDZ!0g{1fEhD{rxxAZ_=5MRKM2f z2k#!FuAk2uM4WUZg;-kdN9-P$N^JEPl%#a_3?hN}lS?dW$HXfe zbgP8%>Ye3r1aVTI3e50$i(>p4|IRFmO_UO%h(CS#2gJ^LM6r7EJMfDXVx{wY@Yf6_ znec126tE-_TjS-z*PvI6COtXeGBJM|PONnP8+yZu#8$u3;sM&e{b_llzW=txZI8FX z%F`vJ^D!Sw8ywkcIsC7~h?9p81HTF-mR)CnW2d6Me+7a4UZ6c{5wO{J^4%QZS)Qb8 zEx)7vzCQ3jd5-oU_(A<6Pk9VIBBYej9=W*=E_Mq3RR`gp*%0k9_Z7cZvK;wmUVwf% zfwos0)g1L_*hMTmCK7Y+d*t_Su7agw@Gp&re|nV)Mm}ZLUg%|(LjV2o34;{Ci^WB^>(SZq$COU%7A`?JqTs{K;F*dk)-wO{mp=zI_O~iXA_g|TP!c&W~qd!ad4HtUc zn>~(ydlqF@f6#$-tfx4)+Hy3E@yqPVs-(5_A1L0dB=_h$cDCfn7JT;S5wBaZFR4vW zwXf*Oto~dJMzGs&TCx3a76yl>da`Gwulsq3{z+KwHgnD|PxiaNLQ1%~E-2!~tp1A& z@Bi!|-iyb1FRGlOju*SKvvf@2zpYq-2zBrEKrc3`-vxD-XwUw6jkEPvy_nVSlOZ3U zm-GMof9s%u<4#r!W0g*M#(Knhvgp-=$4wRIrtJqpzTSLPoQ3Be>vz6I7}KiwHtE#7 zIJQp_{U!E|Bo${a$-TKvoO}4U#8X4Q+Ss#(Z^q?qC;G3>_wy5*2{!k)^j7p|DGxR^ z?Buv$aqdaVpRM4m69rjWqdg_V{u>_y?p{J!Z}q1QftykC(p zW}nh2Xh#VzX5D_V5rxHArTH{!?WcX|l~RnCitpS$M;`9z+}{7k$|_9#DaOX`Y%HBH zmozrp_$#o$|G~^MY&-WzXf-n@m#*H+(j}Ysf3d%F*=Zc*;O>vt4;uJW94r48^{JKJ zjjJAm^o(~(Gxg$$6~hW-DPr+z6}nz6ob#%|o`!rvSFCDofbSohhbR=Dsa<9KOS zhDjY}o%rPa;}{z-Irg+vmf7l!P5<+~3wQpNe!9=UWf}h^{>HoVq`ls|>c@(?<-*E= zS@%j@-F5NHQC~54oGH-u><3pKchvD$`g`RW=SPA1(o zq)vCD+U^wb`I4^W?RV~2)a@6UhJF?CvASpTx^Mj5OBK$s*Iu5nI@^mBZ}8ba>eM_X zzH?c|LMQwTn3dx1?EGTPhy`Vt)3#H|9h{EEO^co0sCWft&$N3;hW6>0t>{#jvAHTS zY3zhk!-jwJcXzF?x>c{lScb{@15c%6(wKcCA172|+~7brrcHg!H;t_P-@f>7-xTh8 z_}TTW%vSuugem!}FecTUmj6OJWq-xg6_}Rxt$3A6%uyw6ec{M- zygblCPF!j$;ej zY@ICp{nMtL_bJQNk}lW%wOss<@gqq^B*|r}F$ao2e!D-N7rB;PmJ4=HXztI%t2!+< zFLM$9Xd#O6*90&2z$dks_@`R*bROf?He*N~V%;75p+jBb&)zwWSNo1N^@wxK&1Jkg z?%wq?UZs&Avl_3qO9#O#=ervGyMAfu7m4`UCyJL+`@HlQy8ZGZu)X}}V}fl>?^Au! zmC~+Uu(N(f=#@WZFxYXYHu!70?4(cZ2flP(97`hp!E|D^qfbHNsMFLQHlVA`V8_*G zl%KnOB6c?JjQXbk($CkrW`&7~^#&nZu8 zkG4?Ss7>S+vI?(>*nnLVwYYk58T8fJ}=jmu)CSKJ}Z%T3TY8#PO zlZjVp%Msds=TBC{U~cbQii=n0n_yyf(ZLeLUCvi$;{T zbEWv3{xlyKFD$M&xLlz;2J;COkgw-DaV!cpj!p5Z(y=N0FT}AcSTc@X@xqN`TfC~q zu`OO%ZE@_2SE-CR_QeY~<_7U9O%ij1c;Uv}AzoFQJA~huTLeqS+#+6V#@r)bC1dUp zFE&pxH;Grtn483lEsL1D#H(b?UE;-I%xwl26my$+u^ZPNEe+s!y(6tR&{4gdy91UboRFIRGs|MAmv{Fv%>efYS_ zu6#tEt&JY`@nd|+tiBDNX8f;iTe*85XJ5<2HDj%M-H$mw8|U~eyg=ERuG=#Du_UoX zdXZbSwL!Im#yt1C?bh~*CKnZ}-{0o|{d4b@O#Gj%ZaLbw>1W)s8ywgBWE18o{;|&3 zSn(%|;wFl_D0xIFAc{DdNH$SQh~h0uc~L5fQcVVX&5@o0;BSjf2N{lFfi4u=?M0#v(&jdPZxYJnEejF!u-dJ7L_b}{?Yt4`J=A{Ls{*mzV=r;!>+-~o(EZRYe%*H+)o1g1NP2Rv)u=!9 z1+iAq)LVZdw)(ZiM&x&t%1(2yk`M@9^@wv7WCoHWAu5t?b zEPhREi})ZHnhW(Ut%&Vz8NijH+qdT>w)!K}T9V%x;ZC~j6^eX$J>bvY271fV#8!V~ zo}SSET}Qf7q%*kNeq!(QU6F6%VZ>YfT6_=E?Ik$rvTnBb1m<@A-%iN4>>25Q_-8i4 z_^#NK=6-otT^tV)QN&765P05L=oY{JtR3QeohDr$Whb_$9w7Gq*a~{&Jm?m`R!k$^ z=GvdmEy>>|BHnE%apbkX5Pxtwv6gQ-^nv5RmUyMbRru|F>D;88xJ+!*RdA>4(0|MZ zTl{izD(TKj^GT1K{Te*4Km0*&k#FW|Vyj<1i~5|IGSgZvGBy$ZKDmg!J?{`Z62Bwf z;@96c=|sm%dbhg9tL$h*Y!C7z)<*|}+f)Tx{K>V)knWt2tscd{7>;}uOA#yn(Czig z5?lRxnaPOXbPoPiQ;BU^)+68780anc5nKIo*b>xt?=9k87ZcmVuOYtsJjAIcT9YE|o6l`BS61v47nP(XM zM;iOn{@WH!Y}?<7Sn(f0Y;Vz?*y`76c7~ttBVC(={&UtEi+pa)p}+qdy2Y==eeGz} z>s}}`GhQRFy&+Dz=t-;{#PJaq47U2^7wsv(yXVdZM!eQJgg7K}H?dy29rV^mh^>Bk z*e%j&y<^nx5!>0w6LPyt6XVr;!c*|hLXC+ZJp{WxRSZr)uC?8G4cR|{IB-V@aomOG z;P}qO?j>$DGJK~t$HD*d4)LVx@nF|%@bA6}uI!2Y=6Wn-7W}7foF+Y~>jsJsym%P? zEH>iwFIE$~4~q*j>YKD_H2k&kY`egoo#5Zlm-JJgdxNX(Y-sq?f9wYT{2|1r{%uC= z{-QnfHm!)`epM%)l%*d0M^)&04fyZ)5(hdwpxa!*kJ3T!FCjiHJ+XLpYoJlTGVmbv zZ_=};#9H1MV)v?piRBmhp?}W`y;o!Gzlj~Ne{)4)|GRF%{$JYy`#<>n^F#J*$u&w=|eon^TTBB-9=LZ5fE&7Z-$HGY9fr{8pby;{4F? z0@~MT5OIRPJM=B3iM_{F2WM=_48ImKlGsu6m_NnGmLd-Mo|Sa(^BJMXe}HbD594Zz zk10jG+&mfm)3rD_P$s|q%5(B72Pz;w@jG~>IbSXQ>Sx63?J|=teiKWaT(u?meRJYS zdtYM5)X(*(zQP*xnU9H;9UY*jO(0I#H4c1jKKy(%_(x@M#|XqHl_A!$_ae40K0xQs zj#gQ#I|kW#9Fa5=o=%zHSdCJ{-X7S z{_Zq*R0+iAxj}5N@g90<%;&a_KZ%p?--Ex@R_I4w5IdvW6Kkh_ucwxRr=B6*HY}1@ zS%mzyTnu{RKGZkRk66!i9R9#mtk({%A$GKVM6BFh4}aZh;5M_MhfDznO$964iS^Pk z#Lm3~iM8#~#I}ON;D0ii*dA9O+^IO?58<;7rDqkyKd6KC=;@+}FYzAhXaAWUjQ)4N zpH5uBa{@Sg0h2rHxL%}XZ!L!Wu_JnN2k!j)3Js%;yMt9`9kNo=RZp4nE zHAvSpV!dy_QyDzB3F-ErIpFJ9-#cqnK)#ZAzhe7Po>*(!0{IqahCaC=vF%$mV!3c# zaL4<2KUDN+({^Woy;HZCyBR_Q~wwKI-@vt@=yuCHCa^OchBcJp1 zc(XsR6Wg2@h_zIFXKGgm5-03hLAt}^46*Do1^&T3z;y>guZeux$?o83)se4DG_mt} z3A8_JOX#J1i1jgziS3K#5GNcCfqgV_J_dzD`ETFY$CQ_`4|0@?g03kx37^`3F-=-{T}-2DDW#Be|r0F#17x1#7CtdrL9{$h6;lF&V zjS(l_?-A=!8;BhpTNBFr@M9lNd&Yi~9Y+Yc6jf7CK! z+puK#`84RMWuU)S5&z>M^l}5B4{t=eay1Bg&0zSwn-Mz)FM)o(D0u#1V%d=i@v{QZ zzxKyr)V?_t(Z2(6jgoU|8$aX!m6Qn5?W?*HYr*1sYVrS$RW9I_^}!+4;D2?xm7zO- z4a9n;&<87GHacJo5X8#BrJ({1Lxl2x)N= z?2c>vLlVnleRdAl42RTfNi6Ouhy3mbh?PUQ#xkxPhksoby(vy7a()w5lA3_{=t~u7r=I7H%H4lQ-Py+72JRHE_kezrHU z;_N~kw`(%7ydVl3Uzy&2YNgf@hpY@FmcQ?VUU4S4@eN}4yVJlIKf&*{fLL$x6MVZd z^rSL0Ke+eEPpqAs1RnZ<>erhdCyq*L241s?IHcNn+8(Xf0r5R_DNqx@K{!gxYpEzVg9q8g- z8OHu_&r*|Eue=OAvm3F}`Ui2`l3URKy#xMK3H6`;2|hFpd_A%Q?qI3)fAvA#QzI4(6EeO{7ay9vLo z9C7k}o!C9L3bDSsawlVZLiT5;&s+3Oct00%pdt95bfm`xcLg`DNt~3pn%MgJja(=% z^{;)n`T4w?$p^MnH|o)}Jfu4ZZzQ(*i;3%W^bEo{ui>~K?)6)+JV%?4|JMUDi&R8dRTUG_DI1_i~(Bc745)nd-Bh>h53ZMuoDi#}=NLn7%$cTKlZ} zdR{2YXk7oYkujgoNq*t>P`0jJuQ8n`F}Bt4w+>~yn;$rF=)FIS7<}M(NcB)we&W4` ziw)?`;lY2(AU_U*?_Mc#EOgCAY*`Nj6>lb%; z@D;{>@7qrG7LQNOS)c7*;pZ|wO2m(uKjyV}eU|olYyHk0TCghBcc;4*P@gq3uE$&5 zg0%=)U-w{Qede}K)LS~3-N)!z5Vp>^e#a|-WuMeNUGAzOY)_i4S4pMqG+{x;22EAL$$OLiB%&sF$wi76!2B3gW^%*HCt} zwYzJ_;Q?&I+;#OY`-ZY{H8%OR*%ZLOz1+wugoUyeGhG6Y&kSG-9**#H?GnnGv|bg{ z_Iv<4QBO|ybxJ5ZuVrPcMSai0sz_l|g|1#bG~(X?*5~$I{_KdzZ;pMxSyO-2xhXjJ z>th#-HDC*}&L3YvJo~10*9v|aJ_oR$w@Uaw63^QC*==r`mJ-0q{+fGi^j%TE(H~O+ z(5_dZtk0t-<0gj(u%lVZ_-|;{k`;}cG`7wmaXgsE+VA5@9AY+6=)3tPZ^rmnuNBVR z&bmuQ_SI$^x3{d{FjE_Lui86bW6f$Dmsv$I{_+{uL5lTlGh;i&`qszT4`Tg$*tqUZ ztbhAFU3lDBAI~$slNamb-?^40^grp{tPZD#dBZXR_K?$u^N8@n7U-TVLR z{@s>p#&x4#YqJ^yR;M%8$A^o(=f?W@Oy%^Fv3{;g*9F#PM}scJ80+ibZMC*fG`YC= zR(+i@|A;?!tV4S)UU}4~rpBxKa!H8sDn-Op=JUjBe^-OVhqj`~qPU6TE=nFz3W!of z6q_g|MDZ4-yePh+)DxwFD2+sEDoU^@twd=fN_$c4qVy2O_)dO+C__XUA<6_%VnvxM z%1lw_h_XPGU83w0<&Y@HL^&zSSy3*Ca!Hh{qTCeat|$*gc`V9vQC^AiPLvc;{uSlB zD8EE;6+1DLC|O0xDN0^Z3X0+(N^wz2i&9RM%A!;krIsl5M5!-ILs6QF5-dt9QQC;o zUX;$FbQ7hQD1Aj4D9U(ICWO5@oI^3q@Hf%1TjIi?UvnO`>cQWtS-XL^&i1 z7v+>Fx+s5(k|fGCQErKHPZXypPee%;<+Uj9MM)LqizsQLNG@XEi6V>QCW^Z#c|<87 zN)b^?i&9RMilS5##aEQNqA*bcMNvd)CQ67XVWPAZrGqGTQM!u~DM~+4qC^=hN{lFf zi84i$8KTSqqEg@4z36eA2~B#Qet#=vM#G)6mUv&(qF>pS&Z#bmj~(TlWm{7QeI`el_6<{DV=S z0g#UMtx=P5_9p~A1?YyjXQ^Ui=VH0OS<&-66q}S zYhqr5gO5IiUimcgS^ZV|g&O&##NF_Jh$2?&ZX%WvBcXrZ4&CCHa^QJfZ24B?KOYPK z{k6nu^D)Hy+gw7<=2aGm|ctl4tpv-tV!HQ2sr{3}sEYp=rZgZ6RzMCca(m#zhA``D#Xq_YmuK+gQi}USS^9=VSnNG&ivp!(s{|I z(BqC1GtUXcJUw)4dwD9JFQRTAYUV$Q{vJ9T@dY!Y{;(C`R{4?d@d)BS;%7OammLE= zZ7Vo62K+h>d~YVP>Utb}Xf*iaTw>|sV6Y`0(=k5The@PMYe&Oh0DiWv3G!_;_lL#L zZk>UD*+9g1#qYUU*f3&N8^8DFjRt`&exA1Of7-|PZ3DkG^Hp99o(s16rHFJbjpJ>* zUnu2Q7k(!{YtHRDxQOB^yh!^{hJZ1>dhOTDFhtn4Pl zFaJ!e#;pQ3K!3A3`{9qti0!KuPpqChi}qOZalfJ1o|zX&=lAeD87b32_@#cNzdwBf zy44?y=i9I^d!g6xMSnH9NvyimAm*!(Puw3F^(}t}|Dl@1>bRWf?|XQD4jc1|bZIp1 zcc6|-hyIg7;E%mSI$O~L`Sv=I&(a=oKW);b#KX`xbVmH-bBJ#>82UT3-|ANbrz8K^ zjpSDs&O&_0b;Qy|JYPpGvJ$$*&j&`pKP(D#={mdQ-gBujXDz`Nj3b(6??wd+yj#--(8(ztd##*k3#MVZ7Z1Tm7txg#2ylz~5OTzZ4LJeBl_csu%XZ>NN%VuGWBlXF9fTm^b35 zOoaZV7`V|OV%DJ%IBgiXP6@;h2u8eDEn+EuTQeT*_3eQ39dFwN{^5AOmed6Ep>%N? z;!7l>|1O0i|A)=s1voy{BHM_iX!CfDj{$eZdWIhwOw4_jB40{Z=(|S|OM9^Y)Rz&& zd{u4I)!5R|Jz9cycoVba?$lnk7tCFH5p(Y{@E;w7?X~Pbw$&W(-P$4Fw@DZe>zfmE zbs+ZFhN{F?KWl~YBYp==x-@Aw;yZbm{`KHRg}@d++lTtutt!yp|AqPv<{{=W(ZnpN zICQJOKaNK+UqHV(8ve575PyCN>VIVF7Qd9Z9eUvcxSI5d`C7rceP5!9;pBvNp!gnR|#eM@@{OT~w z$I`o^h%Xq6`aH~d??J?sj>)Ls;#V8)Bb_Cof251ppVE<+R3C4P?N=ARBDVV3(bE(! z^+|zWJ4LJ>zYl-Cqu`bIh^>CsU<&F#`xX9hhnep#vGg3{LA(J)yv5JcFh0~f$?%sM zgM1IJfzO!Zr@=qORzLr-1MMsS0)BV1y+yG9+15j7PuDAmxA@s#nD6<$Gl>6P|AHsP zYTa3A&)M6=RzFYAN$0C}BLCg1W_vCp-YXIDP1ixU`lsOe)Z#gL&=1BT-w62Cl&Oec zum-xtuezXr_>^s^FLo5_?}>P}&g{Q_tDsx_(z^+$PuotqR14#eB`-sLsbfi(>aT=u z@$=w##2;D>J>od(|9FU)UD`>^E-gVmi=XYo^Xb^O1&A+m81XBo!+&otu@pKR`7D0k z7V`t&i2UMyspRLcrXb(_<;drlhoT0wr@1CHJ@5|GwD*MX!r}k&*pa|mfqrc zS8ul?w)(#;MgQ^&k)(6;_-12TA$~WG7w*=Y*y>leZ=m{lo2H02pBMQ4bO14{y%z1W z1rl5RtlSyWSwYi3_6m4Gck)Z0o!|#y$Y=43=LaGGGe6RKXY3EQpr09U&NtW5A8HZY z--2goK)U*HBgLzo>cMZxC)GFSONo)rs^EMg?Ws)6&trV6mn$LO;#Z5DH1p?$J`Lkt zDxMo0@ej6dxevJVIbuGg7_smC+lb$hp7@V^Yt8w@<%UA%lj3;3p}H`ISbg*q@!?OP zTl``kho1Ee>D&!|Hs%m9OZR|uHvb6nS^Zb%AfBIv{}zrn);O{H!U)i*#c;beESDFP&aW%)Z<}{!-(>fA|s4UkyV1 z_$2bPu~UhqHP^u9Mu8=BJih-6oC)>u3L_BjV#(JK`$PJ7ycv&tYR{g;tifsIPcrMb z_}Mlbe=JL9#1F>tDsI0HzG1G%c7!3H)t||1PmA{OKYxM#2<}SEgU~WH`cfA^o-x;f%UJVX7n+k)fK9(8*?)Nl6pZ@;+zE&AtX z1Ng;r(`cL*s7lOQ&4r#^7i{&H#QB>Qss;TR*1!CoAF=dy7TQyx5;3><`D@Jo(&;*+ ztHV)0`%s?v%Tm;@?kItHi(j1A(7tq~Nte1}zL1WTH5un4sZVa=Km6~^^Htm0@b^VN ze&Qc+1lBh!OD4<@4{6ZoD}?j0R6K>4m%@C&hHOOq zapdC}Ho^bF9B+X*-lQg&-=spS%$`j=llf&HQTqP}Px?^3tSh`0Ku z>>|IK^0AT8k9-5x)9M`={+yT5zxmS>bBmvqHQ%rF`+#_#!Q|)hVCgN!6EF3dbgN&A z)=_`LYtq#LR}r7~2>F&km%801w))xc^+25`h`)^El|R0P_{UgJNqz4S|Kaz-`iQMt z0DsyF)W2~m{F&zwv&amEo_@&QLr1K1g;qN*M z{zWdt{7_$FmXsN~#jh@et~y+yM-M^1yT5{He_p`*Cw4Q1*y`sOFkZRa8`7mv%-6h4 zR?4TwV*SFOyfgFtub;d1NBf+w5I@))56Nl7Y{3Z9rRT{1hySyR_Ky4n|E*Z$OMZg- zMod6_zZ^hA#F7sa|Nr}4 z^oRJp|Nr=<#tziyb{h4KUIRXK67iXr5laDkp+XX82HKgXIFu9uO~m#@w{;MU=6X=&tuHvwO%~(d!YZ+mm9&W%=e?Y9AJxI zI*s!sFKO1-0Plyy|C3+5kHhx$SqirJrAos{7vHy$&Mplg=895Y0^BzTl~@y zv;8jPNLPoO?;j#!;6H=*OYa7NEq-aU+27myBmN`iGx3EsF>5v!{n@-b;w^rD9{t7s z8cjMciTOaS77hOq%x^p!;;nuu+MK^f3^U`gK2sgtz$1^N{_uXtXYos$G5)3erv7#y z{Fm+Uzw1fN?{xxO{Wgpb)dl^dZpZjxU*KnBmQuWwzXReee%24iCyzAyr`8hGUp1q+oM{Is;ux zd#rx7C;E%W*M~m~=3n-r4*0z}|C#)UzX8UtRKF4GZ;bk+ZncPcFN|M)p&s&C{h4t5 zNrO?JI{Ei}R1xtb%=sWs4dl1@r2zB$9<#q$|7zIYAK3rWJ3nGJzd6OLuGPWa&HZ&p z0Y4am_Pq}!mRgPg7mNVQy}=EdgKPB!AMJ?vu`R()e{edq-`YNLoFKljFZuaEY#)DI z3Grvm=W<=sanGz-Gc%C$+g7AO%U3K%9%SM9F*66-}E+wH%!u83r{*?bPJt3Er{MQjeF2B+jau%TL zKMy{W|1l(&_Pr;p;LFE#{lD^KA^$x;@BBevs7l{M=vFN7fB0QDQ~Ze+iLcZD%CD{v z_SDTsr2>H^1z+~3?~`oShUE3B{ed3glI8tG_)=N8kSqCY3ylxOK4Ikl0hK3P>Mk%u zCg{qKu#(TtwIcaldkWtsim;1{{0|PH_}f|vtmI2KR6-7u>kvQ1p7`n2iNB2G^4ZD) zEBW%KWb)tVWeAhd9?n=Ju>9e*!1CfwBsZq?u!?^Wj&r5>W>fjdUM&Td(?$rn^aUtB zU+VvI9bNer^%8P6A(8mEsl4RTqba=hB!``|1!kqX{^7ZYY66rS)G42Xzv1 zC7->iAj*rCwj_B%C4uEvI_yx9%Cj2?xsoq`>`D92d$Zkn`e9Ls!0b~M4}lvtCu~>K zlVda<)!`w8hpP#@^!MU&Fs?3eR)wgh982{}gngO(jYi?jRj_HUQvFZlKmXJ*I3ja zdq05uuTAvh{$+igz1+|ke~zK~NFj#;bpXfATxXACqYf1EII<6r>n@`9m8r95D98~Q z4Vo!_yT(DBFST1o;T>2_*n#W`*nrwCxt!fSFXYn2KLmz6dcOqrhX-@HoO7GPJ3mrj zDV^-A;LbppfJi3hVQ?h?$({wnL`h(O> zE%xtWe=Ozq53;9_KkT9WoaiCstj#fjL1+IYHKX!`k?%=f-CV@aGLP*Jd0lvy<`MsE3n7;`&LVyXf8w7QOt>1YCz#xe z_0*=5yd=K=LnfBwuYN zupFveuN08I2RQjqd^-CXd6%=m(u1F(y-Df1_B_~u{AW`BrM+ZtBn`Bn@GtA+`i%)2 zJQM!q#gspAqWyaIl-dKU@s#BH?F65hU!?YJOZCMj6bTGYI{%Y(_Ahyz1t0GhCB7Z` zhnsaMzt3s@RqR9dG}263uSs8TQh49SkpCT*2shW^?quJeeg67f^gI zv_62r$pV*Lix=TZe|95zAl08V&{JTk!6eGx>j47G$L9%*?;I0eP4hANJMC{npKTOg zWmED$aJ0a%%2r^wxtQe7e~Ix#T1Wob6GIB`vTl77q1!)QO6g~Bv^4(MP8I&y7}}4; z_l*c2ru|W=h{7-5Z-oM#KJjZm^8b|bBR%gaFsxci>Dv%P@>0?#GxJsg<9QyEzxNVY ze!N*=sg?oB2kP4IXbZwKX+44VPK3+n*PB1m_-hp-{L3~H$tzQPm$UL|JibWlC)QuL z-!Xb8@w4sNlg%y>|G^qcUzObd;a^^zPWX$izGw9z|4*2XuMt>|*6n|^^Cf=y z{^;v?;t#ze_-xu}!mr;7EZyi%{>`Yo@!hpn{Cgl9Oe4PS3&K5g>uXhY3cu@QA(yYz zC;uPj6aT3N$=9zI7`inO7z#;W3k|IVX0E#WGi*rm4Z8K%aA$$#G-^+hk+Hy%&OS^& zZb$qbX%yexFo8?1>DrGblKlI1qx4Os{S@X+jL{O zr0=D4@rl4vnQndJe@I|>??vI)-B13P(ReDQuOoTwE^DT<*5-gpYf;p zf;6Aji?Tn32RE&S97;?DpZ#n`coltr<>*o|p8`D!Ppbck=2t`O2|i4|O7fd^D891? zNIsXwYuv9P`653dmmVAy7%!co_>!IxzXGj)*)lEhQ+knn%L#!?mX9WUg8YN7{w2p0 zA(#EBKQfbjG+%TcK=JKeCgzipV-(?Sa|yS*C%&%?k4Q3!xA(V%OHYXTF8raJkF0to z=EEfkkAz%mS0d)i5P68O?`kohhP)L5%RPJ0eB6-EL$JGZ#C#rZlD#vtOBGleNcO<; z*Fj=^Ab%u2H1FcU(*qAvC_Q!o0?R*oQGEPdB2TYky&?@y7x{w%iQ4~^=>ns@DdE*5 zm(q3pSIOT*-?OCLp7i}H(|xa}lD#b(L+zRA=5MlsZ-r?}sfhonyY;WKt@w7-3VtVZ zgePn|xVG$BQG4b*#r*WTJWnt{PIYwrR{3lt4z34BR9fDA&Tda=aNsh|BigeH z{B44Lo}lCx-n(2j9M_kFeCns&-RTJd=MSYh<9k^r{p8TpUH{F4l8;%i#xv~$dH$tI z`PHoSE&uZ#f|i$uqZ8;O#fkIq$Mdgpdobs2FKf_tFuq-RX?0-y{SmMR>2T!ZZT+i5 zEPp2=9ObtQ|N50x9h7_{{$7H;3xLt+&rV|XL~o7&S1|@8jSA(yV1W=UE`m z!*jL^Y$0~`$YG6oxZyr?omP9kW4JS&@^gDTL+tLA?>2t12GvWGHvX@gK>x$NDwede z22G%kSKofFkj&qmC{cm@So%J`%oVie`Mm=hH3m>Et+^8KOwaP_yV1g?0p#%aD@Nmc z3qxC4>ZclmHrFRF-c${mPJ!NM-716feF)|I5z6-^lMgRx!o@LdRe-=u~OuqJ88TsrFk*c4A)z1kY&<=c-u&C>@txAElH!>obf zvmf=|)C5&Q3S@n>1l5e;14okA+by;Rqu{wkFT5H7<9CfWw*~b9e8;1e8E70w zp7q^f2U$OF9vqP05ac!CU;jL055;;eJ&t!YgRB#E#{Qh(040*gh&P2Q(53`ctFpuq z3@hO~7t<}FkiUnrvI!W9cV#e~#%ujPwsrxBA3OiNYKP(Q@vMm!Hyt)%5eV^uU!V7xNicvALf;v9ohuS!!F!__4~#dl6bj% zcZH|6TMkrur3T4(ieBU|H@L_1*VPM*KD_y`9oyx#PG`>)r8b3}I!j_+|8jxq+d3S! zlYAid+`1{=lKOYr%F|kR0bajq2ezdQ5xoj2SX;`(d9 z1>*F0HV<+5*oLQuw{V4Ue!e{cvCE}Nha0&-?0{1~b!U5n!=8P|!oeAG`)~WX(gd+a z|3J~~CSa5qH8m#E8#|nJ= z&(F92=>diLRiZY&#fuz6Onz0bg=3d`{7y>;OUN)@-QxHn4=A2<*KKv(|Mp{LKUa|RkPy<i8h@Oy zV+Pp5l;4m(5i4c8qEIj#{G?6Xeh0SgiMzqV>!FalHp##00SC61*XyNFkglpzLed?W zlHZ-*0}{{*GWs;Dvfk~VILp)EY`(4RPt#TqE{`Aas+|K<^8NTd7KcKifxVlD-a~s< zo`)Q_+sQeT?OA$DeBftY=hu6>!Xf-fJ}@xPrSd1yD{AMaS3hbNDUBfM9Su!0Z9Pif8> z_!7Sp_#1eJ@JUaBq2L2Q7+%tudFlJ94;(YwyTspVrziL`z7yW^=7*5ar9A=Z&e}48&sP!Ympm2;xok&g z0HsPj1(sfr4j$iLdN254ZHZrLWiiL{k1)db);<;TtE9u1%w9g`7!J@HLn?MSnaBB$ z5{ch^j+SHapfiQiw*H?54j4~z-I&8V>&|dK^sC8`FSSc2d3?1>9K(otg#D{l;8@yD zI(g~6v!1}ieMNmqCN+e-WI5?rq@p4bpEP4O$xja}7U?^;SDvub6k?3sJ0M1N;L7ZQ$4rS|ZJ-T;?^>}foxq8l%RpOJqj-FRr?O7i`>@h)$S zz_@m$@P_$|@rB8w3I91x$mLw!c;-Ky_ycw0&zKtmms|@X{>o0Z0Mjp@)%AA+a~eNl z+6cMqUsd#XSVZL|+w~M!YT1$GabN0k|B{hz{ODAtp1=cj{s-yR=a^}9Q5&+ zkpF(6rd$s1Y5bHEUQz!%OX-zoKWQxdAENP5e%z4o`DpTgAkTu!<TpHMdmSg`L5x2vnRK2kOiy6x#d_jr z9v6J6Yzgs)r3j2?s|au1EcmjQPRHARvB3WsFP~?N^hus^qW;-c9nK#~?ZtO3$w!YC zSRSIocY)*z|1ffJ9K(FEm>)DA>*fS(X+uYzKZf@faye{p68s)VTpDZ#?yQX{!!QyUp{Mo3{-p!?O!3fK{O#n5u^N$i@}z;Ks8B z@{;dy%+PsdtxE%KVZ{im<8gn-v5e)XW=y_q`#Zh;`;QKHS6RcrPXpyEhvQi1(1Tv7 z@2wzZwXdAIJ&yT)cKP$rJxkbq;?vP72Qh!T{MP0AgX?h&s#nnS7-0#WBbpprb0vr*(0g;$#V%jyz5K@FI?>w%Y8jWo!k>$vyHk_8l0AHJUY~ zaO~EBb>?e@B77S({X@X)HXWG5-;TpG`rANP9)3&*mSxcO&_ov-@Mv)|a8R=jOllM{ zF=?eWyh>~pXd2Oh$@^=KIqzZ(^|Gs`O&N&gvG;{X)63Sdz;Sz??P!dTuMv;fz?2nV z%ZByuz{FdRU<>8-S=gy_Qec5CENOpMiXDXYtE*3C`C^qiU|aCzZJ~U<54!gqV>h1r zIuqY0EMK?iA;$(C|0&lX$~B5|4WnG+DAz#BHIi6kZS?`wui}pn4_iYLPfNNF6!SIE zU`t^4cj~8j`U2x~|6aIeUywI0%-t7&uj!VnAcNnXhj*N-FI;}|HPHgHOx5Avy_$jK zd{~q8(;PPay~kijzvdwGw<#BzLKeR}ZYW}|%WBhr;gnSG`@ zf1mAEOZ`^-{Wha+xaPy|oXWR5mAV_HZbzy6DQ_#~ zx*?_RNVz6ewij$dj<_b;qFWQm2L9Lw@X9asbx***tU8U)S_>~9;Om~5==iS!Izx2j z>!P~&{|(TYp;Mvb?GqccxU0<*|N5Y7fsRiCg3*Pci$vE3U3+w$&~-!C3tew?8R+mQ ze_tTRb5+VDFW|VtN`bQ~E*BWA7LokST#_sJ((olht_fW$wZ`krVl_7cCVEi7SGZJ!Gbc@2<&{$yk_I1K~7UW;Sm-X+E{E{j8e|V3=8=~W%xcxu;|A+rAUVOi` zTbBy|()!*4Ylf_*@KWgeuMOHySjm5$O?O|t7b))cZ(tKN6IfAf)565WzpY_^Z$V(D*xMCB5ODQ~9w2kB^Cy{>z zAI4LAR(Y-!a!AQ1|82GqU!UYzF2`$M1PH8}+(_X6@B?L%A8-?L$)PWWcg3Fk*U(USzBL7wr^FCW za3p!FcI1CUW0J29C;9V=0!!%*0&AQpKXA*6@WG#=JWH(BqNwJ{v>FMGPm=9_D2r3^v8eiDZ>4d-a6>@p% z0D-g4k0<`RWRgej5STs5A$g}gB;Ted<}VumS0cXO^BISGRDRheG`~?9kpDLo1Xh`D zCah>bs9zJ|Yi!g$JiTCQOYxbw2rM&efyMlY%N2Yl{I|z^Zl#_4y!ouSW~>3R5h>us7?z?|y}cX70e(bbS8AfB)OuDyeu)NMujn-~Lqr6R;#W2E;G{a-;j*Ek5dDi(Z9upA-=2+pP#Q&gOV>N-aKmZ)|bUy zuXyCb1vL~d!@b7(&6v28_?j9VI@UOGugI4z;PdgZEN5TMl7KD$Zx*C>qE)MYH5NO-PYxGsd}3xH$B{ksb5|2w<&JQSm%fA>E_-n zcIJeUW-lyRE#&y2!GoncBJ?3I39c64!m@%H;c2VHGcGq zrtART%`6}7%`%%AHt+hVDO2)&hPVeF_=^5FdHdE5YRaZkn04`b2M*#P6go^<+wZ_mrwQ zcJ*cpc>XSXvKzNE(-IM%%ggrZ zq*aJ}uIvq?5Igfd$^cKM$eY~uDltbq5s2KIO+{$}25#A;9c)v;$6cRrVb5SLv3Rk<@_CEtnf#U1LyGH##W zfAzF2D{u1-trlI_b<&m%tCW_v1m!Q!^nCY=*)k>nZ@e4+kg5+Woc}HC;XNC+zof3z z+p-UH*fr4T`3)OZwRg3k*@)FHyF5(K+At;m-7HJV*P;*8a(&M^8+MD=KZd8}d#ihG zSOCAn9OVtFKi`>o+=eOn^=9qM+hc|K8@ZuyjEN1~&TV86OWMTu2~}*^O0JJVoUCAZQ?qG%eEq3Bex$0L-mh)UXZkE@ZiS_gsOIDPbAcwGj`n-~lUDz32EFUkN zU(7gK-x^@ID_*P~9t7$99{-eYVKY~y;dN&m!E9`_c$cRo1?l;LBPE;vdV z^K^>7=&S+{-_H}ChDCyg+XtAs!mC<=W&RJMz!*7vuap1&Z^pIqv5b#}tN9*!O9*m88INRYe(e_i=*3p;Auo6x*(Bq;gX zXD{E6tK$H!#W4Z?!!f;FSNGQbclwQluF@X7d45Jmf|3t!P`7yB2K2eT!5`?K@As$K zz;|cFe@23e+YR)zfz+wlusS6Ylzgoz>I%&rAz@6569dLY0$Vb9R7?wp-~J1^PHeRU z9OCa;q+|R_zRqr?l6~}~>@g^R)$vk9nGH z>m3P3+`gfQE%f~Sp+)qFNGLY=^Ha$)Tgv|e3}3-7Zy)nIRVeG0770mQ*Vf$yc60lQ znURpw_Qp`VR2Mk5uu9&w_5ZY|a=x%5daHsvg@MOUZNc!RhRsLisvxHF82=@QF#o)5 zzf!@ClP%c%{V3;lNB`oB`WSBu7x~zJ6x$2e31-^D2Ch?_gXJ~)ed@hmHsI>~Ufw+k z7pw4R*Xe9l-c{9QS_Q#hThz9^%(ne`$z$IiAA!cca}(mNoo(>Hu|5V}7potK9t( z%6a$<<3EM@T4e(h14~Q4UPS-gcJ8wcJf7V7XYE{+Z$bkLe_QYvVsa8L>Du4F^LrlW zE)I_HfR_)3ujT#6+mYUFxsBm*9c(v8_;q@f|KjtA6?~0oAGUDk{gMgiiZDL@Evt#` zVNDHI=`r$!9T=7;A3J_x`6>8v(=q1;T48)}r%Wot@~hSJ@ZNX_NNtM`nP7cr`8zFd z9H4849K2~ArXQd8xNhSBH6LyW{DJ+YxHk5$R6AI8;#Nz%Qyo;i{PJv}qx-_t6Zf(F z`1ml@9yajy`2*9#b+qOV@EGSNSy!=rbGxlvd#K9$FUpna$>Qm;w1?S?UL3#QAqp7B zdiKzhx38{a`En@VLsFNv_kPKy$r|OU`N=Sax7x&*kHzP>0+xBzgsoKQRs= ztXC9dMz(0W7( z;KawjJ6OJ3swFRNXbnBN9Tj56?cdf~gRAKuY5o4f_AB;2a4&?He?H1%*ZpXQdmDGs zZj_en!~VtXA~n{Kx^+izQ>+ih?b%*iK?1kg#Qvn>@AXWxhUwX_Pt?Nk7AhV3Qt_!3 ze4jq#=!OBvPcjGyjI)H+^|FJteEc|^95^J<5>jW`1m$A?E&slZ-SI9z2-|D%<`4eX zK`PvfLA#<~=zp%^EX{cp>|WA3@X-gXPhoeZf@AHUjccWk<)4Y){x}sp;PsE?qw=sY zdWm~QqW}9Nj?F)iV`l|58`zBR+Aj*!{JoNOR&a>FtrCpo7gzmh)feo(xhM!2EftaraQ>@?$ zFSj&IZxbVb#}-)r{Jong=%1HICrfaBRORTeWQ>2NW9KK=RFHad{{0IbQ9ihI+BvkF zym3_yy3#2MigPyXb}+MquHs!GEI)2XIZg%hCa7QvwttzA*M62Th1=oH!SE*FW>{<6 z-`{&>`tg(PimpGOE8H`%m}J7=ww|_kR-m38zCCmy@cZ6S*u=-|fsSHMQx$pqS6Yg5 z)ghq}+a5n;P3+-9?&b;4SBFCX+-)Zki|xRf+jS*}{tl}=>`d#A1r`(m->hWDe(~5Cy<@ULA{)xLhEe?Df#_^&%f5$2)6q3YvWe>{qU=rQX ziM_Yv12FON!5Egjs`BW5eO>X61zay=#o%&>e~aBjzWCgS|5CE?gR$cuZn@ubZn@u# zjL_;W-e6b?J_f;;d~bMpE^#gjv;$7yc`s zdzt26*E4$D_O~Cghlw`y>3mOfr8~YuRqUp)RviXyV};C5X;{Wi)qWYz4yLnLEB)4TpxE*J*2V8tzZ^ABobHMplE5F;AUgv6d zzZ|x;Wp5nN=6$FK?dhIs%PM3X9%~Zq0lD$nK~?XzWz~85mbpX7-&gW%@C@m;Ydu5m zR`G!HI@!y021`A_ru~JWLOjEe&h1$9(9V;u0gBtQ5I)ZA^#o;kU{$%J&fpxcU;M%N zR~;GupRQHv^1pEHuXg;e#BFD4;$OZG#((AY`+wS7b3z{Pj2`It9vc2wFVd|_W?_a5 zF*L6sbW>7;^A$N}=SjCL{X9-s!I!pu7P>lVuNBGjN!Kj>Fd%&O3(1chHsIkY`7-NA z`P)0;OCFc2k_eZa61GTk>Hxy?WWq}R#pNXT-6`bk>ZM`O5H05>}Etqx$7QF0ZRAp8~fV9D~&|l5cu*jbkMrjB`kCGx-sh zGgn>uu2mAYc+y^7{zlvpwt7mwq_AC6+O~nh_U-q5u{zs0S^bjwtLnkwJf7b2{%iQO zh-3Ne6Ow1p596U3G zf2HwPMEbx=*S~M+i~bD{i%EXkSmHd^--yCfJr(1D^u?EOTBg2`w`opzNSbIL@LtFF zGJeX-LsF@9{wLoQ<s^cn`&2Zd<6Vzl1|1&%R2wqJ;+u z4|)8V`=VQu9M+O!)vEP@z;*Ngn?BbV z|DEe<@t3JF2m<`8otK8TDyaAASy5n3(_sQuA%U% z>iiqbBHX08&i{CUwe_12j-E;K$>d)do~m|RwE4mE36}*wD?d(@gSM=fz^XH~{kc4= z^OXSpS7TP0{{7fe`uqGf`P(p6{EhWD74d2OTMMj7d@lZKzmOc?ktaMORQxrHH5UA= zQe%N*Kazjdu?T?+|LR8hX=_OQ$k~*iLGFZ|=99dVDe>dy5iY7p@>zQcFQ@u3@;FKU zi_MAe^N_HWmB5*Y8&P{X)sFmw6}6Y4RNjR<|0Ma66j3gjb5$gta){bjoz=SVXZi8+ z%3K;j@psV^$YsS^l0{ZMhaQG3+a;a??Rw#hQ*^1@1Eg`8!#75vP(z5+`R{Tg$5 zW}p$_`4(mZk7-Icp>{)#W6x9i7UdE9;XL6T_QYTJPGG6~b}BFLrxf0gNmQN{>rni< zx;AQGpW-jgrt(vFAo-Ao#J3G2eEX4*8x^h*SgPDrgm2`rSI4&y{Mc$mgfkxrY*a#+ zb(&A%%`GASmIlIqW(6}*UT`3g^3##Z4@Nd4d6GGmXX<-VAB8oosl3%4MS3z1Ig|gv z8meE@wj{5-jN}=e$^V>ek{8b+9JWT(XQ2x9n^^ABv4eyYP#=v@?lfPt2hsMqyz(JY zUxjX~h(A|f$YZOZ{+aU??FSA#r1pbv`f)j%t0mlu+86YBDR8X64lDRx=x0Sm?(xke z=W3(sp$>0GgHcPq=4Cy-c2T_#&!i3;v9PG91{fg62p#|Nep!wGkI)DIr9nPDWyqJx#erFTy}jHTSlm|B z{1pdg^7DZEtw3g1z7=o#5U4ySs|;Ujf#-bp^3a(rd@dPx7l<@%y!=Fp;*x56tk52I*JL&r@!X8hey`}#-|9cgzv5N@PVH7jvc!ZjN#AGwmOG4KJ&Ql zAUYZf8a5cyCC~>F$DA#D+94Y6V)&o@Bykqohu%&ehyKS!HtS{P0~cS0!VZK zCP#ykAA1Fr_-C5^u8-JYJfB&?4=xs+&2M=$`geUbA6xoh!^UXH+`QVyWN-^;&iCj0 zM}xMwU-F(SE#PtIA1&^_i-H2af!M1B%*(&R=A4Xz48C7DvH9=%FGz6unI72;c3d_; zd9f;%?=#%xiuHy4&u9NyjQ$JVcjWrc_N6zl9izdJ*TW$n*ueMZ+@isE>Gc-(Qn7sY z_1C-88PmhIUE8#yzAgWw-y1Pt$mE<~?xL@oA&>vczri!PPOw@C6hy`it+Ce)N-wAa za+)HpJ^0KuTQ~3$XLp0)C7N-cXzB(^ev0Ny+Ju@RV8-|SYO2AR>&$-!L)4&3xhexS z+#3^D`p_T*O6%j9{|0JM^3%if#+xSwgEQZ&+2{^^_#WCljF0Qko47(c-_t7&1_yq> z0-m`RIxUkB2L(0Rozs82Pga&KpUTQMM8OrN~6}wG59pwrk)~?d5l$Marb&DZt z+;_%yk%pMJGh=rcF}S^->;9H-#As{a2}5_d@YQtUlc-=QuR~>dVO9E|o5<@wj_Vz# zbYw>SEPSjJ0M{$N#R~lrR``N($NXxxSp}717mk<$5cb? z!EKEYXO7CNT0Hol@XGl^YT?`M9ZozrudkBR>mbkBKpX2+n^8 zfok>z>$F%C@N>+OHf#t2x!Z!C`3_!T(14F~PVTUUkM|S9fw$9! z==iU=38sHb6Sy+5{`iW$!=cz@UBelPE|7R`YSzHk;ZVr;C2P3;_Ny^E_rv#t8qOIt z8}HvH{CC(M-0ndy9Qt#+j~edKm49RJgn^+6zbiux9eMk?9R@THQ-iR9ITZ$g+r@h( zs$n8OnZ8kILU4`rZbPVOPz-@RG*ihOr0kn{GXKtVss?*Ois= zre8reYf*mS^?FAuYgklmVVVo3Ps8_oI@`c({!K%94!5WHWCQau`yM}eJ`5y0A@@dY z2bCgUYJo19muoK1b95xr zUT*)?Xs#Z-%zimx4&GD9=EvA9`CdfZCbM= zKF7+d1F`+^V_&f~%d%~z>D0b9RA_(~?YxO#>czv_8>}?`9k%LSC3>BSoS;>G~CJi>guh{*jDvyPBWI5Z7>4N>3A%JMWJO?QH_x z`CU=?CUtq*mFdMS_(@X5t~}rP(;ish1^Cyf>I+`iYZwC98CPCO9j^Y1QMPyPhgdTuZnaP^_}@?4arBK-hjs+u~FcxFgjZo+jKL@6@2X} zD&JUx^~Be-5b+gmCB7yzp3=8@tB@=C+I0^_e6fkoDSYi#!a2_b*7UeY=?g3-{|dg= z0nb}wdJBg6#qz(Z=CcIOSQ;d-_Oqer-x+OM5?1o#MiKu>l8|TJ9V)P4@GuImKqmPo z9aivFBb@w1{PAPk^1oRoE@Dq1d{cjcRe1$Mp3$ncz?!f4>PcP-=+4ExyD#vZBRWS&+P9^ z;YHHfI-}+;B;QJBs|tsik=%1WwWp6X-mAtB7q~D&H@@CpL-9B9Apd_+f6Lt5Uf`^X zTdDu(pC&w)&IZT!h|#r28o!J@Ckd<>l1BVT%cy+fI*RZ!E3PHEjhfOEdRXAB?6x95 zg{M9ce%eX2w}Oe5RQ{KyQ+w?0NqGElQGSM|{sL>~wIRG;BQTy7CI8QAO2%L`TwaW2;uYh2~XKV z;f0ivf7?r?_@TFSjS*O`mA~rr`VI+*RmaFm1RXN~AMN+q_sA z|D`2@%$5^=(?cm zfi50hUvvY|4MsN%-AHs}(2Yko3Efn5GtezUw+x*!qvmag;mo^J$iz zOv&$u=bb(#Cj!H(BY(PkvcB8A%XW@V1TS$O&69oJF_s-cc?w_iT6i)gzhL$(@N1jM zLe_RNSm5f(-tzlg+GBqCxj}nRcA||LdxIbNI==B>N`9TC2iddOL@0C}?Cbou2iwYRo-setgvynS9(%CRH+5ha%CB)9 z5uQg=@*n%HV4vDx{hc>Db7-3fs~HoL=N*gq)a=u1vOU5$0x#yYi}lWT!-Z|-z)9H*hDbn?PtCRYua&n+0E2MfW__g6Q+AGCBN+LYu2nO zmhYtSDQ1g3Se3Zbd6zsBLCtM2r+KiznGtDjUMNp3`SxzS2UGITVEqP$B*GwWGd$3P z%_+l0eQ+Yg){81_+uwt&#ci)-lvk*Pwx3-+n3DgvSvPpuIg!Qv3jY2!(1W$s8&Ec> zYa+{XUQ)7F?ZGPW^K{sM)I2|)n12QTg?dR|%RjLFHvZC5&%%SfTl7@E-!Bo8#rate zwomVtoY6lK-V7REvhJNbQ}U1bOp!F2L_EKSM$5n4S#NCrZV8D@8U*RNMefX-pYQCI z$WpkD`J+2i@@o(3UgjEs^}}^qU)`B4?$=pHCNkjXYoEHa3ML)$u7oABPuw>9l{-`N z=koFzoCum4)5ba9c4sT`KB~EcusnJC&$_cm*uKZ0T-+sl!JR4j)jCy%?gJ9pwjr&@ zY%g$UN&LLvz(nR;K{~VQlsg;l(pC0GoHHZ4a=;;XrsVfP4}o2=eekmZ$J|++ackta zov{A6E+^NWRq@`D)~SoGd{?_OC7)lPJEITwuM-ESG#u;B1{J+bJJ=V;n>wN4_lLN% zHGS^~E<$h>q;aESj@J`r%?#z*&A?kqfS^DJH!1I?M(|e}PbHwuD z=UWj!i}R4;)QQmD>|*J_iSA74f4SYDz@^rSc>ZqZ=(JdOHlOzo#25JZ8|u#9XOGI8 zZiC^AGZXGi$**94mziUFz0T?#UKZ}oj?@^K*A(-k-t{fl8{C=wx7vA+dt-Z03wcjmKvwloCWXBQTsucf@a?A@7?zvo8>d8h;CpSO1lclL3* zHf^gdjt?#Fk9q9q&Klvq#xImJK3>*vXG(s*b1{LNS|@_doetsOuB+M4M=$a!5e!*DgTsDZ1Coz8L9u^a{dSW9{XYWYzoQxUK#yo z^>56w5li{J|EZZDj@LU=5?S*1Ma_B)Q8Oj~{N_H=CQM%nzfayn&GP@;FUJnW`r&84 z_NZAX&NpsgesE`?;#^ET34t$z#8tME`uhw?BT*YdVJ&V0mxmx?&t(wt-8fS_baS0LRC6_`Tc~ zJY-6v0yR^nuN^=0hWX1F=RE$UlbT8VhjYd1>)p~<*sGa7?|-R@OuOu?&s~+8Dfwdl zfc>ADw;$cNvLjpCvFe0-D#WulAKq=?$m;O#9m>n+no2Eo@#D)Vy`n_`$nmzb7+ zH)+iF~1#b=Z??(JA!%Pz15SK$FQ6eiE_2)5lqRi&@BE$|MVEP{_Q}~Wl(!*42#RE zH)-na2r3Uu&zEoG%bwqlVE6ca*c)Tm4L;V~ieO593IFCUi)J}`t%GOn#rzmP8*k94<@KuDnqA{#>-ZSvHg=Ax1iuKhJL)vFz*=h4WY6SqgOsBXLR-WbI&H9u3H7s0Y*yeL~A{crjqt;>mE z%JejthQ?N1qS>xf8>I$YBUsX-?)lZtqnY}ndFe@fi^+liwT@;R#_JtT+NUe;Pf^UB z*9(T1&hL7B6vb{f7}4_O$p~h{_uq}9*=BwQ|1{=T*FN<^`{pmh^2rh99nG>TUkKiS zA?Q-OX!bP(<%X?UI-jGjk7k{CyOdh9D83((9?fF;x62sYBaiQ3G;@{lvV6p% zT(?F4?hg*UU)+smn|>MQ%e7iFW&W~qw5jZCG`qpmyDfsn&CZw;Q_{HfjeJP<8|>w~xRjS)=A zFR{k|@qM4os*J|iLYJu^@4o9)S@Uv;ypOt*V3>>inYohJx!$1S`^1Qq{LLHZFpC1LuQr*tii#RRouMn!4xhpL;{1|Eu#eB3 zP#&9&HYk!ADEXsX?hKsvBZ1z%Xk`v{k~Pw!?+H-+?cf^6apq8)hlg^xPVPvbq2{3E z&n-R9{@joNnih{UYm6`leiuk{#IgKt*Wu<6p{*5|hL{aDz5B-mb5Qcl@h%w)l&hw; zX+BprhtFG^LC*RF$kOk1dHYmzxQ~mb3MkK+XO^>dnmH)>fk$4j;XmUc>yHl$uBV&B z>&T&Lt{m(92lBZVVwJEDH3uc%o!^!63DbAaDPuy6IdtLe;RBY3xKq>|cHZ}g(H|2) zJ)FNQVGc_EMs6qa6Xgb&-ZSR#%-fW8!}2oX`QQCd`DO9*yLZh&$ro$kR#@Iovb%;< zXiV)t6x%PiJ*(Cj+)Qm;)RQJ66R1aprT0M!lK*bY*bES1nj@!I(0O2O5@i zZJg7_Sr$x%92p(|{r@$7^6|J_=hqqkzdJe&x<2UoqZ@=S3Ec>E$>>th8TzT^q<8VZ z)7cx#EW9|L^^w|Fs`<19i^64c3d+~@`a320c?+iG4>maoH>)JDD88Oq*Mcqf{Q!Nc zCNPH!En94y)q#{3pc$^U@sALCI8Oe(^;$Ri)N%(_Wl_0b6|nV&tr z@58=u{oaTKc8%-1?)WezKl+z_+UT7L%!c2+k?6yQd)|{Tdik(Ce&5uN z1Qv_QU)0@)Df!`5o#oAsH2AGun)b=jhgIMX-cE|s$9^~5wx$v z*OzW=%9MPe&uN^%wuw7FJX!N4(efk2pGFu|eDKzjiL=~@&G`LU&pb*0^Dq8vTz?g-C2IbvcL@}FFb~CYOQleJInvc7y9N639Rn5t6|#1?#yq_ z^s>Y26WFGJ#@~w`xU*ceA@<*!z`VG<`b>AyPydTA?vFa0z%DEvHadr-Ff@@t|t9B;!I)trDjUL@rv*8;b8(Br0H<9 z^a|<^P`?xQD1nupPpKS;`U|1oLwQt{dQ&~~)lA73{SDh&GS?qIQL_O4jeD2C)S|xB zZ0w-(vL|BEpK<-HO zYbNge=@ZXhey&=r=#n+l9RDd#jgMz4r=;CAu36Lf<6rz`Tp!#sp82*+-a9PCnq}hl z!V4UKgg$G$H9HxvPMh2p#5jNzCVUoBg)@`#`6X7EYr-=V9X5*rsPlK?Q0(9m*1zj+k(bl9FM`M z%erYBEttf=XDBb9cPRNd1qhNPgL!<;l7;4ypQhDiRr&mFM+ju`d&nB$JwQ$Bq;8%O z0bKvX_YL`v&sqMz)_Y96L#VrVCMFbQ{vO^>E11Xcb&3uJ=^*Z1y|RL?{GOq9p`hez zxSi%xE4o*!W9aYnJm&XI)eHrVxNpe{%F|oS&ulEfduzDv`%DO^#TyQIZw;S^x?}u& z4zmdF?cskhykb5k?zMuBdSCt0WDHLcpH$@EeYn}jya%a*rtj`qRw&nwvhf9tcZ+#b0L^0$XQ8e7+ZP2u+M-BGUKOHY!X zcbc!q#JzYEk&j0>K0D~){SWwU`V;vpKDIXt(`SdC51+J#%N2b2bm+c}3`6$+ko6s4 zQ6yd04HyO^CR9v_iU^9RND`$4)G^^05X_jfV#b6mx;o}DVipr3Yj({Ft>`-Hs;oKZ z95HA7Z{0KQ`~5A?b9X#@&b<}7s=7K*c==I8oS&JN2 z+kYuHRb#Cgn!d7r28F_IsR+EF|zzAPU2j?dk%Ff z(n4qB`YZn~b`pcC(cu1r{F^LG9owdeNS5E}D%M!U#L9g!iwM65P3q=-WE6Kg=X)4l zr?@yNpJ^X63XPiQjiDul-@Lk^r}q<t>#$`kO=FS^HQ@w9m8u(7JT0-&M4(<7RQuO}>BbLhpB{9J|xivy_OJ z&!+twiKLgil0LbY7D4akq&5CT?eR^a8kK665+BDd%YJdcq0r=a)e=ezPx)P^q=q7C z^sG*6#<&Q3`D}7_L*crR_L+1oEq2Ry>H87BK;L(w{kV3>b7#edLL7CQQYN`9zH?TL zc==ts(q)Br9Y6aw4;qL&Luq5p!(}MF36nnFq4f|lZfY5^L$;rL)PF4R`xcv3MqHEM zjk?`Hh(_h!+088@&Ll4kb-hX0bx)Ui8%u~(*?*=`e|(#>T1D5Q!dtel6;DJwj>+GCAsFuOyk3`QQa=}cNE_8KAb;|AN@7gr}Mvs@OQ@QlYgO9 zx+DBOq~W)dlinK%uis!T@AfGqmjk_rEytUzdq4L6xSKv*_QxKCz2@JIi~pW}OMWMB z3*kO8URHYQwLuAk+fsj?O7}_a-lx};bH6r(AAS0wbE8(}Yf8R~#{xg&? zJ?kER?MXVXM>y~czf0dDzx#8QaP79;Ru+4c?kC?je^1A6HGP=v;+yn_qS=u@Jm`3@ z4czfIubmcD^l5xVe`*gG2Ry8l{xdy6KBJgS?WIGs|E*4TTD)8zU?Kg=rUZHoLZ zTz%>vN77b=A9d93$?_{W;h04DxBUkT>I6%+(cGG^Yk35BPUWiQXEozxn1Vk^+{0xmeLhE5UOLPZ1~Q$~_6o@91r$ zc=8#K#Yf}sE_!>3B>Ao23f}+a5vaGsxCsp1@@)SPuU6_^CHN)pY@BVJ-DfD&m zTX}h_7Q5lH-sn&K(Z9Tq|M=9aphiDps$(@lzeovI#G5vo|gMPHQMO1I}SEPlV{1t1u0Ef=O zH(++hRF(f(L1~Z{VGlaPH=j^%FryH)7Ns`fId7toB8wr z;Nx!`G;-7GPl12l-3nvxPe`A3H)niI0nW~{XZ%ky(z(IEo$-as`|B8gpTytnA>NJSE%(DWwbB;@$DFv#<$1mxI6C-p zh(h^Yb{X+}_8ezCbQ$>4Pwq%8YSjX7K6#6Ah4R3@BVMy!uAe8JBG=E;k0#gnODxyd zODy-HmbfMDLrp*W_95=jyqEsJVz%%%ro>@q;JCWyIK6R2fTJV9>-6E|%Ilyujp|;% zs}e4@EIw%0Yf89i)9PGCCe7ts*1U4CoNU13gnWJ~|Nc3;IibYg;i3TDlgpy-5>}LJ zqj!gkx8Hq2<2pB>b$p0b??~UnGqw>fWCajzCN5WWSj ziv8~!h-^dcu+%0E#PSVxlh%!DD0&Usci7amftdL&*8Y58Bk}g!l#Fw4!^MrD``_-B zXe8RlK65y6`G0whzg;8vsghY^bhql?PHMbsA+uPsIXAgMr~5O--Ym= z&JEqUUdzZzm4ENs(N`EBgt$JY-LJbA7fMCv8ZUevhTP%%+=TrK#?0-K8IrYb&GwJE z33my4DWJgKyFRxICpVUFN(LXXw?o(kc8%AdIj2o znsTz#e{~=G<N>J2ih0k{Mj^)tQr2zvEiZ&C~r( zNG45R=LL|ys8h~b$tOcBuYcsf)`9TbMCT8UZ-*p(OYN9am9YN!VTsl_*$%QE)12&f zg!WnPN0?n%G>fdBetvg^W|I5$6{y1J6M3B~|8kHS$n-wE$KII;Z7=WT<{|Rk_8mVm zlfKm{pIepi5PJP4`EJO`2rWs@>l0}lmgnop z=|}tbewJlOC%ybez16dcc z?qt2m`jZVN8%8#YY%JLXvPoo9$!3tvA)8N@NVbe@71>&{4P<|kZ6(`DwwLT6*_JQmxSvHxy9qn^TmWQkWSz$6~G8eM4WNu{c zWL{*|$!d}Llhq*$AqyjGMAnQfnye#P3|SAdeq@8lhLVjW8%s8UY!ca2vKeG^$mWwJ zk}V@!MYfh~1KFQsJIHpE?I%kkJ3{t1*-5f9War2(kzFIZMfQm78QCkcw`3p5zLI5= z+1pcD$?}jDAS+DfOy)vXo~$yN7g=?(T4erYb;v@14CX=8-KTTS~T)Yzgun|n|l8MuhVNs)-#`Z^8@qR`;EZsu#S_yKgrnX z?Kki`y;{wK)3Ypl4PKOEY>Kyw5R?yL_Y?kwfc1K%)=uUv)3Gi=t$G;vAFM;ra_wiF zc`FCU`%NFY8T^?%NY83v>{KK-V=dc>^OKni>vB|`UhU({yv2lduF94W#!fECueQt| z{w3fqGD`wK%E|Hd{_2qxh!<_s_eus%1y-Ea!2c)KRqFMzd$)k!4PG?c4F3;4oSrgn zK4Vig;NSGVU%{WmIzMIkPhdY2@_XtBaGAD@wF2LOr$sRq;SU&RE^Gs=kC*layi+^o zT}x#%&RQCcct`U#(iB>k*RG|_--WU1S96MkxTaNaJ-eooGGiMA$e$Ow0|FXf1^?Ehm zEbykj(63$0*z~<0@XqPrKf+(HS4YM$@8lSR_*Dil&TQ9}v6`E!!+(A<{I_?6f3x-Ab$T_m3G{{(@T>ble>WBW-$TI%YysBme*|KD zoUj-E^FkP#>h6dBUJZ<=^V7iV^vbvyNUzYp(BEIo*pzXBu~L3MW6Pgcf%SUd9q=E2 z4gO`)ps#j^vG&g<#4CLpyiTtScSHUip;yl4L4JR}hF&cQ{@Fdozv+AIMS34?`l|mT zy-J_pzipQ-ePBhWSDst|U!-^oLF0(x^9b=`N--9}=fMvy3tq1uSAp|qy4M(bs}p03 z(Fk1FhNp$VU#C~qpWzxM>r|^d^le@-Hs$TiSTo`HTB^k`*6Y>l$Duzr7<|+z#+fZu z_^0ki{E`Xab$Z&zq%o&IVjA?PB7uXku2fkU!`Sk2D)YbT52hmh!BgNLlY#X6O^5!( zR`5kkzCMBv?XfOi}Ke!Y;fUaxlQh5Qajddlq{j7<*`Suc7D5w76R9^=GnfSZsuc~44de4L7H{d@ zZ~Fgc=X~IGdeO8k=g+#nF!Rc18*Y~$`omM0H$TL>Y`tC-w!nWF{Iv%--p0DXs(mu^ z#^yzlAH9C5mGful;l#XV+zVW*IP?`ZBLDZ{uj%w+>q_RWhmc?GD7{-O%j<`DqShqv zRVtu7*ZP1@bwhlezZih@3{R21l11-+Q#|W17v%5g493RrityLzwbgl0|5q!4_bSZT zTHl?qQlbd>iXOln3j&7&|K^{kIM<&cu@vHM9fafI9>iE1-2gb3A9$T!9TkH5|0c^M_{F}Z>dKlN=m^Q56 z7Gh;=bZCiqpC18tjs(`}#q71jqow^v3F0_i&{v}e7Zld z9eC@l0l+%FGJFU78ww9+UM-r2{NJ#pzaks)x5HnrSA2TG-x1%r&^E=B3j3@Q%)8o;y<%KxWVI<^3QI27@lc4lmh7zY0nUEm)u zhOzcSKwob>^g4eLoPqk^o4~xVyUf^J?+^I9?`CZ54XoEIpVI4df2lMB@lPKG9y}ZV zjkYs3_niY?rx$Hr32skkc7U(*9r(~L#%k?1z?b$g*6T%~X~_Q|8}DXo{{!&d4z2G2 z=B=*y4uf8=dGF!)2A2~^|II<**{6Wd{AuG)GyYBAIEi`l!*k%@Z)9vuxCne7ytU93 z#M9~3wii&};yL(DcTwN_-Y^!yr-0wRVyxF|LpGs&Yn7Hlp}$!N85;)XV60tS4y-u< z>-96&+s+5M;QtZlU&9ml$2P?I+HfK_c%5FGeyOfJentmJ#Q$zzPhxYU{EWrb?aW)- z0qgamkR#{M9E12`pa;rxCJ+4G_T&6C!3kKWR~E#>|DiMUO8I4sjoV5>KP3V2)|F$d zMh=AjKq=^T{=%&l{GXSH{zWwKJA5}r4G92#Pyz9EdgW6W#LsCiI{^8!CDre|r7o$;_Ky*!WW`fKxqfdJE$3@Iri@ zep|gl$nUMH%!@&Dkl#|a^sBZ69)a(Q==DnX#{Tm7TXXkCdgl@uo7W6PdL^3xi^0G; zy|P(w{;ao$A^nxjfzOO&tPSx+{!5R7zfP}JAIZF-!f58z9j)O%V+8zD$AAy+4*chK zn;!Yo`zyO~{0)2YT_QF08R8YfcZpQX3Dn2d2;^tDTT?Fop>~Kr2L03E(i41>3CtTt z4qz;<)?jS*$9GoLbE6m=%b>iXa~a^lw)APwRT!n2S z>k;2-Sd8>%PlErmmB2c^w&pPE=julAO=&GBrEd+`!dP2NYXK#mZ^L^1psjV7{{&v7 z9As=5xfk)aHU$5353o+J=Ia?K{ms$)kzPzw#@2vT#>(>CK^(u`YR2k!w_u44E-QeW zx+A@hh_8<)24cQ$jI!|=MR7dSkw5jFJ!5lf8pr!hU!(={_d)|7z5sao-;9;VBY;zZ zRh?dGMf;mme$7Lytk-gn23~Ul*tsKPLxt0br_(DX&U5~(GY&Ja9?oWL?1Om9tbc&> zUjSci8)HMqQz+y%-CEk zCu8m64e*5wjP-iD-UAmKNmY@aVk^&=YQS3tBi;}n`0Mn-?Hb}AvFYEGZYJBe)dyHy{g-*e@fz%}*Na)} zz$e=BxBmswbFGZ@$Bjk&m$l%p(~GKK;J@1!`W26W_tu2o4)tk{wqd>g%rcyxVrxUc z_f>7q|BRBr#}E5SYz!&QSc`UJZ0%MEd|?Y?^L?~0HOIIB)?c&bM;Bi!aEkM1{2G9G zWjCXI)BPB$+mAA~u7$r|e`m%+j{mDZl`pk1eoTbFHu@Oz)+Rw5Pp=n?W^#UwBL(wf z!%Z%)aT|`8dUz50n;X>yuh$Q}gZ%D?zk2TtW5b5}(En@eKLta9b$VsfH29xFeAN$n zLt+T)wXR1{{#WqV>y_c{IlqRh4Op+njRa0?g!GzMWo%w*WUSYVt0lR8n%_5uez+%a z&Zda>-HoxqJ_5W>uT0Lzym2bh7ryqutD#qVJVE)o+OSTqUcZd;m1xEBwRx`@8^5*z zK7jGSdZsP%qth!Tzd~QBHS(8K1nG|g)}o(*e-OpIUa#dx`!aMudTNe;SZ~gS^pxDG zz>V4=KRUfqr6by3r}oI-k=opUjPFohZM8kx<7r!dj9Gr%A6DUbsByc&zeIhD;Y}F- zSN?!{j16CqzCJx=9Qu#7do=QQz5tF_Qg_B8{2=Gwx~>Cby-FB%q0c>pd98&D;(Zwi{L&wM zW$>y_uMEfWHw+lTdZl|6_`3}S9@m|*F@GGeUhkU2R~`@g7CF*$4rXj^JQlcE8OFvP zz=}>Un!RM+aCa%|#h!BTzcmwh!w>Yo>9c|L`nO@6KlA(5;DhTRzu`%YHE|rgSOs3E zS5J*+-Y{q_^GcnWjIEo2)u{f!H`g=&n|@Cs_*?6cp7T1y3xZx-JsmiB1LEuSV(X=v zvi_{Y*1*4IKrO~T@E5x^=FLAgGOyQ*EmPsYU^DnP%@`Xh04q-27#nvW{onLo4VD3@@!f9ZXJs)Q|F4^X3va{m ze!CO#1LA?d9A>OM{($mNP6w}xue?4}gY%bo9`Q>1qx}8O0AHo=Qc?b`-L3HN`V8d{ z*MQHXzKsqS8SCRIo?#~GZ@#az7WAvRM=&-#vuCV$)Mad)>;SCOi`fqFKW*dNI|9GY z$$IfdLA)Zlf%W?G?;>RShUNxa{BOX6@-kMhWdXkhR&;vxTomf_i4)>i-H7_%R*bQ> z)C1Vp8S!*_H7>rTOy3$<5`6xdj18qq!@u5W#^#Tu!0Yu<-EHYZuh>mMdcDfBUd##y z-cb(zI=$8v=VN1Co4)gS?*GOpoBnzT@Hb$+p7uTF_~s8Kkl%6@F@CH?epK^b@b_Jr z*XzZVekkAj^2l$S1jfev6`@b-1in}$@H#!6k5Ik`6~Ir(Vr+;%snF)&cv>i zpTgL@zYk+gr>DIm`FI;w_k+KEJ;v6_{Tbhxv9Jc`Z{-NY)9bBSoImT$@!(T~P#zcP zl@))3uM8|UjK}eGfnMnR#o6g7@9HrePbt5XvH1w}TCo*~?;Fp&Uat;n!Trm+DGvS% z2QoI69?Mu8QX2iysxsE=wR+P~{|*!3e|$A?(FEYz9f2bzA)ZdJoo|TzPn!z8TQ}sl zq8WH{Gn7ZnfWKZ}A}{zClbIJ4=)1YpK8&__TAvFzK8@k8==93)XPiH4yJ_s7lrt~J zn`O`^<-87l{&eI=r`OI;VcyUNylS?hydg7LuXb(@e#0!rdcCr6Jn}zu4&t|3jQpRo z;W`5tTmPC1Ua#Nm0sk1Aet3Jv2Df=gZ|6NOzxg4sqSw2v`Tydpv6~TZ|Kk5IJ!|EK zz&gFiYz)2g0_H`Y-oW*i0DmpY*btlutkY{PKO=u-p;rtg;qSkU^{VAA_@Tghy=HVp z{0b`&zfgU~M)Pv`pa01I#$GEK>-9?idCXh0*MYBm2>O%Bz;jK&rT=97n?3;j-<*FN z^v;-n8c*&9jw%BGwtoTZ^y zsr{QzodzB}8+hVL;9ujBO;0a~BRyv;_(~HP8_oYPj+tOW`e)8C{!MRw!o1;xLmNTq zs!0yq-p$o>F&4*d(+JP^?K#;LbQi&ec}JA9?Exo2xF1eoOyHCeu$^j ztEG>l{&tLFUaf}lz&zKCcoQQyKi2lZdcAhwI>$G3iiiLEzNr7ovlxrBCy~Fg6A(|Y z&v6jX8@`MOe`GV?KQ{K7jd*hxgHMFNs?*c^y389p+tOQE3I0dsBHpQocwX@r_}}!k zO7Z(~#>geC7s-PdnU3`%6IEj2T?-n8Yf zQ)lpQQ=reB#@IY>67&yl^ZTCGpBupI@Q=uOdjQ3w&{>fN%s06%e6a000 z;rkWkYqTByOATB<<~6{|tQ#orb6~w*Sgo8tW6>S(_o)qi!`;AZ_cCuDx)bT^^y1$D z@QrpszXb0So97*1tW@zu`4hnF^@`yH-d{P94*g@iKV--}0sM#UeVf|HfOUFd^??5@ zEA!frHhBMK`boxWRXfJUBZ#NhE0^Cv{}Sn|rw(y_8|T1ZDc>LUA7{fly;8F%^bV&v zJ>k}bv9;hCBG_Q^79)uh-N4OMbu8sGdc7lWH?I zhhzeu{mTBvQZ}s9Ydg=O{;c5Ddo58PeUG!hRvqu(8M6Kb*6EcdtB}8b=fE$!jQkhA zfb{$0eNV$18`kN?>|r=Qi+}l_1-4vdf042r`RRQXSf^J9yYu@Rync)NsWv?s_ajpg zPZT+e_gDVhhy3-a$h@_m?RYlUaDP0%hWJ|_;r`_j^cU4V+h3!&^!yV1KojPp&yo>u zO(C0p8e?_v$`IB!ngxE@V}5_a@Z}HqFPe$`P94Hnxm*kRX@&7n#INS(oyO8#z`M2q z?h+0BxHjk47&(x!Hdg^&r7~9A=J1p08}^Ka-m?yHW+&*sC$QeS+zt9`xE?UvePsI9bx8l$`a0|KUeGtPW8S#h zo3YXn_Yo3gltyOUT*81Z5!rG)T^TK{M&S%GRLEq>w&c7W$@q9ts z(Hi`wEX*HDHo^I9IrQrHp(xKh%pbHht8u<<^NjWC;9HE%r%o^ySLVUr=OW_O#Qf5F z5%U%Gnr%L_BhokE;9fBY$VQF)t!}Vm|OX66xJ23;p(BVD|{dhD6*?P+ayQJrM%` z#ru)JSpmSd`!&YrwSn(01s_%g@hTtU@|lZOV;nOf6Z~=8{g2}l!EY)J{myyd-{O9m zlDvhnF`^*iH$B7UHLflI{m^gBTgU!vC8!;XTIZQJSO3abc^nD-^Q(;MoA|(yC%`ur zezN_WCq4sq?1OlfZ!%V&B`|OJa+k5VJPF6c0sg8w82tE?;AgKzeSOSetTxDi-WT_G z70+dityQ)%uOye{^vuzz(6^v{Ph@`1uLR$5E9cixYYuRYVt$f0G)-nKE@!a6;R>Dy zsHtO^H-DTD{lfgvd(C6)S~MPf#$?87PFwxmp1@d(K8O7L*%x?SI`Dx>(2q2td@XIy zHyW!b-`9@JD}}oHasC`l@DDLyy#6qNvDi=z{I}kW)rh>z8!8Tme)>p^cTH^1hpuns z@z(rd9Q+q;V%~TyhOx5Fgz5Ezp{Jx4&ZNa?St1aGtncM~OYISGc zJQTc`vLElSG`Bs!>^K4XDs|xheJI|4>D~?Mf4mFyx$(S?)^0PV`vc6IkLCuy0?&_)@1WP_ z<9Hh?d}%4D-KZ5^{pIno7J9;1L_Wmv?faUsw#L>Uz3+psI@*u*7c&_vfoGs!dI$Uz z^mpqfyq}^~{|NrRO|NWYY@K0yzh-nK$1@DdfPW|SZ*!NUjJ2CDS#ONm34ZEm#)dg- zz|W0=fB16nw(-L_csKB%Rmkr`@TxhQ%WEiQ!?A}@zhM^<@90{7f64q2^(Shz#_=EU zl6iI7DCVslP#>D9FX}rQ@9T)gWB7O)MjT?jlHLdUVR(NjDHnYom+IT7+TOSN>WlX? z1D7Db<6EJ9+)hM#@1`R^Hxd|Y)1uhlyktE1)F8BnDNErWUKI7w+zfrdD=x3GZLE!7 zfbu`>4W9OClld`h?ZsGKT^Q|eac5wMPAJb_yuV2A6XAW+{n)op-G76zInIUkO8=Vt z{;73xLHKLA85=9%{ZcJ@9?!3=#vIIRuhJMB;`2jabsG3q@Yh}_c)vFT@1Lqg>*4+0 z&6SZ}p+dkjkslFU3-hxK8$R(9@k-=^o<6xH%WHo4(In_s8L$NYh8MuM{{esWGGp!h z7VxLff}e%=qYXXN84F)Gyl;JN9k880=Bq=u!r$4Kv32lFM8Oa&hAiTBA1;(c=Q)C=#Em&Na{}<_@xV()Fjk*WgZ|qj z#>#H2AFu{uUr6=(W9Z*c1iwC!vGpeMt1Qfq`FzG=#4}%E-nhyFeR3|$|9c_7%Hz7g zDJt{gNj}Dgo_Jqg`CJz9jC~NVZ9ZJ@G{^b{vHK4Er?r6p)A``{)dkL~hU=B4&Wvfj z7GoL@8EXe-Kp#~E{H~4AUo8gyz6SnUQ5&{0wq|3!kFwba{(COw>3KF|OYSc%Wc$%t zgfceOyULiZSHSnizMtxeF5s&kW?sFrka6ZA8@@gP*yAt6n>ZW(rtOSr{R3mCt$*A2 zmEe!=1dcw**fcop|MSmmg?&%8kGP(5s+ECwRo=5cb1c@6DX(|KznX=4(Z-ptKb>sr z)zrgh!1sy=zxxhj%h3M7SLU%kGuyVFFrz5^E7SlVIi9iQeSP2o<-zAE2OQJ@*!nqw z>+^g8#-?iu^GcJ!!0&A9HP`zy&TMs@^}=mB_;05eD<1`8%jnIFRdp3`-#-z5t%eO|Jm zM(xDuZBxd|!iCH`-K`1zlDWWMc^FgwV7;ko4e%}U0H{W6Sr#%&U>j!5_i* z8^rAS!10cV7aq(wvsEL+Yq=P_*$I3_6Zo5*z~@>DyxX6#x_u^NOKK~|%I7r3PJ8RY zzf=nNCARhVK|jEcXpDH@FEY-W+ZH(E1!Geq=tYkt_zxY1cx_Ab{U+1M7{;n)3HX+a z87n;u%x69t0e<6o#Pgj8|0(yN---3k+v?dtKXW_Wy`*$<4JX3v0s=Q2n? zV;u0eTJS$M9sZkr7(0di0sp1Ae`a|wfwB7CwqE@GMEDPH0{vTC`JVMd{32*iu?^dU z{}=1o)qMlO7qzvoowoJx69*xFz7gPU?NNP~z&JAt`PG&^25(y*uiZ~!>_mI_2!+Ng z(?004@qK~+zIX5fSPi)Xeyxi1a`^K7M9ai@`0sp%_?=OHHAiXYGdCwf-}?jj_>GL| z{VwKBFVMeKOH1ImmCTEgKJcH9_2^=x8T`8)h?l=AV@vF5;G)TlojRibD?6KkANDWv zS^=ytclvgnv9hTv;^oZ-?tt~(nWvvK7K#1AUv=d1M%{K1@dy0`zjD3J-x=f4W)HqU z>(ny@`bJ;C-|Nd*+^GQGw>@KeUxcx#bQkC&iU3y{1^w}>NUtN-L%U8Y!Mslbcx|l@ zV@sbNtXFEm-|~JU{B7$qEY35*Z_Lg5%tp5L!l!1ze<=2?R-=$VOAO8*qV00#O(A>P zUt5axAEw?HftMXeykpmpUQ?{^aO(I8>80;wp7uRvtmZq-*cxEZ=S#&W7w~|5jKzq% z!0YmY-!+=C`F(!yPx~`AhU5IJcK!qWnnK`L4g|m0g|Sww8e_u*+xgzz8T|Ir&~Jd= zItlAx#mSRg--hKl|I>KFdh5;dtXHr91fS#&f2|St$~71(VdH@dMu4}icQ@4Zf&X^Y zr@;!W-B+4u)bFghQ9jinC-97th&L%eV}lj>7aemm?h}Rey`pD9=u3D*zkC&^Z!TM( zG3}oK|D;;L8Tl9+BQRgmrZr-0{?LYb5q^XH4PiYQt5LS~+%J0r-?yzN*@g9&nm^Jr zu8L(|xzzya*Bi(<>loIj8Q)<&q?$Jk^>Yz=t=$)-=hct(+OkjJKXqd)_Jkn+)@a6R z{$S+ap)X@4xfl5H(5uyK>w#ut{-{N-VcvRrFyf8J`XG9K1AkAfKQvcQV5}Xay`kuM z80>dL@8*g8n07(mAM2Bh=MI41gZ5xOmd03pVPd_};V}3s@Ha0#3O)qS$IJ)!gLgQD z{I{6`z7p1RS)0sbth7GMym{Un;6neRJjNvWH^cff>!&rqSNDPcCk6UuWf&WeApauq z8)JjjmLKyI#^#}?Sg) zc@0mGGZt^EqJC}bH`NH}jd7O{ukU>3&Bd=XRvIU9`qoK^N8ig}Z2XG+(fcaM-~CLa zr`p;}7}jUf`3}b;{}aS-n2Pwe^_<$D?ZA(^4gcDXjIC?$Kwk{wgCX-FW2HRCN8`y? z&_DRZ8unPWKY;Ev)ZP%aOZ24XG ziS^2ssmz79)YhuhH=UU@Y=^j#@1u@xW3(j@x$5=yl~lw`a7K$*Vp;b zo{Twg{Vlele_3toJ+)IWIeo)c#1j?Ko(+X<*X#F<;5U5X{D?%HkIVrXpu>8pw0rY{pEsFg6PDgn+0V^fyFgDzE zL%j2Y5I+j{2h<#G7+Wv;f{(>|Tl0hJz%R_;yBirRXI+q=_`1N3(TF##5pY@|=sPq8 zPRIG#xGfTR&m88>_nR>mzKCb+-4(oH2>7907^~(Ih&KlJXVi)wjExUQF|YM)3SJp& z!=-?`O+-9Ltk*Su$NeF-nyINA-;Dcg>jg_07#m!6!GB0c)aS;n@L$jt>7^`Ste&g~ zyebL$4BL9vdCP!}aY+C53iwx{z5ir-;p@QrFQN+xTzb>z;)F%b)P?8O7K-v^ertEfajBGRS{cCHU_x3tTe6 z#^ZjpW<+~4oTvqo3Xw&xdSC)OJ>=Yn6dk+J#V6ySN4fxF`QhZ@nrreDas zu)^OwcQIo%Z!hMJU6#XtE!LwLBh~_&XwOe-2L`W8jFm$!jOlm**YO2^{3T;CGM3*z zHy&|pE!&S6xVMG$H$EuNSlijzB(c@UjF7SQ3ga2fE-c@f1WAmCqh?jUC z`n`D=EA3Z6AMh3FJG!>y^ej0M-`oedO9{lkkq!M=FYr08FmFf)jGq%wgW}v;}bWRNyJL_l0LqLcF22_jiYH0iM?%@tvHJ{!{R3MpfYP z{SYsQiLo`iBV#dV9{9p|UaG$93|z;>?*q5t*0S!==R z;DgGvlJ#e3gZCjtCqLj)cz=G`TG{|#iG!cz6^cs#=zTpGByw(?sz{zt&RCGJ>NrmC5B-7ERsxe&4V6&y?KvpntKA`PhbqnKz^Z zi=McDYh1h%>A!o+_g~Eqw*U|Ro3XLdR>bdMyI(#P@6TzbLg4pphW_~wzCUA4I0SwC zcBD52@5j*mp7UcoXM2C>Lp1Y73*H}6r#8m@`y^X>!Sfkg52kTEtz|5pFAO+=_$vqE z{(aJQq_2D5M|CLUt0~l@9ySC#N(O#pxd_$>d)SG6PvYGAFX4WF^Q-lev;rB=aDvPF9P|pR5j9 z2w50eQ?eFht;yz*Eh1Y=Mt@o|_Iq%uaTR%p?NDCO|F@)HVJuol0q499UZ+>n0!y*J z*y$$nSH1n+MgBU`Jc6;?8pf8Kr-1ePABWlBwCy7J0TU7LlnoDWUs}e=oOc<#Uf=RF z^G;sZz?YZ-eODW9(7lZGH#NTjUZ)pVyShr=k^x?et;^Wd;5zGtOJl~Fk8T3%^s2}B zvg|+i4&vM0=JYZzK(EHX0sdg)^?Db_a?(Gu-Cg+iNa1*DiX;+c| zlOLJaj-3N;@fG;P6I*${G1lwFfV*BYeWzLuP35na>l$NAv_0_PTi|Dd|4m=62e%KW zu$-(HnytOOcZC1rNKL8qkP*ZGoLx8DALo#*G7M+B>kOs`yf3RM|X)cH`%bojKSg+UWo#66h zy4FGbMypZ2QUdt&bJWN1P+*;2R7^s8&b~;$fdg=H#1l@eWBw*I`LA=20Dw^j1w@=mV@eS8hZro^VuCo~6NxjA^9UMqK$d8gwop|3vD zOUAQARDga_F#HdCg8yI!{$Q)mdQBO}UTOgSLOb@)EMqHAv+>Z+YXpB?dSdZ@wC{-# z@ISl=?K?6OdY@g4GmCcsuhVNg-B4e9qQD2ALcAE8-phq~(}Wnr)9Ll?K@?xh?aB1= zO%quU+Nu0i@vD1XJ-RL-9zXLsQ6$;?}>(;jQY zYi5I|{B^2HdsIohG>&;1-+^EFpg-)~0i5x^qSR#`J;hk0M>GEKoLR-Kg5)jLj)A{g z9{r>7WB9vP#_`?(fAzx&9$zzOK(BVEJ>SURB=>xyUsdi2N54+3ZTZ!Par!U% zd&&`Sl6%V0uWBfX^PBn@c-^!T%y)9)`cX3yS)bVn^`#ce2mVzIc-tHx^U)u`v+sjX zIsm-nD$fy2Js$&GO*mgBegiHahVxyw8hm`!0f%w^ZQ2m{iLdQ^*b2DQMaHJ0tAJk? zWNaz63;4A+pRb(u;Cw5>bD)3S#rafN?9e|VH?v;LXoLFh_m;7!o65(-lyVd}1ATZQ>;&VyO6y_|#oy{rpxw-e~Ua?Yes z`wKySOm~I@=X${LGL!p4U+^vG&uKtc@Cy$Cdj)~logcK~&$&HXa@x+HUSBajzr*=a z3`$3PosRR*Z~7I^m1Vvx*K;GiIf0Cwy5|SpmXmR2KrUdte&vWN(%%$W8~nN6j4kIe zo@kA`FwUG|<8}JR7hRnov9X5Bf6N zMN+A52Om8Q5^ZZNj6brgzR+^~_@ivjIwF^2hFWP)eWBN@#qJz9ecfN2k^2@MA-&w^ zuSt;jTAlXIwA2?d-fgmXX#rwPug2=qxcb6r(V3~e|L_;{Wj}?ht~^U<}c!7 zj!&3)roPa}`|R(tucb#FF|R@V52t;UKUrRx(-wy`Vbc*|2lp{(egF zbb2QT+BY)NU)Wb}K4}f{v9f#({6&P^@9G_;C-=N=O<3;V`I_>l(#zqlm# zcrFww3==+odHvE)?C#lr{9K<m+#|SZs3;+( zetEk;P<(th@Nlc@6i?R2u|TnA*^h}sYlRBE-gKt;*y_Ckg?-uS_Ty`Y3a5NI>}vK1 z#Qvwgp~9zUtFTgi0)?N9Pxv?emqO{?PSg=K>y1kI@ut2o%6%{Q))CiUoyZ6+7W!X5 z^OJla#ns3c2Ki%oDh4QQJ=(abgSsk%M=C@9$SiI@Zp7KNMh?Vn#?3&B` z==9qPydC2+B2Z*Ed7QD+GgPGHnJ~q_OQ2|VXT`WN!J$IwbbIQgasMliUaxp*qn1Sl zi7E4w>;EgLeDb)31&NPU&rUkfl+vr)aP*@QK`4JSil@_GiQIqS%r(mI&8_z1!bm@_ z{P9Z}R3Gkx)g_HXg|IFh<9wrzaBqIlA-Hv@(ChE`z7A;cEc~`j`V&|q%MqLT-^>=L%A@_f47b=W$-(Ib@=qBq$ z`cs&lM`{cEleZ78jR_Tc|ExCVM)&##i2dn_M9=L|j@#KSP_&o(Z&H0dp)JOjHw+YyZch%)(IZqu%l1X>(X;sS!?!6v zdVR|D``Vuy14OE9Pn6!3$PtHjfbuLr`3@ za+RQcW@G92{M_{A&9Xod(Yc3x{1C$5`;UH@94MwJGroTvM*T^r&yx8s5-8$F%(lBa zI#gulto}7mF3SH$*BlQfP<+t|0>8I(U0&2y}j-_cJFbeKc9 z=<$hBbbQ<@BpoSaq59M5x5@r0==j|`cEo8SwI5@NTvbcb@&7sG$mB_+pWnJpMtGpO z^(x9PZ7!8Zr}tU*pwahxL88E?E%m$1r1EzRKKS@(khmxJ^Ib&e2icyk2Z>uU|A`b& zr#H#{oks?W&i(p0Y#Tv*T%LWcqNqG_Jfrfv%Jb2TK(S@v*op0yP<`n1ynnViK*Y=a z%PGD|?%SR~{Xy=RJe%?>$ESV);?a&>X+|plZ+bayQ+_=U4IcNB{I3`%H{5nCNNlNR zIXG-79si8o4VIh;LVqA^(&_np<>4>d%Kd#;QhIV+FX1n?bo2Z%c`c2faz5hjFJ>?C z2z|DO$|uin|408sj)#5y#dJAeNDdWy2YF}j?&2@nkNjG{6XC2~w7=s*iZ929Ei^vq z^j9{%ty`fh&6nhSW&_pN@Wi9-hS2yf=TC>|d@SpKEcO4gYlGkIpz`VTn(Qwf=y>O) z^YCA!pFb;i2NPkrzh@e?H_zD)Tj}_0*)T^dcaX*do!)iy$S;@Ze9}+u|Gu60x;ysu zyGQ-uXs1KdDZg=+o`;X#4-kIxcpj$w>GXHxKFbsQ#Yj1y(x|*6YL7lSh~_W6A2QXq zoVU^Z%JcDzL&Yinf1*WGp*t z8&Cg_UJDF4Hl$gA$X-I@GsRDmdv_AP^!}>d<@2GUPf+yF$2|gQ8^u~_)hV7{A9MNI zw>CM0M1mYYsC+*0Gxzl`6eP~c{lPP-JTiXAAW=__f0wEK>Gj2^e@>$IBKNnxLVWtU zFIfZpaX!CG{`2T~So}pUIo~7fqthq7uASp}p*kY_abL~#I<;5X{^!wrcfz3~2{&mz zFs01j>(Z(JMSrV9YX*c~Z{hi|uh_pXLKGoBbI9QMj0?WPIkRip!E@A}-<;p;zseWo ze?{jLy%Ty zKlT;77dob2Al@p={m56evilS~!AS9Bf6SueApsoe=_|Zz|7CZjALUP{w~l%fe`8C4Xy15+>fVd$=bz9z8)gNFwz3^|qxzTQ*9huA z^15g&`Rnwt^8881f670(a|8_x6It?jRSv-QS1ZcTiiGb2ECC`yj)&A9o}}$dA9KM^ zT+0>X;70k;`zx~k5BiH#xj+0g@|W|4M1OH^#rnIiGm%FRnS1n^bH);eB*{_|;!@lk?@( z6i=u3nA##}ZVa8zT)TwkTpK2gtzzt&t)TWP$KQ!zqJ(VUG+&@r8LTFT3BBIb_n3dt zY`Pw(`bOPxfbuWL9lJm=s6n>`7i*aKd^%g~%O5Bb#yZ)*%?J~Eeb&?iV~V}3CG1bT z*4y?lOvLOw^EvHGE#WQ4@AJ0)QQB7|%kg;&#h2HA>uQOb@8+a6zD@O^^Uo6R0!Q_x z`zW`PL%uyHf7^YPdvZSUI!xTDv1M<-pjslR{$Be$g!OtM_m3Y!^E-KdQNo30h#ga$ z&Ud}cYFNqxs*lBMkH4n!M9B90it1CRS3+q1x-?LnIq=NxUCnTjlF@1cQUc~G4PL^gG5hnr~t|7epLjBI0 z!*TtdH%NFl89S+Z=WyXF&leSFJ{#Zf;P8or=f-~vqW)2UucyLAl1#5B_1ElO2Zt63 z$9#J~-M^6cGpdFQv%F9CismEozQ<=8KPI}0Deb9!9#|UOvmB+z_f`DG?icST?SD_l zTOOY*KhaC-sl3~GKVm;IQjSA)slMd>jKzLpi|n6v;le6$w7+=tX7R+6)L%3?za;;7 zd7jxECNgEY3se7-@h67~i_GsIe&U>LhcsR!$$3>_sy}&sSCaat9B&OY9?AIwoxhb1 zd5qua{2C`!(@XLcV|hwA(g+*ykr|%IjGx&6l`;(s&`; zr$3!f<+!%SPfVBkUWDcRVm#eH>6r4pe+@c6uCE$ay0V|RB>fU-{$abG3o7pUvwa^r zf6MFMuq_Rt5ScE*QvzM3qCOJ`DoHR>Z(!jrU_^867-^U-^c({rDo_R?|A zkq<^1KV&_v3Kt&o{zZi#v7=6l;NH}JHNGE4*AE3NOm1{FT&UcKZK8N4;u3fE?DJXKoUNsBk$d7r$Z~hv#{_ z^JyiK)vQVVJzc74W)~mh*Bq5a*6BO1i}&}^;;PU(-Xt87=x-G#+6(V-tqd6zaX$Co8baWM_6uPj;a@8!mihVm( zudGOFoHfO_Re3G;?wF!mj+Yh6oBi>J{oMX{>=#*c_WtaqIpv#IZ}ZSHBB{UQm!wZ_ zTGs3JSxMVnh3kVWW2TrZYFC0U&CK`JRopq}GcLSGB~6p{v)EOrNzDtas#r-&DnIn= zn6hpnX@S?lQv=;Kr$wE7_GVW=d7pY{%D1h<&i`FOBpq5;qkW>M7M)ls-v}o+p>JQ* z5<-6In2ZI95@!;XI#KEDm=Sfj$hy1~t~EN*M%g*|Cp$>d)S`JcW&9X16v z*L-4VuCb_;u$CAe`h?!|jFokHrh%`|jf%9+F<31Gl(} zHg2XLlY*OTs<}ypUcJhQO`BI`?`+jv%glE4Zu?hBVUlBnM{~_0=QImSiRemm$6J;( z(_)Ub`RbThLioJR?QnKgGcES<_oBz&6cgiQdFZ{@q_CNn54sc;#u+W_d_FeQGUt`v z_GjZFBIQUgHE%9T?`f5xW`mQsae8LF?>W*3-yD7KY90~2Hp*c^ax+alZ@IT@&b;DR zeTc1xzonM`8Df;_Q+&-W#pTe_pCKv#{%O~wH^sBnUsXBAoFkm|=g`$VzJ)B#{ZCkO zc5_X5O?xqEW=^qbp0%FU+FVQellpSaf;6|ee`MmN=9($!@rHoeg@n(jUI}*#M`&Vi z**~5aDlC$w9Id-3Uxem!f5VW$w~C4$KUQUo>lC5IIMltc``@DC*YPub^q{jqap5Y@ zopXtAbEikCD=y;7idVsJDiU50QE_olX>p^xe@6a}5t`*v%G5p;%Zm4X=y*PDu5FWT zEzVWE&!Vg7G zJMwv)Gnoq+{fPlR;XA(o`>K2UF?LM`R%#4nobq%d>wCA=^bQGWqwbos^@0S0Iz; zNH4PLWO7<|`EjJ?)aue;_jpg?+BW}@8_yy&^@gW;;%rZ`p!Rkxig;J~9BiJa(CeM1 z)=FRcAX3YEUv1d_gPy{(x>hgfVWiea&fk`KiYa|d)N`YF+b+&&{`OB#q1P|{QXu}` zh)B(srjtAEeZ<}luJzwIMQVGaqGLl+s*BIE?|f{dS>=7o=icJnr4~Qi<)`?mdE=Ip ztS%0Y8*TS7N2F$z_pf()|5p!6oyfG_->ZqGGXH1WXp808_NkiKv2X3qhh-x*lf0i2 zTuq20pR#iW5$=*1YQM6Y*mU_o$gM$*UyVo_O1FsdqU|?R+&-TioedZ1g!# z;oba7e9iZfTI{6H+U&EQbpN8AmLn@ti(75{G;F_@m@Mn7Y+EfwURUh$5`*MiF|w_u z@bz%j-{w#H`PHwv<*)gdzJ9f^5+SqpMr!Zn{TrGehTaqtN2Nz<`Q&_`aH}4}MPcG| z$osVAe1u-#=W6-vuE~+wIQjg!sgLk1l#*S4X{6TXmNocoe;<+kF?hl?OQiOfygxa> zN9gshmiL-C^8c~+-ceB{&-?JGBaVuQnDdB;ND`4G0#gG63_%MDy0Si-`f^)z z_UIhZxuSDN=Y`G}T>!d3bb53+5vHH9@1Jgff34pnia6EX3I)#U*h}E-d9N9FVJznx z_vf}D|5@|8FuzNTz}ENsa2~YDEN`4Fu$*61{jvz(@L;@|1Qv%u;CiNhbVjpZ`F@mp`^FJ^wx33tXjcC-J?P6Atx+$pe}FO?>>7WE^q zP{?70m%xVP>jFzwomlRBjj@a`b=~1A>c@;fS$<1m9Q}yHYdb<g3X;Xmi$%juKz`)n2d-IP%rzUvr)3%hpZ@CK)G{O?;amhr)_s^H@t zQ}*w0*qPE}of*OST#mqo=Lzgz#)sS~!vD|*$;_{KNMP%c%$H_F3;g;(U-mEOmpxe0vnPS3M|ROlZq#E{?f)Wf6;CBZ!wek z)oO8i*RNzO;~VGB7xlOB8upP`F4-%Vas8$~ldy3M_XiK23;#xkOWYsac`tDG$FBkx zj@~YCS)XhDY`MT{rz)ZkF4PWY|7pPfPt6lpIx(NyLwU}hag`m%KfONl8~n=gUmPpy zN72BCEMK&X>x;!#;XiAh#Mt!{m)GSD9RAY<0%tAR#5n1k7$XYRy&0E{85@rX41cv1 z<&`z$I?H3aiuz@kw3p>UeL22rFIn#0QQ*QljYN6L%QG6wa664pS<~?@MI$#@zc^e& zWtkCMe80vMFL0!@UAN`7vx&!VVl{!jCXpa%?F3|p-A3FNIrhN!ISILR?d1n~#9R5q_eLok;zu?8u zb!cNv?e}q4mz^$BCn}f0h<&EyCFM9vKUut))EI4IsCCn&d1@Ca&uMb$)+F?=%l=xa zos;xrdW~aijfk~LodJh)o_+t(Y|gZ`ZuTTNy2W$&pHESwZ1ua zdqs?Y%CwToZBgzxbSJi7Nn^r`e`r%g%4m6L$xRojGo3G68+acj-&N{&rG|9At^saT z-*NQV9`_%4*XJ}yC!4UYw14`!BaEG}QuFh6FDaGwQPHN$4O%Pj_L4SjE(GU_2FPCb zll}cfPif2IDcNh$X32J%e>Ffko#WRsz-Ln>3+u_AQXHKlwK72esxM9+a`KW&%j5Y_ zV~pRWeZHo@m(=@y?W3ibe#yGsME~kO(x1&b1t{AYz~NS${>wE_>4l$T$v;}el>fP2 zQolP_;a#`^3TYp3r>EpZ`{;=np7~qGdDl(5jpeJiKJ(4zg0~b$=es>Iy(cy(>NWI{ zuF(CO5e6t|xaZjWcHYvF61#vgLomF_za2E)H{&sgE&j&w*IQ~u=USuidsG~7a?K@g zsUPb1-ZG)Q$;ZHQN#0Va3g_GS{ktscRKC|{Z)w4wUjrIVNB%2ge!w_ysRgBPHh!P8 z)~5~*@{(G;jtcM}ZGen~D~BI__L7d!zHuhvKkprHfi`1r(*22fn10&par2fg(YfhD zgIE_TUVBM(7mkISD-7^@cGXIO$)3^`x;Hb)9RE^JX~BcB0Tm`1z>V5Nyr)!p8~5q4 zKE9r!pVsJWegTLog3~ywGiuYVRq(A81O9rNQJywJfZKYOpPLAH}84- zyaIRWuc+;%zEmC;qC-#BaOe6ozyJygAK$!q5-}vrbr!}?^<%M{w1wI^#-~ch^iOyD z(KfdqnI4gO+*P_p<%R9huzO3^+n-#dR&+0K5vEVpUa<*xQYfm1Q@h7s!A8qyz`v&9 z-x|X~7l$(Ty5E{*>rwO>T+K8eJ`P_0Tm!r$%Ro&HIyZD4=zP#o zzX9lU=*rdr2K;v{x-RGv(HYV8LDwJMAap~~jYKyVT?#rAx^#3I=w_qKLia1W#psr! zTZ3*ry3OdequY(HGpD;?;P}pf&=lFY5&hL49Ld<6U;Vp*j5{6_aycL0rC|B|RgH!J zhXWW--6C+tB(pqcBV!pq<7I#5xBSlG_a4M}&{4rJSU#BXlt}h3 zaAE)Xodj0pxv~87G?vTw1?T#U{7MxY+6n*hX58OS;0y!Xaex>bXRK+(@~tgdF6W=-cR-=vZ_MvGkjww_27&drSuTCp$o^&gg7m7)pSy^XsPOiXmzJ5kiF0Wpdncun&;{m;yzm?_C^l zjCVa`|6weLjA!g$#?ScJn$stF+l%sV!B~3Y#rU#We!yQ~IbZd_g8g6aC*%b^8B3p% z1y(&b%exO|EaR&R+i-gK7=>IPXU1m#8I#;O{LCrrU&b#O#&$uV+It}{(3|le6nswKU8A>70mUy-wBq>`1&3-m_Nmm`46jccxi3|t2UVB z^&D9)<7X^!;r!1tviz1a=XY=t`#;5UNKO`5&d)f+?HxX!WdBKi9KY))#;XGud)B}+ zM;sz#eAQk(r+-e6kQb;J-(GKNs$7}`TED^@>$%S!~br^!@r60(q}c}_~Z6) zdSrZ6`4E=ZcH#IQoBOXVVFDM-3g-0H8_4m?_^Q5R*#C}nj=$q*=C_+;4$myFmC62P zd{u3C=7;|xdHgZUxm{%@L{z;eF+ zg_&P`Ovv@Oz1jciiyU6KFNgp41(wVBs=@7ta%?ehW`)G*7>TwuAJ ze_$TRmw!dbGahAecpdIAZatsn10Jz|IlrlSd|NV3$W_%n*#F_PoPQ0l|NeGjjh-Rn z7ql|BpT40&uF5o*--#H;v4QO0AyHsCKO@I4mhuhGE9Vo&@gV8A+VgU zKg@PbvhDAQJ&24$W_;*2<7JH1LUF$n;;G!`_$uXU!8>9$vjXQ1&qJ9;PBd?6IfM)arRC#zc%yL zZ(nfu^6<7)WdEaXvVXry0vG-9miZeweMQmF1kRB03$CSiqH-@J9XkB0I{k~l2LBkw z)_IK2J2QT}i*eL(5d7B<1kTDZ3Y=k|E3h?g@sYeBXS=|KU89*lir+0P^a)|yV26;a z6FX{%uYYku;G)tY0%xq!iu#vr@CYDzM$>wX*X8*OY<>sFSoOLe;b)q3Ay?zO1B45H z<9C&^$Hoh+8dN!s{@?g~vcUSIqXo|XKF^o_U-d>I%Fo!NyTBRED|317xy$~2&Gm0U z1reX>FxS5#MGS}M$@RI2bWt$8f|djL`!w3ec)U`yN8`pj0_%e-30&lWx^u*@_nhO) zM_n|;Xy;GJ)f-V4jpRc&37p+MPUK&I=?43+RGG`;gHo*TvOkUySQS}c;G(Wt=3lA8 zIOb1IzikudPqJeF16nffct*(e2iU*yVOQpx*N@qiPzRUNm*>ObwOYpM&C?64{!*LM zJIGRCV_dMnddt(C{&UMk{#AX?3T&)doy$kHk>ywWwiEeP{4TIM7Ild*zKkt1nE&pU zw~%j*WBj(JkH8Be1_y?Jx^a9(F@K8k({B%EenGat1>;$sy<`c? zk8=N={e3l;w}(52_e&<@AQ#3ZSpsJqb`&@}zpa-@|GMUky$XfAz}G=w_31o;^{Y7j z>O>w7R8mux2ft&v&YJxXo6Y#RCCjCjULyQ@U*qVn`t45w>yK6ua^sk~EdTVD<b&Ij$DMmTU%$PJ@!^AF{-o-@RA3mxXCSJL zi2|!5tVDSg{N}~D!6tVKUzHgoaFJDowuCc6d45pz(;FercyGp~9fiE0BJU1^y}+uPl{mbeZP|bS-C{pbJ$I)_PsT7mfsIiS?0@l0 zfwPll3tTXa?V_rk8i@WuRoH>~jsD>9_cdf(Yp{@K3~w#4y5SAR^J=kt$FBmH?MG$@ z=L@XA%>3-=&&+S^F8D=maonH2F$tVK+==_M6DK*m#i)yj<)vD`nfWJna{stys=(MD z1y)J3*nhwb?oay|*?;q+0_&UiWt`+J#;1(={0@;ZuA&&Er9;1_8> zG9KEMzem?MV*Gq|cmjWqOSWRZqWWVD%kTCU*kNQ2e~-dyVt%vn$a;bCx~{;gem4X* z3~DImM;Vrr1h%$Y%=4wb{7zYx_kDp2{+`eLCf^y47|;9y70=gnDFPQB=J{N~R_{(! zZie_+mhTT_ykoY&`a>fH7W>1_Kcb$iG6XDJ4k%gd-VQ8xhNb4C&3;GHzkL)W&H?%&kA6yYDs-0UtkKm* z*8p7;bh8tJ)A}nTAd>X;Q+#0^>5se%heo7_jo*Ctf!fCgqOFzL3;#TN7spgNe-P?}FUa{KAI_0rNB7Sz`@$X4 zcSjsc`VFYhnrkv#Q)463PCBcO=XvJ*ztHL3t+H(p*$gk$mAV?wo znD3!5na(=z1VJR}fxGJA;9KRfs2xF2tnv)nTwc$5NT~PrKH`CP*DyW&eRA@#Y@ZcsUT3uM8-kRxb>4NgsAcAn4kAm47ok46cvRvJGjigWG`toL&w5Dt?bk7E?fBt862f#7j^ z#Kfk*MnMdnLnDqnupW}vMnRNegWo~4gCgo%VidTO-eI5?c2j~`rp@o#*Dn({c6znCv?aKhW&n`Q@XKWPfZ65=Oom>()Hyr)1FqV$W0Iu(Y zqM-P<;ED6oG@zt&^xIL;2$R>Wmj*njyk16ulI$r|)IcAnso;ulv#gsj#?&BmtJ0RB4`F&vfAN@q1EBz`ZHf;}&@!0*Bg&jfY zhpfi-yYJP6tFNLUk@^EatnXBg7ovXjGkrvRkAyp8^emE>&jfw?(uzmYj9Zv8Ghghm->##kL z{_nmp$QSxo?P2%U8b^IB!k`x2kMsQ069XRRS%77bh4fC5!(xuhf)ve7Y~MIZlwWGynALz zA==)&O6^>s0cX;u{W}x}b!$F(*f|Y_xx&x0B_dKv=0E44wTBel@z_D|Y>YKNj;?;uFe#QH7y{I37 zkmJxoxusDEcr6I-S@&HaG-}-B*k}I`)T28o`8s2JHjg9sNY4q|_c_teguu4O-o=CW z2ElInPI#}VY+P|B`wjg=;O|%6r{op}fxLggAteso8XBY4WxXrW*lJ-={`TXGQ~bfX zSB>LVSF~Vr`)-Ais{qO)rv^Iu(HK>W#+=odC$b52S_8AAw5eX%TCf}O{__#Uedx@7 zl@^q(UU=8Mr~x@&7vIYN%Ef=fFN;s;V0Q*sPP%(1wJ@;Z{tBb;+oe-oKwC+-NGCTC z+ai_0FfCXLTRrC$(%2TLfykN@1I`3!;c-aUyd{e@5KDJ1aNb~bA9YG+0m##j zMG}vesp?Q!W{;cGea-j=;6?hY*iWekS?pI&afeJQBfO7O4>?QG(a!^&4>~_|fG!wa z7`hJVqS4U@>Vd8ox@2?%(G5m79NlPiT6von)DIpCf!Aa&CdL6ClHID+Ay7p6V6|GoJsK}2gn&ftbt1~?en&UlL=$?~ z4lp^rr!>E52nhY904H$2J^k>Llfh8*jC5X{VW7}k48{hGx@hBF*p5Y&U{F&(i}%Z> z(YX0R2S%Fr+-(i6bpPjw4pgLfUEUq4P#uUty;t*h6MEG(bkODIUzJ96b!WX<)KfOl zJ?q$Qyf#M(3(Q&tj%-G|p9o>V$MECB!?I3{s;r&nbVX~vq4C;{Hrx)T6 zs5j843AoZY_87}g*fD4VnKYlQjqO`p`(TMz?b^rPZ*vyZFT6N zrZ9Tbht#zn5L5c@tHGDbr+p|CQX93gg?ps``U3qgJNYpo%N7nEjt?B82?fbwneW^{ zJ9tWZ?~OyDXyBpBx%KVf7QU$31m{f>=?ngC2bamt8)CQ7XxQM59pu{PrDov0IeB|2 zycct!?h6MfPj(sbeyUqFuTf1ZwgL;fpI?IYiSEr`b%3ZD3;kAp4F);ibl3mE!$%I# zLwhe_$vf0j4cd@+@h=C6e>wM6PsB!Ta6`?X4v>7JMzPO(9mx46MSb7VHyrq0+#W2y z=p&xne{+DDf&LSfIoMw4o>HCz>p|oFL^&VGF4y+vkV)%?^k5iDx>GBfLx~mk-IFl= zWanjBbC`Vkxd`7qwcmit!}R_k!{0hTY>s~QvxSr2nEmfo;~l8b2f zBOw@!)E9f$!Ct!K*ee+1e7@&r2jyw(2*dC_jalzo*+D$njq8Zp%#!y`WY_4@S6Iiqb*=ErN0e^FXI&!jUDT; zJ0Smdji{;%?ZB1nAvk0EAiHDh?ZBPdj}xXx#TjW%?S70$z3mN; z+Ei%V406fd1=E2779cBT=te7QZ6Da^!}sYRDw}C)2S$34K`PlU;)})YB&& zxOHmyNcJS>Vt*jxs|mlt{+8}r4MV>E>+YFPTR{n`;Efxl1Gnx+y&j@Ild$VI2KzG^ zAGY;%Z}g`V+a2tR`6s<-)RVEN@dND$NN851-xDWrrSv3WdCBAhy^& zvf?r|Y~J2xEjux*FUslv5zF}oF`l`=G-}T}jQ?`f z`6mlq-~z3`(Tq@V7Jb}%kd`*wjU&jv{9wmAOKI{LW_T&DbcWBz1(p|5P! z8ZHaFkXT=7TyS<}y97tjZjWd`u5gR$*Q7wSOM^P9tz3W1i%cDQPjJEhP1tRCf_4=I zALk#`{u*Nbi9f@M?K~Evob;prC%z(Kx4;3W(fapmAQ)+#RNaB?sFVcq{Gf>g=MS-* zFUHF~$R|4??@&(po#_C*Xzuzc5VDU``@;U4>_8!w^R4N=vkK#<@wx}bPv7TGORy*V zE&~I>MD?Sd1Gtku8?Kk-e4L_n#(T?&G-ouf!HKF@Gf5G5|4ZfqEuaE&9MI8^=HDuGQV+b2c}XP=Jw7SRd1x(Axcuc%>558%pGxhE1dgA6i@5PW9krEhxHv=Td3;T9WCIc6WR_`Y+%8Tloe}{>@|aYhzM( zG?sw+@osgYtPHm2oYlb}rRU9J}| z*PEB?)ywtnr3OxIF3xNs8L3@Au|qo;2jAttYbX_wK4gR)WH;TH6!4|qk8cJu;6wG{ z^`x=qpI8L-MSh2OnVo9Yk@PVRRg>=9fhlHJrG!g1QU>*teVc=#&h2$;YE+Y~&tm(3 z)*K)TD|AXlsf*$GzU?1vf7FRA5uO@M|E$(UlvI}7NY{FJ1GMv4tMh`!7LxVlPLHjl z>i|?3nRH=LB}tJuZ*g`q)d&A7e>ATwscGCEi{JNc{)oNCDpFzk>;8|he!#KZ6*~Oh z##IfM?yP|FL617dj<%BAl2QNCRQo^QAFE&&tiKLaf7i`+JiZ_MOv{c5`*Rxj|5(f0 z&{&UqN#&4Zi;jNsb+vrmEnk<**X?5e<$u=o|5^9T_66ko2J(Fc`M$$CT1N-#!DD#F z{&uhYVIA!Y`RJjz0j?JikNl;cGQJk>V~`!j$^qbPKzj?HLt*&?ixixsI=UzRGdld0m63XCgeWtm4VRg*LI=hZ%{V0H|CDJNKjYA+|5N4}!(RxT{n8-# z1&VJ1tIH(`Eaw-T&tv}yLxuduHw;9(ZkWD|rIL^vXN=+SWPIhw{zl?uj~^-I`6v4c zY*eNQtUJo`qKo4Mmh)A4)=9#D6LWkPR*bLmS)2ZrQefl3LBfB=@mh?N7YZ!*pZ<7L zPYTay$`kVZFvi)grv%mw-6G`K?@tRX=NDXv=lHLj6!K^H5(QTGI3loqw1MN_e30=i z#@SuaE+)n+_pj`0j=wJ2-9@=_;YN=CJK7l|Y?iAtNBke0ZRV$+=kiq7xWw{}$y~m1 zeD@;X!d$*#_k~=}&p5uF^JCT9m*UIlxs$OnQeahw9L`UNM3&3=y6q=Kdew=mgj|<( zoU!@tQ2vvA&d;mgSuW$}*HMY|sx#Mee5d{r@u`DqbA0nYGk&mH;C#2r%zto;vCO}2 z=qZjbLnY2S^LKOljKMVp){QaCui&{5=2y;Fwm-?~r+sylEBBf6*AVTt6W@#F#+Pm^ zm+|xKEf@ZaF1BO&hGha9U3?h-#W>q1K;ZxIQ)_bmin!@V;NuhyPxoHJbj=2n>hXIo0kO64{XozJKbXcGQRRs z0QGNiC z{Yf4z|a^vRlESK@~OA+)ZGkW~{)@amv5c>~Wa9MR`J4TJOB2`_`bOaNj%IlmoWIfcmGO0! z!6JO?*VRDC4+IIEb%^gqUS0$kTT*zi^Ka^W&q0yf6nC%lRL; zerJ8CC*=8+%;{Nx+h~~H{2X)q4;%$fm+{lDG%^w|t7D9iEB&}XGPrgWSl6O9%j@(J zSkBM?d%P&G!tv?MKRZU?tS$2dR&JOeu%W^nf#v*k@8(8|zc6qj^G`Js*pSv=;PiZ! z8&=O}Ea%VR_G+lTK*)6&uEM|d$jt&PZQKPmOy9v+#?QZ&!E(Ti#`UZLWXTiT9a5g8Ku*_E#*I@s+1Xi1M+X_MG{@4d?Ru_CjE# zAIr1WykRWk>xRrUQu$d=d&~Zxa{3D$-ZLJNDddL69|e~4b+?j*ywLX?%Lk7V*!sa` z#?yuhob`RR!0C@t7(dEp`KC0+9R~`m{5nuz^}Zx-Pv-u^*aFXeF`deyu0oD?KUv;@ z`yX}R>D<2JV_AOIh2@c<0vGMoFm4+xaMmLQi16CSFkZ{;CH)*@!y|)`>q5BwSP$m< zoNnwS_|}nqS$@>q-tKb!QPzwQa)V`@kmuj${>5-PRAA*w{=Qis@LmDtN9V=;uXU9) zmQUmQW*xMZal?2c@vKjOV!o`r(>MDHxna{5=4Wz$Z*7C?LG-7rY?eRYDX^TMe{HC! zFNF?Dt>6zHB5>9RmB9J`3}gAP)fmh9qvnbD3w^9uex0%5awW#gx&9QYs|uX{GE4BS z6YC1Bl=)Y7H`li_9G)(Iw%}*EHs$bsn#=f5bM`OeD{HRe@K1C2x_8-(gIX|u}gY~p3LZ1G?UhoZ`tpwJ&Itg5Oo#X!xzvU?*&#J?^1j_n1n1AFi5anC` zB+DnQ6ga=tF2*jB!09`eFt+a@utL z(v$wOe@{w>bpvyL-uGktjMJMwZh*js1Gs;HJUKtV-B}KQnYWNDoy_?O)e5XT%W}hz z5P{|V{IHwCe_?Jt4!`p?feq#Dng2@>%kM`pmhqJj9GQQBzi0lkR?H6z6@29lSC(tK zFqZMtCk^gN`7+ds=kTxpByiSR=Ib7takV}|{vW<}4TpcUM97tUxqf9mVVvHDvEgR9 zK#C_{#?NoHmeYH^0?T`w!}G5qaJtVrmY=g^xs0#0;qa~7D}`K_wvyA6regjHb9lF_ z3oPfScitrUSxZ{6|He6tx724`Ya_>>UQ=K>KmU3&F0YRBnV->w@$gv!=eKOkxC_hW z{Pb5HgxpZeA_x@N{}qQYf6`M9Z<;~i!V(7|m-Ev%^Ln@NI@$|H|H}5fzOepsg85gr z3;)*jZ3WJsvXbL_XTx&uRg8zEium&D&FA>GHs$!wEM<9%`&tS=e-F<;tuNtwUWk?J z#|n9t#dv|!OHu_cTyT};8%D9*FpuTq`m)?-CF3UD7+*Kn@9Uiz@0}~;x?5Zxh8>Xt z=eIPMw_Y!BdJ`wX&l;A<=eRavg* z_Nd(Znd_7PL7@DmTRi0YknmdI{G8_kTfgll{HN!0|5W(YMPTL1Ni2WxQ1H|5juANP zjs?pXCbIm?2bSx5FplK@M)!4)zy{jCr1HMR_1Svl8OE{Yd(!`BRm3|=_&0r5iQpG5 zC}w=F4$EDPfkN(DUtsI*+XUA2GnXH|AELnUdzi8JKE`diybHq?37kKH>$4$X3Cpv& ze69V=^?hqAj&DX1^M8q9TyYoUUw>vipeEP9gS;NF9=)CGXS+#4ZaDLl<>N;STp0FA zVBJRMXU%FX>U)0J1R=M66De@|(hLqC`Y~=fi{%p9%T{3i+Ad;jJCJd~I>y}x3!ERb zL14qs6^sWiX54t8z`A}bm|t{UV0rt~DS5tG7}z+N;>+J{&hO(&jPEkGzFA-3|L~() z^&)?TZ}VARrKP|_)#n9PW;zLMXmpChlk+W0MfwV>+k{X!>2-=Z{m&WezI|c&Bvi+3dd~mrs^)nZWsXH*t6_i`l=7pFiuUkY}C8`+gWdwjbtKvt<0HUN8Fp z!av`O^rwGt6xdMBnZxt9WqGSdoL_eb#?c#@-_e(G*zb(DaDENdMhcw2^-mF>b;L*U zeUy<07_aQW;bn4vmvyWK^G{?7d0~#8!)r2w`F+#^>yp;9y!BHKul;0}Z`i}%|2VJL z3`e8b|H~|v+n6}~Pv-CyoZkFOTUj3Y8_Oqi`U*?&{-Xl>|FoS#ZaB6=;Qy3o&}#PY zv_{C&TXKHH836iE@41HM74`_ZQpQg}j-JjF2H|*^yP%U)Om|>zwt?pZrsvsPb&_U_ zy9Vy!oXP@u^qWTZEA^cu54yip)CLw8cP#mP@V|b`!t$g0BZ=;C^}dZM&F|l^%ECTJ z?@|2O2IS#k6y4BeP`bqZ#|e1x(m|Nq=|l;4Gv z-=&n_wY)?3seTFsdwNID;lJle^7AI4A7H}!*>qoGk_J-f{*VIipVK=Slkj|t?n7X| znnQM5alhye-5b=;)_JC+_mk!@DS>@eWOiLQ>Kl^Km+W3tw7g_AV#Q z7I&Rcp4P{AWP(DH^NZ`^n{c1CKf=FCcS)uO0*Euv8sI^;h^hyHacQC5j>nczOm8S1 z$3A*ZP>~+0!k{ZtCdc%_Hw%Iqy?AI}1MbjWvjv#mKh_TQpIIHs(qGhPcJ*CnzTPh?=x?4Q~^J_EddDA}SB}=;Fh;J-pQ~n>7lTzqzR5Z$y>8F$| zZSZbk6TI6)Hin}rN-0gIA1%hS#j^6Yq&FWf1cLEY@SY5nrQ}R!iRS~s?0XK@TStv~Z=w(@me+4oY|Z9Gxb^Pt9(&VMIq!K8JusNShW<00XQ~dY zCagBSLVcn6m+J;zn52Wh^{%DO>U4sKZ6nHGbw-@LzuYR+zq5Hi!!OKU2My4qUdiJa zNKUC@dV_aGcTV|`X03z!8#5Yx<{)o)InHvo!ue`0$xv37GcK=`+_bM8y zkWDTd%rEgGV!%CqS!w41Iv9L(R_K z`%4SjCqXLRlSM4&e_8%McUY_u+)3{R^HvsL$&H2O`r&xoesKp=Pu$-V_nxolAdT!C zB6g*FTR2~PN_{|4627;I?{lG^PvGXBc`a`yK@Yrkup8Uz1brJ=Q`HFT%U>?3WC((5 zqz6?V{nPt|_&(X!2kK^LP9{MhjZ=0(uJu?e3cd_NBk% zO{fqA*B9L@NgRy%q5B8#10kNuDem9$vKoG`^pr6PZqdE3H-WG%uSd_L0ZDu>7Vjws zQ2L^i-~!#nuN(x!*P`)#$0XQB??++ZnN00glLRwKA1VUdE!{gEkOc1dRDvGU9}`)( z*9UzPJf*%9@029a{XaY}75BtX2l2bAp%~wLd_VY95O`5rYn=o!bnos_5M0O^)bwr3 zB$!M(1LrWm=I`yQo^a$_Rcy<_c)@8BeoxXP8leNP;uy`4gP0z=NBTqux0|G!ij_v# zIdMeVhwVD3aMMonu$+R|{P|bdAud(b=JMK6I^7AM zw3=Uf{m4%|-K|6HvjB~UKedT30`y`;syTPl7mu$ScgMr-h~WXs;tGrF7RX(#O{ z!ne??^_CQ$2fsVn#!i}#CI~m8Tw3*AUjwn6zmfX+Q@y0uq#N+8xm2b53>bwt+jeb* z`9;ko_bHzxE5ttdEQsnRzTqO{({S&J@_Nm-wt057xir647eyz;V}k}yYgo`+Qa_lG zR)MfI52l<)EaNLI@a}I)Z)xbU5BPvZ9qDM~?Yx~6drJ_89rU<5(zGRe)A~;6Et$3j zp6NKgjwI(}5#mI&uVjjz^vlm`8>y^*sPU=VIZ7Mp%)ASdp5&+&BYmnN$@!lTT9yvz z*H=1jT+L-BL-=7|Gtt$cSldxkW_&?O*2p~(OgZcA<6lvs3T|9qqpQnHsM{Z zr8@M+0OIZRz4}{Ak>v)IPBHeDvR>m|v;b>K&c8u(;$E0OaVNbTS))R zX{PWQgXy2uzx{ew3rWr|Yd_NQ7hfxOt0>)lW1n}TUNR`I;G@lVl+vk(a%xy{o}6FRYg=092PmiH|M5Am6Y1_1 zVR&>`d(`K=Cp0f0IlaO2{$rk;KNZ(NZ*HUf+u(zJr+>^FMR&Ono9Mln86WeGj@WB5 zkUV(vKgGR1=E?bwK5fjqXO#@n?*px$e*BmhQB+-%W|@p{15EXQ<@O=3IgL5h@OwPm zvFgP94|#I_isnvvv$1>?@81-T8S_4`a_~FUZ9@Ln1FKuL|CraN>z%a6->Lj4fA90; z{AIfIw0So$edj84d0h8>UY_qM)646~uR8hPt+ zo{Mg}RO1}_pGpt#Jj(07_6f{IY&t$|@W)Dz^5lFnHG6Uw@_X&*;Mm}(#-Hx^5%z0~ zzq6W0E}b=AyAki6J3aT%S&f{(h-}wxMfvP%cVAXNqq$ArBL{IRU7kL#SrvMsbOp)1 zP{%OytVYh?cXfPTH1XFw(AcD|ySb^NkaJs0u9mA2C}u4?4`YotSZ5aq93o}}!! zqv@U&p-3hFOCFoLcetTBV!cIk>;U4qDT`tj-PXwY6KF1#jq>Eyx1$=q)&#w*2|Lzc zd4C!a)M@nJnqR({px-LQDeD*2i+HY)^UwDgR$5~v)*qiaul5)}X#BV1$L z&zeQ8TT8i^UpV2sd&I5Jnu>NyVIGyw{@KeKJo}=N^T)KkW9qmZ!=Gw<_u_!>nlLK= zrO2nZ%PxJ_{OO17Vj0RkR)_UE{aqvHU!k{D>HEzJwrn5yjlVaRzv&|BE|hS6ru@0= z4GV7mT_fl3!qrR$#;-`BM}A8*r#m(R4Pq&&^Di4-mS~z@`wQX;-&ZF%H7M1{`Dp38 zbn0e|ugA_e#;T>7PNX}&3B&7nZhV{AQcZ*t-u0#Ska)5~wUMP7Ilp@9L&aC}?_y^h zd7xBtx7i6rWyFei7gk#?DAi1$??YJV(kVdB_ZqoNQ@9o9J3jei&pRmqyU)@izv2AB z<<-^1!QV6mH2&{Ie2t7of7i(QQx=XYy@BOhL~qGWQ9;dLyrpxCF~1vVy{-bMfQr)R zMVP*RE8a|OqXId<6F%`+jNez0)6Ku-_sTGUbo1!@KdG9ahbpjXE;b;nA4M%`JzoXn zd|zDKJa~@jTZQi}cCQYDvAyhg)>|r~?;mXi3ABEG*;`Ui!}{@;70CId8noK7kCZjf z{$9CU8@P(~;Va^@`S45HzeDV{Ep^cR`!*ow<9x`7b~w^$ZoR($lO9G~7ku5OD%pqU zuD>+6r5M!x-*r<>hprJiH99+V4(MFawL#~Jt{pmmbXs)5 z=)%xlrgiRyf8(_GIjpvwnFLK}t+Nq$U48)9R#eW%0OWkJ&wzI=Q%RS+xfU#Fe`He< zuXi1^;NEJ?szo9y7 zU#!JuCBbIen{n2{)81ny&7YeDuJopIm=-SNk8BpdI0?$rT26zQ)^7j9U$gA{q~Y%{ zy_65+?``^8Qv8FMbZ(-w5Ke1NjNj^LW@-62Ey(#R$$wfBMA2Fu<-_yO1YwVbAE#_tlFh zXfL9=k>8rY^ve0=AL3rbG+gVG4#YGq+hUlK1eZz2dzKagXfFiYM>3Vud@acNqX!)C zIba#)KSEK`s%;?b#O>kK1xYZPY{&TE-ow^48jpqey~s9BL?Fod!Zrwo-;4H61_Z)1 z(v7C}co`oN|0xj8#MmWO#`-s!-grPP=WEF32%cR!lI@&>fe`xPx^%s(5qeR2j|Kkd zQ(S3Z^h6-Y`N#S^I$9??2^?uJ=v*K)p?$kWxR@Nn)RBZSji2HW`m*1w>Dn*qVMwm(Y zb_{|%?`eu67b8@mx7()&!W^29;oXVL^j7s@EnK6w3ha#Vl=g%&vAy`K%6)l)->>{N zO~zO)ETp#y@b2|>sxR0c8!X(hdf9zUKk3-x_fL&zm-O;R609WKK-hmYx#)WTEA~GJ z=}ma-FTOrNUHNPHJxS*?NeiQizaR1N4wKG#V0hi@rVibK->cF%|Ha{Ge}TeZi{oeIP*^GSKV|l9nTDVO83HCQuq~nh1w;}&nf1;?roveko z)PAu4j2SvstIOAd7uCO&Nf1E&(pK!qM9iLfVcXV1HETqRW@~}P9-rC4ONTu(9 z=}o2Y_lvo{VtnH5`861xY_CqfmnYxrlkWw}zc&s^J<;8x@vuw}>0A#~w431vO6r3> z^-#+{WZct=0YGCQjd}Dd8xO1Dzn3pQTrL^y488BL)FOC}wb;amC?V5Usq<7W7qkrlfF#HsIlbNpd8&7p6F!bsAuH)8&%ufcb5UK(ysi|*q8qa5#`me-`7 zW^h3}HD%?jJLp1Ouu7~wx%VB&7I8#4@ z@(VZ12fmmQ0&@O28u!t*l^5Cm7#9M%fH9N5Hw%Lvv{##s{;6L>TUQ53w;JD~lJjF} z?{!KzT%djoZ8glM`N;Ti5Vj+1(WVdW{Z0)BchVhi9SV^}D^kYK4u{KW*NZ>52nD%+ z(T_&p`Gh&&$Ecq|dv-g+e?9Ws`Un_5Dzn^8JfGM}HjKJN!0;>hW)7aOT&8)9Qv{!% zoW=7I(uJ!Q0ZXyD&c^c)k-r*<(L7F6g%~JL=ZS41Ab{TFX^e8RDe^uXw#~(PFurHd zgZe3(2>83*UlSh)%Xc3Loa7yyxki!v6z2IiEg zdZG<42%A*VITo&yZKRE15Krrt?y+dW6#KEHFj)91MdNul7WR^EyeSL<=FTi_UpEde zlWn4gVQ`H0NgxiAsclyd16SH7F~q^P4%hs5p4GEWqSKO+Mw=GH8H63l9(frK~>p9tu8Gts@PHfgLJ{AkNgpILKu&3|xYb;#O zt{GVAbO_|qymeD74DLPtn9aTr7)|r(wXyIvc*5bVtPnV8+eTY$P%M`7lu3W)guvt> zxKKfRpSHL@ygniXu2X$)77N>I{C4DMceLR|-=S_Stf4a(>ktUsWd|M59%%zw@BM=3 zPt&{<4bgt{b9x(RN-&hrIM*T;V%FWywHh7_xipWh8Vd&>bx55R``_~_`FWQ7yi0x_ zCOzkXF1x14e*`&iWkiZ$LQrpJSKl&MQ-uTS`>uKCwq=PDC z+iRNv;>k7~-We)S>y*_7C|?iH(-pxmndZCrrcoId=dJjrQZLf2PYZ@r`ko^VK+k3C zhX%uCiod-99H}3=7YvPtI34k-&=FAM-c&6&819fwH+=_)p}rr#uU*x}($>-laHl&A zChUL6R-7Uds;nKIS7oyfYEip87y-%DKdsY2V%Q3keVqudFHXT|^9*hBObX}vR0Hv- zMw(c(b@tqAZcNd!H@d8)!453JT1Qnzqin-{V5FAldf<*>@TU` zX+pO1uzrSN|44U83c}zXVJ)VQ{ErI*8?q&XZ{yiff4M6RlJ6!K|NMIh%%=6k3he(n z9IoKEB?LUkR$BLPK4Uu>0)1%yG86kxDhEX|UG;9E&IgMmFv z6GuJ_gIGFGULOW~X>31&;R)M};owO1=5ZJlzrt~#1CB?c-k^UPFZYIn4b3BSkx%n; zhyW|9=Rb$RxfYpz)QGMoIogsu zNcozH=^a?_-gw?`qB&A3jr|q7 z!j0|gcRJj0g4`m@lGK+8kdH@;-f3P?4mq@sL%*d|PaY<~{=KPRFAj78dsSyxy*B}F zh;hZ4`=vDr(0^0zyIJloP;z`>lMbc?w0(jHRe8?fyJ3F0>?z1M#}%;hqG@~w~(RpSx+EStUKxHm7tce6bp7#dwwZR9mYae3K~o~b`+C7b znulTj63LblzGGpdDfJ(l+ZC>ojnX5Y@D`Dt`>j<1yd)c#_=e?R+y)(159`Jk$1gXAd2o9PNC3-U^CpCrL;3Fk zO};uLK#FDK_8XVCg|GYkwY|zGz~K+APW9Q*7N$+=x?2Ak%bVVQk7x@|+jWEfkqMAU z^Q3j|P-n=Bv_laIu$%Vd;@tTj(AosZp}JV=2DUT~Y)OE}G!GnwZ-CQ2&D{hjc5>d| zw3ZtL(md@f*3V>2=)BfH>hfaIkGO)o{$iExgHGIil-n>&q_*D?*O+8)0&ON!J1D#R zD7SAY_nSlOjbdDHoBig{U*ABsr5UY-vap5ip$6LjFMEl7F}jV^KC^f9_ibOz6YQ6S zmKd)`K$jlZ^UFK=!M00{^Dk8CAkt-nJo*)djPlK@;R)lZo};baB4N|h6E@TRtDhp_ zvoqQ>JLC!OH$wbwo<@w#*>|}Yx3i*1xZ(PyD zP&~mIDB3vK`!us;(La$O-nx%y_aiMCH2zQag~?u90y-Z@xvOWOCaFEl$9(pm6`~(s9|>LlGJapRt3AZGr~tJ`AvWhNb6Zu7Pfd(}`Hi4&_&!8g-g&*<4(w^X zZxIP5vR&NV3-;2!hF2t%mDTgkPXeD*#{A351B)n`iiyhA=h(CjogKYlcp)zEv&~}| z>Jyd4P2amdI{L}eAy4Oi<@c#gUIoCklT#}+J0AgyE8jV|tda&2ls-phO^pE6)bU?c z1p&BUf#VX|klj!7-y0g}LG2FLKMuHWxbCb;g!Q-ac04oR&xS=RdUsqk$s2P2xqsLP$E(-WUz~0S>vc9qX4i^IB_aqw@*+2|n3TuIdSY(_C{?1Z-dDexgya2b{b$^Lt;6yDYp^w`Gds zXg}LPb|LFx-+3DCUEe{u%)h)&%Ik_eFJ&U%T5&vm3@-`*U9~NVO zbmCsNVny|fHpTY;u~~J)@A_=ZuWHdZSs|Hzg&lnk?OEPJdTrlEm9xA)l=(GnzSg<( zo`#@)d}I8y3_GdJZ}v}>zhr;1m0thz+w1HdYA9M*P_f@hq52p&r>39@q9q*luL!uBHcUzBPFAkcE2KPk-^vDsi`LpZ-T$ z#2)HQJ;>jqmcLgmf6rQ;Ud+NkbUu{tp>JNVtnkhj7Sg>_+=m!Sk2||J2b+y0TD!HO zqCU_*8~vo{gYR3{_*YiOow?;qt`n|(CEG z`Ivn>$gMwE`f~w@c$-tC_)`n}WX;VsoZEw%*5Dbp z%^_v-#%~I=-!N3zIkW>E*)Kj607feBU+m!5>OBta-l2gkvP)^(lJ{VI0K>PNm{6_d zkF>mQgLXP&(e5AF@okH8C4RuEt-+4$nxmcc!p^5p{oSJtMADgIEe+cFL!AdjTW}U* zKme$g8!}AQJs^eb@!bl5jWk!P?ZNkN>jFG;K%3I5Ji(PN+l>M6p*q*d8yrcmzyMIN z>vuoB#v9(^6l!ZjfU$Hp>x&PZqp|+8=11A(P`JNopg*N!w-5B8Ih#`e+&F}GXG?t` zhwPtg{J}W-tG2tM9i)@J`%+IZ(7rF?mluPBPL{TXp#uAW$3#5LS9m~1s|uBND|~)@ zE9SdFG1&`-uUkKr_<46ogs(bL0{p8cN6|f8C%B*5=DBv zK_WZrUs}SiGtQsdGu0akqpuIKZ_*Op)0zQKd{Q+70+^oV(h?f?3b0nfd2<87Pgb{Z~Zamf2H?-xe+exT26!oxd#}__=QcE zhZ0}$%Y9b~{}>i86<8W_PvF8WO9VDOxFxXlNR~sVKUiLe{hP*^u`E36Kb%B*pz3u; z3O}nO?qT3xz&8;ER@w<{+R;Z~Ip45-E9ZyG0{vO%;a&s&1@~I)-&|ITj}EO#p4GSx z=Xd1?#&^u+G4w8{zriIAZ)js7w|;|rrxf0tt6V;n=862mf>%Bio^{Hy5Fs~z8-quA z#_fxRyzJX4V)R~uZzKIb%H9L2itCI29TgQj76dyg5>P<8h~;h+1r!C1C@Lr+SYt1- z>!`#YgEe-qy~Y-M+EhK086KA}A$9`?kU_M?MheFt|1y|?vQxl6z`I|aR{vGQ`ewIHIZdy(J z*G$572Qh0GI>`Q=9?Y!oko`+tD8~op-(&vg&_$DSWL*pArYik1 zcIZ%Ed_S-w9sX=So7NWArqzcmFm)f;R1KW<+pkTBczuoQtDdKv&tax?b7p@0eDV!` z2cYjn$HJgIeV=`w02k>yaj{06F7=0@-&uMm-`5hxq`0@whzf^w^sb-M5+)>!`O@!Y zI4q|&AYAh?iJp<2B4Ct%*S(EC$3i`NcZ-a`J|U9U{!Xs-D&u{=>#?S$l_grqH;c}Z zGVeUDNqY5N=}~tgBB2lYaZSC?G$t!oB&??RW8+{GwISSyfF3jsVQQT8O~7wgJVvBn z9E5P2aU_)ddG_$}yW+m?%W?`gxwU*_Ssf|w{?l8-4Q_jmgts)VVrOeuN8gm(Bf*WH zLsz$f;q-2Z-@b;|TW=cwxeYu%H*a@3UMDuu_>$~4GM43Hgna98Z3`3V8_wzoY0KEu zR_5QG76I!h{|(}y@TxJN7GS+7>>seeetJCY#?$-Fdba2I%BT^P8!2K3}T8l5? z@R|BWJxh?jG&oF+x+2_w1L(1J+>CkXG!BS z@XNieu5x|@TSIed`)%0(4(I5&ceHaHc;_yurYGPSB^m>Dp%v_BLXQ=;pO7ukI^LH%{ zxoliMS}lFE&CasM2Zkn;sPn=xT5&sVhR^?;zc!yuYYRJ+)+yTR11q;Cjkr?kTUetW z)9i&7N9S)GwZjb$fX^vUcmIy#0`=4m`wr&9_7FNJaZYt=6R=04Uoh>**#M|Y&t?5^ z{R3;*vvjWGHx=jBen(!-41kUwp6X9tguys^zD&Tj7HeF#xEJrgTrc%j7-Ud+KM#UF z#YU@RIB&zimmb>TkKgiQOb@fr+xA>yrswTp?qBEHh07Q2dVF+%wY2sOK8sPlYfz@~ z=kVc&?QumbDiu5G(5-AYFelf{w=BAnO2yOvPL?~Q(mchTaIDA3`+c7kb_e~`(k2&{ zxvInt<4^hZbq5Wb2m_i{QCT6M3)mg~0a86$4gB+36*X?txgwAIxI;$SxXG!D+*DSF z#rCTTxPv)&^gFx&@KEIXTa9-5g&i_0d3Q~l(U^ms`&lI2Y(vr1M2h1iN&K! ztIVc(!jIx}v5rkI{c9FiwzAi3B3%#>rS{GIPTwb>-m~kY)Z)=LGhjn9r%QFY-&x3cL<~lj57x{3# zlnp<8H*d6YUg_`Voi@x%ZJf8-IIs1K>s5NUobIBWsgH0jH%N7v{;qdCoRzy$dc=NsUQ){;3eQ1-ek5`1WNKUte;Df_X-jkmGBe)ZtTOvi0{_zC;bX?MlsSmUs#T?|@z3#V~mRDV*G199;L0gS; zs&~w*ZV)dr#{Ual|2D@A&rUonAV)eePZ7z5QjKaClhSXlL=`RX^} zd++tv*6F^EvT&G>047D6Bv&d9>Tt8L+Q8|q%~>qduG`3S)XmFupwAfYUA`Ke-=(&{cF?XpCEw<6y!o*YGloe5sk{G)uyH9{xJ9lE* zzo*epodUA`l~Hd?!!>wxTKnDpY7&*(>S)NA|E%^)> z{noZoEuc2q%4qbX{ZD+Ke5hA_9dHx?Kwp@?%UtrN420ehyL~lAI}kcz2baS zzpX_E9skE4jCwO2yXB=-GR?VAG32{zvhkX2yrvtk`Nn&K@t$G4r+jzMG2D}k_pI;k zX@+~A@t)YLuxEJs22Zt{`UWoXfy*?GKHNv~+H$}7;I-n9EXT1v>NxOG2>2#gR z@1rv5n!eK)5~&W2@==@VJ6{`raKLd?4gGu+_Z6%e0I6|-_I;oDs4)5t?-2;Lt&0U4@vr`T5p?-PDa_tKNOaJ8=gYaDCwez$PNTm9MY^yKEXvyY2C?B-{Q0Bf!fu^dBPZauFFvmZ0Xz5(`jtzI!}2{w5tb3J!auoJWmoS z@2)fl{ulffH8ZZ%?@N6kp2h<|XabvCU#WZdrw|~GI~x6L*9p3A(77{oUYGrEpErhp z6ZLAX80HH+_r>GJu=K;snhC>w!ISz7JZlU?GfUJAOYsGx-jmwLaz#OHT3=*{FT~UO zjvsJ6Rcil3eZiZ!4oCr9UzDDSM`L^j{oO$=vTy#<#M-Yl047l%^dFl5&!18a+bfCt zHh~j#f7loRMtuOaQ`$vA_Ec=|d=V_`G8T%0Oj=KAcQ9lZ%{p+pI$r-8#UI)JAsFt` zx>Ylx;GY2k9tj@Hr44a0dRuuW2KQP z-FX3UoxXKF#`IJ0TU~Gfc+$FQ_oLuEt;3S#2fMKgQE2BVD7g1sjm~L)FzG~IyAUtb z+v7S@&wYWf$J-i%E3NZ%-dFO6-~N`;d%Yc&7p=cI8NcmWkFD)kx9C?T=Y=nR*hdO~ zBN~I;2Ld34=6tx&7^XWtKk_fOtCpa9$ZT8>mal<9kWA|`?rRL0^sViE4JmItew(?! za4<}w<8%k}L+!_@*#2rL-?jR`i3hz=Z>?J?bgiv~`*Xnl)|tgMA#=^$p9eTa;dAG+ zwtua_b$Od^9a$@X6eQzoZMp-lcWDTZRd4{_xw5H_-lu-!p3OmmLkOXP_@AG83pV6@t3fEI=pffj`pgVq8q4lN!n z5iJ?5Gg>#aUTAnz`u{yneaR}0^?qp$@~8bC zjPHbgI)?w@&(v2e9OK(YV;S$J8TGvGS5pjcO2eo5p?nv5&8o zyCHt%zmsOvH`@I`^`ZD*&+M}|?rxg2?;wAk@+hOrowQ%6pHc%1PwTWSxt(Uzb024n zUtf6hUC(xR(&ijqG~9{Ot4sO0oi>>Iv=VRr7_0ZZoo3W?AIo_3zt~&Vsp`43Q`Gkv z`yguvX)d9&X>Qafko-Rs#jjas(~Nqa8>S)pw>PTv(dD>_`ydk+KA-f$=cMT{jr}6d zv&Fe~)Jc<3&ts07pugdhNzW&qG;yD1Vow?jeah71-kNkaf}N2Ee>}A*Pk^R^?9iMDiQtHM$DfZblKGKZ|w6+e4$viB~vb&CRd(6 zatbkz?Y?3%>Zej0Ms4(;Pbk)@!A(H|s4W20}HxKDU3p_@cSb>)M7H~xz9E2@97z#TSjJ^a?(0q$a-k<6t@N0rmy~0 z)4KKF@gKJ8>+bKeZz8S%4});sqrbCFqv&}6S*Q1^``L2*+B63>o94#KHW~HY_p3Rk zzcR<@rO}^ETd9xwkI4U~ooZ6%lW74RPxQl^+S_YY`(!feU(q}`Sib5>X|KE?pG;M$ z9{}#JP2VHeh8#t44k5kPgB8J(KbegBSJXE?2K{~3xa0}>Z1Sc)Ab7qgJ8ZY|`fS=t zb1osndK^pI{R@%_BjXVX=hXD126pP}RV*>t~g&k^3_ABf+=KVv^N zgT5uTd!k>X)4SIfQ%USf?u4wC-T1L)_b;ZyR36BXN_{H7n2h?1)F-9|>fhl|?)_g( zjm8`qUXpmt)bzUPUrf)&7Tv!DS%27}#PNT>n2dTJJBo8Exujx)#dS^QEu!=%U2JyY zpeE06$<4untvBJfG#P99AAQLoI7b#=AF&hQm>jT<`dlG{Tg}8)({jMoO+So0`5!jw zdF&_FZ!?cY&IyI-d=A0`iI|LFJAm;$V?iIiUPTryUr zC6;G4uERJc7qp{!h0qVQTzhyfFzP$*ZB+MHET2UEY|N4u=;!uh+}|yDy?mYvR#Tr< ztdEJb?p!`Q9Ha5S`k)Ry_7)As_?=C^59ym5>eBVR2I^b=y3aEB8>`k!$}gUh8;ttN zID~mYD29KP+soCGTgG4@>o=$~S8nS%eaLYb--p~_)K@CeHuN0suXzYIw0q`}c6(%( zZrIjRH4mJkF?z^)kJ7kObsjM48&MyAN{^3MUYO}L8%MH!Qwv}GQWF`9Ki8r^y}=f0=$0?)_Q>+-Qty0>;O4!4!c!8Q5-ye)Bduzl(rTznJHtPPuj`5T z2l(xW%nt_~q%S@43!3|8vjZIVJ}_b@hIg6X)OFTQjE~xz|D)%=)^(ag23_}x7ln%l zac&ZQrTmdy!X{nXJ*PMr_53`9_1mXsc(9kVBVKQDY!{w?J{z%Lds#;r(@ph1>_F1_ zD&N?qua}0C1>yD2A>XYthkH8$&7I|rY$??FQT|^YA!KU@)jAfBAIRm~>-q7>J4;}Y7KQ0MI{c0M^Z^H6jJouNy0#2|upn-ml%8%x<%J+>^ z_bi)Ny`&Qu^?W~Si{(%2lt(+s_q_k`)rNt^~i^_`GGN@4XM5W&e>k;OqI}m(DV#1KFm( zPh-9DeZf)?*WqqZ3LewkNBF*{=zNPR1udx^2RZfbm3wWYOMy{;mB#GV$Nj5`^?O?> znX3Zd7c9_aWwY(2V5#P=iEtaV>=v^4M-)dsk~EyJ2*B44EEN%T8V``-1^VALyudM{3vk+B-cnme8UWx%NCIZN^UPvq-sS;)U0 z`zzr2=-?%gP+5|Iz!=SRTCJnyDY$k7ePoubpx z8g&lB59ZuZJP5bL&_<%!+Gp@jHGX}(!mW8xb3eeX-r;X%i%%>wv{=XN(ytA(QLnCD z;(as^>?rz>3(PL*y6B&t7k{CC!UnxwCPVaoOT|CO%3ICM+FL7Ib}<|En*E!Wl>bEY z64AGL#B6@5L_hWxvrGL$;y38^2D{o)c=Md??6*ug$eg(GAhQZm;_vkjvr(@GZW90X zrv5#u|@F_FyQ=~E`PQ5ek9I+XpEH>twa>6-@rmWtnHy_F4mNSnoa z^R8bc{+(v##0GsOzCkmXUAB#2HtH>JL~m|2n*Dn0Dv2*>1GD+bB8l&+@OS#0tJ!Z^ zqFAp5uaWS_aqJgnUw>@%f6fp#=vC)_aSC;b+t&rL-m^cm%dkz%nol2QOZCIdM!g!; zTf)yeCgF36KC#3p;g!84{KN|q-k?tm=p*&xk4nLmzIJE;v*k;5W_{LRsZTxIF&p)I ziQDXVNl#?IWz|j5XSSC3?mw3JYP6U540_%3UkU%P1^c0Bj(COHurwRToS6EC{fT2H zNO*(Z@_D+1pWjjZJ7!CIgQqit|1@T^b)DAl^a+b4{wudwugd==@#S-_378L6aGAun z)}C26==Ik*<0-!`4^t$(mxFNpX=eD{p4sv?oWon*6clcDnf=D_@FGz1v!lDDKhjUq zk8IBjTY=de)Jxc)ha7*4zS%7H>u2Qrw+vp&Y~Ce)mr9!?yg{!mJz))hnf=NnY|hLV zLgk@NIwbzW#f1%e{k4>*OMEo;wZQscXF2N=3$$d0BT}B`=&s^7>Pt-K@=46Khy7~j zbY@%^R>H6QL;NE$B)mbdRXi^G{CR6r`iT$EGh4b;X0~)W&1}Bjj@hW!mimbP?Vs$| zHwH5&7GKW{S$@ne50^@KgC0Vvi2lzt>{pB2L|<*Egr8rD+1!4&uu(s9YJ$RiTQ=Wf zzwMrGIbWp??GId<#(wiBUuI>{s{_{h_x=!v*UPv{ey&<1y-FpSEhlS9c!OU1D?;*D zFgN=z-|rydb76lBOkZ0oZ0|6#3-uz4Jg%|o5OVm4oYU5Y7YwYa8Sj~7j?%l-l~PEk3m3^c$R$`+cjwna%ruV}|7mm@N~l2(SCn zlGBgg$*j9(GFyry33t4~oH(KvGnAe${x$FU{;$2A&g_!Up6^HMPFZI2;wj8h2v$zwQp?>Brs%{m{RpfFsR|&4?5h#?u*FA`-9p!l*`8wnn%(< zJzUCbvaFMy7%+-Cv4a!)je6~VM?U{tX6}~jo#hv1i~DtERYlkweN3)@2K^v>o96z8 zn74>rkK!ngT8ST@m~dmi;!Faup3$Wo6vsctq0x`tzYCxhL8JdWp_M_afJX1>KcH1Z ztB&T6Rs*dT8oh7Fpy_BSX!x_=l(!AgA5OpLq^<`fofe*x`MdaE z=a>9+S<2y8Y6XPfx#y*D?$rwm54y;HdxO67$R1Lr$NzF5{ljyEgo8IRk9+%zaN=5K zqdtA<8u4%YQk23+oY^3}=PL8K3TuTo9bo=WztAjxRihY%A6I^n@CR4nw+n^;DaRaP z&_|3|Dt_q1e*2s=g_o^kPG7b_c=J99Z_r0PJ{3>t>Y4c*Y3p9;nDE=Y%;`f;Tm9pi zjrx^Ci?!zPp?SoAuz>LFE6nyIN(dL&%52n+%l$;+_d8#b;)(EjD4g+#+5Pq-;o&ct zjrw$LqU_%`lM{s>*T>5BJ1|E)o-F>2$;{vBPv#PT#azxD-Z6(&@5*d%w))?=i{GH{ zoPA&N_x6bRpX2x>%!j`CjPO0Hzw2#gqu&1RyExW6uP(*u+h+=QH?!V$4bYGO#eSpS z{)vZ_uhVh%uS^aXzW$lRuY3_9Tz0IK-(Z}JhVr+1HJ68dyZ*wS@1^{m#|WQ#$>Htm zToqo{qd0}Lzk7{Yn>&!tuk__b<@{=X<&BB_oj!^GcIyws4-Z>}KSyV>zYfloga6lN z@8t78{gRc}yXB|U(!F16aR|ex%?wqEfFr$ zS*|yCI|$#C>sPvmwZ6KQ7ymWu`BiKKU%%3~-sJkJ{oBl*%kO2n@HP`4&j@ch-!(^X zJMu^9e~W*|ryR_7ONEQQcxytxd!??zzs`D1oE}&`2OPlhgCmdYfhm2p1CgmC$E_8f3@=XadN*|3$njw zBJz)w<^x%Z|}WNxbX}*p8r_+UbviJ?=OhIhe?iKL+kaqSup#@ zwKyvNC-Wry>)XODf=W>O5vI$+zs{2DZ;Q9W-9ov(rJwLgAf4{zT$Z-R`tDxITE83C zE64t@Fwy_qnf3N*R_^G=expA9Vhz!U-(vl^AP}Z!7D_KLR5;y%*{C0kBX9@jMxp!g z;D~WSbpf)L4Zm?YM8(s*Z{6^r4u{m7%XC$FP}3NVes*Z~Xa&*gqlKfLpuWI^aT$s7 zI3Z_|5S2{xmiED}Gqf&ZevFUiOCN@xs=~}4^1iRBjC!6Aac?NN@7!H8etn?Iriu5r zA*U5QJT5&@9i;in_k@DyngQF(jR{mneL8=|M5g?i%^p{`Tk_5k#k zaw?qi)Z=UUWUR^2sD1}8b(hw>Y#as-G=}Dlx8gq7lj=il9&6&GJZb#OFZJ+S{8%-` z;;VSRzk&52^EHl1F5{jbU^9lTTXMmM77I7{f4;q^i3D}1kCkVsi6Hk53e!)-26o2CoHJ#?O90bs9 z_|3FIjv?x9?ZTtq_r`~MYkk>Fb4nBi?DM{V=gW&BD(=dF1~2fP?Cy!FTHu?SGDayU zfIF2>|6pa`<3_MYlnJg+uJPWhE9Uopg<;dM`nm8pl|Xfu`XXTT)MdKv>;Scy#)ppr zFwQ#fevrJut22X?m}5Ac-7!uhEZ^zCps&ehRsYVAw)eN|cND}ePc zp2jF|@lh*jzLSO~2%s@lc0P*d_3Bm^(*5@wjLV7say7Dhbgc^xDTQ~Y)UU2==SSR= z1A~{m^-?Lcmg_v+AIHxHd*<;}Lut&`8LaO#_N|1M;`tiu;h3*ZgU4@l^-?@1HLlIw znZ}`{daGpb7N57cgn|Q&H9ApUP0!bKn9H(I(B7>JZC?rVJ8jjdW>YYHw+A0r{qCcp zXx=Lvn~|6UC&t43@;aMuL%}k%^3G0&yi{_IceO`E;5wJ5X&slU>biC#XFeYs^A^~4 z(9vjjRh!17$6$O{j&D8Z_JeAHhj|aK4{M*WY}XUl9~7^niDTY4|L5Ig+^};!aNzUl z2UV8lquEstV*VL8Ah{ogr}VPw!Sr)jx7?uy54VEv9*-`4JMF?VHm?B<>&nBQ@+nPWSm|2zFSy*{lk zFp$dA!Cmp#&;%@>n@zSo^>J4zG;e7ljy);zgU|eu7{5V}m#iV!nY;vD`vPd(LQ8x< zwU)iDY~AVm@99b^7iub?(a(16Gp1*Z-#wnX8|%OE{4kzB#`DW|jGXJ{ zRs(~oDP9NnHyqk~4!ep6dMKXn^$;Gzwh2x1ZFg7KX{^d(oXf}c*~ZLBuY}g8T8y; z4lmvqBx}fZ?Et%dhK5Cz^H(KM^CKGlZ1siil@4g+b4DwR=7Q#m=8oox=7Sc1RtwF9 zX1ll2eK#8IN17Y!W%Ac_r@h#d7WyI?faVL@5Tshuct`Z7@L0YeWz@H(xsFkvLF0qg z2B{U)KlR^a=tbA~i9zZN&3%aB%kuahjL)FIl;;_gyP6DPG=}ZhATQ~U5@aIrZ`G;e9RVzMxRkZ*$KeYarlwnw32bVh32l?nvJ|l7?Z{YF` zP)7Z~vB)g=SAGxEsu9UBkmmL-8K8{u^V)J_@p!4aA^kl9l!?|B`Yjoxkt`^OFrqhV%SP9iR?$ef^~>`fY5~ zm)jVvHNgCxaPC*{VF^s{F3zcr<>yJ|Uo=3~q`7mEUFn%r3s6S=6dKRcKN+G5jUVE9 z)L#|vUqL_L7n!a%W`Fgd4R$>H73&i{vmW$UM*T2a+Y$9Xjq>eVJ;z^tq`AENAk(~U zQ~lMn?0u>)a@J#9Lwb_GGU{#D+3Pe`x0S!tcf9V}uCsX^W7C>{`B~0~YW~Wo$4llg zG@c)Y*D_=KI`H0CE8YI#%tSJuQLjAro ztKvAQ<+Gn%C6h5F?mcL(@@i=^kt1*z_{d~*-Md0I+#ZSvYX7wE&{ zdY(Uq^JMhKfi;zLK-|q=>?i;10a{gD;SJuKbegZQ9&)}ezlW}PZdys>U7az#Z5w** z-u2w%%=6<_1)YYSrDmFZa>b{5{f&Me*K^18!U^a1>0Ir*c|>%s1>fcwG0sC`oTtQC zUoiLxG_H5uDyE0#OKGNFl*4rj&bq_nc^yaoIJ!0og9&}elcBBkMm27dpO?*C2e zX@756*%8rdEsePk@dKkC(;SJmlAg(62F+Dr2ghjr-$OVD+N2W4_WxN5a`Lf=GPsP^CZbKxfm4_@pRW0!~tjiTQ$KSDi`y#Cc zGd&W_bp1;$3F~OwFs{*{FHSsMqM$QeoHHf#Sz_eZv2dtR*@Cxo)pA9@wJh4veR3l5%twEYdzT+RZ|*Vr z>Ux~3xOwtli!vQx@`bqlSJ$H7P##!BqtKR9xsIVe_5I>u-dfAHihe<`j-HWl&ai@w zukCy>DG2DASrmA;VxZU8U@MvDF4y(dVYhxrEAN_TI#c33-du^CPv3YKbp&=RgkRz_aD72pPOH= z%t}K)j|G|<4`H;^CR~_80PP_Zsz2 z=Xtqot``K0-<@}?Gz$63-u0l_YB||G&IHOvH7=b9p4eKz@etgvd9zo?PW`X|=*- z1^)aY31WDRfhQRCUudp7-z0cQ$9bg}{7iGS`6R*RgsZ*cHhDq&UU^2C{F1=ZGqKN- z4PIc>^E04-5>6oN*Y9bD7Yw5HT2cRk`q=0827YG=NCGp>MTK)E8TBJ+F18v;ka}_D z*v?_zP=v-91SWyb*L-jIM9&Z*n15=o8|Mv1{XgYX^lXg(E?sY8eZWLxYLQ!3ZT5i?w0;Hp=h0Y$ zzkR@{U$eiXJ`s=Q30glh(ifsjeW`l~^E;H*Lulsv^;plNIqsVHf>Ced09wx!*H5TH z&uADvjE>hdU*PqekzIqP+TEJ-Exb){E06Qk9$5gdx3s6HZ6LweV2_m@8y^;>Y6g<$t2$fRe5oB=R| z=Hju64*srX@XfxK@ zM+0v4k*9=z$|-E!&(zGK!Unze=$z#5%R7nx_`e)JrbJ26-@PR9^>$_cPLJbRD1C=Z z1w_B2j)Zr07are9_9w8Euu*?&F#A(Gm6P}z@0IxCD+pJ*&79G-s)RS_^_6S+xTTJF z5`UdWqIW7I;eDcn(>=v+)LV~xYG>>Ioq8him2(q)f;Wecp?X2(Yta9O|KsN}ec6ii zik04(hST*Rv;jT~ug2DhY*U_Gp?bk4GG8LqCHmQX4RM;e%!me{{i0kRP5byQtTirM z*zfds`^IQET{!Xg)Rx^G)oq$fu5dkUD7#gEd`m~Qx^(Bd4@%VgpXV~0ZxgMTfj(8b z*t@tk{|Rf2%NEwC$27;<(zO(+RH{$e;%Ycu5Axz%`}8i@wW#_~xnQoD0geAB-T#}` zg14qe(~7Eg)OLz%_!#w=<~XznI(Kc)R6$eg)H}H|0#t0VD|t(m{`$;C&%g9di+;xE zG2`=^EiBIu{s7~`OW*ZArN9=qEiK!#r!hWEVmuljht)KvPEtH!&2Om3RO>Af=91jg`_Xs#Y}D7ezP0!K~2ZO4m3 zA(Nf~kdvvtF2wwM;5_rEO<>gHE>1#Ar2S1Di}QWqSf@&BPJUVQ6z5Z=x#n93z$RJ; zY@Zybe5j+JP5T*}y1U`^A&$P;{q2WqDPnt)xdEi(bW77h{J(~^#l`b%^lkuV?@~u1 ziUmMBYzI9aj&*|8BwXeXe6B_}fQ+sq>$tV^|C$$vKKoP7=zegD_N!(CFs6@L`W=nu z6tkl`F`*$`o`Qpdv+(;Vy_+m;2+sca`_V(?L?PWj=|}Z_C?AuC(5t|c!@K5sC{G&u zyr>~W^bUT#X8gDC!TkpH-98-K`wjZA7ftKdnbQzb!7ERZzpdYECw)ubmLFUCMm=U> zGFmzv18$!hnt++toyB=&FW`JFo=y3Dazq{djQ9PFk2Q0@4-EJk_65&z z>JN*lZ6$pwm|?)ZDseRE$SU6*BRnn1J{kBHr}F5 zA(6gIHwu7L^nK5vDH!#N=KRO!sNtU&OH(lF zv*@|GLjbtZJEe0|$f7pTga8Pl^#v<7{aO}6>kdI;sfGAMyp; zkDLFUw@r>Z>c4FoMFu8nw?2Q<+QQnjwy-Pt`y$3=&|{jZXk3@?(>yT&ao{lEa#+WX z?uzIA4vB-#w0=dX`*(Gj*OEtn#Dcs7I(2YYTj~9-=J(+Uo@SjW|%7 z4<75B?5=F-UtTsMvT9p*#q&^Mer)L*^_cG|npTD9k;K382G>#1M$=o;)(#nn3N&o0 z=ms&;dc&dfu^<|~&r|&_jaDA55}H3+4YXQlCN%2v(HN~6S}a;?v;?%D&^n-XLF9fd{@C6q3ma8uD{qRl~DfVKo}1=>2aO=w%u_Mq)UJA`%=Ed%X5+GVus zXcn})Xb;hzqP;I2Ud#QXl2n{&|J~n(Y(-n(E`zGqSZkQMQeZ-iPi+I8Con_ zYqSKkpU^s>bwTTn)*G!q+Dx=LXbaF5qb)~UjkXSL6WUg^ooIW}_M;s^JC1f5?IPM$ zv>Rx*(e9%?Mtg?#3hgc02ei*(X zT@YTiOZ=s7FzfC9V%8qK6n}Um>$L~DrM```7ak(@Nnh#840|1=zLm|}NNJefk($CS z`!yz3pS+l%erzPMw)URHuRAfT#j7O!$CV<;pEY+hvv%m0hQu&zjIhHi$#2y^nT`7o zZD%EsUT+dF`ihH~VSO9niF3rip|ylJ>X#lC{km4{*8}b|t4_@%e4qWo$76*JdNpV+ z`?bw2CH%q-65ssh!k)_{{FE4Gqdx5ATRz_KUhVtOwSK+Z| zL?6RNEa6ka*bnP}6?Tade%OZ@LagO$h)*@_C;s^j zS+8~umhd&}NqVovum2(ZoxXjlgfA5$>CGR*tk0?|Tx+?+cd@p_Z_ulz>qS4jCi}JK zizU8(k{>#@{E2tOZ_w)nEI(1Y+S*&<&wj|P54b73=r%JP z`G?u4*B(qvz^0S%0(+}*%em!8O_%E~){=*vHqzez2EAbii z_E*|d`g-MM;;)ih^kJ#O(fipCrLE~3^-X0=URKf!*6SOi5;(qGRxaZseD)8C&!C5O z0|F#{(QDJoGpn9kCB89p0@;6c9B_Wj2Oa<3Gl;K6RMrM;GPvv(Ar~_#0mA;rnDq}m zn2qrt{3(Rb7p;6p_8;u(Y2_s5l;gFS)yU@JH|X`^@|=M8d-f~0nS9JRC|Ji8v!^{X~mDyeN`wEM{OgD)y?wRaw&oRPYG3?izQiRJ(ebV-{VK&CE z*$?9XgMVAz>SZLHcCm%%pS9<0*7-8O#djTIBzqkvV0cUsdrN^jg>FtcOQGh#%J&<8WoGN%+@6t(j*@c%xnq zyukT`99F&OEoME{L*n~$t>iyi;y3DpXI$Xpp%-W*di{YMpUrN}S#$qph8IPFcU_Z{>!3^mICBBd1 zhwj2geZ-aW%|N5~qoS;ji1%bx8*&Q|31HTy=NC5W3-lF#`JCb(+*`tz{mA)_2pcB; z;~#_#dYm&%>T_Ba`yu}{X6@8FVapQnC);!UFmsYsFZ%;K#tBz>&wA~djQQ4<6=1!d z&{F*V5}#V^!L06D`Cmt7%~fN)mJ%aev#^BU@I~rpc_|N=nJM+tMbcMpZW3QyF$o{~ zg#CJPCkfwUh4?#L>$BZbIX^$<75|vh%&N0>|6k=|hAUEj+Fvfr+UOjthx|oE-(Jc? zYqrH2zqNeUowIT_mxr-@HT!4RO?bSuUWcW;95$UIR&M)+Qx4x|zso^kgB~V@N&T98 zfc-BH4QAFyNIQo1E?Vm2((UXw>b2ifDkw}3vet>eMM)Q8?c`cXZ~gu9#L)jQVS`@V zUOpH3HJ1(I@3-Si8lp077S7Q=k3!ZaTKDJDk^;og)*8Pdyt?)am#Z4ThQsTBPLt!Z zQb~ME-R1nfCi`pDYiXS{p!n0R@$ZfO%;nc+m4v^!D+l@2&XvLjy}Hv`$}4F#`&Ikt zQl3|rGV5nz#P7UB{06;NVO3s=AA7=C<9}dB49(5L?HA-HRyh|48}xcz4`)UCqRYg; zV`M2}ZJ_0t0*q{fe;Hs3qwt6)C!PHy$xce9XTARC0cJQdj9K+sCjO&C zgbjM_!4|3iLH*dTy_M(stRzWKtN6}(e#rhA^)ZXe)u;UF9e&~PF}`_)(>e=V{aHy} z{)cUPZBI##UmMbk{rU*&b6CHQ%y3!mE6}zZvr(@)<>vNey+jWQf7?U&Rzu-Gukrq? zPYKLM{lUy3KU4m-wUO*Uc&-<-o)4JA!iO@$;rik?=)<~q6#eUZ>`xifPV}Wig>QFZ z){e`4-Kf_KNWD_gL9EY8`i-}6GlsXitFvrV<$9u~*a=^KD7vDhWSb@TD{ZMYTp8>0 zM=^PB(5B1trq(Bl%R6gs89A<<7D@Q1Ld^OtDQ~Th+~4$ta$SdE@_$uco)fe-k0c)h zYDl=Fa=+5U1~O|Yne5kmJ4iaYbIJb?j}m=5NiXaCKjI(wmi5ZopBbtyWY#|fSi}FXUJu;C^&f^weS)?b!a1zrA8uw2o)IYFjrvDz z<$BS?N7CyzL9Qo5tBF3KtDH~KRmE@6YeOvZed5w<-YTz5-s)wZ@-{2!ko>>%L*5$o zF^?{U)ANoP2G%B?xfs_Zppqu#;PT;g|1m-JTGm-Htb5q=db{&m9N z>D_yX{*+bk{4=xu`Y?xgnASzYPZZ8D=p9;nM$-OkeXa5Dz&_`QdfGt=pUZ_=wY0KP zzo
ytOIKIWRA@Q3XZKDr9CHe|D~K_B6r63gL>DfUNPZ^f*Rt`|NvK>TAjGTR@T zD4c(n_zmIhCoE)t@C>UydKxo~{!8>%)>-|+M!jx6%{p~W(uei8g;#CkEp+>bxB9}h zvNhR_4plZ-%fbpQr4#&6+c)c ze@1;olUOcKsI`*y5j!J=k4SnEKFyi+s72y8=i8nkyW*6E(Q&hr#r=J=i z{v9Jl|EsU?Rb9BL)qiP(uu-2-OVVp=)qln{(y2Tj4;Ov=I^usnOxUPzCh2Jv2Z+C! zm!u!nU-*_u!XN5u_5ZIvVwYztN)Mc^{-C154||DzeFbJM&o9hIJ*@A_|JOf?Uqw$~ z&YCOv(GCx0)*k%CTciGAGi&+wknr}Wig13lHXT@R-#vy|)k$RjPCwza=$|(if1Mk` zC*zsj$8BWRKSW4;2EBfETN6d)-9Co>Q0$@b^@hT=4l+ZIaOUsy{ie$Eb51|;k6k64 z5F+98A7R$qYYH3mYS)q|NFq#s#={ZRBL;ou^oUmGdxV2#(H*E_m# z{MsqWpB~VLIqQ5m2|v1y_z&k~)`#a1-cm@|5FWZel=O;xWSt(C!VLKfNPHJFne`E6 znPGW(K7QIs$-j!rFUPZ6ZuYC|LE;b1D|$oxn*C8p@8Vs_*M;ANn>}JyPcK{3dnkT` zUX5%g&wo{(vmY{klIO$U&r5haX&=(Mo)b3c^<1C0|AqF;G4V&2=JZsiJh$qzu1f!m zCssDv7j`+pdTr?j+5gu& znc-oSlyCp{%=+q|nN_1oT)ukQH1XGxy zlJi^daY_6Glf-{HMDq1uknpzWQrm!CTYtIHSYfI(+qsMNM`0U29UnR=> zu$FfL^NSizlKyy!Pixj&_VbK{hxA~pU+$;yq6st1m;00&cv1NM?~i51W z{-QhN{|l_=?Va7sS~FkX>ehZuTBAy&2JH)e~zMJ3_abHtzEC;EO@#s6D%@t0jM>@V#La73@71{;65dbR)AiKx%xa)~KTzen3*V7; z2|eMk#8<@H90$nlKY!3@hkGixbI z^kI`k-#%Sx+3wUNo`ko;j-Pm|@386@O1!G` zS}Es+o}zz!R_+If=1RJ?l9^TbD)9$Qkouglhgsp8+mwH(v|scOuS=QJPmunH z!+vGfrtjtcrfOWA_>VNJP2u$&Ma8c<2se@ZYYzhJl3%TU%=y)!E z<|ZkpGo7okK5{5C4E@oE+18(-=@Vu(a5($5Ek&8(Qcq@8?!Nd3^_KVw#=hEP(d#|>HjL*PvU!ax+<~umwdm6^q3!r)zcCZ|GDLE%=dJmq_o(pC`9y#Es_;FhoUDH?_XjOfX~cTjo8s?x^#y$|(YDC`v{Qbn*PO7aP z|F8J(i9R_0IrdkR^3ZN}%L5c&z*hF_(I-kXkCpFJ`udRa#Bg{5`@!k03-d(lbt~Mv z5;1;96aT`5ip1K%6yZN6xDngFFT!&tcVhKKzBfXVAs)oq=P9B;-NKVtuVziJXP4^4 zS?9%{*eahNv6ffbdlRdfY7pyLi#fbYTSpV~*xt-BBl-gKj%LE|whPy`F6aFM`z%q}g2t2;0&wvKG`x_C~ zg}+~F@Pgxu7(0g9R-O?Xnu~wBeD8|*qbIYf-HOXQVqXnrh?M($#DQ4hoAP}w!nKKT zqXD9K$Ya$H6mIhWCc34e+|ERtCdoFP0fnpr(w zExhMG#}CcqdJwVpJ+o$d#Qt$}E=c-??D+Z{@vN4l=P%ET5pB0g{A1<&Q^e8h!mn>j z{I^a^dQUG4M_Pmn?h&q2qdD*2rn$l$e_+<^e-~a`Q~aGq3ZLuwBkA>_;lh2#GDEj= z!mCHOAitJJ-d7?V+A*t^7CCdd$n9kYGvYT>{u;y-hRS<5T+(|%2#Sc*?2O8txQ zurTZOrM}qb%-fRu+JRLP-^9Yq&|;MEgF4J==LX?cS}W4))fNl8By)SD-oe`5d@z++ zvzPW=`|{Ert=dcbzkMNTztcT`lkmk?O8eB>LCoVuzHUSDtKqZ7uL9ZIx&u^r5Mu@njK-6Ed!`ikzt^X3WggL?^sy(sxN#0L8AD=1wV2b!d)0p+V*7oO~yO>oc$$!e$(uo|t!co?z z)NjlT{U!@vIU;;h+K=ttofLMG^FJ*3fUrf{-wx(H#jG0C;_zWxA2Mst+6uS0A^uD$ zZ#$Q>KT-O+S@zHFs4QY%*Gla7aqS)dKJ2lUHI|| z@j-9YGqT$tynxP+83nVE<)ldR zYd0aYXiww-oo0z5Kh{G_@h#$!CRh8Zw_jeg9?y8JJg*!;*V~0hLdoyj5YMNDkztfx3wr)BV@4a|-hrrRWLlE{1l?y3 z`1}^jD;IY|U2pzwD*ukj#FmA$eJwL7{cqXGQXz51AAgX)-d0L~K?brsY!h)6m~&9n zk9If3e>|DE$}!RdqsU)=3R$mTBJsPg$N?9_kfnX}zC)p2Pl|uL9$v5LUG0U;qFNyr zexUn7Y$LsY5#Z5}^i%Y{NkFcW-eY=yCL?D6>ACbikKVp1#P@np`g38~2fE%Z45s>%TxE)1`X}l2my>_Zb4u^?7E15(0%SgJ1L?)}h~wz} z$AC^civJAhs~Ztl+#xQ1NXLWK58@&N(sOlVIh0%PY60mV=z39ajv4AwTNBdV%Sj*C z5;WowgSuS%VU8f1@ZRq{GfX0-c{KXUX!aJs@vx)%n*Dxf1vk2sj_CciA zi$T`QsEy2f_oet(Ldf55FtMz(msWF#gN9N3k6Pk3#!CA>gRH0j26ef~9nz~&`+N4_Uk-lvx08^&S%RkHHFHJquF#oyt(i}IZ;zWHqO%j0+U7tHqe z#Q1#6>3)KxZM~4$&6zojaQ}j8QE#&{V<&v)LA&a56N%S`=XM5UrOJyGwO172&I3M&R3F!Cvy2piqC`T zeLs0Y7^PR<5p~`<7+KmFO7Y+MP&R5uxf3xyF;MHEFzcQor zo#=g3=`R{r5YiH+Ee^K6ki%m<&oy-qAnjglYqxZ<8}CZASZ7| zmfUHak*(_88};csQJ1Q(>m^tYq~9Mle>gE$#%l+3LuQ}XqCa8tww}Td_fL@7bz0+0 z+DGHP%tJ<1L zME-omF4W~-uaVuBZY17)Lt#pf`;;Qf0drBm@pKll| zIb`V#JrBv_XCSj7)ZdZK4&Zs3Jy+t-i$#{l`(yg>x-seV+90$28f5;8SWYn_-J?C~ z<^61lr-YNg_)j_y&s6R=+?$Hba{WlR?2Rn->w+xxOGf5L`jeiOiRVq;M;RZ@^&>t< z_aoTqC_2w>RPJANY)<;Zo|ImOG7jxY+avcrM#q&o_2;BHdywIITk4SSji`N0HF+F$5ztMGx6s5eM(eEMYrgt$OTYVE*UT_?l*QVq6M z^z9PG@2cGAw4`!Muj#r-9;V!nG*R+^ zxe7fez^|mx-`%b8nzvpebKhrjWIH?MLGab4fAo>F8S*p ziV(*ik9)9)-xV*NRT0EB8TF_ur+qN9lflJUw^9ddSFf7rK8BYkp#Umb-(> zt@87i%~6+k!Mwvz4t669S(=8bq8(vS3~bRU*4r00Fus?HSOFbj3QJQP{&mmII+{*uCi$shNf_)m9aKHwy> zeAx$C?dL8BQQzgQ_^(`}_)F|4{>v@Me3m`2%Fk-1l0UNYj4ZdL`{a_^e_;-_k9l(`{;_D{(mBLo`N-TwORVzC zy4A=(mF`zUqP{;Y8B>3IVUW6b!Rtj&C_ox}fYelDi={`oadepB>pbw!o z1Oo_$5KJIgL9m8k2f+z~3xtjkydn5O2!ap>0X~^}W1<**^Rw45ugnu8kmW8V$RQ6# zA+sBoNmu*RoJbGfLArNaivM*hac5^_srnzpYX6mF@@KCgy=gyUo8`zwgMK4kwwQF4 zU;nKJb-rZ|>LE92{9n2-nc}bPiaHxR5?Srn&!h2txc^Fix6b5WI+Xm=X?!2n!X{Sv zLx%N1zf>^-^^il|DE`I4>>8h zqAriSL4F&$f5UA#a#3B{zdzfmYI{N+PNe*dj#2zeN_$J&kDU2oEaiXkFzG6P<_Ved z?{J3n_mhyNZ3Ps6BE^^V6;}H_+mZjbX!4)6qVyg{DNOfwSfYl~SNSWqZ^XgQs2BC6 z_AWmRMb4R{OOJEyBd{B`@fR=N)b5TEuJG6Tc#zB{~qR{g0l9iulm4LA`j? z6=ay74B6oNBjPRXD89;XaKjRH`H}(obB!rINuScI+k)b!RU=*H4|&)Z&pmv2Z+{NO z2>IFvnJw*wte==b+_*dGYX5jTcd_~J{X~5I^4{o|U%Vhz&TTySZ{!e_zxYQ2o(q|C zLq`$6^4y5$LSXG9(xtF^sH^=;z=P-WXtADDWHhVz=$U#eJB(iz^L~zuX5GA69~+($ z_Fw1oO3md*Ej=A=7>^cf9rXTB+{&~nBiWQ##n2a zgzHg^&1{(68Rq0x`+w-dyv~=RSdN&Za$#d$xosLB*Y-&>8*|-<&+@B-u5mjYHhz3# zuJ&h%HNGH!ocs8S34c21bJ|wOf7kQ2qcV$3`216AQyM}3nsymi@SU@|U=3KjgU#jt9M}@Smg5<|f)Cb7`_zT;*myt^ojZrfr)6Yc#ldvg+Uw?OlBS zujAp{i$Bjo@tawFe1K&$};eDCDfkzcRptNp{PrRHbXP3BUm z)3Cug=X`M9e!=J6eNK)%?_&yUfz1$H+j36(x95G-{->RHRdh5==5k)s2m2EK_A%W& zH+2oLG$`q~{n)>KzQNj5ON4IFR5w`nh0ibk9k5WJLlWP&f$L@;c3^Eye@h+Un8b(3 zeLufH;Pb}m zGY2_)GPPfvN`HfWSh)M+`2F^tta6{mu9#4L0>q!v3l?&0m&_~osoF0|Qnn;j`5@W@ zZc8z&qb2qkYftND1t+l)E#4P3YiZ4!fM+ZO@%g{nL*@BVoJZS2u!Z0V!4-lB1Rn_g z5P~6uLx_XW4MHyneIX=47zAM`gpm-MZVazz0QVUy%XG-tmfdzuV&CS%0cX{oFK)D4 z?&F#CbL}>@f8*5AssF-#&Ixa?9CTga%&W9q!4?895%crTP}T;6ZR|NGQpW*U?Vkzj zvA(s3wF7-&{o6UtyxDmx?$9oYnf0rGqV9TUK3VMV)=9LsUN>j1_S^ScBh_?2YwMl3+CLo57yTPRdBwUPyB+zzVr@RXB&NZ&{2X~#ujQ#9 zYC!$;&+@8~>B!aoyJAh5+DRM*Cd`ES z#g{nn|J7eTi6ztw`PwYQ0mc?<$|I{Ku?^xJCp++CVh&@_$B1=HhAQd*4}Y9k=LGy2 zx^058Y#cbQ3j*b}!L?N#_@V1Ze0sy&%`H#DFP5JJSNm^@xyNBG%5O7fRcmYJKx<3F zeesrJ9_ma7o+jqQ2EEu87Ow+-ztx@XdXo5a{akf?Tnq7OB744Ui06%I_S|FjMS0T0 zM3%7U_l(QI_PnuJ8}C^n<5tGyvxDqu?Zp4#UtGVD90d1WwW-tQ7}?s>{sg~8q2-LG zR`z^NmzrsJKra`+V~y;&+V3jzgY@ES+&R3`)t(0*|MX+VqePY+{9uxy9r*v5;?n`j z@8Lk*uU|4In(vN+3^E9H{?wHBzADM#qPVu?77zfrM~w&TTZj%{%>OWF`?gn z|7}pZ9asC)MLT>9^(or>TU#C}&c9GU6Fw)N9C6o{f8O_vUj@BXtPl0VmaF|2S2Rl< z1nt{ItS52Sj@ydusglG>*N=cj+ibagQmJeV^^-X+zh>D`JFfO)`?(2gMcQPSuh-b| z3bA%4tbM5!(I46J)T|u|D`9_2af9akR@XFE#EsF5ZT#{a#CJOL1^t zJ6o(@{-hn>wBWPMfXk!B`snR=b)S*aREYol)3EoacDLhd|8-cxb1}^OTAbPX$ntLO z_%XAcuvR(Lk5~({Upu~D-dCXm`VG-P_HM`3{$Y1wK`WUnKrChJK8^gL;erO zexFvS9Um>$g@SpCCy4b8K3j9OzqMEk^(y$^!kUE7ta(BEK=}&%9?REeyPUpc&7awJ z^>Mid_0fF0<E{FbKiTJ&jYbBiJ1yxx>(7vWIjq20d zSz)$6*FCHP9`U;^K9#@sIk(FjF1_27S!`Ul`{U{QjBjnW@^ROuOpZF%Vb{s}OtQSc zv1DUYwkvz}uEO__rpnLepKp;`Y{t0oPp{8pN3XuKo|&`jNcGy0>a``+YfnlC+U)hUvSFoS-GsK*Y!|MhVMA+D z!rG5zxbB=SD_as;XJ|s3pLN>i>(0@zJ~}JzzPv1aRXe8N64t3Pvx0Tj;rb}MEz_?i z*0pHPa*j8g^gXQ&E0@GNXBJErYhxa6&B}CadlhtndTKlNjN1+?mLslfPBdoR5Gt-? zE9M}|6=cRryI6gTUew(y|*D`-PgZ5&ivLzgB6;WY2o9FNB#f3$d>HG1$N5 za9rJN!ep`T)zb!yiTzmBjA@IlW=2ke_8``UY|w<2iF#=a={Io*`I6X#C9EiJxjPYb zSb}rgb5mF=5Ej+^tpSroy__;-4q}bXC`f|Ume$Q z%3ebzJGq7Rjxc5kqJB*qFfALjIm)dulU}D7l{Trx_m$yOV8Z+6!K(9zsjqU35_mm07mW+LbwMLKYvk70a+*0t;9;x={HGHH@e#*s z9SHRy7(r+Zfk6m>5CTDbsx+<3;LE@X)@p?B%Cxm&Ek{6EtmO#b#bPZ-_*VO6x4MnQ z--}y!2ovA(#W1LUD4Ur*@i$B2fR@Bh_Ztb_pt!l>pKU0(c$))py~XG^c-x7%(GW~u zvnlYu*r28lu{!^h-IV^=DAH?m!1PCqav~nt23gxg(bfKj^`N~%`3q9WU-fna!CJeS z#F?3>7cW{ytoA#VpspRhnsoE&d(ZoXvOwk`6MBLn;^nQ-SMH);`v+OJF&*e9%K9imi&wpeh;%jn_5KkOS`}65^ zVk204ROpp$O{`4{Y#`Pqg>PA`T?*eCv39B8e%82s2H|NjqYetIjlKYpF`Ew8Y>7Dv%ItUR_0?Vko7w7qNh;QrTK zZbm#EKH|{Ws#B5yT})6h9)CcrG!Fa}qDR zPkO*_#A7E?d+afSIClqf@uKYrR{Z?s8 z+(bEkUPe)U^zkJBnK#6dlZe&rtKBWi@BB8>-)=+J-rh<4cMNV%vE~r5+W+nxZLjAM z((7HJ`l@7~Ow_fR2T51^y=G(kDz0~!^l{n920QYJJN1I|F|-$PZi8=GE6T4J?-rxhOKTD{*% z*I8pI{00{n5$hdA)*jwRtoCoJgYk>qo{~N;8~w$7&k`FBqx5oL6RZ8@xs<<4Dd|<4 zPSsSnXflnCd6~9qB0}$*)sEd~+@7>uPis+a;_0QyWlz>m||~Dd!+P zJ>plpXnPjbBUby@U84FPRg3iaJ;<8u2E^U#Q~jS*SnapENA*)~PX6@s$eNb0W|%1N zo7c$N6$-2UYe$j)Oe6AdJwttf$IXb%Payt^W_Sbty_1k+M z={4MlcTFcQw?ZyHx`y1s1?g&k%rdI~3T6FR-)eNc95$o)qdllSzqce-`w#ad{h&GNQEtcv<#xm$ zpOe0(J+a#VwF>>lS~`)wvJoCH8tzKGI*#<`N_@3H{vGx~wFBbFuUU>nMBKCs6S;^rqvH&$5fPjNIC^<3O)#|j zN+wqO+juoY|F>zR^8@IIK0mSd7Ss(aXA`UZ{EoSWh;LxIi1ao4(67-h-yNvKGo`dW@Ai`awH@l(BS(l;epcl>=~won&U!AO{HG}X z$nKV6JBmY!NLTs!-BnmVg9FbFYnY(1>sLup4sG zNMc^t5INJQ4eG!6|K3ab^0uV6T0`;etth>A-AVT*ovHln^K{fh*0_;BITx<^p&x4K zLLBQt`Xx_dwLjktb^T&5)cJ&Q7{92lE3w;rWc`aBDZa|jJ_oiI`DNM$klw(6__065 z-!ca|BrurtU;LUjB7RXQ`Pr(E6#tQuo^Ef{L*gPR{*blE`ZXy%{&FF5=72yY|Nl;p z&$goYGbuiAmO=6FCo1Xbw8iopbwmEefASrcr>NOL(g$WB>rWX#{NW6h*JvoQ+JD@L z(jPF9^hqU{f02X2C9O~|+9spUaxWn3&m-o)#NS8TJ0gwzLoLW(Hjg+}L+S0Eq2%}9 ze*SnF>LG{7&(nrc{Ob$IfA<)2=JAEZDnGONO#W5#DZi{c#8VZnA7zd8sl@-q|25qP z_jh^`r61If_+vitsITo%uYQu!SNWMX4gDeSuaW=tFpB?L;fj3Z%&NC3zS=))fSpKR zfAwe5tK7B4{4BmwdZSjNUi9P(vC7ZWF4$xI+#eKwSOxiAE688Cm-PEml<@uHZ@3fd zD}=wN_-7)qzKR^GlK*M|@xR|lSNVBJDe11&(GPRVP<KfmmPqF$()WrSzlpD1DWm9Xdw!HKz{g<+;Qm2E^J2#I5TStNh$&Anl*dN`9Kg zwEqS*B>$$N$RUBmzxd7n;iMuNCt*QO!UsvL*{Crv+++QIFjWIv&n@roE zr_|SkF0?EGJ83S89F@QHt{3W=6X;pQoB>A2MZO&|zD0u(q$ju{tNnc2bKF0fw@GK4 zw$uI^>4tvhf0FbgG&jXB{)eBjeHS&O^0Vwo#9NzV{G4;PwEbyR{?heYyF2R8pQHM&r}&??!tK+KrDuQD{?hVw znEvx>eKCH{w+2qw{ucEhekCDi+IAwIaEj`ySpafrx{V{2Ka+He@Ykqkns*~zm43qI z*4~(YwNa=iM3@oxB`yuIK-RB0nsk-l;`$>hU&s(j-?hR~q!+R&3E83tr}FNhXTvRq z>?U3@fpk^;9IGAeF+QjKa)!@G&YUog{2h-W>z7SLR{Ilju6W`0j2uDwtn0+S^el2h zL~Yu>f9cs~wLjtg;SQMpvdNU6^;YDfkSWNeD-R&+7ZI!d2^|(Xi};yoN_kh0C4M=U z;+H%@E^_*Pql8W(j zRvg6o4}mo~L^*%)r#{B|eO|7__cEpWh@kv(z86sa_oVa^RQ?G(PUTVgOaCpR_GB@W%C}#l?SZu^M0rbHcPizl{MG)_Q735Ke=FFn>RDO%;E2(~G%s@S1W;d$8ftkoT!-o+k%t8Le|D}ZTA2^rN-&REV ztyH#mZ9ioFSS5Xx-{QU*`S+5ZV}6qIv(2FVy=M_WUO@S){0YPCQP;mt+ncjOpZt-t zsXTvqAQy#bk=6c$T~}R1J?r}~C4Wg(SHYPp6o0SYsAmo!{a61<+W++zV*H%7dytDp zW>bDi*NF#bDe0^Hnc0|M(T@!jzp4kaelew&@bD&;@AGofRelSd`{XamAw7N_m4D$j zViyZH+@6F@$ZCH=&S>&m?IAtCNU4vt$fXwVsr=oxAgld3x9eg0`t1&rp7H?G3t5(r zoHN{;@_%)dSnc0@9`%r!1*Gp;PyWXzD7_iCNI!m(SmjUndc_^LFY*+{_i^ip3~OwO z?;P(?)I(rx5y5J|tl!N@{N14RTxanuPonW&t-r!+;NCj?y{xvvYX3ssQ0T>ruTlK6 zIygR~F?&P2a+#sfHSOOKcUpz%71z^^6n~c=E=RrCq6zW2p*X&zX;_`O(Y8hyKS$B4 zbws^*kv-|S&l zb*A+C>?3`FuB9ly!F6Rkrs{2EtX$H=Ug9kRjDGUAAD$eG?c8j%kRT#sB- zPnUT70pyT|)rqryCw-K{5j~01CDK2)qVm+F@g)BG0`Zu4RGwL_(Qn}Vk~rug>e>nS zs669n`?PXC>2;Q1eH5QBA&z&V?VtI8c!DwN26@D+MPJg*K9T<9Bh}~GAH-STQ7>xN zgvxvRE#|L3RT*ze8Hwf7hdIDSeee@wXuQL;75Rsj()LcXC4MoA##g?2kl!|&;?GmY zqnzK<_=pRQ=kO0TX#8WVFZrWPh!-j2KOUWEd?U?|^xHl(zA!I@@@w$`IrB&i@vv0# zukB2{Gl={byA$UhL|uREAY@*+nBpH+#&eFxk$>kT(wBFm>*@G$#KXObZKn{ozKi~l z

D)`4YKk?=;e@cOwqTAZ`>+*OysK$X`~6uFwCTL;BhVc>NkOY%%frGD_cOE%~R6 zCVfUO=_kfed6sV{?o8LanO1wqU&Dm1hdUo3{U}|Z>DN3*y3HXfPx%GX;~UtC{alp( znDiaN_JT8OUn0G>5pu|kd&GUtBNr7uB5syJ*Nd~>Q+(Y#(i?px-S`#pRSAC|_$w{u zml>ps%uc^WUH<@G50rX zMB@(@;TMoI2kO)BQ->&I{V9#ef3dqG=4Vk0xpa^-a;086Z60z6%=syN7OTf%{GxOV zWVl`+{UKf7mX7o!hBc>zKjFby96)7y=VmW zzsAtJsiphgQh(KR0O~nG$EiPdo33w4?`|jm6uQ1Ey&FOO!_TyKY3bm1q$g@HerexZ zsAqPj{1QAT6F(YA`R8`V{)xWQ|1Pa&iv5?OLBlD&@p;rUt)`Jb--L9g^pDSXLcQoO z>R;w`eNTGpc@%$ZDD{V?Qv96TkW{HuG2{4(QLQ+`ESVtKEUza?@=9F;HUN-nW@ZUo{bRGH@~ zzV$yV`tgmZ7ah%|_|wBlU%efZn|(x-8x_n_^|`A`G7=p$V(Sk#<}oSC)* zcep=`8Y$xkH;yA`cJ{&aO1r$F{cqEX(*N+8 z+GB7B%Kvi`wclyB#Fo)?eB6(v_y?BI@pgyGn{delxoEE+>07(e@fGhx@f(!U@w?oJ zbk>ND?^Nad693Rqw1=W}8V|N;e3;HZb7=h9;i0%VqV2+y}6 z2UjEW4R4V3+btq}fi8{zP0A#F(n)o+63 zNi!$?M2V5$;?O?CwOot^8#vMS$(y&C2-eIPOnP_K^W4e*?j^B_^2}iN3*vQ`F#qC`R@gq|e-_a8q$|(E ztTv+Z6f4gU@Qn_~#XWo9|Sf@E??ZgEC~zOh@v6*?{$-U131HZXuRe z)5V6k=S7O&t|RfLzbO8#c;YR!DE`?XV!Hzku{^%X#Ck`mzC$#`_1=;h`u{!G76B)$QwqtxvU^?-e9a#SwzJR#RRIJb9e`XRN?uqrIO;}94W(?Mkw&^h9 zTN1J+VlMI1{Y|mFuFHt+T#+@qHWGVuG8ek0_c7uuPtuj+U0&G;S-WRF>HP*%dQsWL zht0?zv4nW)Wo(Zc!}-Mfe6hTmcPoigi4C4OQ^oW6!Q{0D9CHZ5_h3fe#OmF+Fa z!um2OQl1ZTDaQR_V7iw4{b!+HYenZPdCNK4Usv~$t~@WJ{cwo*^K{x@gH8~SGr{y7 zl1>xb((`o&fBzrjBUPHx@e(|d+Mjt<*`U6ntNg!1-QTDR|2*wKK5bEz zSa!bSNcY~oINc+iDer$>7;MS_hR^tv~MhXb?vuFn)cpoM(l}{$j+cY9hJUT*P9K# zQ^MYNkNugK^|xrB;WI!N?;LjZW;i#&qgYluH*o4AdoSj_X^~D&^*FYq+RBa#{JdCn z!>Qh9&EwdHz_R?MM?Kk>Gf{GzPH}9?;rE53S9mgO!ydj@g5uadc(vO#(v#(d*4Aw| zERH>9@7>1-19v{!jEiGqM1Ihp%rd=s6n|Q~giw9HJH|&B{5b$08 zhQn8-9=QweU%3`e_-pHf{3~aV@oo1KnUw6exNy<^e6ip3Ac&9JoA0P6orbuT-paW0 zA>Vyn+Lz|5{gA|J2)jgk!S{u$MQ~|io0=a&fB0+@@7GG<+~8E?qkG>PVS8x^KJm#O z-`}^dHk_BPs(J8PXd{-VfCpY&B z=lri9?9lfHtTJu+*v8-Ovud$}Bc2@(i3sOX-STe^>vUPC-0bw6uaN$f30W5YdMszb z%)BG-L08Jw)3L=-FV9Y#?JW5EhgpSB{Q9m{mgEAzj5=suwHCYc*W8r$mpkzs&$|;J z-mFRQPQ$jzf5SVW`UWgSysQ1O6E9w5wrgu>LpJ3AoAz)E`2FWzKWb{o9Nty0a&b*3 zKH}Lo!_#k#Sgd&0bV{e6bvjtRf>KU$MtPi)w?(t`VQaWJ-!{WT=Iy^8}QARiG9&ZR5Em;V#X%Ui&^Uq@TBgK?|bjn}bUzGI!4cA+&J^7?t|A<*0ODsXxDxiwS!m*~e< zSe%IE#k0EE&j@S7x_27ly#v^!PqCkPF30nFp4_`2mRmG{cPD|>{=xP9Sl+Q%Zm_Pe z^3pwj=7r?6VX=chvUb3{`QxJ%YuYfi|9Sta`E_>1^0LR`oiQ6W zW!*%W(=nE33cq;HL#c~WahzPUVQT-14!e2Ri?KZWWbBt3pKRGA@$P)lf9kH9xd|)3 z70Vs|wHwfe#vaJ2OLB0Uc4!I_CXLIb-Of#zlq-xGYA$CT0&?8 z!3Kf@g!T~JAb3FtfDi&<0EFKl42Lip!gvUiA*4b`hp-vKHVC^QM47StH?5LbW!bi@ zt;4IqoCUExcIUrhVKgU$70ltF^T{*SEsU%Evo;=<8=Xj`IS0mt^2z=6rTNDanT^=y zp`m~Q3K-GGQtex|QOYE|Grrs^CU0X(|( zjnt+*k=fR=`q8FSC|CQpi}DtMU)1fOP!2OmQ3@TLtD20^TCtDTa5FFNk-FD}uS&U)h#S25mXo^m;IVpV2#YQQu@{ zxYPTO4){Il&<`tN@E+vg>`V?NEreahuE!8GRq@a4Qi`ICMR z<}Iok^Cj>OdGth&=4yX|=#RiV;o}ac>Atxa#FxGB$Zu1e$in=a$aU@ji#ZcO|50{0rQkvkSNji@ zOv;ZcO=MOPVZH}81@X7y9dUT4ALrc23F3k8ma?@Pl42hzG$OA|~}=&Y8n2Jd@r8^0S$>Sc@jf+@_-W_m9s4xjKHs zQTHlNpHE`ZarXIF<_GYkyra_Bb4l!Sj{ULk%LBOA+|X3}Q*eAcE=@VUDu9kR&|}3r z`^y6O?VH-vr-J)B^Qwyj__J6iDfVm z9p1UmfjKU)eL;D|+*|8m&X=TU|0bIQ_{njfq+U?Il*y}ohROkaz%`h4j6-?n+UmKl z4&YOhL**5)y?@AlUQ^cv@Jr&{3;xyl%Z{i29>7-omIUz{MsfMkFo%VeO=_Mk z58}P1&6Q_Ed)MT^q(oLCj|TBUHy87#kiNB; zQ)5vOe<{uvaQw9q>B&KSBy=1qYQgrv)DPG6g7_6N2M3fN+v(^)j&mAJOJbedHB8yE zH-NXOSe!pOJ&BcEi>&B2GJvc1Z&Pu8vq)xV#rf=wAHVr&bN)T^WG0LL%4__DxY}PG`au)EK>M?I z(#hBO@wRPuNx9`oOgh-@XndF-Zz1Ne_?*OwUU${W3-#k_e>n7SH{6Evo!Flp{kZ?X z*Z4Ky*yz{Zryc!x#RHfd?k>zJ6a#a1xchOnzmC{{k0F1m?iy`%8R#H}#|AA6ORc zr3SFdU&p~F)q8Cc%UL-m{~uRhen6bBH$!_)vtTo(`O+L2JK%U1=c@s}yz#{W%nOb; z3#ZX(=T`ahK}LnFtJq$V{}8Az-K##C?~>R$(O+Am)Hme+<-CRWE-ycRr13ER4C*`H zf6KAV1Srpk0X~;2p!}k~F8T4UK|LkYR>=&{uR8ub_@rCvVwTJfM#im8JLShmi#c|p zlbMC6uiJk7ocMkBN@khj_dds;=ZiUxnkQ5LdWApD31phgZi(}6qX3)(rppVM6RG-n z|J(}!+`ea1>Fs4Wzl!$0K7czW!JIR&KX#2zJ=(QL0N=j)yw4Y4&4>xU0ks48oD-LM zy@PN(ta|PFc7s2i->yJ=6X&ae{yehNJ<0h#oIjv1CYAc}w;K=mEV~5f7nt&B$SgnZ zbtso5-vNI{U%qH2v=7mrD2DSZ^jpqn`0-8Rz=ZRg^c?0S@`3hJTF)o-0eX}6)?w~E$H7)t@c`W!k14MbEd3IVpgIb zzS5VU5%s+cbkQDsd?~;G(SE**_C6ivjHTvx@w9BOu(^a_`F^UA(<##l3-*eq{;LR{B@zKhvbtGC#jtNj7cUK)D9 z9B(kUPE4>D$2s!6lKBYHK0>@W_Jh2Wd1YT+?T5X#76M-9>o-aIo*W&?zxCfaK6Qi# z*K@vK5NQq9*4IXU({yv@Vtk?UI=@omo!0d(+wR7<)okmtrBgW9YK-&p7P<4^uiNVO z?-tH&c7HzTGNL255Wi>l!hfcn?e>l(Mnv%V!3~Z!kMiI);(P&XyqCRjOW8cvoo6p9 z@-8av#7i8e1az`>=i9{ZtK6B=D|Y3TWz81-B`+6#L(HLczcZgJuJboKbG<7Nac;!V zvS(nwB!VCG?KA1?6=%M=N=C z{piHcKZ40C8@ce7nmGk0Ucq&IlRuAt2zB8_qCdJXobM8IPknIZHZ!(Xv1kw5D}IMw zxc;oW%KCw-TL;(q&&BmriukRFhHG)rXYS_29ol+(H*Xik;mN5FcK43pI8I0KiO)1q z?(H$WOlAWvcWJ+{x^RL^q)t%Mf1{O2HTD5=<$IsH&twV z6~#*@*OcGd=zfJ|758Fj4!~zo(q((H_WS=YRLVLj{WgZFqE(1 zlq(<3H0FuVlT&ksNAuzZ$L@VkYr+#nd1UZ!K3Uq=+MMI>j8`--H@-G+{HNx;ySToN zh~_z>qf~6kuid#+F{)8C*9PsZ*5@Ag>z zXu*@41o@WbMDfhX(eEs8TXMt6_IU$gO||R+lVbgB zp3Xj!CAH;cS&!vjux6Zt`=8F8jeU!QBo4QD#%uc#fvnHX2Q zX3ekF+g)&cPz=`>{m-RVJWGt*xTxzvyw8rFVP z=0W#5ZqTQOKF(dx#XM(q-Pmz)ZFx__=9;hfd|&fFdixteo;M-BI{h7c7Z*hBi)K%? zC4Oh7`LU1hhZaPCie}@!^!6VS>&L7_U$ABjyF6v-lvmGvf0qAYNP~=a+?V~>q5tE6 zUkvMA=d0K5xxUOv^c7uWnDxhE?=G>v?DE2%Km4aevuefAH;wjX^S>4F(UYQ?u9(M2 z-yFWN3-+J_A~GH z_hv!)Z|)zR4D9f_+9gYGX56b$%7|{!Y{SyYY2Ds>F)PvM{v5?N?09p+_oD}U5%^Dv z(VsA<$7qIJ5G&+5pG2Q^7u2hWluCO*|&_SK!)iTeE`ioJqc0w?#lvuolTp0za-^KTAz zV?pBHfgHuQt^A%gs+}7fRSmA@;n@M3f9m8p#JRBxFqQ9R`zWRtb$dosxhtD-CbnR6 z?WmvaEjf4JkF5L9p2XO}V^`)R?yWr3u*XC1xwS2GWh+i3{HR$9{N%3(-Qih>+MpKj2yT5jx~G=yb9`|7#&z3CAHH!44@V|(Y0|M5@8&_2an+VHGc?SlCgxm#NXXmKhb9_hV}`&U&@E_E?JeL*`i?^V2CTA#Engwa53%44h`GZEJ%NW zlRL{3V=K!vtYzkrNk*&OndynEItH_#y^HB|)7;rK(TB^?uvjssJHVaY688w&YM9=( z(^I#$b!XP^7vzP)vzd<*D?ICrac7TTX8OBa=*()j9`qv{(ziMA&*YMSIQrEqH`OT03mde3GaY{}-KC)y-NvHxc1j- z*zc{AkD1)^V%J1{EQjOOq`BvSOM~#|L9ve3+HE#&`?$)v}dW?KmH% zdkwB};aR;-!-L&yGeH;k=W0f=A%8uo=6}_Pbsv4GKm*UJ*ogMC!iVh;`Nu`EeG!L_ zA1U);cB0(SK6ASbn9?VM(Y>!FQLLId%%k=PV|#=jo}Jkt#uD^=SvApaPe!r%)|2ST-OP%BfBQfYi&Z=X0ab1Ik0FkPpuoU7$cV)*(8xplThX$>TQDqn<3_n zh4>b;dI#-p@4%*s^Mp$%lf-6yW!=RX_VY02Q1jmp^+(t+%ZQ^@Jl}+|gr_iH=V++A@o!V^d<&c+(--$la-CUcxMYp*2<#I#y;?_S zHchm9>u^>s?uDIqX8Y}it=Dl2hk4v#{_Ha@EO~kl??rZyf9d-Z`f08#4^G!^pTn4p z$A>HH^5)_9gT2C-+wmEjx>~z3WB2{5o5Or>7UG(Jy(_ym=hk=YnqjPs(a4TxR=Bd& zYhmKs6QMubPOjLmA)zeWd2n9X4i|Py)c2VXW+INOhb}+c+AdLV^Ih0wk>2bOc2Mkx z)g4$DF}8moggJ<7!G`S_?w>4(Kl(%Yv5C&i`fAmq=g)^Qb$f&+x)EkZipO>h?0{TZ1todZwg4KurBtj4$JqY>`D(~qT!v9SmhhPtk8My_`(>L5(&OKuE`NJ(q)XSqS7>)c0Onj1A z9&a=USskCO`%`*j75?&=Sml>K%C!CanI(gA@>c1@#i7X3n`u6Q+U!oN>AnIoflAi z(=Ft;Q1<`Wo5(y+Vv0i*tnp+aENI>ht(m^vmOEy;Q!uTXWQvbBlZ+ zAKN!SOzXAE4;$0*bB5M)mghf0W^X58dUC%Pcy3|`R#JZ1rno=kaav-h_OySVXHx!W zFCa_zMv~q$lKc;5klycaWS+Ky*ku>-+C1``(*EO-=2Sm#>DfcxwE}fH)d0)Kk2FDD zj{JhnWQTt_^P}1wnrE<(kMuz8TDZQ>S$i*e|C_R;*IqA7$ z{>`TNG02y9btbk?A#Q4ntgV`k{EPqiQq&EK|3+QzeV6$DD`csAbBh1?8KtN4OBO9X zxdhw4)ZL%Kx3sMWv2lB3dGlOkgDL)`tNnh)9vq@+E@{ZWpdWJazP`k_43G_qi6xc4 zaz7e$+)n-i7u>G~7mp)zxf$`;L&VB{|LMOp&_d*2{P9fygKueQf+>fu;(A|*m3EF`ci0fbJTZUCI0gg`4iHS zZD$nyrXQ2w<8~NXJ1_^CSN+CgWv?g1wuy%-VppC1VIRc&>2Dugsu>J zK%9^d83tJwLf5HDEYS+p&rnLk$+wx z`CkPf%eQh!SNRL~x2E{pW|99W<;UX}A?v-cr1ZWdBCGv+@0y^_>&CR=5|sCHBjVS? zh&$J(^l$hftNnT&5Bo8QCqHsSy|CK@;_U9o8RmD9VQM^zuk!27yMwyaF#`2~&Nq?S zY9sPrRCMbm#43N`{zsJl<3=sH1kFq5Z)BE#qXlx@Gi0ee8TG<#_edY-MjUkmnfD!x zesy{QaaYMdv=omWy%gFW=%1iJK>myLq|>OY{Qe;yXG;m_ZCdA@acVI zmRpxtXRm zl%CqZOUb`d??L%9?n?O$RML-iq4ZRKy(CIs-rbV?S802Hva!NmHBndl3mvU+|420^ zG~=)v^iH=TK0Ok-@S4IKBZ*c1jPDh+{T`i2um25MF1UujReEoKkRJX6S?$;J>WS@3 z?qg$&`6o?9maIMSx2tgb0MbWH#otc7%Uy}%jyFQTDt_UoG%UYd*o5?*lWBYQey8-y zCy-w4i;;-0_GjEpCmrV4MLld9G0bC*?2wKuEtrO^_Ui?XR{W#L|7Ill=L8|^rI0R_ zTM?`Lde0Lne)@Te@0&pJcP10x?@#)k6NX%Z{iX8jt&XGm8FYfuFV`STrnAUDCYHF! zfLP_vsJfBT@36a}NWbvZdg8P5ku&TR-EKUw${+AJnEZyrNbehjEUk`1)~l)LtTpm4 z{*v7kzh!9yF2VL??I0e10J$)fbZO}-HGeJgVB#}5wb)QEs5TO z!I2DtVf8LnMDKMM!C6Khb+sjWv`y4i*TKOkYmd5mXQTJte$QOj_e;P)Bz3WR*U-&koS>Nt=B;}%B z_Dlv-ch|6v&(sT&$=>js)O*H&zwsC5@sD3%r}U=Zwefn4EI$1v9RhpD@p#DZ?pd(g z?ysJj_0Ok9eRr_6yq`4Ec|;kr3hPeTtDqp5N=lbRO~HJ!kRS zwP{VZ`(3H8#K_|Bem(L)cO6N^dILXjJl65}$ZvH+E+63*XV1><#Y7(8#&2AK_0>@% z5Ax%0ZHM2HF$3#mo`d=6Zv7A2OMJ-oZ0&nT?Cy;IOy8g0Bf918=%hdUjI1WRCRQS6 zs>t@oR|dyFe9HB&c3<+$JJ=t(PH}%I`iK3g`#=Zk!Joj6Kf03BUh?<&iScRX2Fn+| z4*)lx1&*&?0>^`n9BaYi+?2z;fL>I6ig1+D2Vo4fe`=m@j=7=#|?C;EFb9rgC1IR0@(@^~)( z+EtEUqILXJ@ILjS4>&)IFXHjiY##6VUHh5i`=gio&jZO5Zh>XvDcHM+$G4s_R)71S zf$r(Fj?+_x=gIMhYJtC=H~Tfdto0>h%e2r}e&+PFX+r+VTK~>@k#G0A z*Pft%URCgOCO2sf-D~r^ z61r1ASk4dr<0|>|c;vhKHY5L-_+_u3SYK>@*N#gVKfin>KA5^A!$tC-9bi|a#YKvc zZGLZhE^mI1AHO#9y*(|KSHCmaeJeRe7N35xb>Bd@`Q4kjJo$w;L%zGT#ryMsUE?k8 zSCG2R|8;p3?LRZ?yM|c)xtG9>sTP~xlbE`-{C@Q(ifEqP{!w* z?-i4u{&oFH-kv2r3FbQ{+ZC6y3Qq$O=0ArBNyPv4ds~3`d!UFA5v4?gizqK5-h4OV z!7z_3F4@_gU%X?j*R96QUWa)Mypt?Q_WW|U>eSm|9=wyWOOR~$Cl@_CbX2NvPlC7v z@MDo6Il#PcBU`v9!2AvOhalPS*pj}PL_PJnc!yxOAldGpXTJM0Pq^p2`Hs&lL9+9} z=$&I~gnMK&zg&>q&%FLIceux|LMQ!YcX6MY&F}dmZG*u(ME{4Lcg4mA$#RObat>dZ z=X$fePs^Mo7X zeg~27m+yY^{^ntxep|0Nt}P0Z?S8yVv2?g6*?iYrA2EK@*6i9;C)^W$bjA3Jy68`q zi9eMp9`4Dfi5kL4Opw`EK;Q!Lr@o&nH#< z{a-OYH`i+m&Wm^LneXy+iu`Tj9ebjEC$1k2^RzO@BRp8P`+X}%*DFyV%#*V4TKGvx zOrQA#3yC$E7dg6Me+a6Fyyc5vouWa5IA%%N<=JGc-nD<$T@vdy% z7b)%=4L8T{rI?@MRg_2m4wmhH*}U#OLM*@4J{%plUii&xSR00W`kC*Pd=@Mxo7cgI zg?kds-|#)b@;83-zBH-yf7VC8RJo@G^e*WMFxRIx#pUahu8%JHtfVJp?vZ{Gjm0|; zt@U>pmKX7^!f3I+bS&-B%y;rRMEz7b8YW1g|5@Mta>OOC$rS2Q&36on^{*eU>kRcs zbK8e+?+_$E)wdr$H6hgV^u{q)_KRZu!gZQqo_Xf_v_UKn%`P>3nkDo<;~yYiah=9L z%yYi|#eS^@1k3T}b)n*2uL z{Fu`l7v}Mp?;uSbB6sTZX26myV)@1W#)7koI}HQG_?!2Y9uW1~eMmi`xfoybnuX6| z{LSkm1?zaHV2C`g@qh^p3x<21n(tCv6C!`z-X-j>pEmc(5INjFzI^SIYHk|~RtWO8 z>rqc$F(Qxq`Qpm*NAo?XC+f*RXZEj_n7Jyhu}Lp{=D+9;zjcxK*ORv#tNU|sOqKt5 z*W*6^Dsr#o14m6-_&>G2)?Qz5*1=EW+OLoK#Wg14U2k7&H#4tsG(^4YPseKFIZgKZ zqRF))aG(2n^Pa>-4dg&`+t{&!XZ^hTHE&#OAp6FOn>D|b_oNa=hY0hZUBfh;`(L?M zt+(Tp@deX=TT7HMztYBa*$Lvc=BH|xm-9S&R=cXNt{k%c_j%dZR+l$j{J8gifP(SK zZ#F9aJzbM^$9Fr+%Vo{&NuDS?@LvIeQEEFM^JyNw```%G< zziYYoeG%_m?^DHBr}vJ?|Mcy7zLxC@E{v4lmQM9{^_1%3KAaQ9?Jj*CQ&*W6kFv zw274c5)S-4yqFj-^IFU5k+KrA>!%1C; z`=z2}`}`2IXuXJj=5qgaY`>0OzIn?JKk=_2UmSSq!=WmiubG6;{AVBcueo2B0Z+Tc zk3IKQ`%kVf@752^1FOs9dsVMf`Lmdh=5r^+b1Qx~$9;R0+{(O$?uEGa&%8D}cFN@Fkg}>%7KSM^X>}Y;ouUtr6&m{h5o4N9cf9Cr0|9k9HS>!~BkVUve zC?cXos3Mw)XeFYZh@VA_5%G(Ni6W+nm?>hehy@}Ri&!RNl?ZdC-y~w2h@B$-6md|* z5fOihI4k0Uh$|xQh`1->k%(s^UWs@o;**F}V!G0b$R;AEi0?%Bi}+qdF%e(iLswS( ze-#ncMbr`zC8B|d#v*u8=n`%P9*zas732<*Z=4->8zRgXxP;RXCUV8CSvLXn(Nu%#Zp^>UMw1 z;*YF6x1EZy@4Z1=~lsl@(;QkULUAZKiu#~cr(dQ~#s2Q21~&98YIP>&7DYx<=W zgnhjah}rW3ql{_Q&WPL|!}x{EdJ2H0t7e8PlIqVl4Un3i>Y( zAYbYT-LUz!pk-WM{wZ1z?ai{d|K$QECvILqeP15vcE5CE0Oxn{UGPgcdXWE$X1zaq zkzHQ!H~wC=SifOJVYF{;Pvz0h0B{)>bWhC?uw?Tm2K)eBj&4tXa1-**Mc|Y+&Bznh zfbD*z*mU$yb~%ce?P;f{fjxd-zBgG)BFjnXikNKoOV@PzXYqO>>4;VDE&Yk>Exmgh ztn!n zf6CZrY;RGXqOza4em!A(MmRawYjSd3@HhTfKeN4l{?N5$!^zjGft4)O<+|fo&*oPO zj${A!4MG3D?my%IDe7KB)s5@_?M&o_|8RZX`Vr$3pRWP> zXM|^`-e)xWqcr0DjsH@!usJ(~IVR|Uoll~q2+JJJ@FZvZh|DScDZY}RhY3>g^^%rw_u+492Zy^1@ zZH1mvl`P-i1UAZ8y1E1WAAh=?|IJpuILGJoecA_IIrR^x@1OnjH?;JmLtwjKikQIu z1doKCIL=zWW7e?VB`&YxI>|sxug!0~Da8DA`Kfo!Oa3e)-#GZ3{TV)(^(JJ1E=M$g z|C{{wO{tGM3BOc8As=ZBzw)3V>(8pe_H6!?a@-z!YOO+kN)PUjJppy-kLLO!=XeUX z`=w_u+nduV7rg;p)1HDo!F(2{5yImm`Dkb{&j0`REAgCuIkXb>)m%U1XQjZ#xLll` zm)*g3f8rM|f1WQjp(oxdNPn(AV6D*ioc?8eHoe`iBqnma18$-JiJKRb_q?;FcP&`9 zo;~`FzwJQQ-`a`&i{*IAz2m`BL+;yqRD z27lwPn~LLoHzLUFpR$AOnR)>2E0<0qU%r0|{Y|v_06etfM~v5tqm9pV|KH2IQnw*d8T*IZdu!0@D+6^Di#1?g!TN^8CQ_ z^8m2DeoBd(%%8u5{q1#w`MY+3jak&i{or7`UyI`Lx978m{yprk$NK`TdEc;J_cX!g zc-sAmw?eqQh0UdZc?5a>1+em<7UzfSN7j#NLcPg6jEAwi2KoE_VCk7muACLDP0B$I z&yDu&?Mc0gL6;vqV|&TD$#dJmFQqO*c35olhg7XV{b*KB-^~F`P=?_24>2Efk{@7U7iyT6| z&@W(nd(z=goIh<;=!whUlD&M_f-%m)dN+15-{v>oq-OoeE7{%)vRrUA*f?Ra|0=T0 zZv?wo~NN6@)2zJYw4}& z&zmX4Tt20fJb(49$VTo)UA~_KZ1)?-71m$QcZDYI{<{4c48QT%;sdkEHhB>Q7|^{mZU z+?@W9R6lU}t$rVVC9xm*U0N(}S`Ds$vR(@um&;oJ3f-ap>FfIN7%XkC&HA%OVf?iN z`N%yxv%lk<o#*kLXW2%ymw3y; zdPn+ld3bt*<8`73`;+e|$Fsso=*rN)$@?mzfA)IDidoE0|Bm_BrjzRxhF|la$$Xa! zZ1*Q_wvLaMCnH}Qeuw_@P3fOy?Ju&F2iyJPe1h!_Y=eAhJCBb&yB30#OA}acp7`xU z%wM}-8(|$EM#fnEvyPveR4zgP0p`EwyF~4N?PV6uzb38dPm`9Mrx!SJU`FztA>eQP zFH&=UJnRSE2>-F>!)XM&|>=~%DZFVt=MN?-at{ePz4^%nV_;e)}(O6&aY zq{jSVWKWqbU~T3Hw&%YJocM|P^6qK$+v+RHJU@}gAB8UU*4SQ^zraRzvYg>O^KJgP zyPQ9syCuwwIv{K);U-2XQHDwh}e zRC@AD>v-!P-wj~*OLcQ{eoy?D^-o*%#@`1^U8&0*9+7Q+<1Y8-o*p-u@8I#b=T!>% zq;)*L?=|=v|BFc+pL$wBt%o9!FL`F(At#AvYA z>NEED0ef4KOVgha+7fKfmwKzHm+(+8Yt{>S&i-_I?*c1Bx&7-@cN|!|!|}U5U||{f zi!4Kaz~Jw|O3hZBo_uRVP2G6Y2P_Y5iSaZZPov)H80XhKYkoXbp-ZbbP_MFq;~#s7 z`TLv0uf04=eOxK}%W?hoWSPx+nQqdb6huDB<9E-@!d&09%v}Cv`-7!uYx{N}7uF}K znYBIHH-zig8EbjY{RpfqIm7nj9IW^6a`G?N(4KT-72ChZcQ1!D>_`0=m*?nF&E)db{yCP1 zgw0L4JPhFS5->dmtek#Ny>Wf=C2M&KszHCcjHoZ(1p_wx3XpFHlgIJ<-+2$32W@{#Ga0$j9yP^{S=F zHorD0hV8pt(3MLq!2!|j!P?8t-~?|Mvdyn`9!39+5zrHl4F_MpIt48KI-0yyXTHs^ z9pLmQXi9l=`a*76)8B-gQiA6b;@!j0?S3uyNYw9Cqc!xz0B#==YQ}+$En}c3?6Tes zW%H-J9|%3*ND}Ms;_nUOU7lo({R_Bv0Q`-=2HOu9pTC0Hf8_w@*Y)BBz~bGwoPP87 zN}+2u|L@{&tI~@9 z)NH4ErkKyl5by74W|G4XkOpZ zMD+w-ny~v$WLbIax`;8!%~emk%_rXe*{bY+rt7PJnE71Sz5g?v=kJK;jK`{;^=5yo zm6dh#`CiBVXI$o{KJ_{MNijWTe&1fXLs{9LFNP$^&#&4!b4>9sKGq{X_3!4pWbx_n zi;p!w71y9<@@?@yHPeg#h=2ckOr3&riO46Spojnwfg(agloAmxqP&R8A|gb{B3vRA z5&zfk$DPFAhl>~~VvLCKA|{ELCStaTr6N{}SR+FG`0ck*#8weIMC=iTDnS47+pks{)uh^Hc6ig+vHqX@rLqR%42MU)p&Sww^g^Wu+45&ze3#NEZ; z`-&JMVuXlMBF2gsFJh92ufHGvD*k_=h$SLch*%?HgNV%{wu{&;!Xv^f!Vqy>gipj7 z5$8o*7I9sKUuyCEFcH~BWgS3qN#`$ zBHD=XODC2?5t4|^BAg;}iO44+P(+A`QX;}dlowH1M1%-ggiC}XB3gthqJ@YyB07kO z5ivx>2oa-1j1@6n#3T{ZM9dU1SHuDlOGGRev0B7B5lJF8i}*u?M}${|A>z1*iz2Ry zxGCbUi2EWQi+C>LwTSm3K8r{r*0J;=GKnZ4qOgdfB7#Mf6cHw(oQO&ys)-Q)9M6C1 zB#Tey$PsnTPxq@~v!&wU$tXBw|VW5=%3B+Ou8Cn{@vH~4RpPW53KI|2zE}r4c2$R zC)@nq-L4ojUyrI4X@0s&H;pkrodqg`y&I!8UCB-okhF7v(R z!0P@xWSd`~(+;{GbHVaAV7(33$U|!~|KU@x-R~{DKgMidAMp^n_sa(I=A&TO%fHFX z^IOFWrfb(#FuHz^iF{wv)mA?5S5vdBCl{u!E6HTL ze_R0bx1NXY>R1G<4*eJGoJ`sH+(Pwu!tu=81Fu&Y=w>!;5S zw)h6K8XY+e|)}+2@2z2-52(TXVGdYO$^_D-ep3U!_ z(g=R%>N(Vl|3Gdwjrk50tWKCn-R}Rl1^tJYLU%|c_UcTw`2!zLp?}#A z)a(4hdbQh=&rc)Q>kGE~-B$)eS3|qe->eV${s8izLCn807;N{uLwUcB{(3d1|JuKu z&FN6JWKM5Q-fyD@CX#J__l)cG-<$#6Rq!O(+18JI`X<=7$Xb7PJ)u9o8<*$eWZ$(+ zT>kwZ(jOK{ed{@}E_VdGN9?Db?>V+Fj?_D;r)|LeOozy4Ka!(%ft?vTp+AnP2D$5W zjEA?!7IN>9`j~%{wu9A)N#w~F$Wh{XdSYDM3yzax3pIehKKtivUlHtTbPu|3z0Q0m z$5Y=eu8B7Dv*e30Kb<9;{@>&IsxtHMgrc5%ZAG>h8b+>Bo%&~)Z1a1omZyHTJo}@U zAup%jb*vn@jRM{7PguICoY_CMLv`o@=JWXk>*n+M#Am>4@%%oM&FA-t&xEIU!2JrV zWLv&B=EINhS855}yC+RolYLhI9hbyAR|KoW`%t&}-RXkRUp@C!=Vd>wnR~ z%F7C1&+uQ^pF3s2cE7XgomysnqkawOYViYLsbqby@5FuT2g!E7emi|_GhZnk4Zl7n z2y9#>`~G~({1pwT+x&i8zwhc1`O@>g(Bor^fVDD%z<%osgB9;Uvdypc6VK5V`O3{T z(6yP3$sN~&mFCo?)W1`=`K6M4$GOzZ(#P-}*wWLD@Ehs*PHkyhGT83-4Qzw+6z$PE z=+1VPz|vIuecKv?jcl8s+x<$QivCK|sVfu5vHxi{u)QR{<6HZ*6s(;Y4Bc3?nELrP z%&)`#itD_XKg_CUtM3aafbEF#Asc+oYJp&5a&E9Mpd?rukkjJ-_N%&h?yDGYZFdgn zdL{8(Ta$zG(7z=W>xbqA+x=?AIP8~|Q`DXBXOlM-X8ypPU}=0I=G*+fE3cx_UiWg) zok@?$2g1Sn{P|!hbtvoE{Kh!(oOaP)Wq%py%7h~v*9GOl+PbqG*WP5i-#2JC$E$Qz zw)f{EPH#1`GuJ+_)QN2O%NYj6dPMt1D(Zgc9&`DQdW3f4{qJ&p*eIU=Eqw7MA5zbC z4(yq65p0a&bod?o2doX}{P(23PXCLw@XN*t`dzcYeo@=$U%!a;HXQ*=?^jU2x|dv& z>yMn;OK!9mdi;$ubBZh*FB75jx(v1Gl>16+k z)R(*I|GO#mx@t~pbARRsP}dYEx#|Y!#s+JD;N$ohmAU_r+jD+PVG@U2*K>!3 z;&@McRT;V?k!)<^{BSi4gl_EN_`7l!1uOm?;CJUKMLk;#*xTRYnD$^-?c%KeTN|=o zlKC%MkgJyfOMSQ>cC=-_G@bL)F*2O~Q1-{sz~Xt_9@_hFg!s5T4B_}1t@nYIf*sMG zw3^#%CAllu?suf-^eg`CuXnlCA0PdWd*wJi>nyhW&$@8_qsmhHb9M7H^r@`sw4^V#?^0lE>s5v+8WO0M?@ zxquFqq7Q<#Pt=vVhpBg*0^MFuPrHlj&z2d`eRu2PI9Vyi{;FyI;`%nfH*{yi@?;0w zb5?URf8}E6_Ig^STWH@{_ZxJr(!cDFV-Y#%CRhrb2bLP1C%5|*ti)cVzqI9l&3t3> zQmfv7*LRj*z~g}#3DDJqpU8#i*LU?Kw^{|=?sr#?=JGjo9dxg|v9)}x0lQ~Lk;BCM zLBxEq`^9|&?EgLHD?_~G!Q$b(=6u}p7pG^(t^i%~ zknMi&-#MuVZejnjW&tawcao*NtRF+&?)Q~B&f~ePhv9d&`-6PI0PFn@lD)^kcE29< zJN3<{m|tW8Sh{x_tZMw8MtO0L^=y9M-QMsk70=T@-OA75W4)3yp&N_-1p9X9XZ>lc zZ_oF;`U3l7r8Miw(>{SU`4aNwL2t>$$#%bMUyaV@e3Qyvhu^iP0$6Kx5$xDe7;JpD zbesR{@3LBhoAiJEo%YmTvf4V1FaOtkzp^|&kP6&}?&l1R^@x0F z{0XwPKUb{!U-$2d)xJI7J5ELam7}lO{sD#4oAC|UHNOF;Z~a@a-S3^{!0%N`=3-4{ z(GT~_GGL=eR&w8bU}>)dtcBj{X!^8W1;EncHDKu;*rHZ+;Kaquo!I$Q?+#J0Q+*Z5KmZ1-!G{=nbG@dG}^c{llBMeFas z;8Rm;@zdtlwA-9tyF=+uzmoIIpZU^|<6x;sB=c>4B{>WGyS+AasbLDPqf%N%ffb$C zVJZ9Tk!^nEeL=QgCYt%>!ob=Y%YP|7+pp7*`8L0H<}B6&<9IXZ+AD4swMk9sKgsQs zHl_`Ao8LKz-}8z4rJ<`W8(@DVu74-*JHg}Y3DwE^_t2$LWZ#=@JiZR&@}Z}HM_yZk z{({rV>kEQ?TaK`vmYI5e>wLBP2TX?^?$7pn9S5rqPO`m`zmqTgihQa0a{62D!uD3n z!~KU^>>9VnZFWOM2A2l<-C)d)prEZ(wQmO;!r{vcO zy79_dULMgeJ?DNw8fh(0jUK>nG;KitfT84k*7Dze6geB0w{PmdImPWu{?5pE94X4} zOE#`w%8r%Ljrv8PD?65gwPoLd#rA;tIRe=J@kL-IlFOgcVGi|PRhZvxBlB;$z$xX{ zQcq(43|oJc_xZ5CDb3kGBYhdJk6vp$_ByC{pl$eJIXf63*j z+x=2f5c{{Z4}8j{E@17Lc>kSPFElU5-+0XSwRDrHzivr={|4rFcohyd9&`I5bzR8q@A(?gQ%c+iEBoq$ zwR9)I#$Bubh0jxu?E&3Bz3x|h{-E}17yWl`f>hw!=Cc^V|OF!_Xm;pu)aNC zyTkj;j73A>PwBzy+_eR)uT(F==^Z{Cy4|mYE@^I#yHs)nbff+>uySD}x!x?WHfa>t z?pIFdr$5tJ)_<4}Yy>l3Q{G~El6uE8-|kPln#)_kanPl}*NW@ zy7T-lKKo=2bl--dJimQBm)x1ZvuR-qz;?eh zI)dXfYcX_VMHt5~dOrDDUg}#Hf$e^2b!qfhd$a_)wyQ1sUxE24J<5`Aa()`YSK4Cw zT%5jd@(Z_Pd+C2;y{LLt|CX_RUw5!_b18MZfBsbJrKn5gdyxG(ebTl;#v{-q?a-LHinpnvV3 z(51m!$+xzEjj^0QW9$y*+x*IoiCka09;ScK7_d^&V7(jTsTVm4{>Fchy4Lw3{Yf+7 zH-gWCwFB{BE$uPt(>hXrd=6~b@*Q(K#hCM3>cxE5=&NAuA^YQ=G#6|vPe#7GW^L+Y zPLQJ-QXhKUtey0pL_N#{CenUu8-;8fz^cNV5!J^>Nda8BNpSST>1dr z=&y5pSG)mhau@0aE#2ld~ei)x}uc=Ydih0-bizsleKhUyWdspH;k{L@HoX)klRni zpMLkGWz_w6-edRcBO>^HV0H1nVbKr$@?%cVTIM_Jq@|w8(rtc6fmkjtOW3}n`E<_j zZ7CQp$MdD+XfAKw@0NhIF{#m>`^;vrah%iZ?!J!xX_=@mEfr(-ODlAP^?u~@kCh$! zz^?n3S#O55ew6XCUWP05N8g}+`99d&XBX&FuQklq3xKs&T)*9~N|7g7yuJ{5-Cg+IXDX3zKVg1sC|JtRsc^<4x3*&aJ;2Co<=^-Ppz~EEM??)YjFp`+V@$J zZ}%JL_AmAWulO%uej&DR_ZtK9VtE$#E3&>`oXhv4Y^-<9$NXhhzRhp+ z4~Jh#%mu$TsRZk7&jB{Zhq(>zmv5YDp-3|0J`0;U7pM7I}phH4MWNOiqU`nXRwk!m~8Va-M&YC;~aG* zna2~x`T*9e@(ITaTH_$F-7ht}i~7=y67=`DNH$8by&A8;#CX-=?odPz($#%c7VJO>==6S4?)Q9bLEYI;e zKZ5$|3Y>nM-$x3u=qBRZ1Wpu1F?Q6E$CO4W+XSNN&ngtT>od1?S8j+7WQw(O1AI5 zJ(c^rMAmck`IT(-$L`lAPDcN56G4{p59aiwsmK1<{K}GHZOrM{cB<50 z_6Hl2n}C(k@n9uaQ?kvkB=lr^<*92~JAw^&1Nt{AV69L?vdypTpO5-dpC6!yRGmX^ z+l>C^zp?$&7Tf&V>s!!`ChWg*@;LcTBjjsq?=!ztbF$5E#H^t|dn@R1t^~05iY%?( zK|Q8Dbh}@zU1efpuw*z{uh4Mn*7=gQpc6UQO{~v}vvY&BFrH5-4|tntm=N zyh;5AzyC_SbsVfb7)k%_Z0uhRZaS^x4_=2s3Pud7G@jtca@>kO6#vtCH5F4W_>e-O_>CEsNG;<xH!7bM2Jd)!>io&ik;nmux?AGoOp7KiqhkPyQKJ%-y zp7&rLejlekgWs{g7`f65u(wGOuu<3#`K~DI_ua~mp}PtdfG&-^%lwi*f|aBE{m60e z7v}Hzi}mV{0&8RV`;H^}SF-XEx_j4F@-F@!@7?eLEImtM{pQth{$m92`UF?*D4q`) z{2j&Bu`17(X7KlZmw#>Q+xYvD_uNQw#%t`)!5;LFv)XSp9;~Fv!s*=^12!IK1$z(j z`$=h3F|hkJzc0}qRt7tYw#D(GVvVn>uY=s2^V2&iGuZIU#QXzAz|veN>wDXSwZHiN zyZdSl=9lOEb}g$(eOW5DccVVoC?kPg6D|Ew8tQu^sE2cUyay|?ULf0Z4J`+je$58I z_iZI|9m~Igx)Gcfx@$uf>Jj|?(6OWt>oxy`@pL^5qW+A~w;#GT!@3@!%>uA=!>ZTeEbBGp{B`Z~&>zL+ z&pqZb{qOiYu6LXR#{=R#l>Lc##r%W3zQkK4J9KIBR`&1HNU)OW5dAxDP*3=SdZkpn zzUC#b_i?v=L%r8v=#C48p-Zc+>&?y=0xL^-eVnUQY36UP!TSD1sAmaee$E2SAN)Pr z%fsvWl!-;b-pbbXciYR7r<7&>bY5TQinOi=T;B@1Ye+oXE7YGn@e=EI?Fe>)rHtL^4{A#PG1m8f*O>ZT z4Z16E71+3I>9f<|`WWN$FxI!Or;)7dRlicH>aQk?nk%Yz;7{zCthETzcP!dP!-IM`jeJUNc{ zm$>#fVE%;c&|THpAMv|;W7$t!Khsfx?)!2W>~gdqr{4&6Y@^>7a-4e3-q6*^narQr z4Xn$f$uIj;pV%1eechS$`Ug_4RG9kFK491A5VBs6{y*np`kWd6zbZ?nh^5yYh zN2Yen@1j%h(;4h+%KNEYnQDO52^r|m*?|5P|KPmK71@-$=N;I4)~cWO0eN;y==!k@ z~Mst$VUw&+n+OY6ezkJ|8=k}e< z=B@1iYo6cvs$2oPh8~9Q9GRN)r%rWFkAFojU)cw9`WN;EyYpDfOBt>ouIXE-zpTRb zBYra2SG+CPpT2R__tz!Ibs$&zm-Q-j2dkOSvb_|}4>i|0>TwSGH`k{=J~RD|#(`b$ z-?IO`hmnW9Cl6i<_O{JLf7Cnj=cmy1vuEgk@Gn@+^cw7*nM6K)mpp*?o2$2;a(q_v zest&gom}4YS@%~T=lyr?T`gI^Cf5hov68G;a0&g{3sUbln0)sh_NTspp5!(+!Fmwy zCv;WF!2Ha=L3amlroNlsLwOJV%KXQFGJp3X@*>^`;9cSXm)8%xf7X{S2kU*J-}e*mtMabk_xt*S5c)4M-?=I)c@&R#^{gM* z-gD}{GG4H2M-#N?Tlty1hR4HV|3R+Uh+7j%2G7+rj=|#QnWb*KpU{`6LU#O)u`u`dO-B}>f z${)^p>(;uJ4V+J^Y3`86-XU+wkNF+vKc0M{GnNl&2(JgyyB`1>^(x`|Am3lh$yfLu9ldc3 zd0lDf>W;ePEoI2tvw~Bu8aUr}jxUS$5(EALi~AVC;(B|q{xTI<`7ji$F58d(Dicc6 zAAKL3D9y(CzTPnqe(^mI*tz65{lhll`wU;eEb@{-u)cXM*!c4!zE4q)o})hTGuUUm zCHGxRy(Rk>a&%{FbH4l9grL2U_p`zJLxt_9o<`oh7OYL&h3`XsopWINmG^_dYG6F| zNi)ID#l@KK)MV5Pzp zeBbA5Vol%6_F!kKj_iNhQq+}1`ae0SXW;Wc6#o(U9!=l8jLT2V_vA`nz|!+_WVH&m zf6CB>IPdh;=5?sjhqhq#6z_wO>hgOX=g3pY*KYNrzl(%?aUVGS^SHg&KKYYdyyX09 z#Pf1}SsD0^DZDOMpU>kdSsA zS_9~+V?Fa9^S(lT9iMNYTxf{%4)rAWH)^K2IBrod<%M6(6-l1)6#aGj#^5+XjpBX3 zYUba-`lElT-x~;acKsWyJ~#sQZQ=gLx8hg&<<-=SWkY>kizjcK!u-`ES?^*iuv)Ao zSbsEu_0E+9`xHOcTl*gK+j-<4>gkq|PbM?J;g94wJkHn0^Z|?eS8yEQt5ThML`kso z!z^<09vl}qlj!%w)B@{kCozBdd5n`U#6|!4reJ-i8>~9(fSnubkWcbCAZl`N=3m~% z>3IAn*jJ??r+XytdsZ89I`zW0nSXQ@KfCQ={)ZTT4!X&|N43U!rF$EYQ^$hU8Qk9J z6~+4y&FxE5?r-(A`>|h8FI9ry_hty#H|a5bSGtmaT1S>N`Xgt8_1iVU;(U+#w(0aA z_E?mv7b^26t>&i#x2(>QX=F7P=$QLNKMVV$2Ak$)%!*4wD`&&$ZVjoO0M z$otqI`g$qkgb!e~@qN~P&gXeKlMazb-lVRdBp*G*{PlcJl&>kDzvaAWK=-vhOx>>~ z{V!^R^|O4=l-fNF>lNNYJ?^(ttPkY#+?;#sqrdw5`!39v5;v(^ z&vA4Xtw&z?1iJGypUdSt^&G6P7|MRBol#%i`I4V&>$CmiJTB57&jmaC2C$BPl)lb> zYnnZ*J70_dJIgJn-l_-r_8;s=yQ|~|tLWc!46HWs z(ck7RxltNSmvcfrw5R{X=MSm5to+r@q5B5bhpsmIf#WczJpG@#Ge33}`+H#`^{_%< zU#e+f^>h&R_`_gd?cU4}d(8Z&cfjh7NAzF%Le3P#`I6@h*!Q>**y+o_@%qx2TN_fc^@;z2^-d3DGfxi+#dby=x)tmj zSsHwOy-^eTnJnN=EnHw<@o`}NatHX;y1ehB(~1Mo)!lcY2NXEZ{F>bV1!OR4nDv}< zx?($cePMdmo6P6=Boul?{rkVDr!E;``qj!gv7HS_*jx>qdoVbm|7EZf$ypp|Pwd~p0aJ5VR z!pCLMm9Evu%T9tNUw3k=BJdmO`hWw<<)pv)4)PD_p({fkf;)}k^KgvinQwp6W}XR+a6><|E2t?=x^`|y5i#Z z&(85P!BWZm%vUOtTSSlxyXkMv<3w@4H|iO0DneINrz58?4tDO9F`h=PBFry+9lE%# z7VMjx8~Yh)FrQzewqFI-(sKONs0-x0eEx^pZ39>tHkS3yJOUd{`P>L+vpm?ZNFVq- z3w>fuu##&h{WY(%Ufdb5I7Y++NLk=Sp(zoJXRt9X8xG=}+YS--QkVDDY0ENToTa1~=vrx+obv~; zxG$Gnv>sUM!|jTAKNbCB_Cwbu@%J~Q&2jqkm!`iw$3bew{u{42PR5r^%zsn>pGs_2 ze2VAXGp`Qkm(+{%Rhh@%2emQj=>N$6N*O$C&*;U!pXBpNqzj$N#Wqv7reC@J2RTPP z_4izFl=gG#kTgr%Fly>GB`*U?azJF1o zZ-ce@yzWafp0aTF#{70TbIwby;w-+lF|*BB1| z`g|VooF%Z-wjcW+GMoN_>%hu;nf#ISP0JQTe{U|onz{#U_@&3ER%k0ZhRdPkdqjT% zmn-9^zo}^q4$?_p>NVd|+w(0_aBTjnu<`p})ln zuog6z`qnAbYfS=cuPlAxVd}@XvtCLfx%g!Ac|PAp`q9n&F1$Zcxjlj0#$Y{_+aD=q zH}l8Mrk;+^k1@8)A#Zy^f4|?r%BUk?rQqMxtENzY^e?gsn_9hY+U`B{B8#8jW)<%?{6G40IYPh*3&Cvz=pJi_4@9l zzuao(?>$ew4X-a&5{{AE?uBmDe@=g|edG;$!Adf(KbG20!uQ8gi;vK?PL;sQMvGt8 z0~@uhMV#$c%|uje#6 zSEOFGGW588hrn8YZkO!WlNx(=V7#>Dyq;2vn$PiE-Idc@ED3Dni)H_9eqTUQp1(Y= zgzclciL7Wmes`YF&*O1-P3U&Nl9+mo}(z8;iug&nX zesU(ZSJYs=zMxB zN1TT)?sEYvnJzNFQcbXwM7H~t2mTx%_a)~4nvM0Eu%0sXg?0Tj^X+~``^fbr`2zBl zzOTvitoDkgV*meOeY;=#bcNSr=fA@IOqBhes*ZFDlD;xdLKjqZF)Dv=owMqBE+L?Epo>|Leo-Z}rDu zZO)AKaqxfgkK#t3aqK`!1}JDJfCUwfx6AFY?{USl{*#oD@NO&IKSV0 zf?umS8muJvald5qe|^rp(&!8PPGcU=FSDhF-?!}#atQP7ezp1ne6OhVrC+bHYkMB8tc^{Bt^FTLdS>xOE503X5=G*J3 znRvda-FeUUOY!-T(qN94Uicx$yCZ)mwELY6t>1YQExpe_7!RqLHNBDiomYF!_U(RW z+W<~~hYT1`eZ)uPD?{nm3wt@fd#Ky}QoC-}`jip*Qu(o9@!SHi(Iy%!J$s1h)6OiV z-j?-4-gEhu_P(SZcAxs=Q(&Xke)24`md9G3)9it+#Q#7&-!}U9Y~=juc98X==76c$}y2?@bPI zBHy|AAo)ZN*0cH5c>WG5-JmY6bHVo|Qg7z#mGbiYlWy6WZ}Y2hyx&a9o&~zzDJR=2 zo|CNbIrd6H@;CmtDf~X=Wp36VHXdvo=klYsyo7qnFpjTo_dlq@{!Y!s{N}mAMvpvf ze|K&6zm!$q=J&n%9rd*V1?cZLhy7o{=j^D3MuMfJ{Lt-wXP;p7SBcBV_Ltmdem44@ zo+n`C66=5C*ZDiQ5zgoKIJ@#a4&r(4?9VV>=cSx1$oh8w^A8-ariJP6#ODhbF`QoK z2i~73HQ@L;iw=ihYF`L`r!C*ND2V+%ZB6gFTO7Yk^!s-6JrBk;w)c%c-B?clilV6R zyE>co)>!^*{QX%w_C0l*Uth4D^E0D0J;SGRer^cn^rhWQy+8>08-H)^kKS8)rEXlG zqKmP;vo5f4Fo^lX|J(0tor-$;;?UKh{GCSY&goI>F68`P%KUHqo4P>P@`ka!p4ND` z;B_D3`IH=w=A~KR<`?TfzJJxamxJ#7E{5CV4(yL!@jCU)Wte|@6~C{|Rf6qR-bPNg zruQI!ht^(m`BM`e)RzRpFSZx>KG+z*@%^U#=LPURy>!p&&k){U`Sm$dPS+xiZ&CKo zY4hvzcVK#@krmnBW7hqvkIS;ZS3h$8HsN`z-S4Y!9dE3qUmcyo@tw*3s2BOX3Zr!u z*0cG2>9b@1riHP6-@ZX%x1t-ySIfioS6uf^uEgz?xNeZ+zbO;P>vkB}n4OmG z-D!mJG!|Lg*NJ)QufX-sXZ>DpoaOqfFW~Wu(t*prUWxl_3nIhwz- zN*Ao@vrUgSiN`J4s|fT*aj+euUNx{d9_M~=eg&}Iuez-D;WP7n2k&!y@6_h>4Qh?~ zt1YVqw)^$|(L8>5LBBqt6j(Z3m-$ylQtulHw)>qK3v+!MMqORQ<4NsuRkqh6GjwAM z>)HLj^gLctHo92lY*0=lh&y~?%r4e;sKu&T^h52cJ02|9Kw)r)O z5672=#OXJ}hx2&zXno{M&u&u>t_QaJwO9NdPdd!`E8eTk?=iP?`O`HS>%Z2K>!uWliJdgGC#u-_! z7wh?Kexvm#9&g>Hu4Ed8<3sHj*DuMX@OZHYx0gn}wqWHp*LU$A5%LS`d$UuUc|7+@ zLVu;Q{60b3#q$yAc^c|>GEzU%7%WX=J>$(q&W}xG?NKoMx18G>?crPKQnWwvjkC>I z&)U98Zuo#+5TSExBH!Qidp+d`kkpO za{Y^Dd(I?YXKqw(!T#C(@9uDUdC(cU9#xO+wQobd!~0TNdY`a#!lDdqY?E@pp1@ zeFeFB7U)vOo?yG*cYFoV;N=Y3IKmh1f!{pax1 zp2R`PpZ~s4{Xe^QuzkTLJ_^Y&@ueCj=<7=G?PUF#Lt*Ui$UUk{?YG>U} zrC#riTbpUv{CA?>>UKBU_j$87WW;v|2mJIuZGY`c{G(C1uP^lTr1)y?JNW-7d+(^I zlIMMR2qG%#m{AdjAQ>ETW{MdERKlRBsHm(F-3XdCZtGqp~jMgm3|K95X5^ zy5@)(72&Pw2KV>5d%l0X_ngP^p{m2}9IE>+2?CAPjfttQ27)*~7{xvKC0HkVFMOnW z3TBf4W&Rz3xatNgWAYo#0P5l&f)Vr>B8(9x2q~gD!W4mzI`cbQx6GP!Kn}zwlrdfw zUdFhzek-^`Luetwm`mW@^SYI1GS4n>_oA=p= zex)mBDCxIs?ia?@J!W$5UCp>yU%R@Dr9qa$l-nm&)$qN|xINPMQZ9cjBFOJT$XURn%M`bYwSA{KFwiV*NxkQCaj#t3wZj@UBh_RhxLr>_MKx~HdV>x zJG-!PuB>ml7x|}7UB${=nQR~9wPmpEO|>V)!&!OKt>JbhStjE-|7|rZuVpQBczJC;gmKxH z1je;tzwz?8vB;f;zikDxn<*REn#Z?Z2IC1mmb3TKe)8q=m^0p_ugy5Dq~C;Q19*Da ztmpi&i!c3Mx?mYEZ)uHqemHezJfW~9&#y+@E-7n3Jj-9*XJh7G)>qEB)@1bh?=sxgo{9)F{EgvP~Gs-1G`U)2)QH+ZPG# zts_7)6d#)=LS57`L(ubowT({nz4@Sj{c!U4(tnTN=1rJYJK+aH}wYB?c2b_gD%6_PLU6USS|m_zwS8SzkhgVo zy?m~T5HRw(%41VkNHUK~@O#rla1!Y|HGh#B^$&)Z{Ce1)!A*pewZn_P-0J!(4=+&o zk1%}STr&$5x)fX5e>NIXCL z3O;lChwh10K*^*>zI_E>!E}_>`qxi`z~oyOS%)Y;!PGW#`lc&EP)cR!vY+7dtx3T^ zdj;eK519PJ-CyvzQ77+0r2?#PH9zxO?Jw-dJI|Um0Wf(`boBjAKjCpeqv_wq0zk#q z)cYlV!r;JF(>}Q(O{_M^nG+yn{=9W$cy|Syu<$N!I5<$qIAJ-foly`}OrBZiYnVcS zi6tRJpZdeL^Y(hKGeMAC$8pqbKhR8BuNYkhLK2Oij`)J^;`~&PMgN3Yt4nJqi^MVQ4KO|cu!2%!Ccz|}pf_U%Y!Ghhml_z>%af95?os@m$!2&#+ zcf8ekH^@1;U1Hh*-PbEUx@MRgZ`*8w1)-v1#_J?E*!knMa@-vdWbuoNK9{&c#i8DL zaU(E3RBpPsfqe3I{kPE|=z7{zb_;WZ>ocGFEqJUDYRj2Vi0dSSWdTCT-UIpJQdf|B zDiz)91_&7}JXg>d?+pF>V1S_6uv2N^jQ*X~JGT0HfMB{-ouO&(4$0%q6tQ0dgz(2h z59?LC!|N|H!^a~7gd|t1yg=+nPngD?eqZb(2xDBT4qom!3Wv;VV49 zo%22ZFCQ>0X%zbUny(6o-dVIfy8y zcpKL%)^xx!Q=8|tX&&M|Cph7vt3dH?i=aoGU*h}}=eM{#h|7z(Jc-MjxIBu>tGGOi z%X{s+W&CkBKTHqX!)X=Y-u?6d6WzNX-qoIhZ0X6LVO1WmZ0gxl(_`EO^>3{@HIDFx zk}1=sf2nd6?5vv3T(Qm@av}#!aZYg)?5KT<_W|9?Q2`|%T!kbPJzPIIPE-SbTHLJc)m^XWAQ7d-z{AH?-VoIjXlhj5HzNBOMIYB_!O z99JPcq~KucYCFiP({;RFfsK%N?{EE-&2pf8wnWe)&+9$?YX^M4!;cSl_e0Ct8n-^9 zj1J4DJA?7pfN6Kij&zAs6fhsl0!v>_K{_|GO&aC}>-~JN>4yJ^F9# zwfnS-AoWbpe%gWl?Rwt$Vnj#5PMN%atQq=eovdoJN-k6!_!87(xjlq;SUgo*C>ONz zX3Yw!^WjcJ?ONVD_)uTEA>I^a}WlW9bZ|HGYeh$f4@SKlwM1Yu{fx>>b?^ z?$fx+EDSUkpBSlzbc86XN3TNx8n#~4t+&gsxMfXttA6h84BPIEN&o$3C}@*c&a1%m z`Gfk0{-2HClJm5gcrO*Q~k{3UYhu;2mIhe0JqXy)clF zyr(0~T3{V~xMdjBriI(%E`bef!a)4Km;^@Qe<+(i2K`5s_{T+S9r3I+ReEnlXtb-sa`mUH= z4>_i^U5bMsG~XTeaJ2;pY|P*wB!9&dRYxo!t0w*8R<<=6FSP_YwNWqZ1j&!T|Gaj} z3dHY=Nq7|DLuvnASXiz9qBm$p#|)ir@>$U>?vdV*YP2sJ@pYHv!50O!g_P#>{F1Hj zm;c$~i$d)7NXzs@hQeOC8FoV|r_IHL7YDYfX zvbxfa0>#J4bH83mGS0o{>|dbJvw;hTcceYt&fJ4CxX{>YM%cqpIP-96l(oBGT4x%{DgPc>L+!g{ zNA>&W^voLKX)O6iXGo?o_-1P;-JBFye`IF}&uI5>}EzUD0zrmrafFiZ0!5`zg;4)76nIJyb=D|6q>*4;F9+BN~?+OUV6 zfZEGM1Ur!~5i}3=M+`>Llk>t=z@M^tCwVw4e=uGf zZf!iWj^;7BSYJ5Swj;SuNG)V?*gcH#vM;u5eF_|oxsW_-qdwzeeaS^97G6S6ex6ff z6~p7x(uuoo-G;|!swLxMedyjx&ip^upX-Xm-9$_1Z=^5qE*7F3BzNBJkU6M;JH?eg>N!-#JzLWDlF^p^L zZ01~~uQ3a2LGHDRaxS+V$oa8XoHty~d6&&7YA+C+<#(9;DajAcNt^9xW52jU+zWVkgF1H-~k-4w@ zgY$E0Cf9aY#JO1i&mZ-G+&8$-<&$nUAg--*m-Dc6W8!#joAWC#^oeUboaX%7)4Igf z8$NL^@=x=)6|2X(Mn9OmwqEPJpK<*T35}S04}G?-{2%?6quLATKWhi@SMtQ!jPaA$ z77|ZRtt`MFY17@DAO87}xYlqm=a!aVS$dd6a{hAZ58`UiV9sxiO5c*c)NKZfp9G9|FuuW%^W0503dld<5Pu)_taD6$q#x(Y29%LpyUL#P>rb9C z{oIR6z`slUBEA!s9{O7e_$$dU))VkYy1t(h@K^JAWfK8^bz|d|fWMMCJz06wU24v_ zrspWe)z519P#aSL)zr;eae2#8t%>V8j$vHlbF>|CZDJPZIcFJH&%Vw1fx@;-ey$~p zujWd4jhwIJOAcJ&@`UO9eL7p1GyT7BaQ#h9SpKMo-sJMSN4b7Kk|8>H_1s>kXM)bH$EvwShC| z2FDrKI?EV`zDbNroAhQ};@ZNJ`L|caxF%pECGla3E3&Z4eI$gv9S;J zb@Z_HyUG9R*1Bcm|Dyl873-rH`M8#ANqucDjrp&6;+zXR?hg&%!q+zG|3s+3n)X)x zz1TQAm;%Lz!hd|&yWqD?N_b6UHNgYo;ZC(=gAz(zM*BFO#kI5d_L5T@l`xLRo3}kc ztY6WO<00^S{E)>!I7~_xZb~0XkCdH#LrX|ui37I3&)ZbV;Xxx^Q_{O zNegg|c!i|0yO$^I&+Dn2ct{C@X}$ff+pqYJ8$M=cZj~#H8{Iv?<)9Msq`MSF#;%Y< z`qPv!W0jrqWm8uOp>atH@;37D?;&!y{w`+PaUUfJhSOT=bq{S^Jxxe9Sl z#C^anX;VlnxcaR;J^+$v>S)&l z)Y~zfasB|bmMCowNyM#jER*meCAnoQ$g+5xSN%J#Y0qkRYD=^!WWU&FuzMKtwgdB? zE^h-_l&>DXAaTO?``HdMOf3TLfG@}|T>0ng(Doo(Rxl;Z)dzse_Zpms(YW2+2iRKK z01F=83~wket{1U=v;|1D5woLq;hJt$`Ns#xtUz{key3~qyZ}~Me(O@w0c4bK6TN`h zE(F;_ZGAdJ`JahvC*;1X7bMeK^&tm1oS5@-T?;R;OVB^`cA+DvX$@i_@_ksJzzM>8 zY)p$A=mp`dd^kfU<;OeZtUkIxCau9BFQK*S_8md@W9*M7)!vYKE34yioZDyE8`Rw| z@qy;cB99bJbBCmJTV{D7_n3+W71z4ejQhgt%birGa9uND!uQ~Q5?`3{?SuYntdFKE zF3ld+>EHU9P30BqYi)ZVuD{qMV1H#~Ms11EZtNK&KN}e1t5c2M(-mYM(<=lA9~kY` zPOnrI1Iu3+|3w>kM0;sc)Ti}@KDIEB;$w~ei}aIB+WBV(xWnYG zk}7+=gRCsdDC}%UchEOVm$Y_6eM;Y0ci2hsbBP48K0K^eR z*%b!T8dFLnWYV6_U#`%f`kC30Al8Sx#giW6o|6l;C(|QwFA3{)UpKfzdp9#MJ|o{u z-8tC}7E<~yLiZy53Yy2xaf6;28dWyhENf`~I>-&Gj^E82yCo9r7N`F3Tk8f#Y2Q36 z62$s&`$gUHXI-Hx(k?{Uj`64Rw#F6i(3-^oOg~yLIOPi4sJ{M<;fwSo+p&MhbA`#& zer=5eDYZ{mTw(4YW646a*{`5{*?{RodxhwK_?w3L@|ms>W$1H!9{MkKFE_#dHN+Km zubEfy2E$7{Y(Dc<6HHH9Z@{~7;fpS)C(7qiyNEVlVtwNlSWcI^fer0l9zpj-%_jd` z>;^`wgHsZUv3$^)Vu2eBbX}fWgm=)1^%E#O)UP_aI$(%x6i6Hv8vH#J^%v@g6uKaf z#_}=14d&ARvmy$_`UzBz(p@1n;qSOqw*J*6}M}ytMz$q@aZV=TiK`)|TG>G->u=&{c z$OD!{1*dwIMMF7mi_g930rzSB=utFCF5!6P&A(i%FRh3BMVCF{xZcqG>DDojn}&NH zJH4S-QgU!er5dunukh_R#v5GS{!BZ6PYub`zn$=gvv$GLHr`S5_gUf%dp0%{jPR}n zp!C=Cf!j1L#Jdp^sr>BlhM)JbeOjvi)yCGL`Zi7tQUhG0-s%NCXzv*B?#cSp+&3iF z3!+2gC2hv5A&!-QFIZlFydZcW`cLKIgeN$bM+F6rLHFs{aJYFvyH!VoCn%RwdMG`? zp4NDKs6j`2mK!{w2l>|-<&8U~1=aI}zgat~hHxqu%RQl0c2%&onHscPom6J0J-}s5 zixcoJ8nXX!HCR3x+lQ4no;wu{a_Vn9F@I=0Hz=C-e|ueFmZGxqYbwge_6&(Maf7{N zzY!4)GAiG(u27%c2VwqEzq%dE2l?9$^M}?)d%J=cwJQ}-ykDQ@3Vu|cuSD_w*$V3; z*|3yG0hv9_wQ&Up8h_y3)VVZ%XpZrt_j(u!*|djUj`5-OznxfrsC_6#_qpAIJMF{z z%Jyts;gQKB1OF3PKUhES3f;)o;d&&nJ?fRN(1`kzYuG;g@U8Q0t1B$IP}k7!MkG9k zkG>Nz{+nnVei!Q_*#~7}d_OMA@B9qwJN0X0vHzmI+MCq=jM6umhwXvKA!XdzNRUx| zosRY4NQ+HesXuWWc92}?| zZ_rCh&c*ZAz>d6jQ=t=GAa%}I&MgrsalKOExLsAkt z-cRc)hlBf;7RIPCJ?Zb|a(FcDW@SYn#*f}-xEyv;`47PGX%Fz49B$J4wZ`;GTv}|q z4*Op!uT^L>MeETg{XJLuJHcq9_i63$3`G{r^SV1iZ|a}$TOS(QQ>}D@n+)VM4|?YUM>?D6J1#-@Z2!vzX40P9dNue=FYT1sM-F#)dwBsXu6e z{oCAsgk8NopclDwivca|*_e7j>jei2=-Vmk2zBwHWgfrg0OJ0UhVY0inm>f}v=6*? zIT&onHqH*9vU+UM4ChXePHoB}C0hfbHGX<(W#{Ha2!G@HfHl>#ts&57+VP@vXC>sU z57M8e34tyn5)E?kF3BzP9f}VRQGwrb6G@L;fKs;3qWl#W+wZv+3_B^mdjZtOrRm*! z2IDpW#J9}PzZtCr2kIx=1Vc_{7sW$ACG`38NTE6gIobE&9e%s%@7;poI<+tN@LkAG z@mw&JDz(#_;+k*N`S9c0@i)(>QC^=ow1uN`X^!zVvoM$xtAzDVI58Z7>m6-6rzfpb zfdjo)4<#7Z^Hl8|81gGE#P5$uQiym&=_uJC+jH+;E4WYZ_zL%BClnv-^xhgASf3Kc z)^u8v9zHD)3C&WjeL8Cn1E~!>iF-UV_so>xT13@LGlRY7!$5brVdQNKi(l@O4!zoH zZ)pW98Z4jYjQT>%(N7(lSV2?j^EZcK8(qHM;G-q1AUlTH$fpeLtZ}n~xx3EgJ7$Fa zicdanDP`auxL`m2tj}9(-X?d#JvS=Dt*zls#;U32$OS6%N^5v((0+Q0b{IdZ!&*x? zN^3iKkFSpET2sqk?<-?GZw_rCmBxW@EjF{!Twj7Tz?R)F^li}f)}iV!0xFDo9R zt=Q)%A(`d}Ev&($^ke04Rg@sv|IE#Ej}6Pq`lv^b5#oaFkz$((Q9>@+wr<+i%3E8KZBN&uSoF0=+avMatGB}gB1yVPoA2N3(0eQ@Bk+cCKI zxCQ;gb~FOZ|Y2f_ytfB-}YA{-HcP$OayafoGz zUT+3VE)T8_K)EC`aY@`>GZ5?dr154+9F!-teLwiL8Mx87?KXzjd|!W)7VW^MdEek` zcj7=j2@O)-nSoe;;i-+Otxlu-RnNK_Wx|EGFR zALf~ZSpQsod&LfPpAbBFu}yz-&@UZQ@D|-CbqJc6{+Bu2+-sY55#_q}hXw@gFbA>z zB|7u7CJwT;UhnYF;r2XzR>y(l=3ncV=C{ZBc;vJ%DAygd+ZTMQJ&5%kjC;YxZ75gG zzm>PTJutfiyi-CwFg(g8NCs~mS{B%E$M;!2Fzg>w8Hn}&*tWlF#VX3b>L-0pw1-sX z--{9m$q4H*1W>}r}3|W z6^uA>A!rencdh@AbJI<%!0X@;q4%3OD9c=*d!m^Yi1lyY)>pn-hwf?oXk!I^X#9iR z$314o^$u2WllBRaC(}6B5xGdesiHbCxeD`Z75aY-(~rvI4@_^`KQXleqf)2-PIY?o z^(|-Yt3>)Xt=1Mk!1!kbPd&fZ%L*2LtWrcFPb3>GZ!1{+?0b+Ia?>r_&sZp}K&-#w z@sa$-SY8vKyfNNB+6umW*&RF}GY;amKM!}AXayg_U8*fmo=xi(bFuFf>EHi3$ifff zm%Ywp;gSJX(39ERV0veCGVvdUJmS8@y&BW!)$jF$l~%YO@xS_^bzfDRZR-ssWRD@U zg+>YI1iekjnN1gthg;d^-`U(7k|u_`Lb@%8^^JVrRsXZ5Hyr-9MxF887TP%MOx?N! z-S55D_~i#%c&f~)GS>En=NIsaumiDv(3*vkF^hUb9J3#>gHy|HK4^-Z+3+o~gH*3w zLA{XMwf38!US$Vj{m@ybQ(HRq=Jm709#%AJ{9C0%Z^)$i29BRyXzb#M@uPO)nLUX0 zcW=DjKg0vW$KoEUM0>k$2NQOek!YhP(x1Cl{+kD;f2HQq zmPj1WrjB2zcN_V9xqVO&@}t=7*mUR(&HL)646^fp4^g?*ezM-M<->#-*Jimx5KhZ} z#-MwjYct*~a)+npP4mC{p`7MOYJHTE_Z$r#as!z8ZU>mLX;vHM2?eiuK zA7lqg>R--cd-*)9*#4L`FYlMJ{HGT;uuRAHht{jU!~rOJ%rshU30IH3)ce>5^Vj)c z=T4i=U?GJ+5ILQB7}gfLJYB3XorV47fH!uR-kX9=$>FMg<9h>yn;JQNYY9qv|H1ga zR3CS?f>6i&jpgB3KWfGz>ER+8+z*@-7)5K8Zd?fGG959iBZ0`)I{ zV|$<&Q;@P3-2Q8IFIdTM6-h%#yP^xn}vn< z=VO1s+DjMSUL1=9c8+MEoabLo9OO*>bK2){T)(IpYi*(SeUTeD)M+U!8yg4Lsl4HO zEo)z<$H5kwH(qjwMp0-8U4Z^kxjOC6`;&)pzs6?b`hr8K6KKbmK;t?k)#Va3jCk}o zb#_lBjCjzZu=;#ASaNmiw9d0a*gVc2_2{`u{l?L5Fv4q5al(2fn4ev8v|W5*N!EBRYDQgvA8_Fqv&f(dlmpj2Z7Q?6Y?1f55r+VfS!{fP234b{E(Ht~D|9eOc`b|{A zxk)~0FYu1osHB$$XICnr<HIsJ&mK zf?TE zm|uC-bc26k)!rCTw%nvDiw}V)%c26~%ox~Pe~q4t9628CzMUTfr>^0R2kk?6zqla= z8qFB2*9>id^6Ng&UyFA-x2#m2}&uf~s<- z;845+d9}u-Xx3#FY@`0GaX094@>Jf!OckuS@16f;csH2vcb((jM^)fL>yG2P!Q4Ih zX=fvGzD(oXgWX^+jdw7Ai(CyQr;l`lbgYgyul`#WXQeICZ?Inp;_`<@@+2aK%JJfF z8hEy7h#=tB>d!(;VHBCa4jm!@t=Y7-5DZYq1VK;j{G?Wg*}9{fMqLdQ4#WQS!~BAc zAV;3~Ao9@b+<*P7)%V4A>HFj%LL4ri1dgy2jFfvdbEXUtBy^s0xTUbHd7Pv>@;E#q z)S#!OAl6s6PgA5|oD*qZ&dEZkeTUjOYf>K+OhS1UwX2pEf>D;8zwqcY zhK-NkXixeDtK*Cd9olyk8si^mKbD@__owvuzxH#f&O0Mq5uOOz&&5aRaPNP@5abUS zhwc4Xm>LHzhkj9v|3`o4W3Ip4naee=I3LuSamnDi}W>z&oH^>pupW_)ZyVvMsDKn zdvUID&g5LAFWG;c$9KVXH%foW>KmMwU*!Bemun`RVI0b@@%VaQ=5n$7uZ3KmxS#7= z6f>>~-OG93BJO{`-Q2xMU)U&c_iGMu`5^8ebmjW6j?3Zn9wz^f{sZp6WO{aI)7_O#RGJ!35(5k`KYM%OLCDWB4rBMq-SVi z*%l<%`ZXQa>6s#q_EWdWK;ohQ{%C56B0D+xa@tG_w5w<`dsNm#h3UIpe{8&JiS|rn zh<;vSO#4kZzm<$oE6yD~!AI zT*2&%1k?|IIJx=m&kEz$BR(wqX$9io(eELJEo!?q@-v6 zkOTyQmeyu1ZDAJKo8WlK2z4A0^oaWeTl|}h&bw50fU0VI9sp#KeaX=daG#=n1)!YP z2mb5;F0>z14j|SSXdlDE2K;bY=_jtcYngq64R{3~)~_Fpc6DS=@wx-tnG{?3pgY>R ziS#9ha3Y6gZ#SKB>KP1WWDhjK2D;IH_HeYrqcd`=ZQv=5pN0m5SYJ-|2(Ak+}E#%T&!Q#qSEU`t`(Tyc!6s$O30!!acS1j zpX}sTB7ZQ_cwwHLok+@x{eER;trf7kkjH5f5_9wn&BJ|Nfz-qL%l z4~E=Hl6jT-)^L#KS1r))kDbx8hAJC8n|>SZ`snd=mzv;MyzZ%Wv5}4hdH7sk~|JeRL3P>V*q0!b5LTAB`1%Z0{ zrb!Z9->EGhZ2#WO68z|Fs0o_&(Rq-)GU!kC5Fr4y<;jKilip)^Y(6UkebU2y6mI_^ z0~hjNr38tp=b_0#7GO!|QIave$<8j$itoqoLM}?*GFmskVF50*?_H<_AiK(Q7T`vE z1I3tsw4eWv#jo-!*2f}y4#z_ktUVhrHQ>_@-(Pz;PF7h(O%g+4=N3UfQ@<0CPG1B) z|8Lo+_2#lZ^}~NO^Mk##FZoynS-2&$>YFc2Sg^t%@TCfLXf8V(ee#~C+FLU3Nourvv~u7p7BZ(Gyn8sYp% zf1h%3NeHC;?40t}O9{KlZfQdZ%%FQnTvYI73i>lH1R`jD(+uYew2wR<<$pAn)jxsr zgJr||H!2R{^U~p+z=q~2XdCV{^?CaDqn%(|{QR`>3qxRVG!AsjaBq<2tNk%P^xoGy z!6P~c))4*gYn49MDHOaY-3&rt6rDvL6bkvwPC^B>dH42Bzv=lp73|-L3qz5i;IOu3 z-t&J{aN+IQVx?y&>~7~>Ip&}WDrr5>Fcj|7xv@PeFl%RCR1p{oeOyineoiXLvFl*C z%M8O?-s!hiI1imcdzR6mux-vnjqN=peEOqg>gmCuaEaD8mZF^MlUPe$e z*hKr5*-9{Z`%3T~(uwbz@4@}Rn!V84{m%dF2a5I!#ruik{YLSAWXe)g!=8_W;Z(k3 zdbO<*%xFEdtrG5#9Y}X2?7!0P#3g+tT%vYp0@{C5e0BxHWU|jgd+AEb*Vtg#M)^Gq z?X#&oQ~}JQbtNm5%g|1C7{EcYGcZs>W_8ZD_YD-V4V~{9rG#wSd)p8MSNjI!Jwxt8 z>ne#skWP1Uv_X69ShKqQ{|bbr_0Ybmu@Z7|yCwwB7|o(`n1t;jUWrhKXP%gySt#n0 z9gKMZ?4@>os1h{iZ+^G@%^#|a(S0v$S3BLB**-P^F43OiOeN^h9CkDK!&7?SDR`#{ z*-2;m!|qF^s-ksDD5v(*)gSIt{BkfnVf@MYA^vdbef^N=@8}-Qo!5q9{iAh53l(Ts zx%7i=WET;u0_~}+yG#(iEq>*GWA zx>A2|JN|pho8MJnXFzscevnS}Z6mf9WGDI25BktL#3>Aq>TMH0usQT*%5_{PFeUv1 zesG7%t7`~oseV@mKosqVJBNUv)DK;GJ^=1eeujoXc7r|Nt;z$T8`(kPJfEePMIcn^ zqd!u(DmmsSCM-BO3`8?E!z4}>otM;CURjhy;}bbq)>^?NGDr z1U-fb8oT{p^WZ}7$PMk#ZLMDh7QKZTLx+OhY`55xuVugdR=Zw$Fhk!$5bKA%ixrM{ z9SUVSizI8AOt7VX4SC$^W$9zu%Y??`+iR{P*M#ACw2@2@>mP9`o^~{1C?wMP=(;lD z2H7bgFQ@TXb$g)|+5I7x(7DmC?FF&^zB`o#BYF-6-Si{(&cA6VjFtS%ukL~I*^YbY zFWU*5(b~K%%CqcyC9k~OP7v$gvo=a=(P1bE8)?jHCbSq8U%lHJ(i***md;E^c zrA@V*&kc=ce_!&I`JXY}R83qenap@?{H22>oR8SV|2&T+DQu-*X=*T`Pg_KpVe8MFR|tNHR-7yx|zja zee@!iw_nEJ#~u4b3=i+u_HMn~p@#uiC3ZNxwRKeowfzr8Rp$ zZO-AY^mlEY2AsR}VqALWu#)~Ref^Q;hweG{ImmVIesb?PV)S!+{xWvSjMXwaXTsm$p&+j=8xckB=&L{DElXZ>zaQ$xW+575pk~n|YkEchA1zcZU zkIV1pasGsVuUCDyF)MFUr%62g`ISsA9i7Oz^fu$_f0B6k4RGH9)9+V(xpklW_bQFa zrKg+l@@vzQhiC7|{hx|=vXQ$zTeyCE3CmAi{hi#uBQrTyZ)98=o65L0nt$h7H_U*Q z2Wcj7|9=Ma`t#L>akWVc#tI)qMR-Fbv zHx(-HqTRfz-;*cKdSciPwpW@IyWO!+HcOUYi;6vwA*z28>wpn2*mT&F1QFt+D)bWi&< zc;}^Yzpck!S=;jX*kju?Sauoh420%1zgvvkoOG|^PfH=ibZJ%pWW3Yv$Tk>y*;2@$ zaa?@`7#lA(@pZ8fO#3&f{QYDQ$kP9D8#7Rb`wn;>Xnr8x`2*|wxwIB)%U$g~gQj$z z1Mi&EVY0kv)iJ|My$ypJVBV#nS zrU4D1)~zPk*{E^%Dn$;B`91+itjRd=s6t%M#PJfxTYOh>ZCq>bGR~xRiO+#J&MJ)g zFyf^m)1c4M^hgCH7%uL#xMe0UANreKj$dL=PbqNFiYvq&UE;7PE&l2 zQ=A@{M3=GESXeWz5Nv`h1pJ$}_)|FGujJ@Yc1~W`a#R$^ky}yX;zMnXF~S7#|L*%( zA-6@)T3|fcS$k%%t2gk7tj2Y zt{BX@w8=Y?YvV?6F4EUTt!XNdJaHM5Yf|?xuFkIEvv)AA`<%(;B7Ho=!{u|=vwl!A z?-b`tw{gBAh5KiLH3mAk(qFs=>S$a$Crf1hDB za*@8I^@!%=UTw|CeP}De-bcEe-!(3gZDDfV+nYRmvHpq_7M^-qIS(%_igBs?HLgFX zUK{dX+8yt9#b1d?UxH_<1^lx%v6Q>-b&ZE7`^fcG9eDf#UU4qc$MV77_m}}2=V%rZ>w&QZK{_pE~`6%W3n$mopeggkql*Vi}=UI;2y;win ziHBDu=kiCr8JFr@IS;JQxOxMg#lh%kMEVll4xZjA{f79B?Ui(NGM6U>vi45>ax&+kMOm-{Eu*Ys+}^&9Qu^4j)%_%Y5mUE%#fCI5cOfAm+j<@wckC)f9? zX^+P3Wn41z6@R~p1zaxH@3oZY??L{38ck?A<64)R_rJQGm+!#U+&_O4u3wVQ{o7;B z>Wj1uUnkIr{FCTC;Nib4;_ladVO;8WlJkNRp1!NjaxT)>SQWDO*L}Ig<;&MGuJ$Zs zTr`kF{HHjY%Eea8I{bYNT; z{FL$9ai&iFg3Cqvl6gkl|10G@ysgoU>o(W$@!xp*E&t5rV*MspczQ2)j%4vyJ>liO zZv_ueKefic=bU?VWBzHQ`1frzv+cS3UOVnyibrqgfMLkJVIg@J_$8o-;oXZ==a~`vbk3Z+}_DSQuis|S6y^P5< zEiN*-_WS|PuZ6Ps>z3~3{E9ITf9wg)LqeEb?Nn3#gNF0;eg2xuJ+AWfIbDyJr<;aM zu09*hc~1{x0i&&c*p2gmU93DvD{L6oY+B6oyNs7-iBHEyq%VEX{g-sj=lNgTejHyd zVgAjosqae$u=~`sHQ#^2`|Auzu4~NuOUZcO`oy(G&Mdx?**lnA8_mZ@lKvN2d6drN z<3-5~6IQ;ZdUd&eqBXnkOdEHO=l{YYK3~fJjdPj%z5?`L8q2@$Bq{UBS5SD0GrT@n z>pdg6T6>%GT$@WI*U9+zASKW5-5{wQ+u& zzg)`i&ztj%&7UOJgP$`0Tb6SEX#~FyZv92hlc$w1`K}|J2RZWl`!?<2+{67HlXsrT zxm@ouacy#c&Y#EV3MhW3hjVV2@`AYTMm*;;cD!Ny54$C8s2g^9Wd4iP>^@Lo`A5?PxC*9-x+e|)xEOB7zt+|rH z{Wsz3$eQ*ISbHJ;Ihf1E?Pbcx*9v?e-Sxk?e_L9-A+GL!fOG4a&lnHEdt~wN60v?k zTy6UB`m?hudmrh+M9#anVsh!!5uA(lAKk7epttHyd|yQ39M^!j_Sm0X|3xz<*QO2S z{J;7}q%S=^iOb^`F@5!hahz|x%+gP~egWssasR8I@+YrPO;&OF>9M^2o!iEFm-akA z{P8Xu3`Zi?KWxt4PkSEko*}t;9sa(mb)5gPmgnyXynBJ8N?a=O~&!ONqb;BUu1?^9o2g6l5 z4`LDqYsXm@#-jX@YFmo2X&8w02M0D#PHKVS(|PiyVPL9Rb8_Bt1%&#YD;y;YgXHV0 zri5!0kWThHE@2?nKS5`%4**=6h8IFO<9pKj`78MK-lepk?->kLWbc7(W?o3sJzDRB=3VW@j**H|F>@+-qZJ)jM{E z`J>Kj_8rA@`s$z4!||J^rNiKOYXcR8p4=In`7s>g>71Oc3TDxH$9kRdJXD>+f0~3q zP&>cCclZtT+I+c5=U6s`L%@aoKOJwYz>W6J)52jt-QQ6U&rPs<hIQiJ(@8TGKCE{z=%?d-@_46uh|lM=$6=V>bl$is6yoy?0*1R|`ZZ}& zcnZrGyGH=OnRbcndRm2n8O7r!fDPSiXolfGI$0z|JIdYkjgg8_ypIsiRrv+O-uZ_m z9*;ud!mA9Gmlp3u_-p#~<2ypZjO?S(?x!1Ghp}j9D6A#>D2yMoBe@g`DP*7WTQGQ$ zeI|yVL-nFZFr=ed;`at&uduXwXHp^!vpv%CWMeK1EtL34A=FZfZ&LV)7#|A1V<6sru%`IH zVZ586)qx*9tOH@zs6drZ72btF-v+?DvhUEonQu5`w+}U5gmyW5(XQmBEF5I4J@ElI zviHQh5;D+?x5*PP{HDOeelFk9|C)C9;OF7TZ0Q97dl z7qY`@*agJxC3XQ<@bnAoL-7AFZ}3n7e`SO5jzIhkSv7Ge{Te^OGi(+3pFu|W4?eYh zNF)4zQ$$NdTSR+=6~YeTgpeaV5I%?i1Rz2X;fM%?8WD?#L&PHz5Q7nkh*5|n#5hDU zA_bvA%s^xy<|DL-Wr$3~T0|CN3nCklgUChfM;u1vAq2z;#2Lf|#8t#i#2v(a#3RI0 z#7o3m#0SI|#CJp$LSGN>wLsKIG(t2*v_#bI;aKATZ4eF!7la$a3*mAs!;0AYLHeAl@TBBfcSiBJ>O-k~)Zbh=zzJh!%*}h;|4ILBC-%$ z5ZQ}B58(bg=mXt zkFY}6A{-GN5$*_Ygg>G)qB|lE5syef3`Qg(MkB@|79y4)Rv@+`b|G>R`w@kRlZdm3 zi-;?T8;IM8dx(dKCx{n_H;DI$&xmh`YJ>p}{OTeaAQ~f@AzC5YBHAOY5Vi z!W-d_P#{!@FhmTZ2ckD(Ku!IAdaj#*#Xalyx2zq|#9U%L;lfkKC2}s8m4D`3q%S#n zn(52j|KaZ6pJBW#^$X)L{v`Kr{bTN4q%Sxp3}WHC;$58>4?#19@!EDtvxD=pXCWRW z*EAc#!H#ob2anHj{vA}YzVKZ-Sip2FyRn7qdp6;`WFzB}7|yd? zxm>I-^gqPkXW;M}{bK$;E4y<43eNEG*DvGpHU(VXXdmPMalhMug^%Tj%iq*xJS$ws z-G9~N@mpZvMei#S=|kBY9-mDQnLb#2Vf;V(mO(r`)B4`zzYxcH zLQ50QjX1~eQZNo;{m>Q6eZpcblS{VL@T{d=ziSQu!MPyP*UTOpD`2=~{~u%T9ago{ zy?+NpMMT{yVpkN9q9P(F2*OZBMcmj0%eI$e!QK#pqGI2cV=vgdV(&PJ1^c#RIf}g> zD^`SevhLt{zVY|R%XQ`Pe4hJWYbKLPGMQv2!)ACZ|~7h6nz)@KSh_-|sZUp_eq`jT0s^Ej0g5zk}#lWw>(ALVQP)4L-6)f&>h z?Yk3a$LxfEHT0B;;(LT*{kX=@Hugrm)gsba@1DddZxPRv4dAA$p=f9>ghI!CJqxzYgO6`G@@cNImfOTi_h%cJU91 zwf=8CT;TuujCAp=J<6;06!CAJhz;ldCD!`+?tJW@Yb76L`^v?n-dy~fk_Q&g?10Bv zkgoN!4-P1=)JN2Bybkr*))D@T<-s=`h_!yXPf614cJ@K}x!~-tV5y8X>4wC3=o&xo zZbf>EVJqoUc5&oONJV+(s($1MvDVL>M$`J)HON5zzToUrC*XGn8@8MSYy9l&UnoBy zi*zXs`(O6>$M7c(AiaRKejb*F_HC$Zth7Hn4z_D=1NH?Qc7e5iZmS+I8Vo1D>@^GZ zIWhzBw-bpCuZI$A{k-06_%AFcUD`2+*iN~&67`W<8;P@T?MA%D&&y4Qem{0CVW~yU$P$MdA259{Em&-uFxQ2>HcQoY?r?ful0{o%d-^UqZIk&mTjs0 zlz-!iyNa?e;a$mhlEf(E8|I= zyIo z_4qY>9I@7faiG=63^8v5W?@OzCVPTA8M zycv4R^>)O6_^X`hz(u>-RjP>mbN7OGwkMW+b`z)Ei$p$+-}_Z_Dlfa@Si~2J1b-a` ze{tw`Hu1z-zdWW7`R!&;Lj1hGVAC{WcBVJ67(XFj!(PzMXG8zl6MEmt&^7sF`%vg( zRoxnF*gS*$yn7JxWz9pp#?L!`MEuY=l;>$q?O_75Q21_QWZT5HHqGpr8l^D zW$0rXB3|Q{JJ`ejRn!R!6uPptv2=mXaHrTXaa*>(*`m+q_me{>`G9|TjpU9AA< zTEDIc^g~UcOE%y|zQp_~bVKnl=vx2%)!1Ik`@p~FD%iap{5Mm<+xvnwe!2M4E?jIU z@f-$IwwL^RJ+a<=3b7$&5xDOl=o-Iw=UZ)*cwH|UdKIv~58|asH%K=wO(tFImj>NK z`A3$aypSivruyrUFa18)ZY{CaFSVJE_+=a6KfIh+|9UfV)-$Y+sd*aWHGUR9f$C?v z@P>45orwBAc}L7L2sCnLU>C9yF;M*I|W=vSvAUgMW{Nzmh` zK|hyA^Ktrvvx#}R55&4Fi=k`$Qrbh(jkQ+7fBhw~>HczJ>0&l`{9~}jFRg!${Pr)Q zxA>P>pZpEH;uZ7_U%(na|MeaIKC^?U{imutbuO`V>nrJ|^7Dzcerd!<#E)4CeMt_n z(P;(ZgP@zdH-I&MY2h34>t^pEoyEUJ{O4WppGN)luMUGXeyRCz>fgGuMT40n{>G00 z@3#bh8%AtQR$1fcTW(T$rhosX{A^(sc-=cN$NZ~ltN87|NU!zta(|=#-}gcHUxNDQ zq@w&m%fYsLiDjF`#P9FGFNZB8)*n8KcuhX((j4fa8KjH(CUE6!z8e2 zLyr#(QR*+XnL(_pE#3_xP_{?{-xlu%QCRD55lOmH51rp@4Zl5DDqWx0RH_H^Yy2!z z2i+3C6K32Ad|$jPL@Dpwmnbg&)!#`bW)Z!Szoq!CzoOer1XtEW*W{CSBv8EZ*JjfB zGO(^#3gUBTBY#W$&hrm{zZB?)#JdATd3?=2u+0v{%NvPxoA5hTuJJRgYor_Z945c? z?k2J66j*L@3Hp(1q-*`We*@^XiiA=5V||Fl?_j}i+>y`O8m#fNjV`3?R+b~3)pmow zMO9)s-4#5^0lLP|lWh>+rJ9<*GO;nonOM511Aq1bpSD82#&zJ=#7o6;vA?-BYO1V{ zbo~Fa{3?%heF;C(*x1Yl5Wh{ zhWMMy!5vS4do3o`P1%ijjbCn=g8b{8@Z8+hMa25l9P-NvsnFk5g|6`%LiWPn)r)lY z@(6fwV{qkz;D+JETEFO#$X`+1Qy0z2`=1A2Y(Z?d`TZLVjIvOXwOu@1H?_-KID-|7pa(O@QCy#Q)(x{P`#8dQA6yJK|4oAl4u549@q1 zezujW-=XoVF1RrAFT(tTzE3q``5MOe`o6!Y{n_mDU2IZud0MVNzELaixpVM8D@-gc+=hJ9ZsB<0dK}}8oPO|MJx{D#bcC39-+}SPTodBk zG=N^|3iSTRaXf7vkNwm4AokbkD~Z{|NMc?4qqIKk;YU3e>tW1E#`-&cA~qenjd<^i z;6aNJZ=MG3K7g2iIZv!V+!lK1W9UKch~>IZpbrUyzVZXuTxF|5s9!%EKjcqEi1n`; zBmXS*c(leB@n`K(|CL@~E3j!Bj!(S56Y@23Li|$?VNY(!)mUcWQ z)>jel5LVjrS~_^X8S1z7Df}hzUJU8-31Z!MFZkns5u3)(4^rYKZ*{y~^e3_0?Hl}4 zzZ3H{>Ug$f1C&3x6zzYeui~9&B3x>We8$3Ukl(62_SZw=*+C(*#)#LyJ_}ytM7l9K z1@UDnAm4Q4lUKV!pV$QYt+K?r_l3dhTS3qDK)#p3h<`8~`uMis`gMrq(Q18fehO0R zBd@k4UGH|8n9mIdSG6TS?^p)$YcolgdZB*Cl52_O-MC)UnH_-cTpIog(~<99B(d)O zdgz_=(Vjm>fbXlo+kTL3}Y>ujx|CLN9{$ z(*G<3y?$MkZ>m8or@ImB18_aWlM!$FnuqJ5AY9*x-`Nt&tG}Q=7jmIj#PyUh;01VQ zF|4oMR$|W3{<`c8V)1@M=%>~|KkSb3hs1*)hY;(2&Ib?mgI;X{;^WJKrw)ZaEQZ+h zwJUhucLL}nRH$4kEHXQy~M_`{}9XbE+BrdIv-L-<+}%< zcfEr6vpd0cE+XFjZ(?>1^E1X7Q;GS|Ea?7;@aN)wgXvTdvGlwuo$qvm#CN|${bZvT zxF&S=5PoC-%J5(6M!LblE`a>)_27?~zmz)LgZoUO`xVl{D#UgX|IqyyKD`!r@-p1- zD5Bo~?S2mTH)7mL&mKMi@iooCJtJI0uu*sTOzGDPCCJ;-L%xJu3 zsEhek-tRr~6>Wz22eAzy4(Ox4lKc4c$3GLN? z2Fja*`xR2-S;#-FJn7P<%gDE}3HW{vc-D2qA1O=YVY}MXiP@Qv;E=uWR~$v0{XQ4^ zL$IM7wg->hMS6-;7x+i4MEn8VpWzoX5N~-9`ptvH1_uN3ojC+f(1XoS6Q@j6*T2wN z=m&8>j;$=wk&Aw8_xt{1KK%7IpnXcakZuS#NG!+WerNW&AU2K{~NQ0QYvqCeM&fj&;W!%@_WU+hV2D31Ln+pHP%zF)Eb z?mbOx*I9gfUihWf*~BTo<|6)er%qh_-`UsI`y=z`6Q}HcOzk5ch$J=?FG4KOsTocF zi(85L8Xw}6)$^$T$s3mt+m+#Te3PO}#3=rhWbtgV_*ZTRwhL|t{oM)D3s~+E+*$G4 zE&fJ2yS|55_f)(RM*RONg?A9geu*LecD-pA#c!G{-pMQe%|6wYxL|%;@>xS{Oe{j@ z4|%s!SH-U@-3%NFHuacGEKMFwx=DGLsPIYZ{F;7lEHU$2MY^%$31Xgs>uY233*yDy zWaMvulvu8_gjm1$2eDo6If$Q!`8@V88ob7g#wUgsy};K>f~yQ6)*r(7$gX}OacoW* z(z6Rsg?<iOD0@T$8MC6|Hv_lPu(hh>mi9x*$>`BtP90^ z-S{5$dbqGJ>5}DJ=$%IsvyC5!jTi5NgUfZ}B9Ez5O^lc4m=hb_4iFoRHNf`<)Ac~g z9ZO=pyc5?4Zwls9%H#TU%sG^oTLCPdJXP}R2D%XQqsTAb8%Qh{GDH58;x`AvCw0!J zc>OE!n+Ab~5ac(N0!yy%QUCoN5#OW;#hU^LgL6NUZh9X_%(_^i{83|&&s3LKzt|&O z$!}K%^Ch}fHpFuKO^ElwdeX$2EEb> zu-#+m4c3C|uOl|3Y$P^b-v|FJ)W;aU7;J^@t6z-yAlB<7{Qc)6pKrl@je7ljbw26( z4ycbI1pQs#&XLxSm)nl=z5R*B!`I-M5zsfO<*V~4(tR9XjRmZkKS?=){%UG`f$GB> zVtz_DTl}U%v@dJCp7QIy-yqidr83yQ`due+J-qV=`Ni*Uas7MoC$WC?8Dg!UCyt`= zp2@o;9S?Z&2x9#ZTO1!wt9tpe&^3O!YtKGh)Kee2jDELS@H+{8_cg@)To2N9jn@-v z{ro~#(#3lx5Z?!Ey0jbq%blUe;&-E3|H;$c`Y=h%e=Ipm{*ZD*h;>#w!Cl%yUvq_6 z>zDrNNcl~J9*{15i6++Fc>zAv0o+^tU9rY*F-v&|tJwdyJ|w?|eN&Xz@fCPZbLdgX zr}azyCCaZ4gP$$5AU4@Org&bsD0l#LtzSB6h4^mxowFQVg4h`Q7WMzf5_;8t!CHUu z#?Y%Hp7rr3Hm$(#q~#xB$hYhr#mgGMys|Xux{sfrbCj=Fe>W^QsE&NW7ocnXL-Kla zN%WsaYW^SQ|@dz+Y9Iuh$Di{AtZ zpVluo_=@tYd`13tDsTP?zf}(Dx(Pp^Yy8rVpQIZ*sQwmMA7hJO@NfQ3y6GTT>rZlv zLjUOgk^CXKk;MAqdBhgI+YlS|---Y5tG`3iEkgTS%trq=4$VcrvLlh-rBHKa{R@7# z^xOZ^3*|MjBK>4c1C)0@hvHA}cLE2zM}0JYslHhsF7g_OVg2RGMTkw?i&8#5R@KXi z-&hHs)-O#=Mf;{gXA4$>KcPPSYC5=gDa331A?bO&mAv{-_+5IEcP_DMdp+_eEz2j? z1&iOn3ZK?551L8&jpZDXzv2R7ldc9aFEShK>Orja%eRk14?;XU{tA5AjQs5RDPnzB zFT`v7Azp`2e+Nc-$g~5fzYpqbVZIG)*8sZ4&!+Z8d3$}3|89Sj_cj3hsSmNSQy{U{ z&kb$hPiaBA^e!6yf5X6Q+Y#&TbR^dLLsCys{fw=mNe}6E0$ehN*djCoe6K5XjbGXp zjQk(rmrsO(+pF&pI1xy!|J|fn`!;@Z}6T9ivugLzKyz(Kg53yv92xNTcP!Hw?@dH zHU{}^)cP7Gfj46Pj3ttY|L~tLO7S{}6w=x0n#lL7KRCEJvFZL%@bK@wxY!R&gN_r+ z+W3Q~N!N$nAzi#D2>yvr;eU97*f{Jp@@f2X>@~!DzEb^azH49LFP}xa&i)YeazDZH zM)0BAh%b2!{?pmuRzIL?@=GN*k#0PO_YcccH-j%&;JqXRcM|KSLf87Gk}Kg?-#aKh zQ+bW*ue6bLQ&Mr{)A;3<`;kA(igflq1NolfJt(|L8sZz{JwSi@cag5Is`{_)1COaj z`M44FH{Ewo^Z#GJd?f<;o7IDVYb#>oP*3>Jwt&93KC#v>_l+W5cf}ok*G|Oxa(Mr< zyf_;9SODTRe);NrgVLXja}A_R`HPA5Lq-wvi>kh77~(a49{aK<^~bM+p{Kth*6l<* zTlEawR?VmN`whbSl|p{$)?eUpy-=TZy;QviF`L~TJfj`T8-wj_+S?5J?OCMjtBi)O z$C*F-#6~yx`LL$My4mAN*ZSF@0cfApLC}W{0XwSY8T%6JdMA+nhyMwF zzh&w*qEVdf_IhzhOQxo8k#hoDE&$XN&$u z{d|*DeHpRo<|JbAo>Qz(zXiyr@yjWLNH-Q<2L0t<$d{(p|HN?cK$SIq>G)FUrlBbB z1nOf7!uFJcHX#4l6{P>+U-h2aOW!aV@jpJJ{jAU)QjHJ9rtPbsYyCg#WBa~D{pH0C z!JbRuFRSV?DPXNX_Xp)S<`_xm*Ndv{Kb%M;7 zXYre{{;jqkzs4_bvO;}1)=zqd-#40yZbp2i;-u@{b`t;L53)skgFU26r%R&#UlxJw zDk0u+4p`%t9rvKV@34L1UhM@pQ_E|quFs^sC|~1e4Mt=A{O6L+%8n=2pV$Zg5%eE@ z%XQGT{+&1;7(4EU?%N0JW4@nQ>H%GU^#F8@Up`bA>$^vd|A6)s?=d1j|ApOLyZ*zE~;5#n`!KPA@s8Aty$_J0h2 z@)pEL;XS$hDfVCS9SG93ezx&0_MfNENS8L;!TvhnBY6K!m0uxV`sxMIEwf-7MQ2sc)M~`nkMyx;l8TD6n z)BdmjhmESAt?Uke7py;@+Y9Wq1pd>5!K={!C086@^>eOb{r1F?Zc^Wi%v+5k7T>=h zU4AqYyanqcjqitixj#tPmM^bDKBGITOSAsG@Fe8>SR8)O>BRaB^iTF-A+gbbd~(-W@V}@-er-Ijh4Z!P9o`ekZm&lC zIQZrEXbXZ(R)DTPe2;0q+O%FTTY3Yhj4j`k$Bn zOKbvP0MT{46t!bp2iRJ)BKjfZHLS9SkBiUPpedpUqUSU!Osj%hkj6=|$9Ea`wUX>HJ74 zU+b5S;d;WKnVlH;^}^`r6LXa4vows$WB*7qC-XSF7l z791x2oxeJcPsR-7XCrYuF)df?TNcN2U1LY&b3l2f1$DuWVBH8SK6pzsLDT zKRTcGUv8`FW6+;?xr?Y@?E%QASszwgJ%2VzAe}G4@xj+&$f&pv?oR$yrX+E4!|5xS;6a%E@qm$RrJv#N{! zqMrtTyc4*i`rhTBD#SX!`J`*(r6|k~8K2`lu2Mit>OZ;`3*ozey$PuG=A|iB>3NNCq3@fb>zEm1OL<<#9Kqx`dR1O(1&b5 zdF~I0jZSzEEh}~hdRLV-ezvm~wT~(PImI*A`rvh6QGR4?V*On(Uyb%RmH!04CSLAP zoczYm9}r(zhxoyW7x%M?bw6^UYy480J>r*qNB%4e#CN?&%n#cke$`Q8sdQOlQ_N2I z2RecqZzGl$S3`V43iP(9zxe$L;w5+Jb(4vC|3-*kHx=qI?6QFkrAl*21^bi+KgDYX&~a);CLyXzl6S8JwCR(K`d4Dgnz;sVpiJ^ z<+r*D|6&hfeVLkaH*)oefhR^1*qi<8d#VShB<%R@ecy8liq3SH~}Rie8x zpQCU82KA9k5$ir8pR~Y=*rYc@yvEN@ex?4e%fx%R<=A)Nmt|4k4cK2zKEJR&r~ZMy z2;c9LW<4V|F8zk`+h-Gt=Uc&ja-mOG*GGPd_>KRN&f<~J_&6K>TB_ygtj*hRWjAMK$Vq`tTOlZ5u!h4x{g#fgo!RnQ*V^{?`?2es!R ze19x%S1z&cP6han{vbAXuSqOx{CxT)tY2kk_*-RS{c{j6Ke~bX)>Gd@?QoS?pHdrs zO}tbA#{>O2H`1jV`6&N7-n%aM#qnIX5%tyj*&?;PnKsDZ{}k#oROKQ#-{`B}#P;wz zi~2mmd$rkA_4=vS4$}F&%g{q7!Jl#s`7SJn{uc4Z!dr=@)>uE?&NOiS^YD*7h4>FR zp6eD~fW899dsBT+s_!55zgPkM zX^;7c>^AWKDhED}_Z%~gpBJ&B@^m414>6mH@^ld+DW16&M*WtKK|Za&=1purwh`su z$s#sYzx}7`7FFmV+@@@AbUgMX(o>lwLKGM1E zC6w1;53$@8x|mNvK8>FpEJJ?Xf#=Zebl?deiP_E~;Qdl06UU`L{LYi1kNQS_{?Alm z<61D|3B)F+!W1uQ{f}kD*ZGd}LgEm=)RO$X);MDQ6_qu9dG{pvr{z<;{3#y!7L}!V zxdig*B1(WYeqJsS`2#AD&a#rgT~S~5avHJeVjXEMX^SAc!w?_N%!Y;)6CiosH%d81*=SA^aKigo$_0e9ezr1P+ zxU@6*ji(+!HBPo_R`lGO3eH9K>qT> z!5Y7O2<>NThVrE#wLQA@fxo|c{JAiKSnHQPo1=d%O=+bZKjq3`T>_YoZAQ9&%u485 zzlPbO7U;ju>&VaFBcFcnn*V1!e;0}JPj7+$O$6ex%Cn zou#V&lkiJxp&M&K|HE&E=Oc{Ec2c~&sx`{10_Lk?P@lQ`kYD3xHn=~lPd!Dtl+>R3 zi@wfT@No71;gTze*ZQ9gM*TNlB%Lood+Y9G5zB*YNjJK$N4_X^z9nr8{7d!Fd!|Ak zpw`#!2;$R`-x$6LdgJjZuTnZO3m#3Z@B0+Gran@I8?-)|%ctm-{vml>B2Jz+gP09G zMQmAnHgt_&-nR1Z@DOw4zp##& zIoN==uZ14_mG;l@&0zKl$KTAJ%i&+Np7iAYtKc6~jo9-01n`d+bbXLKZz{M#3b;yt z@ZD9!CiVOx*X@DpiE=o9%f~^*AB(65u8#2|cf$3SY5g;tFHflV zhfXyHcN>TCa=!=~AIswzvCg2*4=)OZJ~tcl*SiD2B{09uKer>+PwkHRt}p}iWEu0< zrz63iaX*ru$Ndh|e)WFh`ZlDSp5gu|+Zcm<@rrWd)oUj+VJ-*Y! zEmDY$uhXGl-9>ClQs-wLt%ZK#E%X!V#Ck9FyFx1tLhl$v+e3PH60ENXZfzpgMc_MH za+fT`zj4R*JNydzM~tT#$Mvsq&veq+N?gyHTHw1P{OBX(SFi6_#4ln~(fZ`)^~|aN zi07}6f7LhS_fx;CQok_i`j$!XPb>woY!E-YGW;Hq@Fy=O=0`_?M;`}Y90LELU&v?QA8daX`rOXMMy6g*NU_j;EolE? zFJ?mTbesHgiEYHDr~e{8QRQQ$vA@myrRKxt6Mfh? zl$X*B@o7e4DGlQ{;~I=N#P6#~H=dkNI!nd)OFw4-G2g0Qj~K@y-f}d`TeKGbUJDUl zt~d177?0^3^oW1*H_GqfLoEMP$1g6v(9h38{5D+gNHs8i(a)(wI;%Y$<*mf^2m1)W zX#}p%BhOU;!oh~DO7<9Sa6;(e^@yzNUV!h)kVs60vi|*xG z@W}^=ACK!fxdi5m^a;4$kq^Cu-U-(eyxd#ltEPUJ=kx_)V@49Shx9B3{z>@m4li*6 zy1MF{s=hk;_4>GDII-Sy9@b|r z=6`jo1`^AWn4i_9Hi14C=WAW2j##`;mEuj^x6tvE&7O;V)ztamX%~rgSG-B*7p_CU zT?5D4yBUaoRu%EJP2l-?IA8c(1DCEqx)hy*_(x}0$r{eZP7?_nSX->sg3TjwIH-NCKB=3vP?&f5Z>o;15&3ll;6l z_>ua&%s}<|xBSr*&!*yeIpf>D#5}AY`AvW0JA`uk0mR06IRDB^e28`NAK))|-evV8 zVyRad#5Y&lOMRY5H%q-gpMdt(4eXBg&rsX{+7z_EMH<@oE9U3oc5OrZo>S-RR_rG> zcEo(TGAc2j@TO-D0p|KKf6Q)5N-!Es5n5m!ZE+fPND1 zV>jKz`HZ_=AzeRYA^ZpLfmeGFOC#_+kZIjia9k+b%e4(LtBmcd3&wZC7!N>u4M%^M z=i+%X<2;c5jh%$lqxU3c*WhlSdabosrA^c@b;8~at&`F$F z`fCmRtq*{m)%tV>vj*z($w#*!-U{t++_IEdezS$*jnx+r^KRQvUS=}xJvJ=4$=(!ioLb%^_Vsx*IXK#r%c-(*SV2>4<+U6Ek~kZ=m* zpzGW4#RG`Beg^WF+5lY|K^)$49rPkopwFC1%*@c=Gbbex%a!erzw|8ltKs?DaN}HJ zDIfc5W+1L-SqzTf$;%EPUuJWZ*X|hN8#O?_P0PUsJYQ(}_Y(L=SdyL`yBT^zw6A4t zk; z(v$10B$m^0z74N4mzde(dGh4XtHD1D!Ee||%ndkSTbA8}_zpO}CofU2?=|xY{D}HI zKoeISzs*1YpX(pyr{1qL$Ney^|El<6kY!hTT{9O*PR#N zuzF$7;U) zd24C@!oS_*B1@eazcDmd->Ed0%1kk9;qS`Wob0f3`%7^iH`eTnS#8D}R^l;oasF-h z6`z21LO+q?+_{t$x0}{AY*(+^jGYPHmuy*-J5Q;4dR;5wPY$*Ay-^r{w<_Smj6H@pqVOrBuwoae}- zG3EF7shx4$*O2(-=UsAzfi8hiMxFxe#f?5G_$9JSp-iVP^bGKn*hE^4r(&tTnk?3xb^AeEPwtv zq<6;B_s#jH)a-cvHQ;yI8@^3B(YFiZp1Ij^uTC`QuFAjj7*84Jk^f?6bH3Q-S(8im z8Mk=4f5MzHEx6pprpfzufqdI>M=9S~l&7vsok;^Gm7Np8EnKSvy7~rkDM+mA>j>`d z6%^1WAdnZVdsekCL1te?w;5R_psia=Ztk#@m7dGEykmmdu`(@r{KL)}*0UIQ9^q(y zqi{=p=fw?a{bm%x;X&=#K*Tw#~Noi+?;*0rq@r+eu?5jd& z7X`7BNxjU~KfY$!uf$<(LYb%X4aUHh?2_`$!&;&2q|*L{TC$bvxcOoh z%B&YjxozSj*u=IO@vT0FusO2N&JLeYU5a2u5PfGmx=SOYi{d+A@@UBe4l3W z(h+)AQ~73NY$z)^ezp8|s-E?j9FVarA(TyA_&oieL_IUFAofQol$l?&^G{vY z-?$X#oAy6P`1M&B!fKUYp8nvco^4RRNxCS61x>TeuUR#U?LWTg#H5H2c2g-oIEocf zzO5+QFVZ=L*GP+EC7t@`Ew&3~c^5?fD^aXWqJ3hKilMAW+EMA}j41Z1&q1?Q86m8X zN$hBMqS&L?>(c7X31L@b3;74UiDCm<`R00^31NfNXQWNEZp~bkZ#v2$zqh?d`(^nZ z`iHP*U2^2HZ=;yiYw!3?WkmV6BGPV)@=q(@{;V#z=85B$i=)_Xt7T?x?jfwqD&O?H z-BB#ac1>`0=MeVed4F!VEQ-y$x;~-V6rqnXNDl``vGzK@v|~{rtaaH5{^#09G1tRx zx!FBL`H>;V_qT~+SCwxoRutbf%}wXSYelg;qsqozvkhUdl>W@3Sgri|$Hx~J>vPhV z&#x)`-NlNZ31-oAub4Hvr)Nt?-zfC#STGBkawgt3OV1*eZ#*sw#`Y>2g>Sva2Q%9~ zck=$O5XHtR-|DOw%+8JVPJH=6_|<)3%HdLMfEX{tE8qCs5yY0hjSKJ+fFA zi?vwv6=%hXG5GJef;^IyI7hjQV{*Xb6N&pHdF+U08~8OxqaAW#t$?#)WD9o90YSJF+a2OWvz>X89t&(xxeq+)g(- z$MuH`>*zRlf^kVCH(iaq_|(>wY5fK3xj0swOM+e5XS+_^Vx3s;L*jfsK=@~d`S1Ln zcF_7o6WkOn@cMf%a&aE|9|#$W@W^GEEbFMSI~wf#DA9- zi>+8Hi>10)?8V|FmfB*eCzb|cX(Se3u_#G{#j-#w;!hqWi*=VIUbJEjCjMm=*ATN^ z7R0>cbnqx|Vy&N7or3tVAkwAfi{PK&f%v|ELqFdVtntfTC&PcWDe3I;JaE}o#B#c5 zJEdOPF~rjH(crT~pljmgNBv3XlOo`s5fAPg3V+I9;EWMqjbFSM5&2Vwlg`hp_0R1A zjzxWWzjfeRgW-R%mslGwSx!TFIhpWh%mfF3*`Q>^dweEcUNsQ<+j=#bSZ{5-+irI}ZM^n&26z56|sEto3v8-EA(3_8i=ebiP|{k0ZXsQk2?$tS@wpUwlWZ6_-SO z)y)Y`>-7z;kt|&#BgE|NQrAe;E-34OSOApYm)SAIQWB0gtnphqzDUQG7L{f#yhLa6%l#axQTzu7 z@Z!c*6y{EI!IiTrDJ(zQ1)g)FJn3;bKFJ5J*pTko9D2{QWr=NYJeRG@l_3`MS6mW# zn%2-MEItZMEDaxTtuTAJ6&#aVN@3aFjQW?9Zc8jZ^B|TT>`Ig0*Bbhubc&ZBnuA^L z6;gD`X9x0)j3(x-c7uyIG*fiht2_Ma`J1Kn28W32P$mBBeDDH`7sMA2fbY!sMBMrU z_?A88mkZ^9mn|tm`4?XWmvbwwFfaU-Shnz^_Tc6I0jHj-K>F@hw13JS+7QbrQ^A4H zD=WJAqw$8^p)745$-Nqwf33=dC||K5=5O3-`%2I8T>w_erKX}6j8_=%0G?f){LI{+ zjt5eo5@1ON&&b2^Vq9A~ez45z;EcWC@4txI!Yai4%@gS72jO_&Vu|{Mr(l1X{T%w( za#TK_mH|Gu1NFJ}0Qn}r#_^-daOj2aqCR$bFEroV9mli4InetD6U&DZ5wDEDB+(w4 z;~yJ={fT|7P4$s??WO+8R*nSU-b~EPw?KV0{#znvTw(D)3q4|43s*fq+2-l^upS;P zewg?WeQ7=K#qQ)w$30jf;ZerR%AbW;EXAUYF_qetenK42_kR19^d{DWZBxd!_Ilnq zb=H{O!#$YRf3*9WKn{$%&Qv~b~%y_&nWg=m*_&I`V9$DvPMo(knk6^QKfljVgMA#B8PH%$G}Aq9Bh8#zgK7Oc zl(~Qho!BPjyY?wHS;;kVfyM84VzG9+!^_U8$@<)$&!7F%iRtp*e|oU2Ce!-O_f5|; z8KarEGDla^p6!@cIBm`3Xjb9i+})vKY!+XpbYAGZXcnu?oz<~tT7TNddHHA8M6*p* z7Pw#AV9!n{yfm5}{8UjtM2zLq_PbOozbcx&7gGuEFACQ92gF^eR!^+|Vx@0Qa$sF2 zR!W;J*4Nat^U$Gj4($GCxBGL}M`LbH=vx28sL%Y&rfBxYlhI86Y~vGhQqN znz@fybh6VG2d4FJ_?AJ;`R%Duv^~#>W(bl*gO5? z0nuMtocrgwyCXY2aazE1p>MjrWPCVtWLt(lWZ@$I#eU~Kfnkm;($+mO=7`w-rwnuzhf60N({u02W#rD)CUi2CBMYR9>sjKpLMl)XH$obo3U`7JJb5jO=6Pg-)MHPmG_Z-cAhNn>fKc{-bS;K(v{Lr*YIS!=X{shyJ%*k z^j`;0ruDz7+aaJ~Ry4DAur9Rlf+wrgrF?wsjcC?u&!ul^S)R;)JsiyzD07xKJXw5ne75Teu|6-FX1Lt-WLp24 z;(sU972ETI(m(HbvXaW2-44MW&Yt=Az9$P>Hd`7j`U9Qc9(XdXe?zC+{!hgAy`KIf zVCq>^E!{{r6tSpT2hmuWUMVrP9B*eRuty^+B|U#_w^pVP43CXtwv`sHAH*Jy~mI4(}gP zzL?$$6YX^(&%uB6<7gH!#_CY(ubxcne{|2PyO-EMXYQ=w?I!dcN;`=5^ZR1*Z~s|r z&l``^ZixM@K-c;=%;DAJMR^Oh?g;i0^`rS|!MU|sPFf@QT-f=l=*E@o^3#WlQH*2fBCyd)5qUD+3BJ$(v}M@sPDuE1JarcE~t;TypNHo`HjW) zTW&jPYJZX6UzuCn7R{c-_#J&N>hGcS=dGguD*fY!C)4@|Un|WoivF4FeJ$y&Xzw)J zy6mptwv9XHKezB=X-a)Qiu^|$eGET6nbyCs)_t>;Vttk??e|>dkNS|%^tM}PMS)cpu#|`@E$@(mP+0`jqwAZpq{rn4gVa`-+pE}C%vzQmt<~v;dJ(tA( zm^A${-zxO5%-8AbK8gC)HT$+PSM+yf|IZQSPZ*w%W9h}T{)G?6_!oLA_K%|eMQV%n z&&ya?t>qI@A3A<}F^`hB+2QBW>}c~MvTHR_zQ#Y`W@~mw)OUG>8h&@|yjc5o{p=PB z&Yhn7qee|Hw(^O%>Gw$FyLIPhTSqUZ^)IwvkvLcEzbA7JSY#LXV$+wqclQ+S@p;_W zGvh!jYe%y@w?q^y%Woems1Byr^iu&w&dI zm+j`owEoD5@;!2TiS~2d*u=3^Bj#aO>tWFLXy#D4*~Hj?8nV1EI|93j{?_a0+_6uJ zHe_qsTP1cD<+aTTpLBP+XpgnSM9((iAH{;7-?3wyCO8#8M>IxCJJ zUtW&6AdU}rl>KqDSl?9{abtykQ@M8{jwd0@o;1zfE&5-fRI}0(MEP|uPdf6t0rU7Y zC#_juvH#Qd^$_P%u{}k7Y?OO2ed@D2n?E+O_ljXA@59YU-*WpsUX?y%UG8z#h231R zB3|DthJ9-!=9M2fvsX(0jS}Yzr9Wjlv8AV1$Cqy*&iC&g{W{phnFa0d%{v9gFk_oS z7oYE~#cGOMVRyy)I$I3?{H?b*|JKM1EOteVZ=62=n9m*9s8Yoq{5&qkAGY^OjnO-@ zrKRszOFAaTdtaY)U;VEGtNF@bQjT}2USrP73Sx46~{y7BlbseK3_RyNcVrl z@rfzNk6>{=72Rmn&S;i$s_ly^ zpEC)Gc7iLo&MvgGh^Sw~s%H0;@!qM}kf? zvZlB@EXD(ub0&BT*7rZv$Y!<}ADwHF=d?_WAH`-Fv`UN@w>8Y)Day}M+RNF2r9GRS zw`iUipDburVvy{>9tBlPpS4lsKRPC0@G5(jcfW$UJXMS*lzZ9Z9N49u&!luQzKs>b zvP-V^;@(iP)sseu{4`#0U{_;y#1EV)#%px`uF1lv{l)sJ*B|l9c&A=8%WTdfPI%Q| zqdJUA&khsgnV!Loem=Hmv-8aZA=XdaLV7K(g}*G>6Pq&3 z#?DF{zqqo0?Q>?cKjj5PJ`m%fVXHsxd|HPs%{RvX6vu~y`{RFH9#EI1Ws7_1Z^U>= zInPXYV?lGK=lh-)dY=^^Cd{q(`}~z#yUxMtHtwvZaxYFCzn}bjHX~@HJKLaKkBIA+ zZB_|W8s)oVe{LcAf4^8{&e1H+x~lJ{WKlHJlT)K*7+~bb^1M? zDWjb4(mJv0%DDWPINmAkwN?0)K-|ycS35Npp{JMVKPoM4e*~boMX?MkS`OFX34mRrc|Gu_&)6YoHa2GM# z8l=@b{Xa2D-TA!Z_1Vpvt@1nh{FhVfANbTUf4-PcuqYHLTZ`@2OJNT^w^8oLwe?^p z+O1?+ZvWLm>o0TQige#g&*PNwdMgiBunr^k{bTW=TWEnF*V!#8xniLk%Y7oAv1skfxzdN; z)?wR@K1ujk*_+Fy2c5ZISUel^XymVnvl?tbp%ws8k%A8K|4nb>+>oO9kXx{8wH<`D1RM-+yh2E7$r(74C_J>V7rt(D5IOd-K>Q zD}pOp)!}Q!^%fu0n}6$_bfD<>KkHtg7x+IWbxkNUu{U3=Tz?<0%~!NJE9uAe=2^F7 z%b_c3|ISz7U+l8|xZ||{@|%72OSmI`Oq zn~o0wzr7uAj}CqRHGn<2Am&4Ie17M$esJROhnvE$>~FaNzsvIeUix?smjJdcu-mww z4Sab)UG7a>e^|cUm}|=u)x9sC%`qwUj+?(B@Je_ndvSI~1aBms9Z`;5O%Q~Hat4h3_7<-~ti6iZdH)DX+PGjS|sYxCc^;*@c=csDY2q5x-hPZ{s7 zZ_eDK<^4B81Z({Fel4GvJH0v6-`q5{pU{)KOvxXS*qpuJ^W4KL%9#z!`RLnnW^?A; zCB5OJP9k38zn5?`_gmHGtmBbxg}?Q2X4{ngM!Y*bR+*n^<;<2p8#ZaRRdXh7b=X~M zn26W->nP9c91LZKfkh751-mfo7SA*G><(p_Pp)5bZ|A~hDs3fnOJ!R}xiGCisfCp1 z@>rDD!q;n&ugmYYbbllIQ&sVtL|k0ZsO;vf!X-SUoNiPvE z9yxq{r8b)x85rNNqwp*3Q=|^dEj&@G*h6f4W$w6U9j5hf+bAxGBSM*T>Ii?o8E&kg zudvQ@PUK{|F$ZN{pmiw5Kd;=F)^DTC4~p%vsg`H6I-TmVh-5i<)68(jN2KOmE>@R~ z_}1WPr%vI_B=-ApHg(WHZiF$WJj=Jkm34l)D80heFy`$$u_-@7Sx04i z)^lMC%4Hw9ctq5%=-v^9V_ewDmwO8BSQW~`mHFYnT-iM_{B-LOiut+$Zmf#aS6>^^ zUPtRJ{|nIzuE77sg6skv|)C0 zwryd@inG?avjsceWdzo0&OR&SnNWA8Z7;D)JrMh8KV?5PDr1m*FXq`alda6D&$lUa zHMbeFoA}sca1T2=zPJjX@~1stYtQGYc9$}WNex*_{)Ouh`nDFgqw$Upv+G#pQf`l0 z{F~CAoqd==c{VlNi5r#q6tl+6*!1Bzn@di7@sEW*UG6t#hTQd`6Fxfe%o7uTnlGq>f7CG}5%5~pRKbB1A1s5K!JUeI6gn5_gm%919EBC%T z_2|=uO;~K<{-@Ur72I!rhSS9+Y>ZO=3s+uY{jv1&mjo|;o*dW88Rf?}Vfua}hfR6q z%q_)f+tRWLdtp|}+U}$?H!9jBG|lWX#W%6!!`Kh{xsRx-zqSLj+gcgQLK-`gfx>9(3#)+5o7*-vN+dV>+#;*%Y}O??a`jG3jT%pn1;1^((_1=r(Q%J@=zlP&h<^oD0X)Z^Jrb{`sW z!jBo0=f-Ba^DNJI-l5`~V!3_Zf1Q!z&UNB)`B*XE-+i-StXVKV)b>ZwMYEMf>DL9` zdtz1V4rs`GDaV9q>~~$vghx5YD1SD}d}=&nV>-<9yA@pjxAzS_L+!z>)2lRz@5ES@ zOHobFeDL6V50yJHU?O8F_BTeye`tVvZj3Sct>L8q5>MV)xsTv4_MtinX>~h_ap&Os z-YyXVte`GgBF&8_O;{ZzU;T!BlXCshtqE&8P)rO6mXvX4sU~cBt6@`HUhv|2>X&}Z zP8kCjz4(g_0YB&N_hr}DmzcUf(UVtD#>LHiS)DBhitLZ~#JJkqmqk=BHDPplPfq>B zwK20u3ZGJCYJ)$<+1hs2Zihd%lV&?=xAQk~Ioo)kKg(=z>|{rCcfLvK$6*1?Q@IYZ zaOYK&XBQ_h*11Oegrt4$T-*L)lRT6pS6gK}iofJZyaN;e%6C^ea`A7HPfB;dl>vnK``>Yd@_gO@-)HDTgg0C)5n|DcrL9^vVlc#T zh|v(^A>tsWLd<}e4KW{LF~m}cl@Mzn)7KbV<9F&Ooo^SF%x1g#6pNaAeKY? z39%O9FNn<$+aY#B?1MN2aSS3A;x)uOh>s9oA@U#!!wy#*q9jBahzbyH5S1Z3A!sb)Wa)LH#*%zmI{p_|G5lgR=5+f;J4O!h5c0vQhE8oQ{pmn&@ablBAZtFQhC9z z6gfuybE``J;1pz*_S~Jf*BE^kS^E7Uvgl8meV4|2zys9HW1mp{iaX}ZTzv?|4^;<&Rsr}hAQIDTof%s|+ zvctIt2C1 z&_AjC;A1qO3alW1bWLRC<;?%+)-pcG(siA7JZgr_Zff)C=0s$*c^u{UU4(4%^C2!d znEY8Su{~mc^k-;)m*OKxZ?+fbCo5JD%Uiz>q4|5sifsP)p6W}Z{+NeUqWQdQDwas_X#ZPXA9b<3IqxI2-{On9&Phl6!{biG7bRk?y`n$FjA5zkli?8xzkWn}UlVi+;nX0IIJ=Mbs_7`%-(n{VS@l z-x+cPi4S!r7W|nt{-*r(ce-)EJYXm?o2$co)__UWp5ND@F8XzwdQ$$Kk5Si6B9@$! zkmD=%A$`we(gnZuxAgzym*?FeZcrTka{4{uKU|STKP*E`$CHr_D1T5ZWM;5n`*rQ> zkzV}{vf+b)Sj#V#w`}i5?X}Bs<>L`QR_kAf(wINKAn8&!A7s(5Yu}s7Z!Utm!C9LR zzPmA>WjoCWc%K8Z=r>*+O8fiTU8tK+(Ecx-9*k_QHWGEorYExKx4sD@{rO_l<*!YN z3k4wS&$J-kZlrvIKmL#r+bVtbI4wVH00vApQFl*lqi z{qNMf3cP*4_&*Gz@$ue|x~|9w8o%*ZkquRc60fO8`31kB-!;-x%29nqu2Q~3cPL-` z>!gqBO!)-AwJSW60PT|QA3**R8oldecX!$NS%;d=nQ@ zz7e;H`{*pUS1w`(JyzTKNMdOfkvN6IJobqANy z{3?AO{qeOEh&$K)*rT zpG>Y#X*{-`#`>iZ6>$3*R{TR;^cuBKD4(=`Sr-P|O`7x$^C^R;6OV6C+>>(Qht2 zSsTwMr}sGDr04BQaa~zP_XAAT@=9`+=WA^AKwb3f z)X~^KQ}I%$$Co9R=3214!D9^RJ0By9ezX4$YX8v{s2h)OB#!!<`rDAM7flUkY5iMD z{mCwb^@-&zPaD#Be``YZO$ws=?E;Z4awF3JwkH<+hCI!`s3Mi0M=U)njco7?rhLf- zi3PvBTDu?A@C@onyN^+O&kn@#PdcmJFJ13L{Y&0Vy0$+WuO1{`pXG?-FXXcf8%gzF ztxEOJ98UfDb07LGQ%ILyyeAg?@y1--L7GLD}Vq1tm^mpL(i+;=J{-n1V zgSzEtEV5KJj{MsBxhXoFbir@ldJ_95RVakIeDgH*cl2lCh@+(M=!-1+Gkrs8{~S>P zbtN?f+0?Hfu}c%uQ$LpA?HB#=SGDu=jo#?D?A5Lx?%qXzzHZ91K|P-(&98eukow!8 z73mcR5TD+G<#huzedFK6g5Tgm=Nr<4aa3O~?f7)|HTm1q`HbW>7FqOLS6!gbzav4Y zD>Jm~@z@Q;d4H{+HT}S9V!>~DQ;yoZU~Mt(*S#-Goa2FPXjPtgmPwh_ zy7F=%@%SZF-_#|<^W2a{zwSU`%D=}3>$jfFq45lSjjT(NNterGdC_lZTAK99tx&f( zXyaEb8`+TRNc!_q#G-#HeO^nf8s&e#nEW<9i}3M_U-SoYWEW(M;Fsf8QU8)&p{@*G zLEO0%a;EQU;>(+mMZX+J_jjdoHK_lJQxt=GrSlQk9(hLvVz+`?`TzRmxJ(+avabrO zP)vVDyZ@2W8Ce}b!s+oPcLL;^bg_< zL+p8ZMexg+@HbZQ?~+4Z^uzry;u?q1pY&rlagGPF=(qf6K=W^DH`EO~+Y&oEVt;k> zyAlukhW#@fpwC0;^bYKw&e?_Z8Jp1`f37(3mMu7+3=0Ypm!6C5Gv|G!{%G^>r(P@F zPK9oW?NK`2ruL^#q5inT-@t%RD)hxpgIqwSwR)5i7%t^pMJ=? zciR1gm0xsR7yWVs{EZR(yYzQ6)D>Gty1wdF#7>2OR(9&N>#YYk{#NN3`Cla?i{;IE zHq_phN2t8_kIoG8NNLY)RrqDyQ;hWeDcC;IuPaRF4^qc^)ZQ;UsDHEiVR^%5O@Da_ zS@g?;-co((1*ksz*Te;m72x$N-kRRyJhJGw=r2=!x7T3#c(ZoB|IUl*tEcI!I$?Ry zFYh=*$IDG~ZK%K6=aGz^a@$@n+txX z{*L~P?X!-1Ozd)s>g%KNksvHD`jfVOq56-mp!`KX5l0O|mb-r@UgwA``s2^j{S;|e z6RgkD{|t@i*uChF@2uSq`D{-t_~k0^D1WnFsGH67{irmfI0V!dq`>wR{c_?| z+W+BwBYeCvL&p+l?!opLkB=h`m`CLWzcRWk>Fvr={j3!6g0<*3-_gD=@)?ON`jut( zsQrd2sOukU+e~Z zVyS(}&xzfeAY1Qh95@imi+=sX*W@o3PX24!^>jim`ptLVkRBR?Ec!D8HUFSQ(hI-* zuRotJSEKp`zf$xg`B#_5@wb+HPkg30`jb>`KK5@){S*BkwfRz^5c;j-G>**4!}(G{ zn?G6Sk*$JXK0(iyNQwuQAAf`DH`*eb7inzSP5A`BQmi}87njUjUZ1&h0&#Fns_$-; z#)YZ$Y_SY+ke(PQBc=W0b zwO{bZuhXt44!B|abX7F&I|Kce@uz5i9&zg%@1N+G2Qup4A`|M$&i2F(b+J9hk50rL zOCgJX^9nEeyjiyp%Ud_pBzCAv{Im{n-=@f--y(-m`}$AC`D1zJN8G~;>$CWW6T3ak z;q4dw=Hgdqyv&s6mp^TfF&&ZP4{P_kE-P4G^y^>sqx_bsZ?U~MhZ3v)=+~d= zO1#<+$IBYulQ^Rb`o;3f=)-h;+-7{k^I4l;B)%{U_XlO!M&fk~v3;UnH|;L9e_Va6 zUzbDQFG)rZ%xC%MKI!|Svv_{dZ#f@D`r}WpxgLL~2eJ2loG+H4?TK|0u>GQ6*G?oa+Ib7^{hLkdWHge>@#oJ1O5eb8gBE3MWLZ}3LHvPRQ&fy9De z7t@Z~XN`P-?M-M!d?4>GXN!9WVpFTzs0;r5`_ocZ|9f0Fdd1TIv~4ElGY=a~Y&vs` z>!RPfsyH3L)ZM6?*B2#@>~NF&%~v$t!4X;X|GYn6t_HP#9({i(Y1`}P{kI15==$N! z1?o@l7j`P-*L9=sZ>3=_I^@b3q}Mo%*MpX$+V;J77dbw*2yNfb_LQ%jCvovB^!ZoE zgE(LZeg4D-5x<{}$3KG;J=ZU_zK!+$e9r%$kjhy9j5zXF`puoQWm`P5DSKu`&W2J^ z$m-*zsK*!ojr2VsZs@Ppf!Ht2g>$@HKjOX{kqzJI`8d<=bN=&$K+#?uT~VL-#-X?jil{8@zv@n|_FR z(oX#R)AjkAc;y$mA8~6H@gGr`-%x7>anb;~Kantr_`1Ok+vmQNc-u_k4V#GTwx{dK z$9suu94?6dPe+OKx7P;x{!6-C4Y#MR;RMoWh2iI!<(hUpc`*z>FXC%y&#(Bj$Kyf# zS~K~564Lp673X98RqgtD)H3S-dyN+iCZ3^j!USZ|FRK-4e|S}b_V2+ZiO1BZ{Ufgu zvGw&0Uccx!A2QMY$EM@{CGQwToEA&|^eMyv5!Y~k6a3b5+Wo*WJFjwG-%-1tb?5F& ztZ&d2I)3T?%-}5gtpnhl6Y%fwzyp?7Qp*#Y!!Bd_V6A^iXD?xS(cfVjwNDDg_A3iC z&iJ0j{rVg;>5-#w{1w5UxsRT2gX?!}pM2mSYHxDED_CFt{dV_0m{0U8&f4+Q*c6X1 z)_Y6n_^Ppa=1St^nHPCJ(Jv=Hqx&yD@34Jxg}cOw-Z=jH$`3XD>v=4{CX0B;H*B9+ zUTM0N`fIICf%~=pN`dmzhKP7Bs?qqTd+% zh}y@hP<>@Gh}Dl)tlwYjPu7xCoJGG94euy{f0v?X97q57AmT5*asFAiI}=CV#QeYb zoA;ykIgB{N%P0N#jX1HGg|m66QDajI<`?|3^BCG*rVoeEe`o}8MwgxFA3vIy4NT@N z`mGfLsC~?ey1tS2Jb|g|4$Sv7pY4}pKEbbZ>Q2WCQ{rN-|2&>_zJ&85-mW9*(H-}2 z-Ox%qKMAOb^WShqyWZ0JFXl50%KyABisPwkr|~jB-2W^-8h5^gEc)fb>#6@f`|0?w zV-YdD*cDcyMQFPFDct^|KQmN2pVsP= ztK6jhRT?&dmsi4W5j%`XNHVU(oX$XVGu%uI0DxzKs62 zrOAKb@D1eV+WsN$yNWFMl_c$YtaFD;T-WDZrSbYQ`x@uWdz!8fO6M&4t$QuBKf8oK zME{^v;t3xfb2bNRdhiZp(QmfXKA)Vj9$|j(64d@zC7xpWd_6eg$$vQC|KT3hr#pt_ zlg?_#%lKMYU*Z1RqHs35) zv3~PHZMLJ(@rEH-Ui2$=|4{#2YCPllld`n!HHY5ctQ^twJ~OC(!Jm2Y zHI?`3PWc@)?%tC6-|GeGOV+;N`9;67Mw@Rs3H@@4#)eHVu{}Su>-{5nRKMW2ZhS!b zS1m_fzH*D$BLmCpJKZOCazqyW#@F9SH%!O+&Ev9(b=_WJ{m;G-FEL?x(Qgffcd$ag zByGIq+&9E`dSUsW`GN;x|3$y~({1t>-G}p4X_HPo$E=mt=I?-qxP3&wyjGk4eNJQh z%x`Z}`3#=?MGj*xGp)puqIjHH4A}F8WmlGe6-zAg!7p(EI73j~b@s#xM?#QCw*h}lL=2!Zi zruLpYhvS)4ZVU0fyIB4gf2W#s{;_T$>egY^i2wF{$NQi3NKc&VkF1FPYvo9P(}vnN zs4VgKsl@v?IVx6 zN}RI+^BM16Bkuiz>KFWGPk5&q^i%3Sh{`8_CI2E5`ja-;lU_3GJ`g>BTWIM`VqaEp&`+vdy2!3nI8ftHX>u1!*Ykav{Hs_=baiqJQ z`ilJ#{7P@_{#jUo94y~WyWTi-2U(u3ov&5ML>B$BV{>X>fn(HuwFI#&(dWOZ1kJB8 zPS<(<%s?Hn$7;M^ByZIIKES6vUjJGno09Hw3R$_VosTa_eVzj4jnU&tPrUMmv)ODR z4u1ZCbLJ-P`&I4!jIsVT(tE$i;<|O*9%5bKYs{D1pLoFW?^wQMJL1Tjxt!(s+Ve8I zn&9{2=5gBf%esa5Jcl__yPx~)4}3l@Y4>z0f3yNVf2SnV^`yx*3ZF-mW3=mugj~`) zQU6R~p?E(+F$PfiE#a6y^PcwiC-q9&sn8y!Umeo>KGh*#DoWh;0N(F_=MRXh#^Lj% z=G41U49ZDWLXhP?8N{kTwZGtD;xqG*^$E$ulj_*1P|UjOI&pGA{QgL;@PIhF$g334 z&3Ugieu4K7$PCwhvPmr=K1z$-!ERBg~uPWXCs<_39jF`{>%I{Y5U{xG7q&nUV8h!lhj|jm%s@6z(eg(PA{PASX4>;Ko+-J!z4|lnX}q-ay?8sj zXr5P^UKICdLnH0^wMNhJ_+x2PhV(^i@px-Er1?j<;OD>PvkU3T>rUhNJAa}4`iv`_ zbsM$k`xdOl`J=3*zxS0KGS8)eZf>TXAA4rt=YdYOp?t6BQ-6f^%Q@|-|35+`2Ib|o zVZ`%OX?zB^B`$xKSoA;8`u}YM=>xUr!^+nvfcZ*l_oJSaB^LbVg&U%{Px?HA^glKe zUy8+i*8A&;|8PZJ^qUuTqw%RS8Fgi5C*oq+#J9BXpPdWYG6}{<@LQ@Cr1AL?f_}?I zTjJ+?kmLO|{csSL7yZ^Y2KxS``2*CGlt#oMZFRgp`GNL**wBW^qTlki1l8BCFOI)% zkUjCsVOYPeWJ%)JB?|IMbKBFob?z0OEGUS9NDx6h}~`<;)` zpSfuc@!&Y}&(rks;mD$2-t0{C&VB2RHndniZh( zf?w%7i1dqFsl7V~5Eq<)ez`roW0#lja1HZ|etG&L(p{Ss;y!sbyrUO>ODn%%dBs*^ z|1{DCzjigbV~8E= zBFlnbvF$?o-ASmM-L&!8d=~Rr@3thpWfbb7KWW<-(nB^;{j-JVy1? zf8iL~U#>mD{u{^iC$5@OjJIEIuRR|H-;Z!M3x4I&2+F_t75bBgYxUbm#kt>FN7H*1 zK^FaHyM0vO%x6@7_b%f5cag2lcW8VB%Zq+%$SA6Re>By1auD(MOytbA!-@aALAu~q zW^AYYzxPI6ez6hRRIY{t_O~_teYjLUyaeZ8{2l`+fBCbd_lQL{{WTxU%hC|ykT7Dw zZ$3fW&$QH;>ib0V%@n;2**KZ{V>-JXIrGX`$``PU{6cwSVi7uCl=jB)%V$&h)96=< zR3Lq?4YoI5mo&e%x-I4RbaTZ1KBoC=dP4QfRcOANoCaY&>vwJYR$Gnr$yp&-Ub?N~ ze6tSdPJC)8&7ZMth#OmJf2}<|UoXJxGoRPy>vg;Dje+%ZRnl{N)Adn-1mgBNbiUSz z#zWGs_mqZHNI&-)&nJ|Z+VLZL`XQdrdc7a%6Fy|704pv*#Pv#Mb2cxkO`O{s&-XLC z`VcD*K5|`YSe$sw=v>a`EuI=*w^4zupTE=JI~Cbi6uH56;yQcr{9J!#HL*SqzfV&> z^d~NL5n1*y5Ks7E&&wNMeWC5uZYkzdzP%+jy)DRfv;QMv?Rjk3cRumcYgGQ~SmIj@ zpV!jwno7K?6ZR)*_aNfjsc*19L)sAQyW{z{@l`ajdy{8e*Y9dZT(4vX`dzf=jXbh& zysaO#=R@+ZSJSogf$zOPG=}n4MLS>kJ{8Xg^;2RgU#A58{Ffi}CoXvruQ!#M>xrG( z;r%B0T{iKCY@83)5H}hxcpi%9*H_lYbJQ}dPtk24y<`@ik1D3!#0T^6{7Uh@NL<>R z&R5>i`!l5djp_4r-+j_wx1rCo+b@w#C6j6X3;PSaFDQyZJERC_IzJtwvF^_oe7xiV z+Wdc20`*_~Ym(`Bl=l$#PdSE;f2I#LO7gtQ+Cy}Fs#T8mPr+{<)|2$8H|V$i)q}WX z7V&sZANCDd^eYGR@blF4Vi(m{ESLDq1!R3o3CG`byi6%ZsTsMc5 z!Sf?i#m7#Z<@$90!<3MVtVAv({pCVBo-O`P=TqaC;P}hV5?v3SPQ&eE?yWs9J}?o_ z*OWHFqK&fXI&U5EkPas~D`Tz`U#*7sXY`%k5+BIBgnno3ey)@rnF9XI&`YEzE==QWjMeTB zr&-Q&R-SA3t2(#5f!y~g`Sn3}IGfAfAYL}+DQDRL@A!w`l6f;7pW15Q-)PSRCn?(Z zfA75Tcw}_azR%g{ZKpyp#oM3CTO$i_mVNsXhg7pgedbi+=1fNY0i7S3+AYPuf3^Cw zLEUuaAAH_Ien-#ynfz;`EVg+Z?=pHdh=%98#?S16}SIKs@bt{5(?f9?*QO zewgO_$qU401?Q(YUzOxy)c*x*aeU2-wENZC^Iggrh^GDL^FPGC!;uxY)5JSBAd7x^^C{A!{-*Nz@9lh* zg=}@SkUr=Kl^6W-%WI^klq<`})BN^4F@Mh|uysH>v1?ys(f<%UxPEzh+WGHnaBEdx za=K0UvD$3Hy0H>pJGy<(w9o=?>M$J`9U%A*|9=^xC`9`eOQsB31M7`zl-g?5MBTZGD*N_u%h0jc=to!Kx@5TElysPi~UR%?9S{(1iQ^ zY{%6XPe!lk?88KV`jz5wEw{B&J@~pv9eh~&v1WN6)1iL8Zp@R~>?+qEw^DIkBCZSm zw9>w*w_y$3_`DkjV;|LF`Qwn^Z;#q)qrjNfY9GG-`tv$W^urK5f|yVc{sTYim=} zPQkC9ZHRvLY7FW>+prYYWrKXuh1!&FjT7mjzul&GnCK6F+@fI-cu)8q zzK%ib0CsNnt-KnuS}>Q=rzUr52l=%3-gi&)oYA^#02BR1Z+bL-SSg$h;Onb2hx`e! zeo`5Dr+>sf$rsA6;_r=j4QH$Pd)T`KFwwtf^t(xSTD4#sh7O)|0@~kw_`&0m&0$d< z{@(bGkS`tbw`su)_#W&4Ci>^Ntq=aNs|Bmaw+YneM3 zxU#PW)A4mbqG0?4{~i85_~3B%cn7?9{`Ua34gO9T7QCVNq>qI0ICE6p<`d5DG;Q+F zdJ~L?;P+}YdP-hMIE&}+9iJS)BAUbcT8+Y4_YP|(cb)?6-30Hohy1us;7n+r;LnBE zZ4U^5`uRE`kiVYO^GU6IAs=7QaVm`0snrda20?kgZqrg2578fTpg`e^p!4_cFKoaJ ze0yGi@qSxhcMAC2VOT#2^xQu#%;@(=16EJ)XK#J{+vu_ptb4;Z-L`LPz~U#v`@<_o zu-p^!gG=ig{G8+QExMmNkOTea>naqBVA=e=>qnry{JsA=@GqKc-z^Kqm$&;-IP+Kx zya1T5Q(zmx_T1f3U_vRF-+cQXhw0DbjmPW(OE%fo-}Q@D;o3)p_M+L!B=xLwg)^_1woRNB z*#3OGwuABDYw@JOcv_;4&Uypm;XN?G>tF*|UvJ;!pu+#|!{T_rF8Txl*Rb7~`l%&< zziIU2OQreyOzrqSmDZDG@Hvs~t`-1eVF>=i_kTOT(`WhnP7~YQ3Fq%S?R#ucL;gP0 z{POwv0vxIy=kH4u{bK%M`_n%2_p$ms-#d-Jk9AhlDrfooTBDa%8_3_+`hMW>ApSnp zFE*oZ@b{_WnvA31e4DQ^^U3{}_Cgb%Lg2Ao4)!q(X%ek^9L+mubZP+Wy}L?7>-1Dd z@byyL2dMwsvAz4Bzrnh3|CY(m;ltPRacH2z%5ZT`py#vbe>3UMq_0qiH0hsdw%Y^L z(zkyPd0V%qn$hvuEKiiO4x5C#Wz@q=$j;B&cp=dR0Jgg~X3{ck&eZZ`Rqkpy; z*E9k>W7^ZexzGpE4}E_I$95e*wy$krXWBA7nmLq#bslP#W`%qQ{%8Tu?q(IM)8oSn zH)anWe(dHy|JSjduMxv${F@^`=Ynh7S1)cKz+O!TiY#K$d!zkhV+=PL7Em~H6}DQ)5JB^?e_ zSyg763oHM3aZ>>3(cTs6x}9`kqQ7=bhN%oMzjSCj+q*98D_7j zXT(CjcCJkH&psNdeuDRNEqLaecyEF$YnuA``*g^cIHPsoLe=$`Z7TX<(`Le3xjp!N zSo{grw{6jfx%2h!`j%lcmhX#uG9`u`m~%j?SGN``3!d%}{OA9g58`LN7kGUj>OllR z1Vb#Kw|nfc zunZIZt;_#t^r%1#iyRJX%1tZ78uB$DfQKBr;ovf(3>(hZ+5nc`4s*P}ybKflvrk-B zON@c?ZgA{vTaG#JY%Mh%9mCQFHw&oMxg2{E1t&hhLo&T*?(SWViT?aCi_Y0I>w88y zR;XM;S$!6)^Vy zp(_*pzumdYKDa^tr#4>i!`&EPnl}KL6*oRFJj#u=Lv0&1E(#pY*PwQ+Nc-3S@HcyYIc3A=K5Q9Z<9K)_ z7G852^9E+K799F>WF>Zf+h3+Vz=`}EY*HmA`fi(zJ5L$4BhJGYW_plb}XX7M#J zE3tNA-K7ekFT*v1D>2dkEPQaB_s%{{;%gZ9smx~oU8M1Wm`z>6HP>M?1gk z)2s@scYT6nVBL6*j^pua=z)G$wO z<;iC3ncMgh=ox$s^^TrQ^f!ImUOgHV!yGncD&C_(?=;*L4Xm#>^@}>nlU3(ys{$_@ zQ1N}jcuyw!ed~udz5>U=7wbsaw^iEG9rzRqFo z2=)8X+ee1>X~y!~lVATYept2wnt}ID)T1-P=bi9U>zbjakepNV{s2v%KV(SvKr+1@$}@(XI0ee{N4;_!S6ma z;KSBfy?XWLg1El(Le-46=R(pp>($F8b{*@StG-@~*Q+^MhbA4L9;%A|{B~yW zwI;vTR@FZm-!J1;R*i!}1a$fS*se^K~z;>RAE4UEVikKE?b- zx)rR>?D%@a-3{Q2d#%`A9q z-RU;!hK#H!<}v@~8*wo5>y0V(ndr~2e*(WgUR$5(`TZtnPXyjWsn510XG~I^+p2x^ zm-ANbtItG#YTTj81@5&`xAFDXV4YV}Y`I32AGT2k@O3bcKt8@k+1)nkD&C$0^_l2j z<~g8o`4?@}$9x^J%k^0Vzh4KO&DSe}b!0=obv{1$6XfUR)9W+QzhRNJk=3Cst>>iH zXKFY1katdPRj+5WkCwe%pS|L1y_SIX^ZP7k>od_m$M7j6;Y%CUg>QezKb^O~KwH&? ze`dq@54&0Ed#{{-w}Zlv|8lxxbF}9@Slnxy#sfG&!Q;%*5 z!a2mxOZiXo$Orb%#@ALiq#il7ykmWKfUnmGYq8=u1BF7m{mNsh3-+xc2* zFkcrjIe+aPIWG22oS=E|q_WXz$tqupWM9-WFOS;tf+4uah*u&Rh zgt3U`V+CX5b+y;DZY_dX#>kCnV}UI;M+%md0-4XkOE%3Vy-L?~fh;4f(hv_|!>6=H z9Y^>x2VQ^u8tR-GaG&9^ACue$?7aP>x|$8cF$=~|9A6mnHxQY8pG&HxYoNbWm-Wqg zWqO%qU^1?W?9V3g&!6JWa6451AOA`8E`LfXS&#J?-t5QOW=)ytbBBYGUFxwZd_Bsc zupXqnE^u+6FWdFX;YafErf~lj8ZgC|b!E4HNH>6Id<@$j?91H7*FI*n31d?B{i!!r z`?AV^uR7{$7sfKePyTbHf-h^U3phHvRv63V>%sT*WgGeZxS?SznXi9d*q2S=>jMq~ zf0KHLrVj9BJ@~casxT%`fOT|p>as;}Z8a)6jOF)j5MK{GJB%fcu^)3ewBFCYCG-0{ zQ(=90C%AvOIE*R1505wXh4tmfI}~{T8?-MG)@QHk$8Pb@=7vx|-(KVV*p%h^&=+09 z*qE$R_Gu^lSXkwHQ|2}Uz1Q@AN{#emEg~XM><G}Qp z9)7HDdRXwy=dfO;HrChqeY<5ap8U8o)Q@fC=W4M{nc24Mp(#;*Y!V;e>P=a>E_F}6 zPpJ2ETZ!8PHql!MJcgU@Hhl2#k)CyYHUD+54+f?-+0^~wb3N;O(lIw6+wh-bc)qts z$i1%1!5baY)VcAqf4%u-^K*%Qs51-hAbIln1V3y{!^$fBHb2?dhQaU1odZJoZ|i!W z_KeTr!w~#OJcjdm<__Tjp@;bYU4xDXPaMQlh#3&GA?8DDg4hbN17bJCehBzsSxeki z_-*cd4BI8PQ(}87a$VkZnsmXRxY*m5@$vzeq>&ub)Hto0lNJ{bXYM=hU8WgDGFd4R9=h`sz}?j5bDO>zl6RXQz^*Jkj$00|1KSN#aK7&|jkNZFq`mJ>~P`(&@%&&O;MZCHxwdei< zo?-MJnamE)0F*h4IDy*rTn zQ)&@cpNgE(pe6aAtRSw_p4#(n6>{QB3n!B zL%-VZB=zU^8Y=&MIJI}-GSXvhYhe4ezrmM0ZNUahh*y$fDnU z(_Gr$_LV?AW3d}?KuP52da1;1OCgJX_pcrB^H@$OhkD{GcVug|@|3TCUD6$0sl4FN zc=N^=+dJP4_2`y&wBv;%aiitP@E%{{><09CzU4b^pNxu2==0Zw+Lt)AJF+#T0{Rn+ zKO%p0MJnI3JDpFAcP2f%Je_YG*7Cbvr}hsogSz{HMzlR&e#QC{bElAh*jeJ;>#+US zzn)TiezeB=WE+}aiDOUV_$e>x_#HjE8gauMD(`=hc=HeB=uuWY9w@bFJ|uc~!u-k% z+MXG+{~%7L`QiQmt|ef5TW6Q1{*1Y<>9oBwR*gcIE716hJ@+I|V{G#;W~*nWwP)2Y6?k)BBGF8E=8ru=r^Ffw>c;oF)K*hlsz!b<#uTVVw%&g1 zP<%^O^yjy|RuRwd`g$!@pO!5`_CsId{2ENS+!*TX2W!KRgnYbxe*gA0-=AUpVY@9B z^7Ci;o<*pMEh>L=pBkuc+P>!d?uQZT0lsGY&_LBb?d>=J`w^<>Hx`>Y-fJ_o-*P&y z`0NNZGq%o{WrqUQvA-oWEMx|r$e&*bRApX%MuaN*Gd_I#e%?7qozB;i?-rpZ*S zxqgtE$nPz7j!-+B;Yp23L8>==-_ouV)Gzqc#y)J&Z9$MaLAAA&t4FBme7u(fA5ER| ztxAL{zYLg^G$%;iauS{`1pdYU=bPYn@uBLdH36ncA>rya-tSqV>JEM{7x>P@cG4WU zpMHx!I~W|UrVD<_&potzO6bqIBg6N&_dP<@v5)_3QoeGy>dv>z;}G>qV(p`Oy~5S* z9crgsDITguR$Vx8!O(E^%L{1Vo>2A2{w_y4gMXR*xh5M|hpIJlGbX$!8LsBf)jIq; z4ZCnvtPf`3Cz#jxz5LTQ@8X)xjbdH}%cx6U*JmZ6hzGp0YtM8)$vGg9*@M{~g8zu~ zI)6@kS0BCRQJ)_4%#sB6hKx^oMEC5$?$q=;P|?YcCDb-nvVp&AzRHhLrTm!a_pZOk zCjCVeJ^M7rpLxEHGwpp4#kMK8zm%Hk&kFNrMW047-HMP2(JTF#=%2)&VVW1kve}0Q zqx%K0{5~x?v9Ia!*Z{VXKkqO?QY+L)tjFYc4oCd z>(RmJp1^vrbM2mt4=d1siS>E&?;XEIu{{;THcv-1p6a@!C7UiE>O?5Gsr?H|bS*ed-yCq)1M E155Z03;+NC literal 0 HcmV?d00001 diff --git a/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.shx b/TM_WORLD_BORDERS_SIMPL-0.3/TM_WORLD_BORDERS_SIMPL-0.3.shx new file mode 100644 index 0000000000000000000000000000000000000000..18af94b53abb2fca831e3449cf2f5331e20dd5a7 GIT binary patch literal 2068 zcmai#e`u9e7{|ZQdGGwOY{{H++B6$;E}JD|>CFZ$3mqadMb{i@*I4VaAVh4*8n|T0 zh!PX2Arc~UF0(b%$RAV2n9i2f6p^JNYm8cJ+7PK_gZiAk>%XAO#^-*Y^PK0`Iq!R= z`HA|^Pb6_5kP`k~t0`|A`&v`%8BtAn0W93nzs~djKDjuQ^1ny28p)-=xD@c++ahTF zPPQcM`=kkt=r(C$1#5?-DO06s!_wRc_C7cWqtc^Uuo~hYc|w}k!`^%;CyhAtwa~8< zf8iN)lk{Y{G^v@btm`0b#s8}G>|SXqd5VZBVyyA?kbBKeX={hnv6B5U=>mIgt<*^! zRv=v}q?dQ4uIu=3k-o<7L39r>?2>e&A++BlUph7Z(T=9xk!7q`NPo7#MbiEL(Ee8$ zeY_z3eGI#k={Sta><;#$vIz(9>yu5SE@o(x_oIuTUr&}LMbL3bEzHM~h@&CdL+o>1 zvSe~P;-;`peU_Tf`T_U@daW!i7dA4f0!cP2$r=WY`Fl>^g`#QGEX46+fsVt)s z;y1r5jQ?0SByJ9QBE__bJ=iPDi?OdBjuLm9Z1J`*-jca!{-T-aYS_S;CCXl<|Aah7 zHHe*lUgta$n`Lh@U(U}etJy@}x5#I+WZRj8O_fzScwY8)2YFY?cKGvbft-nr$lk&J zz)IQs$*ixI?I~mZknDp*@}p?gV&|S&<9V1S zJ8}aO_mgbm9F;Y5FM?Xx@lE*OlAS1E9=l~clr=yu4h)mUfEec|1)qH zlBbnE{k&U8AagjE$NojM7>C`mw*1iEwjF&?)}9{f_7YfvJ{kIT)S*XY7wF5+`Cw@nse-|N5{AtqdX_0vMX?6tL*z$>c;<8KId8`>-EnQ_wPsgaMxx1{+>}+{{VLQ zlb?@gY@7!m{ry}B8{u92sQZ_TFiSS*`^5j(oN?0Em~McD*vHq0{Ca_GXbXH#b{i-L z%OLgNZikm-cXmVK-QnJO`)}F28*+JM{5eD+&y8!6o3T99Gj>u(pBz1a!Y#f?~#i#UmrK>>uiyGn!0_wrRaoVxq|-C{+yrh zfZX%7#Mv*mJOdqQgkUiMIQQVYQvr?ck~8!@_)J*QtvVR zx%cife?RLX=f!<=%`K4p%|ooS^>g|QD&v82!n<YF2Xt4w-oj)G~s@SCI{WiNg$3Z^q(eiwpN-`~dyX6&P` z`wG(idbNVt$z#7kL1s;;=K-FdVBRP!Rge|VGst3|%rBUqLcDGT*#j`HV8J@fg!-@nQ0DD?NW wPr=J4A$enUa6rLkACL9T%wa;7g4g`{qT{LK^mTa6y&Gc}`KY&o^S8`@03`-*7ytkO literal 0 HcmV?d00001 diff --git a/data/conversionRates.csv b/data/conversionRates.csv new file mode 100644 index 0000000..0f00cd8 --- /dev/null +++ b/data/conversionRates.csv @@ -0,0 +1,87 @@ +"","originCountry","exchangeRate" +"1","USD","1" +"2","EUR","1.195826" +"3","INR","0.01562" +"4","GBP","1.324188" +"5","BRL","0.32135" +"6","RUB","0.017402" +"7","CAD","0.823688" +"8","AUD","0.80231" +"9","JPY","0.009108" +"10","CNY","0.153" +"11","PLN","0.281104" +"12","SGD","0.742589" +"13","ZAR","0.077002" +"14","CHF","1.043382" +"15","MXN","0.056414" +"16","TWD","0.033304" +"17","COP","0.000342" +"18","PKR","0.009476" +"19","TRY","0.29178" +"20","DKK","0.16073" +"21","IDR","7.6e-05" +"22","KRW","0.000886" +"23","PHP","0.019637" +"24","IRR","3e-05" +"25","SEK","0.125204" +"26","HUF","0.003896" +"27","NZD","0.727801" +"28","CZK","0.04582" +"29","ILS","0.282974" +"30","ARS","0.058444" +"31","HKD","0.127998" +"32","NGN","0.002788" +"33","NOK","0.127502" +"34","CLP","0.001606" +"35","MYR","0.237786" +"36","AED","0.272256" +"37","KES","0.009726" +"38","LKR","0.006536" +"39","EGP","0.056656" +"40","THB","0.030192" +"41","UAH","0.038368" +"42","ALL","0.008956" +"43","RON","0.259964" +"44","MAD","0.107076" +"45","RSD","0.01002" +"46","AMD","0.00209" +"47","BDT","0.012162" +"48","PEN","0.308962" +"49","UGX","0.000278" +"50","VND","4.4e-05" +"51","XOF","0.001823" +"52","HRK","0.160606" +"53","NPR","0.009754" +"54","BGN","0.611395" +"55","ETB","0.042395" +"56","KZT","0.002948" +"57","TND","0.410899" +"58","AFN","0.01457" +"59","BHD","2.652053" +"60","BIF","0.000572" +"61","BSD","0.998112" +"62","DOP","0.02095" +"63","GTQ","0.136926" +"64","IQD","0.00085" +"65","SDG","0.149504" +"66","TTD","0.147376" +"67","VEF","0.09987" +"68","AZN","0.588352" +"69","BAM","0.611383" +"70","BYN","0.518071" +"71","CRC","0.001726" +"72","CUP","0.04" +"73","DZD","0.009006" +"74","GHS","0.22571" +"75","ISK","0.009393" +"76","JOD","1.41062" +"77","MGA","0.000338" +"78","MUR","0.030142" +"79","OMR","2.597362" +"80","PAB","1" +"81","SAR","0.266654" +"82","SVC","0.114124" +"83","SZL","0.077002" +"84","UYU","0.034642" +"85","XAF","0.001823" +"86","YER","0.003996" diff --git a/data/multipleChoiceResponses.csv b/data/multipleChoiceResponses.csv new file mode 100644 index 0000000..81a4be6 --- /dev/null +++ b/data/multipleChoiceResponses.csv @@ -0,0 +1,16717 @@ +GenderSelect,Country,Age,EmploymentStatus,StudentStatus,LearningDataScience,CodeWriter,CareerSwitcher,CurrentJobTitleSelect,TitleFit,CurrentEmployerType,MLToolNextYearSelect,MLMethodNextYearSelect,LanguageRecommendationSelect,PublicDatasetsSelect,LearningPlatformSelect,LearningPlatformUsefulnessArxiv,LearningPlatformUsefulnessBlogs,LearningPlatformUsefulnessCollege,LearningPlatformUsefulnessCompany,LearningPlatformUsefulnessConferences,LearningPlatformUsefulnessFriends,LearningPlatformUsefulnessKaggle,LearningPlatformUsefulnessNewsletters,LearningPlatformUsefulnessCommunities,LearningPlatformUsefulnessDocumentation,LearningPlatformUsefulnessCourses,LearningPlatformUsefulnessProjects,LearningPlatformUsefulnessPodcasts,LearningPlatformUsefulnessSO,LearningPlatformUsefulnessTextbook,LearningPlatformUsefulnessTradeBook,LearningPlatformUsefulnessTutoring,LearningPlatformUsefulnessYouTube,BlogsPodcastsNewslettersSelect,LearningDataScienceTime,JobSkillImportanceBigData,JobSkillImportanceDegree,JobSkillImportanceStats,JobSkillImportanceEnterpriseTools,JobSkillImportancePython,JobSkillImportanceR,JobSkillImportanceSQL,JobSkillImportanceKaggleRanking,JobSkillImportanceMOOC,JobSkillImportanceVisualizations,JobSkillImportanceOtherSelect1,JobSkillImportanceOtherSelect2,JobSkillImportanceOtherSelect3,CoursePlatformSelect,HardwarePersonalProjectsSelect,TimeSpentStudying,ProveKnowledgeSelect,DataScienceIdentitySelect,FormalEducation,MajorSelect,Tenure,PastJobTitlesSelect,FirstTrainingSelect,LearningCategorySelftTaught,LearningCategoryOnlineCourses,LearningCategoryWork,LearningCategoryUniversity,LearningCategoryKaggle,LearningCategoryOther,MLSkillsSelect,MLTechniquesSelect,ParentsEducation,EmployerIndustry,EmployerSize,EmployerSizeChange,EmployerMLTime,EmployerSearchMethod,UniversityImportance,JobFunctionSelect,WorkHardwareSelect,WorkDataTypeSelect,WorkProductionFrequency,WorkDatasetSize,WorkAlgorithmsSelect,WorkToolsSelect,WorkToolsFrequencyAmazonML,WorkToolsFrequencyAWS,WorkToolsFrequencyAngoss,WorkToolsFrequencyC,WorkToolsFrequencyCloudera,WorkToolsFrequencyDataRobot,WorkToolsFrequencyFlume,WorkToolsFrequencyGCP,WorkToolsFrequencyHadoop,WorkToolsFrequencyIBMCognos,WorkToolsFrequencyIBMSPSSModeler,WorkToolsFrequencyIBMSPSSStatistics,WorkToolsFrequencyIBMWatson,WorkToolsFrequencyImpala,WorkToolsFrequencyJava,WorkToolsFrequencyJulia,WorkToolsFrequencyJupyter,WorkToolsFrequencyKNIMECommercial,WorkToolsFrequencyKNIMEFree,WorkToolsFrequencyMathematica,WorkToolsFrequencyMATLAB,WorkToolsFrequencyAzure,WorkToolsFrequencyExcel,WorkToolsFrequencyMicrosoftRServer,WorkToolsFrequencyMicrosoftSQL,WorkToolsFrequencyMinitab,WorkToolsFrequencyNoSQL,WorkToolsFrequencyOracle,WorkToolsFrequencyOrange,WorkToolsFrequencyPerl,WorkToolsFrequencyPython,WorkToolsFrequencyQlik,WorkToolsFrequencyR,WorkToolsFrequencyRapidMinerCommercial,WorkToolsFrequencyRapidMinerFree,WorkToolsFrequencySalfrod,WorkToolsFrequencySAPBusinessObjects,WorkToolsFrequencySASBase,WorkToolsFrequencySASEnterprise,WorkToolsFrequencySASJMP,WorkToolsFrequencySpark,WorkToolsFrequencySQL,WorkToolsFrequencyStan,WorkToolsFrequencyStatistica,WorkToolsFrequencyTableau,WorkToolsFrequencyTensorFlow,WorkToolsFrequencyTIBCO,WorkToolsFrequencyUnix,WorkToolsFrequencySelect1,WorkToolsFrequencySelect2,WorkFrequencySelect3,WorkMethodsSelect,WorkMethodsFrequencyA/B,WorkMethodsFrequencyAssociationRules,WorkMethodsFrequencyBayesian,WorkMethodsFrequencyCNNs,WorkMethodsFrequencyCollaborativeFiltering,WorkMethodsFrequencyCross-Validation,WorkMethodsFrequencyDataVisualization,WorkMethodsFrequencyDecisionTrees,WorkMethodsFrequencyEnsembleMethods,WorkMethodsFrequencyEvolutionaryApproaches,WorkMethodsFrequencyGANs,WorkMethodsFrequencyGBM,WorkMethodsFrequencyHMMs,WorkMethodsFrequencyKNN,WorkMethodsFrequencyLiftAnalysis,WorkMethodsFrequencyLogisticRegression,WorkMethodsFrequencyMLN,WorkMethodsFrequencyNaiveBayes,WorkMethodsFrequencyNLP,WorkMethodsFrequencyNeuralNetworks,WorkMethodsFrequencyPCA,WorkMethodsFrequencyPrescriptiveModeling,WorkMethodsFrequencyRandomForests,WorkMethodsFrequencyRecommenderSystems,WorkMethodsFrequencyRNNs,WorkMethodsFrequencySegmentation,WorkMethodsFrequencySimulation,WorkMethodsFrequencySVMs,WorkMethodsFrequencyTextAnalysis,WorkMethodsFrequencyTimeSeriesAnalysis,WorkMethodsFrequencySelect1,WorkMethodsFrequencySelect2,WorkMethodsFrequencySelect3,TimeGatheringData,TimeModelBuilding,TimeProduction,TimeVisualizing,TimeFindingInsights,TimeOtherSelect,AlgorithmUnderstandingLevel,WorkChallengesSelect,WorkChallengeFrequencyPolitics,WorkChallengeFrequencyUnusedResults,WorkChallengeFrequencyUnusefulInstrumenting,WorkChallengeFrequencyDeployment,WorkChallengeFrequencyDirtyData,WorkChallengeFrequencyExplaining,WorkChallengeFrequencyPass,WorkChallengeFrequencyIntegration,WorkChallengeFrequencyTalent,WorkChallengeFrequencyDataFunds,WorkChallengeFrequencyDomainExpertise,WorkChallengeFrequencyML,WorkChallengeFrequencyTools,WorkChallengeFrequencyExpectations,WorkChallengeFrequencyITCoordination,WorkChallengeFrequencyHiringFunds,WorkChallengeFrequencyPrivacy,WorkChallengeFrequencyScaling,WorkChallengeFrequencyEnvironments,WorkChallengeFrequencyClarity,WorkChallengeFrequencyDataAccess,WorkChallengeFrequencyOtherSelect,WorkDataVisualizations,WorkInternalVsExternalTools,WorkMLTeamSeatSelect,WorkDatasets,WorkDatasetsChallenge,WorkDataStorage,WorkDataSharing,WorkDataSourcing,WorkCodeSharing,RemoteWork,CompensationAmount,CompensationCurrency,SalaryChange,JobSatisfaction,JobSearchResource,JobHuntTime,JobFactorLearning,JobFactorSalary,JobFactorOffice,JobFactorLanguages,JobFactorCommute,JobFactorManagement,JobFactorExperienceLevel,JobFactorDepartment,JobFactorTitle,JobFactorCompanyFunding,JobFactorImpact,JobFactorRemote,JobFactorIndustry,JobFactorLeaderReputation,JobFactorDiversity,JobFactorPublishingOpportunity +"Non-binary, genderqueer, or gender non-conforming",,NA,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",SAS Base,Random Forests,F#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","College/University,Conferences,Podcasts,Trade book",,,,,Very useful,,,,,,,,Very useful,,,Somewhat useful,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,"Predictive Modeler,Programmer,Researcher",University courses,0,0,100,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service","Text data,Relational data",Rarely,10GB,"Neural Networks,Random Forests,RNNs","Amazon Web services,Oracle Data Mining/ Oracle R Enterprise,Perl",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Collaborative Filtering,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,Rarely,,,Often,,,,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,,,,,,,,0,100,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Privacy issues,Scaling data science solution up to full database",Rarely,,,,,,,,,,,,,,,,Often,Most of the time,,,,,26-50% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,I don't typically share data,Share Drive/SharePoint",,"Mercurial,Subversion,Other",Always,,,I am not currently employed,5,,,,,,,,,,,,,,,,,, +Female,United States,30,"Not employed, but looking for work",,,,,,,,Python,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",1-2 years,,Nice to have,Unnecessary,,Unnecessary,,Necessary,,,,,,,,,2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,10,30,0,30,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Somewhat important,, +Male,Canada,28,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Online courses,YouTube Videos",Very useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",1-2 years,Necessary,,,,,Necessary,,,,,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Machine Learning Engineer",University courses,20,50,0,30,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Poorly,Self-employed,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Friends network,Official documentation,Online courses,Personal Projects",,Very useful,Very useful,,Very useful,Very useful,,,,Very useful,Very useful,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Operations Research Practitioner,Predictive Modeler,Programmer,Other",University courses,30,0,40,30,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Base,SAS JMP,SQL,Tableau",Rarely,Often,,,Rarely,,,,Rarely,,,,,Rarely,Rarely,,,,,Rarely,Rarely,,Sometimes,,Rarely,,Rarely,,,,Rarely,,Rarely,,,,,Sometimes,,Rarely,,Often,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Random Forests,Simulation,Time Series Analysis",Sometimes,,Sometimes,,,,Sometimes,Often,Sometimes,,,,,,,Sometimes,Often,Sometimes,,Sometimes,,,Sometimes,,,,Often,,,Often,,,,50,20,0,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,Often,Often,Often,,Often,Often,Often,Most of the time,Often,Often,Often,,Often,Often,Often,Often,Often,Often,,100% of projects,Entirely internal,Standalone Team,Electricity data sets from government and states,"Everything is custom, there is never a tool that can do it all as they claim. Nothing is as easy as it might be, but that is the nature of the beast","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,"250,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Python,GitHub,"Arxiv,Conferences,Kaggle,Textbook",Very useful,,,,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,,,,"Data Machina Newsletter,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,60,5,5,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"5,000 to 9,999 employees",Stayed the same,Don't know,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,Sometimes,Often,Most of the time,Sometimes,,Most of the time,Sometimes,Often,Sometimes,,,,Most of the time,,Sometimes,,Sometimes,,Most of the time,Sometimes,,,,Sometimes,Often,,Most of the time,,Sometimes,,,,30,20,15,15,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,,,,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,Most of the time,,Sometimes,,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Male,Brazil,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Analyst,Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",45,25,20,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git,Other",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,TensorFlow,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,0,0,50,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",,,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service",,,,,"C/C++,Cloudera,Hadoop/Hive/Pig,Java,NoSQL,R,Unix shell / awk",,,,Sometimes,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,Most of the time,,,,,,,Sometimes,,Sometimes,Most of the time,Often,,,,30,10,30,20,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,I prefer not to say,Lack of data science talent in the organization",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Git",,,,,8,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,40,0,50,10,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important +Female,Australia,43,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by non-profit or NGO,Microsoft Excel Data Mining,Link Analysis,Python,University/Non-profit research group websites,"Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,YouTube Videos",,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Business Analyst,Work,70,0,30,0,0,0,Supervised Machine Learning (Tabular Data),,A doctoral degree,Non-profit,20 to 99 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Rarely,10TB,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,5,15,10,0,Enough to tune the parameters properly,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,AUD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,Software Developer/Software Engineer,Self-taught,10,70,15,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation","Image data,Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Sometimes,Often,,Sometimes,,Most of the time,Often,Most of the time,,,,Often,,,,Often,,Sometimes,,,Often,,Often,Sometimes,,,,Sometimes,,Sometimes,,,,40,30,15,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database",,Sometimes,,,Often,Often,,,Often,,Sometimes,,Rarely,,,Sometimes,Rarely,Rarely,,,,,Less than 10% of projects,Do not know,IT Department,Address database; geo data,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,1200000,RUB,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Russia,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection)","Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Non-Kaggle online communities,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,Very useful,,,Very useful,,,,,,Somewhat useful,"Data Elixir Newsletter,DataTau News Aggregator,Emergent/Future Newsletter (Algorithmia)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,10,0,30,40,0,20,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs",,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,Spark / MLlib,Tableau,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Rarely,Most of the time,,,Often,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,HMMs,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,,Often,,Most of the time,,Most of the time,,,,,,Often,Most of the time,,,Often,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,,,,Often,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,,,,,Often,,,Sometimes,,,,,,,,Sometimes,,,,,,26-50% of projects,Entirely internal,Standalone Team,data.gov; Open Data,Data Clean Up,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Bitbucket,Rarely,"95,000",INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,No,Yes,Engineer,Fine,Employed by college or university,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Kaggle,Online courses,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Master's degree,No,Doctoral degree,Mathematics or statistics,1 to 2 years,"Engineer,Predictive Modeler,Researcher",University courses,30,30,10,30,0,NA,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,54,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Online courses,Trade book",,Very useful,,,Very useful,,Very useful,,,,Very useful,,,,,Very useful,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines","Jupyter notebooks,MATLAB/Octave,Python,SAS Base,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Random Forests",,,,,,Often,Often,Often,Often,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,35,20,25,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,Often,,,Often,Often,,Often,,,,Often,,,Often,,,,26-50% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,1100000,TWD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,58,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,Python,Rule Induction,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book",,,,,,,Very useful,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,10,10,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,IBM Cognos,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Perl,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,Rarely,,,,,,,,,,,Sometimes,,Often,Often,Often,,,,,Rarely,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Decision Trees,Naive Bayes,Random Forests",Often,,Often,,,,,Often,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,40,20,20,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Sometimes,,,Often,,,,,,,,,,,,,,,Often,Sometimes,,26-50% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",Most of the time,"120,000",,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Italy,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Regression,Python,GitHub,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"Data Stories Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,39,"Not employed, but looking for work",,,,,,,,Python,,Python,University/Non-profit research group websites,"College/University,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",,PhD,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Engineer,Self-taught,50,40,0,10,0,0,Unsupervised Learning,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Not important,,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important +Male,United States,49,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Fine,Self-employed,Amazon Machine Learning,Proprietary Algorithms,Java,Google Search,"Online courses,Podcasts",,,,,,,,,,,Somewhat useful,,Very useful,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Other (Separate different answers with semicolon)",15+ years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Other,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Biology,More than 10 years,Computer Scientist,Self-taught,20,0,0,80,0,0,Reinforcement learning,Bayesian Techniques,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,,Very useful,,,,Very useful,,,,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,,University courses,30,20,0,50,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased significantly,Don't know,Some other way,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Never,1GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,0,80,0,20,0,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Other",,,,,,Often,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,More internal than external,Standalone Team,Mostly work with generated idealized data sets for testing algorithms,Generating good data sets that aren't tailored to a particular problem,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,R,Deep learning,Matlab,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Friends network,Tutoring/mentoring",Somewhat useful,Not Useful,,,Not Useful,Somewhat useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,Less than a year,"Engineer,Researcher",Kaggle competitions,10,30,0,10,50,0,"Computer Vision,Machine Translation","Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,Fewer than 10 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Never,100GB,"CNNs,Decision Trees,RNNs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Random Forests",,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,100000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Czech Republic,21,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,R,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Official documentation,Online courses,Textbook",,Somewhat useful,Not Useful,,,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",55,30,10,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Most of the time,100MB,"Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,R",,,,Sometimes,,,,,,,,,,,Often,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks",,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,Most of the time,,,,Often,,,,,,,,,,,,,,20,60,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Often,,,,Often,,,,,,Sometimes,,Most of the time,,10-25% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Mercurial,Always,20000,CZK,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Computer Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,25,0,0,25,"Recommendation Engines,Time Series","Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,500 to 999 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10TB,SVMs,"Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,SAS Enterprise Miner,Spark / MLlib",,Often,,,,,,,,,,,,,Often,,Rarely,,,,Rarely,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,25,50,5,15,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Company internal community,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,,Very useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,,University courses,10,60,0,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,Rarely,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",Rarely,,,,,Often,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,,,50,20,10,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,,,,,,,Sometimes,,Often,Sometimes,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,624000,RUB,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,51,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,I collect my own data (e.g. web-scraping),"Blogs,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,More than 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,15,35,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines",Logistic Regression,High school,Pharmaceutical,"5,000 to 9,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Always,100MB,Evolutionary Approaches,"IBM Cognos,NoSQL,Unix shell / awk",,,,,,,,,,Often,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,"Data Visualization,Evolutionary Approaches,Logistic Regression,Time Series Analysis",,,,,,,Often,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,Often,,,,25,30,10,25,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,Often,,,,,,Sometimes,,,Sometimes,,,26-50% of projects,More external than internal,Standalone Team,None,"Clean, to the point data",Other,Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,65000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Colombia,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Google Search,Government website,University/Non-profit research group websites","Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - GANs",A master's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,KNIME (free version),NoSQL,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,Most of the time,,Most of the time,,Often,,,,,Most of the time,Most of the time,,Most of the time,,Sometimes,,,,,,,,Often,,,,Most of the time,,Often,,,,,,Rarely,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",Rarely,,,,Sometimes,Often,Often,Most of the time,Sometimes,,,Sometimes,,Often,,Most of the time,,,Often,Often,Often,,Most of the time,Sometimes,Sometimes,Most of the time,Often,,Often,Often,,,,60,10,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,georeference data;financial scoring;customer behavioral data,"huge amount of data, with few metaqdata and local understanding of the data structures, there is a darth of information regarding the source data.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,156000000,COP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Germany,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,I don't plan on learning a new tool/technology,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,More than 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,40,0,40,0,20,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Internet-based,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Always,100GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,Other","C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Unix shell / awk,Other",,,,Sometimes,Often,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,,,Most of the time,Most of the time,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Most of the time,,,,Sometimes,Most of the time,Most of the time,Often,Often,,,Most of the time,,Often,,,,,Often,Often,Often,,Often,Sometimes,,,,,Often,Most of the time,,,,50,10,20,10,10,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,Sometimes,,,,Most of the time,,,,Most of the time,,Often,,,Most of the time,Often,,Most of the time,,100% of projects,Entirely internal,Business Department,,"Incompatibility between the hadoop and pydata stack. Hive is incredibly slow, impala just slow but incredibly unstable. Most reliable data exchange format: CSV. No joke (ok, pyarrow is starting to make things easier). This whole hadoop ecosystem needs to die in a fire.","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,"Bitbucket,Git",Sometimes,150000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Canada,32,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Genetic & Evolutionary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,College/University,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog",3-5 years,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,"Computer Scientist,Data Miner,Engineer,Machine Learning Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Denmark,53,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Personal Projects,Stack Overflow Q&A",,Very useful,,,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist",Work,40,10,50,0,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,Technology,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Neural Networks,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",,Sometimes,Often,,,,Most of the time,Often,,,,,,,,,,Sometimes,,Sometimes,,Often,,,,Often,Often,,,Most of the time,,,,20,30,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Limitations of tools,Organization is small and cannot afford a data science team",Often,,,,,,,,,,,,Often,,,Often,,,,,,,100% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Poland,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,10,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Pharmaceutical,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Sometimes,100MB,HMMs,"NoSQL,Perl,Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,Often,,,,,,,,,,,,,,,,,,Often,,,HMMs,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,25,20,25,20,10,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,,,,Rarely,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,126000,PLN,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Proprietary Algorithms,Python,University/Non-profit research group websites,"Arxiv,Blogs,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,,,,,,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Machine Learning Engineer,Researcher",University courses,20,10,20,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Often,Most of the time,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,,Sometimes,,Often,,,,60,5,15,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",Often,,,Often,,,,,,Sometimes,,,,,,,Often,,,,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Other,Rarely,"130,000",GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Google Search,"Arxiv,Conferences,Personal Projects",Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Work,30,0,60,10,0,0,Computer Vision,Support Vector Machines (SVMs),High school,Military/Security,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Video data,Most of the time,1TB,"Decision Trees,RNNs,SVMs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,,,,,,,,Sometimes,,,,Often,Sometimes,,Often,,Often,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,Often,,Sometimes,,,,Sometimes,,,Often,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,50,15,0,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,Regression/Logistic Regression,"MATLAB/Octave,Python,R,SAS JMP,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,Rarely,,,,,,,Most of the time,,Often,,,,,Sometimes,,,,,"A/B Testing,Data Visualization",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,5,15,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Sometimes,Sometimes,,,,,Often,,,Sometimes,,,,,,,Sometimes,,51-75% of projects,Entirely internal,Business Department,None,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,133000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,South Africa,23,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,Social Network Analysis,SQL,GitHub,College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,1 to 2 years,I haven't started working yet,University courses,40,0,0,50,10,0,Natural Language Processing,Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Sweden,48,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Other",,,,,,,,,,,,Very useful,,,,,,,,3-5 years,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Physics,,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Survival Analysis,Time Series","Ensemble Methods,Evolutionary Approaches,Other (please specify; separate by semi-colon)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important +Male,Sweden,37,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,Engineer,University courses,40,0,58,2,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data",Most of the time,10TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Often,Often,,Rarely,Often,,Sometimes,,Rarely,,Rarely,,Often,Often,,Sometimes,,Rarely,Sometimes,Sometimes,Sometimes,,Rarely,,,,15,35,20,10,20,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,Often,Often,Sometimes,Sometimes,,,,,,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,"Git,Subversion",Sometimes,80000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Canada,30,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,I don't plan on learning a new ML/DS method,Python,Google Search,"Blogs,Friends network,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,< 1 year,Necessary,Necessary,Necessary,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,,Less than a year,Business Analyst,Self-taught,25,25,20,15,15,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Retired,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Very useful,,,,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,15,10,75,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by company that makes advanced analytic software,Mathematica,,,,"Company internal community,Textbook",,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,"Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",30,30,0,40,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"5,000 to 9,999 employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,,,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Segmentation",,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,10,30,10,50,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Need to coordinate with IT",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,51-75% of projects,Entirely external,Business Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Stan,Monte Carlo Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Statistician",University courses,0,40,20,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,CRM/Marketing,"1,000 to 4,999 employees",Decreased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Julia,Jupyter notebooks,R,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,Rarely,,,,,,,Sometimes,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,Often,Sometimes,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,Often,Most of the time,Often,Sometimes,,,Sometimes,,,Often,Most of the time,,,,,Sometimes,,Most of the time,,,Often,,,,,,,,45,15,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Often,,,,,,Often,,,,,,Sometimes,Often,,,,Often,Most of the time,Sometimes,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Portugal,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,"Data Machina Newsletter,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Programmer",Kaggle competitions,40,5,25,0,30,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods",A master's degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,RNNs","Amazon Web services,Python,R,SAP BusinessObjects Predictive Analytics,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,Rarely,,,,,,,,,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Natural Language Processing,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Most of the time,,Often,Often,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,Sometimes,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,Rarely,,,,,,,,Often,,,,Often,Often,,,26-50% of projects,More internal than external,Business Department,,poor documentation,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Most of the time,30000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,SQL,Deep learning,R,"GitHub,Google Search,University/Non-profit research group websites","College/University,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Other,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,R,I collect my own data (e.g. web-scraping),"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,Very useful,,Very useful,,,Very useful,Very useful,,,,"O'Reilly Data Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",35,45,10,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Sometimes,Often,,,,,Often,Sometimes,Often,,,,,Often,Often,Most of the time,,,Often,,,Sometimes,Sometimes,,,,50,20,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",Most of the time,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,100% of projects,Entirely internal,Standalone Team,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,28000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Mexico,26,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Neural Nets,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,A social science,1 to 2 years,"Programmer,Researcher",Self-taught,50,20,25,5,0,0,Time Series,Logistic Regression,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,1GB,Regression/Logistic Regression,"Julia,Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,Sometimes,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,Sometimes,,,,40,40,10,5,5,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,,,,,,,Often,,,,Most of the time,Most of the time,,26-50% of projects,More external than internal,Standalone Team,"INEGI,google APIs",Learning curves,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,200000,MXN,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,New Zealand,34,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects",,,,,,,,,,,Very useful,Somewhat useful,,,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important +Female,,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Python,GitHub,"College/University,Tutoring/mentoring",,,Somewhat useful,,,,,,,,,,,,,,Very useful,,"Data Machina Newsletter,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Software Developer/Software Engineer,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,Reinforcement learning,"Bayesian Techniques,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Researcher,Statistician",University courses,80,0,5,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Mix of fields,100 to 499 employees,Decreased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,,,,Often,Most of the time,Sometimes,Often,,,,,Often,,Often,,,,,Most of the time,,Often,,,Sometimes,Sometimes,Sometimes,,,,,,20,10,15,15,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Rarely,,,,Often,,,,Rarely,Most of the time,,,,,,Sometimes,,,,,Most of the time,,76-99% of projects,More internal than external,Central Insights Team,,,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,C/C++/C#,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Becoming a Data Scientist Podcast,1-2 years,,,,,,,,,,,,,,,,,PhD,No,Doctoral degree,Electrical Engineering,,,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Logistic Regression,Markov Logic Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,Master's degree,Yes,Master's degree,A health science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,20,0,30,30,0,Computer Vision,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,,Very Important +Male,Japan,30,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,,,,Arxiv,Not Useful,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,University courses,20,0,40,40,0,0,Adversarial Learning,Bayesian Techniques,,Technology,Fewer than 10 employees,Decreased significantly,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Rarely,1MB,CNNs,Amazon Machine Learning,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,None,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Bitbucket,,,,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,New Zealand,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,DataRobot,Social Network Analysis,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Friends network,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,1 to 2 years,"Business Analyst,Other",Other,50,0,0,20,0,30,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Text data,,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,R,Other",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Text Analytics",Often,,Most of the time,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,Often,,,Often,,,,,40,30,0,10,20,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important +Female,Japan,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Other,University courses,40,20,10,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Always,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL",,Often,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Often,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Often,,Rarely,Most of the time,Most of the time,Rarely,Most of the time,,,Often,,Sometimes,,Often,,Sometimes,,,Most of the time,,Often,,,,,,,Sometimes,,,,20,40,10,12,18,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,,,,,Often,,,Most of the time,,,,,,,,Most of the time,,,,,,26-50% of projects,Entirely internal,Other,None.,Network security,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Mercurial",Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Taiwan,22,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,R,"University/Non-profit research group websites,Other","College/University,Conferences,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Statistician,University courses,10,0,25,65,0,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Canada,40,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A",Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,1 to 2 years,"Data Miner,Machine Learning Engineer,Researcher",Self-taught,20,NA,60,10,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,32,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,TensorFlow,Other,Stata,,"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Predictive Modeler,Researcher",University courses,15,10,20,50,5,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Government,500 to 999 employees,Increased significantly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression,Simulation,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Often,,,Sometimes,,,,25,20,5,20,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,Python,,"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,University courses,0,35,0,65,0,0,"Time Series,Unsupervised Learning",Bayesian Techniques,A bachelor's degree,Internet-based,,,,,Very important,,Basic laptop (Macbook),Text data,,,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,76-99% of projects,,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Iran,31,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Online courses,Textbook",Somewhat useful,,,,,,,,,,Very useful,,,,Somewhat useful,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,"Data Scientist,Researcher",University courses,35,20,15,30,0,0,"Computer Vision,Survival Analysis",Bayesian Techniques,A doctoral degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Sometimes,1GB,Bayesian Techniques,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,30,30,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Scaling data science solution up to full database",,Sometimes,Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,15000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Singapore,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",Work,0,40,60,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,,100MB,Random Forests,"Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Often,,,,,,,"Data Visualization,Random Forests,Segmentation",,,,,,,Often,,,,,,,,,,,,,,,,Often,,,Often,,,,,,,,50,30,10,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,72000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,54,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Fine,Self-employed,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Personal Projects,Textbook",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Engineer,Researcher",Self-taught,70,15,0,0,15,0,Other (please specify; separate by semi-colon),,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,,,,,,,,,,,,,,,, +Male,Russia,59,Retired,,,No,Yes,Other,Perfectly,Self-employed,Jupyter notebooks,Neural Nets,Python,Other,"Company internal community,Kaggle,Official documentation,Online courses,Tutoring/mentoring",,,,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,Very useful,,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),40+,Other,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Very useful,,Very useful,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Other",University courses,15,20,25,20,10,10,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,Often,Often,,Most of the time,Most of the time,Often,,,,Often,,Often,Often,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,Sometimes,,Often,,Most of the time,Often,Often,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,,,,,,Often,,,,,,,,,Most of the time,,,10-25% of projects,Entirely internal,IT Department,,Maintaining it's quality and keeping it clean,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,120000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Employed by professional services/consulting firm,R,Survival Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Newsletters,Non-Kaggle online communities,Online courses",,,,,,,,Very useful,Very useful,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Programmer,Statistician",Work,30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Don't know,10MB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Minitab,R,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,Most of the time,Often,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,50,40,0,5,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Other,Never,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Orange,MARS,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,Self-taught,30,0,10,30,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,,10GB,"RNNs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Sometimes,Often,Sometimes,Sometimes,,,,,,,,,,Often,Often,Sometimes,,Often,,,,,Often,,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,Often,,,,,,,,Sometimes,Rarely,,,,,,,,,,,,26-50% of projects,,,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Git,Sometimes,"100,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,France,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Time Series Analysis,Python,Google Search,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,1 to 2 years,Other,Self-taught,20,20,60,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Impala,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,Other",,,,,Sometimes,,,,,,,,,Sometimes,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,Sometimes,,,,Often,Often,,,,,,Often,,Often,,,Often,,,,Often,,,,,Sometimes,Often,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Sometimes,,,,,,,,Often,,,,,,Often,,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Germany,49,Employed part-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by company that makes advanced analytic software,Employed by college or university,Self-employed",Microsoft Azure Machine Learning,Text Mining,Python,"GitHub,I collect my own data (e.g. web-scraping)",Other,,,,,,,,,,,,,,,,,,,"Emergent/Future Newsletter (Algorithmia),FastML Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Data Miner,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,70,10,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",,Technology,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Relational data",Sometimes,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,TensorFlow",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Logistic Regression,Neural Networks,Prescriptive Modeling,Time Series Analysis",Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,,,Rarely,,Rarely,,,,,,,,Rarely,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Privacy issues,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Business Department,,Build up a common semantic,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Git,,150000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Bayesian Methods,Python,,"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,"Data Machina Newsletter,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Udacity","GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,,Data Scientist,Work,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Female,Russia,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Researcher",University courses,60,5,35,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Mix of fields,10 to 19 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,Most of the time,,,,Often,,,,,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Sometimes,,,,Rarely,Rarely,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Rarely,Rarely,Rarely,,Rarely,Most of the time,Most of the time,Most of the time,Often,Rarely,,Sometimes,,Most of the time,,Sometimes,,,,Rarely,Often,,Often,Rarely,,Often,,Rarely,Rarely,Most of the time,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Rarely,,,Sometimes,,,,,,,,Most of the time,,,Often,Most of the time,,100% of projects,More internal than external,Standalone Team,,"Privacy issues, lack of data governance","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,36000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Germany,59,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",15+ years,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Mathematics or statistics,,Other,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important +Male,United Kingdom,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",University courses,40,20,25,10,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Military/Security,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Never,100MB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,Often,,,,Often,,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,Sometimes,,,,Often,,,,,,,,,Often,,,,,,,,,,,Often,,,Often,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,Often,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",,,,,5,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,90,10,0,0,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,Technology,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),Image data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Machine Learning,C/C++,NoSQL,R",Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,20,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Bitbucket,Rarely,215000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,"Not employed, but looking for work",,,,,,,,KNIME (free version),Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Data Machina Newsletter,No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Gaming Laptop (Laptop + CUDA capable GPU),,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,20,15,25,15,15,Survival Analysis,"Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Israel,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,80,0,0,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods",A master's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Decision Trees,Ensemble Methods,Random Forests","Java,Python,Tableau",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests",Sometimes,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources",,,,,Sometimes,Often,,,,Most of the time,,,,,,,,,,,,,76-99% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Online courses",Very useful,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",Primary/elementary school,Government,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,Bayesian Techniques,"Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,Often,,,Often,Often,Often,Often,,,Often,,Often,,Often,,,,,Often,Often,,,,,,,,Often,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,,,Sometimes,,,,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Git,Rarely,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,1-2 years,,,,,,,,,,,,,,,,,,,Master's degree,,,,,90,0,5,0,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Kaggle,Textbook,Trade book",,,,,,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Computer Scientist,Engineer",Self-taught,70,10,10,5,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Sometimes,,Often,,,,,Rarely,,,,,,Often,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,SVMs",,,Sometimes,,,Often,,Often,,,,,,Most of the time,,Sometimes,,,Often,Often,,,Sometimes,,,,,Often,,,,,,70,15,5,5,5,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,IT Department,511 traffic data,Keeping the data organized and consistent. I am given very limited resources and space.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",,61000,HKD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,44,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Regression,Python,Other,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",35,50,15,0,0,0,,,High school,Internet-based,500 to 999 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,1GB,,"NoSQL,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,60,0,20,20,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",Often,,,,,,,,Often,,Often,,Often,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Bitbucket,Rarely,65000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Colombia,40,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,,,Often,,,,Most of the time,,Sometimes,,,Sometimes,,Sometimes,,,,Sometimes,,,Sometimes,,,,70,20,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,Most of the time,,,,,Often,,,76-99% of projects,More internal than external,IT Department,,IT people availabilty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git,Subversion",Never,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Often,,,,,Often,,Often,Most of the time,,,Often,,,,Sometimes,,,,,Sometimes,,Often,,,,,Sometimes,,Most of the time,,,,50,30,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,47,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Newsletters,Online courses",,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,,,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Computer Vision,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Female,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",R,"GitHub,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Online courses,Stack Overflow Q&A",,Very useful,,Somewhat useful,Very useful,,,,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Other,Self-taught,20,30,50,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Academic,500 to 999 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",Sometimes,Sometimes,Sometimes,Sometimes,,Often,Most of the time,Often,Often,,,,Sometimes,Often,,,,,,Sometimes,Often,,Often,,,,,Often,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Often,,Sometimes,,,,,,,,,Often,Often,,,,Often,Often,Often,,100% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,83500,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Other,Fine,Employed by non-profit or NGO,,Other,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,Very useful,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,"Engineer,Other",Other,20,50,0,25,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Text data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Other,Other",,,,Rarely,,,,,,,,,,,Rarely,,Often,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Sometimes,,,Sometimes,Most of the time,,,Most of the time,Most of the time,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,Sometimes,,,Sometimes,,,Often,,Often,,Often,Most of the time,Most of the time,Often,,Often,Sometimes,Sometimes,,,Sometimes,Most of the time,Most of the time,,,,50,10,5,25,10,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,100% of projects,More external than internal,Other,,labeling the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,7,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Newsletters,Stack Overflow Q&A,Textbook",,Very useful,,,,,,Somewhat useful,,,,,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,,University courses,15,0,30,55,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base,SQL",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Sometimes,Often,Sometimes,,,,,,Sometimes,,Often,,,,,Rarely,,Rarely,,,,,,Rarely,,,,,80,10,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Need to coordinate with IT",,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,10-25% of projects,Entirely internal,Business Department,none currently,dirty,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,115000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,"Not employed, but looking for work",,,,,,,,R,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,"Data Stories Podcast,DataTau News Aggregator,Linear Digressions Podcast",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,,Mathematics or statistics,Less than a year,"Data Analyst,Statistician",University courses,0,60,0,30,10,0,Survival Analysis,Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Germany,24,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,Canada,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Very useful,,,,,Not Useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,3 to 5 years,Researcher,Self-taught,95,5,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A doctoral degree,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),Unix shell / awk",,,,,,,,,,,,Rarely,,,,,Most of the time,,,Sometimes,Rarely,,Most of the time,,,,,,,,Most of the time,,Often,,Sometimes,,,,,,,,,,,,,Often,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,Most of the time,Most of the time,Often,,,,30,50,5,10,5,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,100% of projects,Approximately half internal and half external,Standalone Team,"adni, ppmi, other",metadata,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Tableau,,,,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Very useful,,Very useful,,,Not Useful,,,,Not Useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",< 1 year,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,"DataCamp,Udacity,Other",Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Business Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Not important,Not important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,Egypt,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,40,40,0,20,0,0,Recommendation Engines,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Colombia,47,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by college or university,R,,R,University/Non-profit research group websites,"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Very useful,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,"Coursera,DataCamp,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Master's degree,,Master's degree,Mathematics or statistics,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Hungary,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Jupyter notebooks,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Not Useful,Not Useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Statistician",University courses,40,20,40,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Never,1GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,Python,R,Tableau",,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,Often,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Text Analytics",Rarely,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,Rarely,Often,,,,,Often,,,Often,,,,,35,10,0,30,25,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,Sometimes,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,4000000,HUF,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Conferences,Kaggle,Personal Projects,Textbook",,,Very useful,,Somewhat useful,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",5,75,0,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Other,"Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS JMP,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,Sometimes,,,,,,,Sometimes,,,,,Often,,,,,,,"A/B Testing,Data Visualization,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,Often,,Sometimes,,,,,,Often,,,Sometimes,Often,Often,,76-99% of projects,Do not know,Business Department,"online data, digital media data, social data",hard to connect all data sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,80000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,A social science,,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,India,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,Other,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,edX,Basic laptop (Macbook),11 - 39 hours,PhD,No,Bachelor's degree,Electrical Engineering,,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,19,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,C/C++/C#,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,,,,,,,Basic laptop (Macbook),2 - 10 hours,,No,I did not complete any formal education past high school,,,I haven't started working yet,,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series",,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Nice to have,Nice to have,,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,50,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,Python,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Data Analyst,University courses,0,0,60,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Insurance,500 to 999 employees,Increased significantly,6-10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Simulation,Text Analytics",,,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,Sometimes,Most of the time,Rarely,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,,20,25,10,25,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Rarely,135000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,36,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,R,Google Search,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,France,36,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Other,Other",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,,< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,Unnecessary,Nice to have,,,,DataCamp,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,9,0,1,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,Somewhat useful,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,United States,59,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,QlikView,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,Often,Often,Sometimes,Sometimes,Sometimes,,,,,,Most of the time,Most of the time,Most of the time,,Sometimes,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Often,,,Often,Most of the time,Sometimes,,,Most of the time,,Often,Sometimes,,Sometimes,Most of the time,,Most of the time,,,,Often,,100% of projects,Entirely internal,Standalone Team,None,"Reliable data, i.e. Clean, standardized, formatted and correct data",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Most of the time,75000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Brazil,27,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Python,Monte Carlo Methods,Python,I collect my own data (e.g. web-scraping),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,United States,27,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Stata,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,I haven't started working yet,University courses,20,80,0,0,0,0,"Survival Analysis,Time Series",Logistic Regression,A bachelor's degree,Internet-based,Fewer than 10 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,Markov Logic Networks,"MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Text Analytics,Time Series Analysis",,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,0,0,0,0,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,,5,,,,,,,,,,,,,,,,,, +Male,Canada,34,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,Other,"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,,Very useful,,3-5 years,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Male,Argentina,25,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Github Portfolio,No,Master's degree,Engineering (non-computer focused),Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,30,0,0,70,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,68,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Online courses,Textbook",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Manufacturing,100 to 499 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data",,<1MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,Jupyter notebooks,Perl,Python",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Simulation,Time Series Analysis",Sometimes,,Sometimes,,,,Often,Often,,,,,,Often,,Often,,,,,,,,,,,Most of the time,,,Often,,,,15,40,20,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,,Often,,,,,,,,,,Often,,,,Often,,,26-50% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,190000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Germany,25,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,"Data Miner,DBA/Database Engineer,Researcher,I haven't started working yet",University courses,10,90,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,28,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Neural Nets,Python,University/Non-profit research group websites,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,,I don't write code to analyze data,Other,Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important +Male,India,27,"Not employed, but looking for work",,,,,,,,Spark / MLlib,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites",Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,DataCamp,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,I don't write code to analyze data,"Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Java,Cluster Analysis,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,,,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,Self-taught,50,40,0,10,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Retail,"5,000 to 9,999 employees",Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees","IBM SPSS Statistics,MATLAB/Octave,R",,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Segmentation,Text Analytics",,,,,,Often,Often,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,30,40,10,10,100,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",Often,,,,,,,,Sometimes,,,,,,Rarely,Sometimes,,,,,,,51-75% of projects,More internal than external,Standalone Team,No,No,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Git,Mercurial",Sometimes,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A",,,,Very useful,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,60,5,35,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Video data,Text data",Most of the time,100GB,"CNNs,Ensemble Methods,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",,,,Most of the time,,Sometimes,Often,,Sometimes,,Sometimes,,,Often,,Sometimes,,,Most of the time,Most of the time,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,Sometimes,Often,,,,,10,40,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others",Sometimes,,,,Often,Often,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Sometimes,80000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",80,10,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important +Female,United States,20,Employed part-time,,,No,Yes,Researcher,Fine,Employed by non-profit or NGO,Java,Deep learning,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Podcasts,,,,,,,,,,,,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",< 1 year,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Biology,Less than a year,Researcher,University courses,50,0,0,50,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,16,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Other,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - GANs,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Turkey,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,"Coursera,DataCamp,edX","GPU accelerated Workstation,Traditional Workstation,Other",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",19,80,0,0,1,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important +Female,United States,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Factor Analysis,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,Very useful,Somewhat useful,"FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,10,30,30,0,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Technology,"5,000 to 9,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,100GB,"Decision Trees,Neural Networks","Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft SQL Server Data Mining,R,SAS Enterprise Miner,SQL,Tableau",,Rarely,,,,,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Lift Analysis,Neural Networks,Text Analytics",,,,,,,Most of the time,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Most of the time,,,,,30,15,15,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,,,,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,,Other,Rarely,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Brazil,25,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Rarely,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,SVMs",,,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,,,,,20,70,0,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,,,,,Often,,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,1500,BRL,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,42,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Tutoring/mentoring",,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,,Master's degree,Physics,Less than a year,"Other,I haven't started working yet",University courses,5,40,5,50,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important +A different identity,India,24,"Not employed, but looking for work",,,,,,,,Java,"Ensemble Methods (e.g. boosting, bagging)",R,Google Search,Textbook,,,,,,,,,,,,,,,Very useful,,,,The Data Skeptic Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",,25,10,10,15,25,15,"Adversarial Learning,Computer Vision,Machine Translation,Reinforcement learning,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - GANs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,The Analytics Dispatch Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,30,10,20,10,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Random Forests,SVMs","C/C++,MATLAB/Octave,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",Often,Sometimes,Sometimes,,,Most of the time,Sometimes,Often,,Sometimes,,,,Most of the time,,,,Sometimes,Most of the time,Sometimes,Often,,Sometimes,Sometimes,,,,Sometimes,Most of the time,,,,,25,10,40,20,5,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Privacy issues",,Often,,,,,,,Often,,,,Sometimes,,,,Most of the time,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,25000,BRL,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,,IBM Watson / Waton Analytics,Text Mining,Python,,"Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,FastML Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,,University courses,20,20,10,40,10,NA,"Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,Very important,,,,,,,"Cloudera,Hadoop/Hive/Pig,Java,Python,R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Often,Often,Most of the time,Most of the time,,,,,Sometimes,,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,,70,10,5,10,5,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,,Sometimes,,,I am not currently employed,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Hungary,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Anomaly Detection,Python,"GitHub,Other","Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,30,30,30,0,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Regression/Logistic Regression,Other","Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow",,,,,,,,Often,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Other",,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,Often,,,Often,,,,,Often,,,20,10,0,50,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,I prefer not to say,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",Often,,,,Sometimes,,Most of the time,,Often,,,Most of the time,Most of the time,,,,,,,,,,100% of projects,Entirely internal,IT Department,kaggle.com,Signals similarity and Incorrect measurement e.g. EMC noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,372000,HUF,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Statistician",Self-taught,60,30,10,0,0,0,Time Series,"Bayesian Techniques,Logistic Regression",,Retail,100 to 499 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,Bayesian Techniques,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,Most of the time,,,,,,,,,,"A/B Testing,CNNs,Naive Bayes,Prescriptive Modeling,Random Forests",Sometimes,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,,,,70,20,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,,45000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,3 to 5 years,Data Scientist,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Always,10MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,,Most of the time,,Often,,,Most of the time,,,,Most of the time,,Most of the time,Often,Often,,,Most of the time,,,,,Most of the time,Sometimes,,,,,80,5,3,2,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,Privacy issues",,,,,Often,,,,,,,,,,,Most of the time,Sometimes,,,,,,100% of projects,Entirely internal,Standalone Team,NISR,None,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Always,10000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Very useful,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,Very useful,,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,"Computer Vision,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,80,20,0,0,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,26-50% of projects,Entirely internal,IT Department,Kaggle published datasets; Oxford for computer vision; Imagenet; Client market dataset,Understand the meaning of the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git,Subversion",Sometimes,60000,BRL,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Friends network,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Very useful,,,,,,,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,Jack's Import AI Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Russia,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,SAP BusinessObjects Predictive Analytics,Monte Carlo Methods,R,I collect my own data (e.g. web-scraping),"Stack Overflow Q&A,Other",,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Work,80,0,20,0,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Logistic Regression",No education,CRM/Marketing,,,,,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,10GB,"Bayesian Techniques,Markov Logic Networks","IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Perl,SQL",,,,,,,,,,Sometimes,Sometimes,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Decision Trees,Natural Language Processing,Prescriptive Modeling,Text Analytics",Sometimes,Often,Sometimes,,,,,Sometimes,,,,,,,,,,,Often,,,Often,,,,,,,Most of the time,,,,,15,40,30,0,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,Sometimes,,,,,Often,,,,,,,Most of the time,,,,,,,None,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,840000,RUB,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,Python,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,More than 10 years,Other,Work,50,10,40,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,10 to 19 employees,Stayed the same,More than 10 years,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Image data,Video data,Text data,Other",Always,1PB,Other,"Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,TensorFlow",Sometimes,Often,,Sometimes,,,,Often,Often,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Simulation,Time Series Analysis",Often,,,,,,,Often,Often,,Often,Often,,,,,,,,Most of the time,,,Often,,,,Most of the time,,,Most of the time,,,,50,40,0,0,10,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,10-25% of projects,Approximately half internal and half external,Standalone Team,,Having good and beads properly marked and classified.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,,,,5,,,,,,,,,,,,,,,,,, +Female,United States,30,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,40,25,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,,,,Necessary,Necessary,Necessary,,,,,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,,Very Important,Very Important,Very Important,,,,,,,Very Important +Male,Russia,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Time Series Analysis,Python,Government website,"Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,< 1 year,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,GPU accelerated Workstation,,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",45,35,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,India,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,O'Reilly Data Newsletter,< 1 year,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A health science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Mathematica,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer,Researcher",University courses,30,0,20,50,0,0,"Survival Analysis,Time Series",Markov Logic Networks,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Very Important +Female,United Kingdom,31,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,NoSQL,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,1 to 2 years,"DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,"1,000 to 4,999 employees",Decreased slightly,Less than one year,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Always,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs",SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Text Analytics",,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,60,4,3,3,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,Often,,Often,Most of the time,Often,,,Most of the time,,Most of the time,Often,Most of the time,Often,,,Often,,51-75% of projects,More internal than external,Standalone Team,,Lack of data understanding by the data creators,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,29000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,C/C++,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,Very useful,,,,Very useful,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,A social science,1 to 2 years,"Data Analyst,Data Scientist,Other",Kaggle competitions,50,25,0,10,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,SQL",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,15,0,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Often,Often,Often,,,,,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,,10-25% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Always,80000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,58,"Not employed, but looking for work",,,,,,,,R,Neural Nets,Python,Government website,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,A health science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,20,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,Java,GitHub,"Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,< 1 year,Necessary,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,60,40,0,0,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important +Female,India,30,"Not employed, but looking for work",,,,,,,,SAS Base,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites",College/University,,,Very useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important +Male,India,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Argentina,21,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,Not Useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Ukraine,18,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,,Not Useful,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,,,,Other,Traditional Workstation,11 - 39 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,70,0,5,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,Brazil,22,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Podcasts,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,Somewhat useful,,,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,Less than a year,Programmer,University courses,35,60,5,0,0,0,Recommendation Engines,Bayesian Techniques,Primary/elementary school,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Russia,24,"Not employed, but looking for work",,,,,,,,Python,Genetic & Evolutionary Algorithms,Python,Google Search,"Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,< 1 year,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Business Analyst,Programmer",Self-taught,70,30,0,0,0,0,,"Gradient Boosting,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,,,,,,,,,,,,,,, +Male,Italy,25,"Not employed, but looking for work",,,,,,,,TensorFlow,I don't plan on learning a new ML/DS method,Python,GitHub,"Arxiv,College/University,Conferences,Friends network,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,I haven't started working yet",University courses,5,30,0,40,0,25,"Natural Language Processing,Time Series","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Greece,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,,,3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,University courses,10,20,0,50,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Spain,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Personal Projects",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,,,Primary/elementary school,Telecommunications,500 to 999 employees,Decreased slightly,3-5 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",,10GB,Decision Trees,"Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Simulation,Text Analytics,Time Series Analysis",Often,Often,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,Most of the time,,,,40,20,15,15,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,Often,,,,Often,Often,Most of the time,,Most of the time,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by company that makes advanced analytic software,R,Random Forests,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Company internal community,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,Not Useful,Very useful,,Somewhat useful,Very useful,Not Useful,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician",Self-taught,30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression,Markov Logic Networks",High school,Technology,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,Gradient Boosted Machines,Markov Logic Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,QlikView,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,Unix shell / awk",Often,Most of the time,,,,,,,Often,,Sometimes,Sometimes,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,Often,,,,Most of the time,Sometimes,Most of the time,,,,Often,Sometimes,Sometimes,,Sometimes,Most of the time,,,Sometimes,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",Often,Often,Sometimes,,,Often,Most of the time,,,,,Sometimes,,,,Often,Sometimes,Often,,,,Most of the time,,,,Often,,,Sometimes,Most of the time,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Rarely,,Sometimes,Most of the time,Often,,,Often,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),,45000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,KDnuggets Blog,< 1 year,Necessary,Unnecessary,Nice to have,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,,,,,,,,, +A different identity,France,0,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,,Google Cloud Compute,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,,Somewhat useful,,,,,No Free Hunch Blog,< 1 year,Necessary,,,,Necessary,,,Nice to have,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,"Researcher,Software Developer/Software Engineer",Self-taught,30,0,0,0,70,0,Computer Vision,"Ensemble Methods,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,,Somewhat important,Very Important,,,,,,,,,Somewhat important,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Online courses,Personal Projects",,,Very useful,,Very useful,,,,,,Very useful,Very useful,,,,,,,"FastML Blog,KDnuggets Blog,Linear Digressions Podcast",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,Self-taught,25,50,0,25,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,27,"Not employed, but looking for work",,,,,,,,DataRobot,Text Mining,R,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Analyst,University courses,70,10,0,10,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,20+,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,49,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Other,I don't plan on learning a new ML/DS method,Other,Other,"Friends network,Textbook",,,,,,Very useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Data Analyst,Programmer",Self-taught,90,0,10,0,0,0,Machine Translation,,High school,Financial,500 to 999 employees,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,100TB,,"NoSQL,Perl,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,,,,,,,,,,,Rarely,,,,,,Often,Most of the time,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,0,0,80,0,20,0,Enough to run the code / standard library,"Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,Often,Most of the time,,,None,,IT Department,na,,Other,"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,"85,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,10,10,40,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,Spain,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by college or university,R,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",Very useful,Very useful,,,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,15,60,10,10,5,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",,10MB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,Sometimes,,,,,,,,,Rarely,,,,Most of the time,,,,Sometimes,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Rarely,,,,Often,,Rarely,,,,"CNNs,Cross-Validation,Data Visualization,GANs,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,RNNs,Simulation,Text Analytics,Time Series Analysis",,,,Sometimes,,Often,Most of the time,,,,Sometimes,,Rarely,,,Sometimes,,Sometimes,Often,Most of the time,,,,,Most of the time,,Often,,Often,Often,,,,40,30,20,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues",Most of the time,Most of the time,Sometimes,,,Often,,,,,,Often,Often,,,Often,Most of the time,,,,,,76-99% of projects,More internal than external,Other,,Data obtained from real robots in reinforcement learning tasks. Data change from run to run. A lot of time is needed to get a fair dataset,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,12000,EUR,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,62,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A professional degree,Non-profit,,,,,Not at all important,Other,Traditional Workstation,Text data,,,Other,"Cloudera,KNIME (free version),Python,R,Spark / MLlib,Tableau",,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Often,,,,,,,,Sometimes,,,,Often,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Segmentation,Simulation,Text Analytics",,,Sometimes,,,,Often,Sometimes,,,,,,,,,,Sometimes,,,,,,,,Sometimes,Often,,Sometimes,,,,,30,25,10,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Often,Most of the time,,76-99% of projects,Entirely internal,Other,N/A,Finding good datasets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,github,Other,Always,"65,000",USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,,"Kaggle,Textbook",,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,,2 - 10 hours,Kaggle Competitions,No,Professional degree,,6 to 10 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Kaggle competitions,20,0,0,0,80,0,,,A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Female,United States,21,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,,R,GitHub,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,20,0,0,80,0,0,,,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Stata,University/Non-profit research group websites,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,Less than a year,"Researcher,Statistician",University courses,50,10,0,30,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,25,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,20,0,0,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,Poland,58,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Podcasts,Trade book,Tutoring/mentoring",,,,,,,Very useful,,,,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,"Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,Often,,,,Most of the time,Sometimes,,,,,,,,Often,,Rarely,,,,Often,Rarely,,,,,Sometimes,,Sometimes,,,,70,20,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Most of the time,,,,Often,,Often,,,,,,Sometimes,,Often,,,Most of the time,,,,26-50% of projects,Do not know,Standalone Team,GUS; fb;tweeter;business data,building datasets and cleansing data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"3,000",PLN,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,France,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Linear Digressions Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,Sweden,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,,,Necessary,,,,,Basic laptop (Macbook),,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,,University courses,10,20,0,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Survival Analysis,Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,,"Data Elixir Newsletter,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,"Data Scientist,Researcher",University courses,30,15,20,35,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,Rarely,,,Rarely,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",Sometimes,,Often,Sometimes,Rarely,Often,Most of the time,,,,,,,Sometimes,,,,,,,Often,,Sometimes,Sometimes,,,,,,Often,,,,45,10,10,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Sometimes,Sometimes,Often,Sometimes,,Often,Often,Sometimes,,,Sometimes,Often,Most of the time,,,Sometimes,,Most of the time,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"130,000",USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Julia,Neural Nets,Julia,I collect my own data (e.g. web-scraping),"Conferences,Kaggle,Official documentation",,,,,Very useful,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Self-taught,100,0,0,0,0,0,,Evolutionary Approaches,,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,,Very important,,Workstation + Cloud service,,,1MB,Other,"Julia,Other",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"kNN and Other Clustering,Logistic Regression,Simulation,Other",,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,Often,,,,Most of the time,,,5,94,0,1,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data,Other",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,Often,Less than 10% of projects,,,Federal or state data sets,Dirty and not well documented,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Conferences,Personal Projects",,,Very useful,,Somewhat useful,,,,,,,Very useful,,,,,,,"Data Machina Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Other,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,27,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Oracle Data Mining/ Oracle R Enterprise,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Other,Work,25,0,75,0,0,0,,Logistic Regression,A bachelor's degree,Other,"1,000 to 4,999 employees",Increased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,,,"Amazon Machine Learning,Amazon Web services,SQL",Rarely,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,25,25,50,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,55000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,Python,Genetic & Evolutionary Algorithms,R,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",10,85,5,0,0,0,,,A doctoral degree,Government,500 to 999 employees,Stayed the same,Don't know,Some other way,"N/A, I did not receive any formal education",Other,Basic laptop (Macbook),Text data,,1MB,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,15,5,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Sometimes,,,,Often,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Other,,,Other,I don't typically share data,,,Don't know,25000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,< 1 year,Necessary,,,,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +A different identity,Canada,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Jupyter notebooks,Monte Carlo Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Podcasts,Textbook",Very useful,Very useful,Somewhat useful,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,Self-taught,85,5,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",,100GB,"Bayesian Techniques,HMMs,Neural Networks,RNNs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,Sometimes,,,Often,Often,,,,,,Rarely,,,,,,,Often,Sometimes,,,,Often,,Often,,,Often,,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database",Most of the time,,,,,Sometimes,,,Sometimes,,,Rarely,Often,,Most of the time,,,Often,,,,,100% of projects,Entirely internal,Other,,,,Share Drive/SharePoint,,Git,Rarely,85000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,19,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Personal Projects,Textbook",,,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,,,Talking Machines Podcast,1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Other,University courses,30,0,0,70,0,0,,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,India,23,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,10,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Recommender Systems,Segmentation",,,Most of the time,,,,Most of the time,,,,,,,,,Often,,Often,Sometimes,,,,,Often,,Sometimes,,,,,,,,25,40,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,Sometimes,,,,Often,,,Often,,,,,,,Sometimes,,Often,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,360000,INR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,23,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Tableau,Social Network Analysis,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,Less than a year,Other,University courses,5,25,50,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression",A master's degree,Financial,"5,000 to 9,999 employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,,1GB,Ensemble Methods,"C/C++,Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,Often,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,20,50,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",Often,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,,,,,,Often,,None,Entirely internal,IT Department,Credit data,,Other,"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"20,000",PLN,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Female,Canada,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,Not Useful,,Not Useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important +Female,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,Python,Neural Nets,Python,"Government website,University/Non-profit research group websites","Blogs,Stack Overflow Q&A",,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,Researcher,Work,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Insurance,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",Most of the time,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,Most of the time,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Often,,,,55,20,10,10,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,"NOAA weather data, NASA satellite data",Domain knowledge,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"92,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,South Africa,24,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Decision Trees,SAS,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Other,Other,80,0,0,20,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,Denmark,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,R,Google Search,"Arxiv,Blogs,College/University,Conferences,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Predictive Modeler,Programmer,Statistician",University courses,50,0,20,20,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Video data",Sometimes,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,TensorFlow,Unix shell / awk",Rarely,Rarely,,Most of the time,,,,,,,,,,,,,Often,,,,Often,Rarely,,Rarely,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,Sometimes,,Most of the time,Most of the time,,,,,,,Often,,Often,,,Sometimes,Sometimes,Most of the time,,Often,,,Often,,Sometimes,,,,,,30,40,0,25,5,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,Often,,,Often,,Often,,,,,,,Sometimes,Often,,,,100% of projects,More internal than external,Standalone Team,,High dimensional and few samples ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,66390,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,Not Useful,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,GPU accelerated Workstation,11 - 39 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,80,0,0,20,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,Brazil,29,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,Google Search,"Blogs,Online courses,Podcasts",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,,,,,"DataTau News Aggregator,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important +Male,Pakistan,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Neural Nets,C/C++/C#,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)",Neural Networks - RNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,People 's Republic of China,25,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Java,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,,,Very useful,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Doctoral degree,Mathematics or statistics,3 to 5 years,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",NA,15,0,75,10,0,"Reinforcement learning,Time Series",Gradient Boosting,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Conferences,Kaggle,Personal Projects,Textbook",,,Very useful,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Fine arts or performing arts,3 to 5 years,"Data Analyst,Statistician",University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,"5,000 to 9,999 employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Other",Never,1MB,"Neural Networks,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Mathematica,Microsoft Azure Machine Learning,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS JMP,SQL,TIBCO Spotfire",,,,,,,,,,,,,Sometimes,,,,,,,Rarely,,Rarely,,,,,,,,,Sometimes,,Most of the time,,,,Sometimes,Often,,Often,,Sometimes,,,,,Most of the time,,,,,"Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,Often,,,,,Sometimes,,Sometimes,,Sometimes,,,,70,15,0,15,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,Often,Often,,Sometimes,,,Sometimes,,,Sometimes,,,Most of the time,Most of the time,,76-99% of projects,More internal than external,Other,PHIS,"limited use, not clinically specific","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Never,79000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,57,"Not employed, but looking for work",,,,,,,,Tableau,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle,Online courses,Personal Projects",,,Very useful,,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,,,,,,,"Data Stories Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst,Data Miner",University courses,35,0,0,65,0,0,"Natural Language Processing,Recommendation Engines",Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Female,Netherlands,54,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Regression,R,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,,,Necessary,,Necessary,,,,,,,,,"Coursera,DataCamp",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,,,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Not important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,,Somewhat important,Very Important,Not important +Male,United States,50,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Other,"Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,,,,Very useful,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Always,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,SAS Base,SQL,Tableau,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Sometimes,,Often,,,,Often,Most of the time,,Most of the time,,,,Often,Most of the time,,Most of the time,,,,30,30,30,0,0,10,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues",Most of the time,,,Most of the time,,,,,Rarely,,,Most of the time,,Sometimes,Most of the time,,Most of the time,,,,,,Less than 10% of projects,Entirely internal,Other,None;,Working with my IT department.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,305000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,59,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Anomaly Detection,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,Data Analyst,University courses,15,4,60,20,1,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Other,Other",,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,Sometimes,,Often,,,,,,,Often,Often,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Most of the time,Often,Most of the time,,,,Often,Often,Often,Most of the time,Rarely,,Often,,Sometimes,Sometimes,Most of the time,,,,40,25,5,5,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Rarely,Sometimes,,Most of the time,Sometimes,,,,,,,Often,,,Often,,Sometimes,,51-75% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,90000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Data Miner,Perfectly,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,Google Search,"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,Not Useful,Very useful,Not Useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,Less than a year,,Self-taught,80,20,0,0,0,0,,,High school,Insurance,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Don't know,,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,70,10,0,5,15,0,Enough to tune the parameters properly,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,Often,,Most of the time,,26-50% of projects,Approximately half internal and half external,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,72,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Python,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Trade book",,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,,,Very useful,,,Very useful,,Very useful,,,"Partially Derivative Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,Adversarial Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Academic,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Image data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,SQL",,,,Often,,,,,,,,,,,,,,,,,,,Often,Often,Often,,,Often,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,40,10,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,,Sometimes,,,,,Most of the time,Most of the time,,,,Often,Often,,Most of the time,Often,,,Often,,76-99% of projects,More external than internal,Business Department,classified,receiving timely clearance,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial",Sometimes,"250,000",USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Canada,23,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,Not Useful,Very useful,Very useful,,Very useful,,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Researcher,Other,30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100GB,"Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,,,Often,Most of the time,Most of the time,,,,,Sometimes,Sometimes,,Rarely,,,,Most of the time,Sometimes,,Most of the time,,Most of the time,,,,,Most of the time,,,,60,20,5,15,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,Often,,,,,,,,Often,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,Git,Sometimes,"90,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog",< 1 year,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,,,,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,New Zealand,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,,,,,Very useful,"R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Software Developer/Software Engineer",Work,10,0,40,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Government,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Most of the time,Rarely,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,,,76-99% of projects,Approximately half internal and half external,Business Department,Government Statistics; Public Spending information; House Sale/Rent datasets,"Sourcing transaction, unaggregated data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,90000,NZD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Kaggle,Stack Overflow Q&A,Other",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,10,0,30,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Don't know,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,NoSQL,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,Rarely,,Sometimes,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Often,Sometimes,Often,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,Most of the time,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests",,Often,,,,Sometimes,Often,Often,Often,,,Sometimes,,Often,,Sometimes,,,,Sometimes,,Sometimes,Often,,,,,,,,,,,60,20,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues",,Rarely,,,,,,,Rarely,,,Rarely,,,,,Rarely,,,,,,26-50% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,,Don't know,40000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Not Useful,Very useful,Somewhat useful,,Somewhat useful,Not Useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,Other,Work,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Male,Ukraine,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,I don't write code to analyze data,,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important +Female,United States,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"DataTau News Aggregator,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,,,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United States,41,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,Link Analysis,Python,Government website,"Arxiv,Blogs,Kaggle,Official documentation,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Other,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,,100MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Data Visualization,HMMs,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,Most of the time,,,,Often,,,,,,Sometimes,,,,,Often,Most of the time,,Often,,,,,,,Sometimes,Most of the time,Often,,,,10,20,0,10,60,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Other,aviation safety reporting system; aircraft ADS-B data,Government agency filters the data in opaque manner,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,13,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,I don't plan on learning a new ML/DS method,Python,"Google Search,Other","Kaggle,Non-Kaggle online communities,Textbook,Tutoring/mentoring",,,,,,,Very useful,,Very useful,,,,,,Very useful,,Very useful,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Israel,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Very useful,,,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,15,0,15,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,,,,,Not at all important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Relational data,Sometimes,100MB,"Bayesian Techniques,Other","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,Rarely,,Sometimes,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction",,,Sometimes,,,,Most of the time,,,,,,,Most of the time,,,,Often,,,Most of the time,,,,,,,,,,,,,25,25,15,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,,,,,Rarely,,,,,,,,,,Sometimes,Most of the time,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,Russia,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,,< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,University courses,0,0,0,50,50,0,Recommendation Engines,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,Australia,18,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Not Useful,,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,I prefer not to answer,Other,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Evolutionary Approaches,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,GitHub,"Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,Very useful,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,0,0,100,NA,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,Python,QlikView,R,SAS Base,Tableau",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,Most of the time,Most of the time,,,,,Often,,,,,,,Most of the time,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,Often,,,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,Often,,,,Most of the time,,Most of the time,,,,30,30,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,Often,,,,,Often,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,55,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Other","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection)","Neural Networks - CNNs,Neural Networks - RNNs",No education,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Norway,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by non-profit or NGO,R,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,Somewhat useful,,,Very useful,Very useful,Not Useful,Very useful,,,Very useful,Somewhat useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,"Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,I prefer not to answer,Manufacturing,Fewer than 10 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Segmentation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,80,5,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Most of the time,Sometimes,,,Most of the time,Often,,,,,Most of the time,,,,,Most of the time,Most of the time,,,,Often,,51-75% of projects,More internal than external,Standalone Team,Public data sets on the economy from ssb.no,No big challenges,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email",,,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,30,5,10,20,30,"Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,Australia,NA,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Matlab,University/Non-profit research group websites,"College/University,Textbook",,,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,3-5 years,Nice to have,Necessary,Nice to have,,,,Unnecessary,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,Very Important,,Somewhat important,,,,,,,Somewhat important,Somewhat important, +Male,Netherlands,26,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Microsoft R Server (Formerly Revolution Analytics),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Very useful,,Very useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Business Analyst,University courses,30,0,0,20,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","KNIME (free version),Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",Often,,,,,Most of the time,Most of the time,Sometimes,,,,Often,,,,Often,,,Sometimes,,Rarely,,Sometimes,,,,,,,,,,,40,20,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Sometimes,,Sometimes,Most of the time,Often,,Sometimes,Often,,,Most of the time,Sometimes,,,,Often,,Sometimes,Sometimes,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,35000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Ireland,NA,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Python,"GitHub,I collect my own data (e.g. web-scraping)",Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Linear Digressions Podcast,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,,Less than a year,Business Analyst,Self-taught,50,0,50,0,0,0,"Adversarial Learning,Natural Language Processing,Speech Recognition,Survival Analysis",Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,Australia,32,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,,"Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,Very useful,KDnuggets Blog,< 1 year,,,,,,Necessary,Necessary,,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Management information systems,6 to 10 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Survival Analysis,,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,,,Somewhat important,,Very Important,Very Important,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Deep learning,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important +Male,Chile,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Not Useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",99,0,1,0,0,0,"Computer Vision,Natural Language Processing",Bayesian Techniques,A professional degree,Government,10 to 19 employees,Increased slightly,1-2 years,A general-purpose job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Rarely,1GB,"Bayesian Techniques,Neural Networks","C/C++,IBM Watson / Waton Analytics,Java,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,Rarely,,Most of the time,,,,,,,Sometimes,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"Bayesian Techniques,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks",,,Sometimes,,,,,Sometimes,,,,,,,,,,Most of the time,Often,Rarely,,,,,,,,,,,,,,80,10,0,10,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations of tools,Other",,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,Most of the time,None,Approximately half internal and half external,Other,BigQuery; INE (Instituto Nacional de EstadÌÎ_sticas); UCI,Create a model,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,800000,CLP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Canada,18,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Talking Machines Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,30,0,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Male,Other,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Engineer,Self-taught,100,0,0,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,10GB,Bayesian Techniques,"Python,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,Rarely,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Naive Bayes,Time Series Analysis",,,Most of the time,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,5,50,25,0,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,Sometimes,,,,,Most of the time,Most of the time,,,,,,,,,,,,10-25% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,NAS,Git,Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,24,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,SQL,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Company internal community,Kaggle,Stack Overflow Q&A",,,,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,Researcher,University courses,10,0,60,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,1MB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression",,,,,,,Often,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Most of the time,,,Often,,,,,,,,,,,Sometimes,,,51-75% of projects,More external than internal,,,,,Company Developed Platform,,Subversion,Never,480,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,France,36,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Data Analyst,Engineer",Other,50,10,20,0,0,20,Reinforcement learning,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Researcher",University courses,20,10,15,55,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Stayed the same,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests","C/C++,Microsoft SQL Server Data Mining,Python,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",,,,,,,Sometimes,Sometimes,Sometimes,,,,Sometimes,Often,,,,,,Sometimes,Sometimes,Most of the time,Sometimes,,,,Often,,,Sometimes,,,,30,40,10,10,10,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,Often,,Most of the time,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Very useful,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,70,20,0,10,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,R,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,SVMs",Sometimes,,Sometimes,,,Often,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,,Most of the time,,,Often,,,,,Sometimes,,,,,,80,10,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Lack of significant domain expert input",,,,Often,,Often,,,,,Often,,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,92000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,France,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Factor Analysis,R,University/Non-profit research group websites,"Kaggle,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Other",University courses,40,0,10,40,0,10,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,500 to 999 employees,Decreased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,NoSQL,R,SAS Base,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,Rarely,,,Sometimes,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,70,10,5,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Often,Sometimes,,,Most of the time,,,,,,Often,,,,,Often,Most of the time,,10-25% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,,Never,38000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Germany,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,Very useful,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,25,25,25,25,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,Internet-based,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,,"Amazon Web services,Hadoop/Hive/Pig,Python,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,10,20,20,20,30,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Rarely,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United States,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,More than 10 years,"Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,10,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Other",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,Rarely,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Sometimes,,Sometimes,Sometimes,Often,Often,,Often,,,,,Often,Sometimes,,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,Often,,,,,,,,,,,Often,,,100% of projects,Do not know,Other,WAY too many to list,Data warehousing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Hadoop/Hive/Pig,Time Series Analysis,R,University/Non-profit research group websites,"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Data Scientist,Engineer,Operations Research Practitioner",University courses,30,10,20,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,MATLAB/Octave,Python,R,SQL,Tableau",,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Often,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs,Time Series Analysis",,,Rarely,,,,Most of the time,Sometimes,,,,,,Often,,Often,,,,,,,Often,,,,,Often,,Sometimes,,,,60,20,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Most of the time,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Female,United States,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),,Experience from work in a company related to ML,No,I prefer not to answer,Computer Science,,"Researcher,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Female,Poland,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Decision Trees,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Stack Overflow Q&A",,,Very useful,,,,,,,,,,,Somewhat useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,I never declared a major,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Survival Analysis,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Computer Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,32,25,10,3,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"1,000 to 4,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Not very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,Rarely,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Random Forests,Text Analytics",,,,,,Most of the time,Often,Most of the time,Most of the time,,,,,Most of the time,,,,,Sometimes,,,,Most of the time,,,,,,Most of the time,,,,,66,15,15,2,2,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,Entirely internal,Standalone Team,,"Dirty data, Data cleaning, feature engineering","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Rarely,160000,USD,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,SQL,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,15,25,0,0,40,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Non-profit,,,,,Not very important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Microsoft Excel Data Mining,Python,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis,Other",,,Rarely,,,Often,Most of the time,Often,,,,,,Often,,Often,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,Often,Sometimes,Sometimes,,,30,15,0,15,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Other",Often,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,Most of the time,76-99% of projects,More external than internal,Standalone Team,"Twitter, quandl, kaggle, fivethrityeight",cleaning.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Always,25000,USD,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Italy,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,10,20,0,70,0,0,Natural Language Processing,Decision Trees - Random Forests,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Italy,52,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Podcasts",Somewhat useful,,,,,,Very useful,,,,,,Somewhat useful,,,,,,"Data Machina Newsletter,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,5,10,5,10,Supervised Machine Learning (Tabular Data),,I prefer not to answer,Financial,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,,,"Text data,Relational data",Never,10MB,Decision Trees,"KNIME (free version),Unix shell / awk",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,40,40,5,15,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,,Never,,EUR,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Newsletters,YouTube Videos",,,Very useful,,,Very useful,,Very useful,,,,,,,,,,Very useful,,< 1 year,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,5,45,0,50,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,,,Somewhat important,Not important,Somewhat important +Male,United States,24,Employed part-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Stan,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,Very useful,Very useful,,,,Very useful,Not Useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Statistician",University courses,10,10,0,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,QlikView,R,SAP BusinessObjects Predictive Analytics,SAS Base,SQL,Stan,Tableau",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,Often,Rarely,,,,Sometimes,Sometimes,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Random Forests,Simulation,Time Series Analysis",Sometimes,,Most of the time,,,Often,Most of the time,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,Most of the time,,,Sometimes,,,,65,10,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,Often,,Sometimes,Most of the time,Often,,Often,Sometimes,Sometimes,Sometimes,Often,,,,Sometimes,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,52000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Python,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,,Nice to have,,,Necessary,Nice to have,,Necessary,,,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer",Self-taught,40,40,0,0,20,0,"Computer Vision,Machine Translation,Natural Language Processing","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,,Somewhat important,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Not Useful,Somewhat useful,,Not Useful,,,Not Useful,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",50,40,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Russia,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,I never declared a major,,"Data Miner,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,United States,18,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Textbook,Other",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,United States,29,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Talking Machines Podcast",1-2 years,Necessary,,,,,,,Necessary,Necessary,Necessary,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,University courses,20,20,20,20,20,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,College/University,Online courses,Personal Projects",Somewhat useful,,Very useful,,,,,,,,Somewhat useful,Very useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,10,0,75,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important +Male,Belarus,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Python,Regression,Python,"Google Search,University/Non-profit research group websites","Kaggle,Podcasts",,,,,,,Very useful,,,,,,Very useful,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,"Computer Scientist,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,40,20,0,20,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM Watson / Waton Analytics,Python,Spark / MLlib,SQL",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,,,,,Most of the time,Often,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,Often,,,,,,,,20,40,20,20,0,0,Enough to tune the parameters properly,"Dirty data,Limitations of tools,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,Sometimes,,,,,Sometimes,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Rarely,"25,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Czech Republic,25,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",University courses,54,0,24,20,2,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Mix of fields,100 to 499 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Impala,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL",,,,,Most of the time,,Often,,Often,,Rarely,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,Often,,Most of the time,,Rarely,,,Sometimes,Sometimes,Rarely,Most of the time,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Most of the time,Often,Often,,,,,,Sometimes,,Often,,Sometimes,,,Most of the time,,Often,,,Sometimes,Often,Sometimes,Most of the time,Sometimes,,,,70,20,5,3,2,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,Often,Sometimes,Most of the time,,,,,Most of the time,,,Often,,,,Often,,Most of the time,,Sometimes,,51-75% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,25000,CZK,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Pakistan,20,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Google Search,"College/University,Kaggle,Personal Projects",,,Not Useful,,,,Very useful,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,20,0,0,80,0,Computer Vision,"Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Female,United States,28,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that performs advanced analytics,R,Anomaly Detection,Python,Google Search,"Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,Researcher,University courses,10,5,65,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,1TB,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,Sometimes,,Rarely,Rarely,,,,,Rarely,,Rarely,,Rarely,,Most of the time,Sometimes,,Rarely,,Most of the time,Often,,Rarely,,Sometimes,,,,40,40,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Often,Often,Often,Often,,,Often,Sometimes,Sometimes,Rarely,,,,Rarely,,,Most of the time,Often,Most of the time,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Mercurial,Subversion",Rarely,80000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects",,,,,,,,,,,Very useful,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,35,5,10,0,25,Time Series,"Bayesian Techniques,Logistic Regression",A doctoral degree,Manufacturing,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,,"Bayesian Techniques,Decision Trees,Random Forests","Mathematica,Python,SQL",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,PCA and Dimensionality Reduction",Sometimes,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,95,0,0,0,5,0,,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,27,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,R,University/Non-profit research group websites,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,Very useful,,,,,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important +Female,United Kingdom,32,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,Text Mining,R,Google Search,"Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Very useful,,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,Other,Self-taught,100,0,0,0,0,0,"Recommendation Engines,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Mix of fields,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not at all important,Other,Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,"Gradient Boosted Machines,Random Forests",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,95,3,0,1,1,0,Enough to tune the parameters properly,"Dirty data,Privacy issues,Scaling data science solution up to full database",,,,,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,,100% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Never,,,,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Online courses,Personal Projects",,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,26,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Textbook",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,"Business Analyst,Researcher",Self-taught,70,0,10,10,10,0,Outlier detection (e.g. Fraud detection),,"Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,,Bayesian Techniques,"IBM SPSS Statistics,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization",,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,10,20,50,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues",Often,Often,,,,,,,Often,,,,Often,,,Often,Often,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Monte Carlo Methods,R,"GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Rarely,100MB,"Regression/Logistic Regression,Other","R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Prescriptive Modeling",Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,40,40,0,20,0,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Most of the time,,,,Often,,Often,,,,,,,,,,,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Sometimes,120000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Female,France,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Text Mining,Python,Google Search,"Conferences,Friends network,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,Not Useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Ensemble Methods,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Female,United States,28,"Not employed, but looking for work",,,,,,,,Python,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Trade book",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,,,Very useful,,,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Other,Basic laptop (Macbook),0 - 1 hour,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Business Analyst,Other",Work,0,25,50,25,0,0,,"Evolutionary Approaches,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important +Female,Czech Republic,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,No,Professional degree,,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,70,0,10,10,0,,,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,50,10,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Rarely,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Sometimes,Most of the time,Often,,,,,,,,Often,,,,,Often,,Often,,,Often,,,Most of the time,,,,,50,20,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,96,PLN,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Neural Nets,Matlab,Google Search,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,< 1 year,,Nice to have,,,Necessary,,,,Necessary,,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,France,30,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Other",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,1 to 2 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A master's degree,Pharmaceutical,"10,000 or more employees",,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,GPU accelerated Workstation,"Image data,Text data,Other",,,Other,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Segmentation,Other",,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,35,15,10,15,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,Sometimes,,,,Often,,,,,Sometimes,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,33000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Australia,55,"Independent contractor, freelancer, or self-employed",,,No,Yes,Predictive Modeler,,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,,,,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",80,15,0,0,5,0,Reinforcement learning,Logistic Regression,I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Bachelor's degree,Biology,Less than a year,,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",No education,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Czech Republic,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Self-employed",Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",20,25,55,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Other",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,,Often,,Most of the time,,,,,,,Often,,,,10,30,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Other",Sometimes,1400000,CZK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Other,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Very useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist",University courses,55,10,10,25,0,0,,,A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Regression/Logistic Regression,Other","R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"A/B Testing,Data Visualization,Lift Analysis,Logistic Regression,Segmentation",Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Often,,,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,Sometimes,Most of the time,,,,Often,,Most of the time,,,Sometimes,Sometimes,,Sometimes,Often,,Sometimes,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,85000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Ireland,39,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Friends network,Kaggle,Official documentation,Textbook,YouTube Videos",Very useful,,Very useful,,Very useful,Very useful,Very useful,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,,11 - 39 hours,Master's degree,No,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Speech Recognition,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Male,Canada,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,,,High school,Internet-based,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,"N/A, I did not receive any formal education",,,,,,,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,0,0,0,0,100,,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,Sometimes,,,,,Often,Most of the time,,Often,,,,Sometimes,,,,,Often,,,None,Do not know,Other,,,,,,"Bitbucket,Git",,"80,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Microsoft Azure Machine Learning,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Company internal community,Conferences,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,Somewhat useful,,Not Useful,,,,,"FastML Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",Text data,Most of the time,100GB,"CNNs,GANs,Neural Networks,RNNs,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,Often,Often,Often,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,GANs,Neural Networks,RNNs,Simulation,SVMs",Often,,,Often,,Often,Often,,,,Often,,,,,,,,,Often,,,,,Often,,Often,Often,,,,,,50,30,5,15,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Often,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Sometimes,180000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Data Stories Podcast,FastML Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,75,0,0,15,0,,,"Some college/university study, no bachelor's degree",Internet-based,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,,1GB,"CNNs,Neural Networks,Random Forests","C/C++,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"CNNs,Cross-Validation,Ensemble Methods,Neural Networks,Random Forests",,,,Often,,Most of the time,,,Often,,,,,,,,,,,Often,,,Sometimes,,,,,,,,,,,15,5,10,20,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Often,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,18000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by non-profit or NGO,R,Association Rules,Python,"GitHub,Google Search,Government website,Other","Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Other,Other",,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,A humanities discipline,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,25,15,0,0,60,,Decision Trees - Random Forests,A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,42,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,Very useful,Very useful,Very useful,,,,,Very useful,Very useful,Very useful,,,,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Biology,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",23,33,10,0,0,34,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,18,Employed part-time,,,No,Yes,Other,Perfectly,Employed by company that makes advanced analytic software,Microsoft SQL Server Data Mining,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,KDnuggets Blog,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,Less than a year,,University courses,10,50,0,10,30,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,40,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Text Mining,Python,University/Non-profit research group websites,"College/University,Kaggle,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Data Machina Newsletter,Talking Machines Podcast",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Master's degree,Yes,Master's degree,A social science,Less than a year,Other,University courses,60,0,0,40,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Canada,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - RNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Italy,62,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,Employed by professional services/consulting firm,Java,Deep learning,Python,Google Search,"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,No,Yes,Other,Fine,Employed by government,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Other,Self-taught,10,90,0,0,0,0,,,"Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important +Male,United Kingdom,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses",Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,Data Analyst,University courses,10,10,10,55,15,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Insurance,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,RNNs","Angoss,Google Cloud Compute,Jupyter notebooks,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,Rarely,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,Sometimes,,Rarely,Often,,,Sometimes,Often,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,Most of the time,,,,,Most of the time,,Sometimes,,Sometimes,,,Sometimes,Most of the time,Often,,,,Often,Sometimes,,,Sometimes,Sometimes,,,,30,35,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,Often,Often,,,,Often,,,,,,,,Often,,Sometimes,Sometimes,,,76-99% of projects,More internal than external,Business Department,Weather,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,I don't typically share data",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,42500,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,20,0,0,10,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,,,"IBM SPSS Modeler,IBM SPSS Statistics,Java,Microsoft Excel Data Mining,SQL",,,,,,,,,,,Often,Often,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Random Forests,Segmentation",Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Need to coordinate with IT,Privacy issues",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,Other,Other,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,College/University,Official documentation,Online courses,Personal Projects,Textbook",Very useful,,Very useful,,,,,,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Neural Networks - RNNs,High school,Academic,"1,000 to 4,999 employees",,,A general-purpose job board,Very important,Other,Laptop or Workstation and private datacenters,Other,Never,,Neural Networks,"Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Neural Networks,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,0,0,0,0,0,0,Enough to run the code / standard library,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,Less than 10% of projects,Entirely internal,,,,,,,,,,,,9,,,,,,,,,,,,,,,,,, +Female,Spain,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Online courses,Personal Projects",,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,Statistician,University courses,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,I don't know,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,Often,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",,Most of the time,,,,Often,Most of the time,Often,,,,,,Most of the time,,Often,,,,,Most of the time,,Often,,,Most of the time,,,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Often,Most of the time,,,,,,,Often,Most of the time,,,,,,,Most of the time,,,76-99% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,14000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,,,"Blogs,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,3 to 5 years,"Data Analyst,Data Scientist,Programmer,Researcher,Statistician",University courses,10,10,30,50,0,0,"Natural Language Processing,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A master's degree,Telecommunications,I don't know,Increased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,,"Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Rarely,,,,Sometimes,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,Often,Sometimes,Sometimes,,Sometimes,,,,,,Sometimes,,,,,0,0,0,0,0,0,,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Not Useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,,Very useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,,,,,,,,,,,,,,,,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Self-taught,40,15,20,20,5,0,"Computer Vision,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Australia,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TIBCO Spotfire,Text Mining,Python,I collect my own data (e.g. web-scraping),"Company internal community,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Predictive Modeler,Self-taught,100,0,0,0,0,0,,,High school,Mix of fields,20 to 99 employees,Decreased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"Markov Logic Networks,Regression/Logistic Regression","Amazon Web services,Java,MATLAB/Octave,Python,QlikView,R,SQL,TIBCO Spotfire,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,,,Often,Sometimes,Often,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,,"Data Visualization,Ensemble Methods,Prescriptive Modeling,Simulation,Time Series Analysis,Other",,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,Most of the time,Often,,,30,5,10,10,5,40,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Most of the time,,,,,Sometimes,,Most of the time,Often,,Most of the time,,,,100% of projects,Entirely internal,Standalone Team,Too many to list. Mostly clients datasets and government datasets,Getting good descriptions of what is available and in what format from the client,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,109500,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Perfectly,Self-employed,TensorFlow,Deep learning,Python,GitHub,"Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,30,10,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,,,,,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",Sometimes,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics",,,,,,,Sometimes,Often,,,,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,Most of the time,Most of the time,Often,Often,Often,Most of the time,,,,,50,35,5,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Rarely,,,,,Most of the time,,,,,,,,,,Often,,,Often,,Often,,51-75% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,"Bitbucket,Git",Always,3600000,RUB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Poland,27,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Other,No,Bachelor's degree,Physics,Less than a year,Other,Self-taught,70,25,0,0,5,0,Computer Vision,Decision Trees - Random Forests,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, but looking for work",,,,,,,,R,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,,,,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Professional degree,,Less than a year,"Business Analyst,Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Survival Analysis,Time Series",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important +Male,New Zealand,72,Retired,,,Yes,,Scientist/Researcher,Fine,Employed by government,Other,Deep learning,Python,"GitHub,Government website,University/Non-profit research group websites","Company internal community,Newsletters,Personal Projects,Textbook",,,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,Google Search,"College/University,Conferences,Textbook",,,Somewhat useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,,Self-taught,30,0,10,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Manufacturing,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Never,10MB,Regression/Logistic Regression,Minitab,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,90,5,0,3,2,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Most of the time,,,,Most of the time,,,,Most of the time,,Often,Most of the time,Often,,,,Most of the time,,Less than 10% of projects,Entirely internal,Business Department,,Acquiring data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Australia,38,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,,3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Researcher",Self-taught,80,0,20,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important +Male,Canada,19,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Self-employed,Python,Regression,Python,Government website,"Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Laptop or Workstation and local IT supported servers,11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Malaysia,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Deep learning,Java,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Jack's Import AI Newsletter,Linear Digressions Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Yes,Bachelor's degree,"Information technology, networking, or system administration",,,,NA,NA,NA,NA,NA,NA,Speech Recognition,"Hidden Markov Models HMMs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Africa,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Personal Projects",Very useful,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,10,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Mix of fields,10 to 19 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Julia,Jupyter notebooks,Python,R,TensorFlow",,Sometimes,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation",Sometimes,,Rarely,Most of the time,Sometimes,Sometimes,Most of the time,,,,,,,Sometimes,,Rarely,,,,Often,Sometimes,,,Sometimes,Most of the time,Sometimes,,,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Rarely,,,,,,,,,,,,,Sometimes,Often,Most of the time,,76-99% of projects,More external than internal,Other,,Preparing the data for the models,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Physical hard drives,"Git,Other",Sometimes,360000,ZAR,Other,9,,,,,,,,,,,,,,,,,, +Male,Canada,52,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,20,10,70,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression",A master's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Python,R,Tableau",Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Segmentation",Often,,Sometimes,,,,Often,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,,,,60,20,0,20,0,0,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,90000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,39,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,60,0,0,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Other,50,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,10TB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Time Series Analysis",,,,,,Most of the time,Often,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,,,10,0,10,10,70,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Sometimes,Sometimes,,,,,,Often,,,,,,Rarely,Sometimes,Most of the time,,,51-75% of projects,Entirely internal,IT Department,none,access to the data,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Git,Most of the time,"39,000",USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Japan,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,C/C++,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Other",Self-taught,75,25,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - RNNs,I don't know/not sure,Financial,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Rarely,<1MB,Neural Networks,"C/C++,Java,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,90,10,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,None,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,0,JPY,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Sort of (Explain more),,Electrical Engineering,1 to 2 years,Researcher,University courses,20,10,0,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Very Important +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A",,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,Somewhat useful,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Most of the time,<1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,Rarely,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,Often,Rarely,,,Most of the time,,,,,,Most of the time,Sometimes,,,,40,20,15,15,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,Often,Most of the time,Often,,,Often,,,,,,,Sometimes,,,,,Most of the time,,10-25% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,Most of the time,180000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,New Zealand,28,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,20,50,10,0,0,Reinforcement learning,Neural Networks - CNNs,A bachelor's degree,Government,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Don't know,10TB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Prescriptive Modeling,Recommender Systems,Time Series Analysis",,,Sometimes,Most of the time,Often,,Most of the time,,,,,,,Sometimes,,Often,,Sometimes,,,,Sometimes,,Sometimes,,,,,,Most of the time,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,Often,,,,,,Sometimes,Sometimes,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,0,,Other,7,,,,,,,,,,,,,,,,,, +Male,Japan,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,29,"Not employed, but looking for work",,,,,,,,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity,Other",Other,2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,Less than a year,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,,,,,,,,, +Female,Mexico,37,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,R,Other,SQL,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,,,Very useful,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Programmer,Self-taught,80,0,20,0,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),A master's degree,Technology,Fewer than 10 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Always,10GB,"Bayesian Techniques,Decision Trees,Neural Networks","NoSQL,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Segmentation,Simulation",,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,20,30,30,20,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,31,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,,,Very useful,,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,Udacity,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",11 - 39 hours,Master's degree,Yes,Master's degree,Biology,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,"FastML Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Most of the time,1GB,"CNNs,Ensemble Methods,Neural Networks",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Neural Networks",,,,Most of the time,,Most of the time,Most of the time,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,ImageNet,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,60000,RUB,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,SQL,GitHub,"Blogs,Friends network,Personal Projects",,Somewhat useful,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Data Scientist",Self-taught,60,5,25,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Always,,,"Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,Rarely,Rarely,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,Logistic Regression,Random Forests,Simulation,Time Series Analysis",Sometimes,,Most of the time,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,Sometimes,,,,Most of the time,,,Most of the time,,,,20,45,20,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,Often,,,Often,Rarely,,,Sometimes,,,,,,Often,Rarely,Often,,10-25% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Sometimes,155000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Italy,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important +Male,United States,23,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,,TensorFlow,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,PhD,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",33,30,0,35,2,NA,"Adversarial Learning,Computer Vision,Natural Language Processing,Speech Recognition","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important +Male,United States,52,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,0,40,50,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Other,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Perl,Python,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,Sometimes,,Often,,,,Often,,Often,,,,,,,Rarely,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,Sometimes,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,Rarely,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,Often,,,Rarely,Rarely,Sometimes,Often,,,Sometimes,,,Sometimes,Rarely,Often,Often,Sometimes,,,,45,25,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Sometimes,Most of the time,,,,,Often,,,,,,,,,,Sometimes,Often,,26-50% of projects,More internal than external,Business Department,"limited data from automotive partners; limited data from industrial control system partners; UCI repository; Kaggle repositories (Clinton email corpus, ENRON email corpus); other one-off uses as needed/encountered","Much of the data I need for projects in new areas is unavailable, thus we have to synthetically emulate the kind of data we need for the analytics we are developing. In a big company, you often need to prove your concept with a PoC before anyone will commit resources to getting you adequate real-world data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"170,075",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,,,,,,,Very useful,Talking Machines Podcast,< 1 year,Nice to have,Nice to have,Nice to have,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Master's degree,Physics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,R,Link Analysis,R,GitHub,"Blogs,Conferences,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,Very useful,,,,,,,,Somewhat useful,Very useful,,,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,,,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Other,Self-taught,30,25,20,15,10,0,Time Series,Markov Logic Networks,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,,,,,,,, +Male,France,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,0,40,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,C/C++,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Not Useful,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler",Work,30,10,50,5,5,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Insurance,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,1GB,"Decision Trees,GANs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,RNNs,Segmentation,Time Series Analysis",Often,Sometimes,,,,,Most of the time,Often,Sometimes,,,,,,Often,Often,,,,Often,Often,Sometimes,,,Sometimes,Most of the time,,,,Most of the time,,,,50,25,15,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,Sometimes,Often,,,,,,,,,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,1200000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,15,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Online courses,Personal Projects",,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,Other",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Monte Carlo Methods,SQL,I collect my own data (e.g. web-scraping),"Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,1 to 2 years,"Business Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Most of the time,1GB,Bayesian Techniques,"Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Text Analytics,Time Series Analysis",Often,Often,Most of the time,,,Often,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,,,,,,Sometimes,Most of the time,,,,30,5,5,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,Often,,,,Sometimes,,,,,,,Often,,,,,,,76-99% of projects,Entirely internal,IT Department,CMS,Shit data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,110000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,Not Useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,No,Master's degree,,1 to 2 years,Business Analyst,Self-taught,5,4,1,90,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Not important,Somewhat important +Male,United States,29,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,,Python,University/Non-profit research group websites,"Blogs,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Very useful,,,,Very useful,,3-5 years,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,No,Bachelor's degree,A humanities discipline,,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,Australia,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Personal Projects,Textbook",Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,,,,,Very useful,,,Somewhat useful,,,,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,6 to 10 years,"Data Scientist,Researcher,Other",Self-taught,65,5,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",,Sometimes,10GB,,"C/C++,Flume,Google Cloud Compute,Julia,Jupyter notebooks,Python,R,SQL,TensorFlow",,,,Rarely,,,Rarely,Most of the time,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Sometimes,,,,,,"Data Visualization,Naive Bayes,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,,,,,Rarely,,,,,,,,Rarely,,,,Sometimes,,,,20,5,10,35,30,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Git,Other",Always,300000,AUD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Russia,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Non-Kaggle online communities,Online courses,Personal Projects",,Very useful,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Other",2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Biology,,"Data Analyst,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - RNNs,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Italy,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Linear Digressions Podcast,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,United States,42,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Linear Digressions Podcast,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Master's degree,No,Master's degree,Management information systems,Less than a year,"Business Analyst,Data Analyst,Machine Learning Engineer",Self-taught,90,10,0,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,24,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,PhD,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,0,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important +Female,United States,18,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Psychology,,,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important +Female,United States,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,5,5,0,85,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,Netherlands,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,PhD,No,Bachelor's degree,Physics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,United States,22,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,IBM Watson / Waton Analytics,Association Rules,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Researcher,University courses,50,0,25,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,10 to 19 employees,,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,1GB,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Orange,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Rarely,,Rarely,,Most of the time,,Rarely,,,,,,,,,,Sometimes,,,,,,,"Decision Trees,Evolutionary Approaches,Natural Language Processing,Text Analytics",,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,30,30,0,5,35,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,Rarely,,,,Sometimes,Rarely,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,Time Series Analysis,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Personal Projects",,Very useful,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,Researcher,Self-taught,80,0,0,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Academic,I don't know,Increased slightly,Don't know,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Rarely,10GB,"HMMs,Markov Logic Networks,Neural Networks,RNNs","Jupyter notebooks,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"HMMs,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,Most of the time,,,,Most of the time,,Often,,,Most of the time,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,Often,Most of the time,Most of the time,Most of the time,,,,,,,,Often,,100% of projects,More internal than external,,,,Key-value store (e.g. Redis/Riak),I don't typically share data,,Git,Sometimes,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Other,20,"Not employed, but looking for work",,,,,,,,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,,,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,20,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Evolutionary Approaches,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Germany,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Cluster Analysis,C/C++/C#,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Jack's Import AI Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Online Courses and Certifications,No,Master's degree,A health science,1 to 2 years,"Data Analyst,DBA/Database Engineer,Statistician",Self-taught,40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,Bayesian Methods,R,Google Search,"College/University,Company internal community,Personal Projects",,,Very useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,Management information systems,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,20,0,5,75,0,0,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Markov Logic Networks,Neural Networks - CNNs",,Mix of fields,Fewer than 10 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Other,Don't know,<1MB,,"Amazon Web services,C/C++,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,SQL",,Rarely,,Most of the time,,,,,,,,,,,Sometimes,,Rarely,,,Rarely,Rarely,,,,,,Rarely,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Association Rules,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,30,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,76-99% of projects,Entirely internal,IT Department,,Collecting usable data from lab machines,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,,,6,,,,,,,,,,,,,,,,,, +Male,Czech Republic,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Spark / MLlib,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,Very useful,"Data Stories Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,45,20,30,5,0,0,Computer Vision,"Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs",A master's degree,Internet-based,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,Neural Networks,"Amazon Web services,Hadoop/Hive/Pig,Python,SQL",,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,0,50,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Privacy issues",,,,,Often,,,,Sometimes,,,,,,,,Rarely,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,AWS S3,Git,Most of the time,"36,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,48,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,"FlowingData Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,Business Analyst,University courses,20,5,0,60,15,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,Poland,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,I did not complete any formal education past high school,,1 to 2 years,I haven't started working yet,University courses,60,20,0,20,0,0,Other (please specify; separate by semi-colon),"Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Korea,27,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Microsoft Azure Machine Learning,Time Series Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Very useful,,,Very useful,,,,Very useful,Very useful,Very useful,,,Very useful,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",Work,30,30,40,0,0,0,Computer Vision,Logistic Regression,A master's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Don't know,1TB,Regression/Logistic Regression,"Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Simulation,Text Analytics",,,,,,Sometimes,Rarely,Rarely,,,,,,,,Often,,,,,,,,,,,Most of the time,,Most of the time,,,,,40,10,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,Sometimes,,,Most of the time,,,,Often,,,,,,Often,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,40000000,KRW,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Brazil,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,Data Stories Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,Denmark,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Weka,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,,,,,,,,Very useful,,Very useful,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Researcher,University courses,40,15,5,39,1,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods",I prefer not to answer,Other,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,10GB,"Decision Trees,Ensemble Methods,Random Forests","C/C++,Jupyter notebooks,NoSQL,Python,SQL",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization",Sometimes,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,Sometimes,,,,,,Sometimes,Often,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,,,,,,,,Very useful,,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Programmer,Researcher",University courses,40,30,10,20,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Sometimes,1GB,"CNNs,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,30,40,30,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,10-25% of projects,Do not know,Standalone Team,"imaginet, cifar10, SVHN, mnist, ISBI (skin cancer images)",preprocessing in ISBI dataset,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,30000,PEN,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Mathematica,Bayesian Methods,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,Very useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,"Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Partially Derivative Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Statistician",University courses,0,25,0,75,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,South Korea,36,Employed full-time,,,No,Yes,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,A humanities discipline,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Natural Language Processing,Decision Trees - Random Forests,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important +Male,Pakistan,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,MARS,Other,Government website,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Not Useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Telecommunications,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,1GB,Markov Logic Networks,"C/C++,IBM Cognos,Java,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,Rarely,,,,,,Sometimes,,,,,Most of the time,,,,,,,,Most of the time,,,,Sometimes,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,,Often,Often,Most of the time,,,,"A/B Testing,Decision Trees,Ensemble Methods,Markov Logic Networks,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",Sometimes,,,,,,,Sometimes,Sometimes,,,,,,,,Often,,Sometimes,Sometimes,,,Sometimes,,,,,,Often,,,,,30,10,2,30,20,8,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues",Most of the time,Often,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,,,,100% of projects,Do not know,IT Department,,Privacy,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Nigeria,22,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Decision Trees,R,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,10,30,0,50,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Relational data",Sometimes,10MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Python,R,TensorFlow",,,,,,,,,Rarely,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks",,,Often,Most of the time,,,,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,,,,,,50,50,0,0,0,0,Enough to run the code / standard library,"Dirty data,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,Most of the time,,,,,,,,,Often,,,,51-75% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Share Drive/SharePoint,,"Git,Other",Sometimes,87000,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Web services,Regression,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",5-10 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,Necessary,,,,,0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Machine Translation","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Markov Logic Networks",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,France,32,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Deep learning,Python,"GitHub,Google Search",College/University,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,Computer Scientist,Self-taught,90,10,0,0,0,0,Computer Vision,"Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Increased slightly,Don't know,A general-purpose job board,Very important,,GPU accelerated Workstation,"Image data,Video data",,10GB,"CNNs,Decision Trees,HMMs,Neural Networks,RNNs,SVMs","C/C++,Java,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,GANs,HMMs,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,,Often,,,,Sometimes,,,Often,,,,,,0,0,0,100,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Other,,,Other,Other,,Git,Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Textbook,Trade book",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Engineer,Software Developer/Software Engineer",University courses,15,80,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Other",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,75,5,0,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Text data,,10GB,Other,"C/C++,Jupyter notebooks,Python,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,kNN and Other Clustering,PCA and Dimensionality Reduction",Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,,60,10,2,15,13,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,100% of projects,Entirely internal,Central Insights Team,n/a,generation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",git,"Bitbucket,Git",Rarely,49000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Belgium,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,,,,Very useful,,,,,"Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Other,Self-taught,60,25,15,0,0,0,Outlier detection (e.g. Fraud detection),Ensemble Methods,,Financial,,,,,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,Ensemble Methods,"Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,NoSQL,Perl,Python,Spark / MLlib,Unix shell / awk",,,,,,,,,Often,Sometimes,Rarely,Rarely,,,,,,,,,Sometimes,,,,,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Simulation,Text Analytics",,Often,,,Rarely,,Often,Sometimes,Most of the time,,,,,,,,,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,,40,15,20,15,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",Often,,Sometimes,Often,Most of the time,,,Sometimes,Often,,Often,,Most of the time,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Never,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,Very useful,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,,"Data Analyst,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,27,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,Very useful,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Programmer,University courses,30,0,0,70,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Russia,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Self-employed",Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,Jack's Import AI Newsletter,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",Self-taught,50,10,25,5,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Often,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",Often,,Sometimes,,Often,Most of the time,Most of the time,Often,Most of the time,,,Often,,Most of the time,,Often,,Sometimes,Often,,Often,,Often,Often,,Most of the time,,,Most of the time,,,,,37,23,13,12,15,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Often,,,,,,,,,,,,,Often,Often,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,18000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,New Zealand,58,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Government website,"Arxiv,Blogs,Online courses,Stack Overflow Q&A,Textbook,Other",Very useful,Very useful,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Data Scientist,Other",Work,10,20,60,10,0,0,Other (please specify; separate by semi-colon),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Telecommunications,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Other,Always,10TB,Neural Networks,"Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,SAS Base,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,Often,,,Sometimes,Often,,,,,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Rarely,,,,Most of the time,,,Rarely,Most of the time,,Most of the time,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,30,10,50,10,0,0,Enough to run the code / standard library,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Often,,,,100% of projects,More internal than external,Standalone Team,Government Tourism Statistics;Census,Volume (3 billion records per day),Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Bitbucket,Sometimes,120000,NZD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,,"Blogs,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Not Useful,,Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,Linear Digressions Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,Other,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Not very important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,Most of the time,,Sometimes,,Sometimes,,,,,,Most of the time,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Sometimes,,,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Sometimes,,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,MATLAB/Octave,Anomaly Detection,R,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Researcher,Self-taught,70,30,0,0,0,0,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon),High school,Academic,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Rarely,,Neural Networks,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,30,0,70,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,Russia,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Psychology,Less than a year,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,23,Employed part-time,,,No,Yes,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Partially Derivative Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,A social science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Tableau,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Other,Work,5,5,85,5,0,0,Natural Language Processing,,A professional degree,Pharmaceutical,100 to 499 employees,,Don't know,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,,,,Amazon Web services,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization",Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,25,0,0,25,50,0,Enough to tune the parameters properly,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,26-50% of projects,More internal than external,IT Department,,,,Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"65,000",USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Python,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,Very useful,,,,Very useful,Somewhat useful,Not Useful,,Very useful,Very useful,"O'Reilly Data Newsletter,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Unnecessary,,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Other,University courses,30,10,0,50,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,32,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,70,0,0,30,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,,,,,,,,,,,,,,,, +Female,Japan,27,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,SAS Base,Deep learning,,,Blogs,,Very useful,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,Self-taught,95,0,0,0,5,0,Adversarial Learning,Decision Trees - Random Forests,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,France,36,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",NoSQL,,R,,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Engineer,Software Developer/Software Engineer",Kaggle competitions,0,0,0,0,100,0,,,,,20 to 99 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,"Text data,Relational data",,,,"C/C++,Java,MATLAB/Octave,Python,SQL",,,,Rarely,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,0,50,20,30,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Most of the time,,,,,,,,,,Rarely,Often,,,,,,Often,,,,76-99% of projects,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Female,United Kingdom,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Microsoft Azure Machine Learning,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",Primary/elementary school,Insurance,,,,,"N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,1MB,Decision Trees,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Prescriptive Modeling",,,,,,,Often,Rarely,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,60,20,10,5,5,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Nigeria,42,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Poorly,Self-employed,Python,Deep learning,R,Google Search,"Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,YouTube Videos",,,,,,,Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,Nice to have,Necessary,,,,Coursera,,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,Bayesian Techniques,A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Stack Overflow Q&A",,Very useful,,,,,,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,75,15,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Government,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Don't know,10MB,"Decision Trees,Random Forests","C/C++,Microsoft Azure Machine Learning,R,SQL,Tableau",,,,Rarely,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,30,40,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,Often,,,,,,,,,Most of the time,,,Sometimes,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Rarely,40000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Other,Fine,Employed by government,SQL,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Personal Projects,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",70,20,10,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Military/Security,I don't know,Stayed the same,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Video data,Relational data",Sometimes,100MB,Regression/Logistic Regression,"MATLAB/Octave,Python,Other",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,Often,"Bayesian Techniques,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,Rarely,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,Sometimes,,,,5,5,5,5,40,40,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,,Often,,Most of the time,,,,Less than 10% of projects,Entirely internal,Other,none,limited data; limited tools; limited expertise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,16,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Neural Nets,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,,Nice to have,Nice to have,,,,,,,,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,Australia,26,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,Other",,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,O'Reilly Data Newsletter,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,"DataCamp,Other","Basic laptop (Macbook),Traditional Workstation",40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",10,15,40,25,4,6,,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Female,United States,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,GitHub,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,45,45,0,10,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Financial,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10TB,Decision Trees,"Amazon Web services,Java,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Often,Sometimes,Sometimes,,,,,,,Sometimes,,,Often,Sometimes,,,Often,,,,,,Most of the time,Most of the time,,,,15,10,10,15,40,10,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Often,,,,,,,,,,,,,,,Often,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,R,Random Forests,Stata,Google Search,"College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Not Useful,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Programmer,Researcher",Self-taught,20,20,20,20,10,10,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",,Academic,"1,000 to 4,999 employees",Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Often,Sometimes,,,,,,,,Most of the time,,,,,,,Sometimes,,,,Most of the time,,,Often,,,,80,10,0,2,8,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Chile,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,College/University,Conferences,Official documentation,Online courses",Very useful,Somewhat useful,Very useful,,Very useful,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Machine Learning Engineer",University courses,0,30,20,50,0,0,Computer Vision,"Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,Other,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,1TB,"CNNs,Ensemble Methods,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,Other",,,,Often,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Often,,,"CNNs,Ensemble Methods,kNN and Other Clustering,Segmentation,Other",,,,Most of the time,,,,,Often,,,,,Often,,,,,,,,,,,,Most of the time,,,,,Often,,,40,40,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,10-25% of projects,More internal than external,IT Department,Kaggle,noise,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,2000000,CLP,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Not Useful,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,University courses,0,5,5,80,10,0,"Computer Vision,Unsupervised Learning",Support Vector Machines (SVMs),A master's degree,Hospitality/Entertainment/Sports,I don't know,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Don't know,,Neural Networks,"C/C++,MATLAB/Octave,R",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,Most of the time,Most of the time,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,Sometimes,,,,,,10,60,10,10,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,Other,I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Colombia,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,,GitHub,Arxiv,Very useful,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,25,0,45,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10MB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Python,R",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Recommender Systems,SVMs",Rarely,Rarely,,,Sometimes,Most of the time,Often,Most of the time,Most of the time,,,,,,,Often,,,,,,,Most of the time,Sometimes,,,,Most of the time,,,,,,60,20,10,0,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Often,,,,Often,Often,,Often,Often,,,Often,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Never,4000000,COP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Tableau,Factor Analysis,R,I collect my own data (e.g. web-scraping),"Company internal community,Kaggle,Personal Projects,Textbook,YouTube Videos",,,,Somewhat useful,,,Somewhat useful,,,,,Very useful,,,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,50,0,25,25,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Unsupervised Learning",,"Some college/university study, no bachelor's degree",Government,500 to 999 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Always,1GB,Other,"C/C++,Java,Jupyter notebooks,Mathematica,Perl,Python,R,SQL,Tableau",,,,Most of the time,,,,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,,,,,,,Most of the time,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"kNN and Other Clustering,PCA and Dimensionality Reduction",,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,30,10,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,Sometimes,,,Most of the time,,Sometimes,,,,Sometimes,Often,,,Most of the time,Most of the time,,,100% of projects,More internal than external,Other,"NCEP, NOAA, NASA",lack of time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,180000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Mexico,21,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,R,Neural Nets,Other,University/Non-profit research group websites,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Engineer,Programmer,Researcher",Self-taught,70,10,10,10,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Academic,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,100MB,"CNNs,Neural Networks",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Often,,,,,,,Often,,,,,,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,None,Data curation,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Sometimes,30000,MXN,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Russia,19,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,University/Non-profit research group websites,"Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,25,25,0,50,0,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Colombia,24,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Social Network Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses",,,,,Very useful,,,,,,Very useful,,,,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,Very useful,,,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",Self-taught,78,2,15,5,0,0,"Recommendation Engines,Reinforcement learning,Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Most of the time,10MB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,Often,,,,"Association Rules,Collaborative Filtering,Data Visualization,Logistic Regression,Neural Networks,Recommender Systems,Time Series Analysis",,Sometimes,,,Sometimes,,Often,,,,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,55,20,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,Often,,,Sometimes,,,,,,,,,,,Often,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Always,115000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Engineer,,Employed by company that makes advanced analytic software,Jupyter notebooks,,Python,GitHub,"Blogs,Kaggle",,Somewhat useful,,,,,Very useful,,,,,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,50,30,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Other",Never,1MB,"Bayesian Techniques,CNNs,Neural Networks","C/C++,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Naive Bayes",,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,20,30,0,50,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Bitbucket,Git",Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Podcasts",,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,"DataTau News Aggregator,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Business Analyst,Computer Scientist,Data Scientist,Engineer,Statistician",University courses,15,0,5,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Jupyter notebooks,Microsoft SQL Server Data Mining,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,Most of the time,Sometimes,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis,Other,Other",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,Most of the time,25,20,20,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,Client Data,Absence of certain features which couls potentially provide very helpful insights,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,135000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Google Search,Government website,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,Unnecessary,Nice to have,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,A health science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",35,60,5,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important +Female,Australia,35,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by non-profit or NGO,Tableau,Deep learning,SAS,GitHub,"Blogs,College/University,Conferences,Kaggle,Podcasts,Stack Overflow Q&A",,Very useful,Very useful,,Very useful,,Very useful,,,,,,Very useful,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,Researcher,University courses,20,0,NA,50,30,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Non-profit,100 to 499 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Other,Traditional Workstation,Other,Never,,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,90,0,0,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,,Less than 10% of projects,Entirely internal,Standalone Team,Na,It's so bad in this workplace ,Other,Share Drive/SharePoint,,Other,Rarely,75000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Canada,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,United States,36,Employed full-time,,,Yes,,Statistician,Perfectly,"Employed by government,Self-employed",,,,,"College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos,Other",,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician,Other",University courses,20,0,15,50,15,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Java,Microsoft Excel Data Mining,Minitab,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau,Unix shell / awk",,,,Sometimes,Sometimes,,,Sometimes,Sometimes,,Often,Often,,,Sometimes,,,,,,,,Often,,,Often,Sometimes,Sometimes,,,Often,,Often,,,,Sometimes,Sometimes,Often,Often,,Often,,,Often,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Often,Often,Sometimes,,,,,,,,Often,,,Often,Often,,Often,Sometimes,,,Sometimes,Often,,Often,Often,,,,25,25,5,25,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning",Often,Sometimes,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,I don't typically share data,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,Very useful,,,,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Scientist",Self-taught,50,30,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Financial,"1,000 to 4,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,Often,,,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,Data Integrity ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,92000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Canada,27,"Not employed, but looking for work",,,,,,,,SQL,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,A social science,1 to 2 years,"Researcher,Other",Self-taught,30,50,0,20,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,Talking Machines Podcast",< 1 year,,Unnecessary,Necessary,,Nice to have,,,,,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Mexico,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Poorly,Self-employed,IBM Watson / Waton Analytics,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,Very useful,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,,,"Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,Often,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Data Visualization,Prescriptive Modeling,Segmentation,Simulation",,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,,,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,,,,,,Often,,,,,,Often,,,51-75% of projects,More internal than external,IT Department,Econometrics,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"40,000",USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,New Zealand,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,R,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Work,0,10,90,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,10 to 19 employees,Stayed the same,3-5 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Always,10GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","Microsoft Azure Machine Learning,NoSQL,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",,,,,,,Most of the time,,,,,,,,,Sometimes,,Sometimes,Most of the time,Often,,,,,,,,,Most of the time,,,,,30,10,10,25,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,100% of projects,More internal than external,IT Department,n/a,Cleaning data into proper form,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Other,Sometimes,135000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Mexico,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Podcasts,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Partially Derivative Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,Traditional Workstation,0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Kaggle competitions,50,0,0,50,0,0,Survival Analysis,Neural Networks - GANs,I don't know/not sure,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,40,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,30,0,0,10,40,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,FlowingData Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,Random Forests,"Cloudera,Hadoop/Hive/Pig,Java,Spark / MLlib",,,,,Most of the time,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,"Decision Trees,Random Forests,Text Analytics,Time Series Analysis",,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,Most of the time,Most of the time,,,,10,5,5,60,20,0,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,100% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,Python,,"Blogs,College/University,Kaggle,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,10,0,10,60,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",Sometimes,,,,,Often,Most of the time,Often,Sometimes,,,,,,,,,,Sometimes,Sometimes,Often,,Often,,,,,Often,,,,,,20,20,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,50000,USD,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,Google Search,"Arxiv,Company internal community,Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,Somewhat useful,Very useful,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods,Neural Networks - CNNs",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Java,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Rarely,,,,,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,Sometimes,,Often,,Most of the time,Often,,,,,,Most of the time,,,Often,,,,0,50,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,,,,,,,,Often,,Sometimes,,,,,Sometimes,,,100% of projects,More internal than external,Other,,Low signal to noise,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,470000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Egypt,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Brazil,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses",Not Useful,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,100MB,Decision Trees,"Jupyter notebooks,NoSQL,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,70,0,0,20,10,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,None,Process then,Graph (e.g. GraphBase/Neo4j),"Email,Share Drive/SharePoint",,Git,Sometimes,96000,BRL,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Malaysia,29,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Textbook",,Very useful,,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",NA,100,0,0,0,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important +,,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Oracle Data Mining/ Oracle R Enterprise,Regression,,,Other,,,,,,,,,,,,,,,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Scientist,Researcher,Statistician",Kaggle competitions,80,10,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,10MB,"Gradient Boosted Machines,Random Forests",Google Cloud Compute,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Not Useful,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Statistician,Self-taught,50,40,0,0,0,10,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Retail,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Rarely,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,Rarely,Sometimes,,Often,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Often,,Often,Often,Rarely,Most of the time,,Most of the time,Most of the time,,,,,Often,Most of the time,,,,40,30,20,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Unavailability of/difficult access to data",Rarely,Rarely,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,12000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,Business Analyst,University courses,15,25,35,15,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Insurance,I don't know,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","R,SAP BusinessObjects Predictive Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,,,,Often,,,Often,Often,Rarely,Sometimes,,,Sometimes,,,Often,,,Sometimes,Most of the time,,,,35,25,5,25,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Sometimes,Most of the time,,Sometimes,,,,,,,,,,,,Most of the time,Most of the time,,76-99% of projects,Approximately half internal and half external,Business Department,vendor data; census data,"database has poor history records, have to reconcile with vendor data","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Italy,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (commercial version),Anomaly Detection,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Researcher,University courses,5,10,30,50,5,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",High school,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Other,Most of the time,,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,Sometimes,Often,,,,,,Sometimes,,,,Most of the time,,Often,,,,,Sometimes,Sometimes,,,Often,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Often,Often,,,Often,Often,,Often,Often,,,Often,Often,,Often,Often,Often,,,Often,Often,,,,Often,Often,,Often,Often,,,,30,20,20,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,,,76-99% of projects,More external than internal,IT Department,Different maps; demographic data; social networks data,Integration and normalization,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Sometimes,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Self-employed,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,99,0,0,0,1,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,Pharmaceutical,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Evolutionary Approaches,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,R,SAS Base,SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,Often,,,Rarely,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs",,,,,,,Most of the time,,,,,,,,,Often,,,,,Often,,,,,Sometimes,Rarely,Sometimes,,,,,,40,20,10,15,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations of tools,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,,,,,,Sometimes,,,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,openstreetmaps;sirene;insee,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,60000,EUR,I am not currently employed,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,NA,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Mathematica,Deep learning,Python,"Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Personal Projects,Textbook",,Somewhat useful,Very useful,,,,,,,,,Very useful,,,Very useful,,,,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Other,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Somewhat important +Female,Australia,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,Amazon Web services,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,40,0,10,40,10,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Not Useful,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,Very useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",9,90,0,0,1,0,"Computer Vision,Natural Language Processing,Recommendation Engines",Neural Networks - CNNs,A master's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,1GB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Collaborative Filtering,Neural Networks,Segmentation,Text Analytics",,,,Most of the time,Often,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,Most of the time,,,,,60,10,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Most of the time,,,,,,,,Often,,,,Often,,51-75% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Subversion",Sometimes,173000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,edX,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Very useful,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,Other,University courses,55,5,0,40,0,0,"Computer Vision,Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Video data,Sometimes,10GB,"CNNs,SVMs,Other","C/C++,MATLAB/Octave,Python,TensorFlow,Other",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,"CNNs,Cross-Validation,PCA and Dimensionality Reduction,Segmentation,SVMs,Other,Other",,,,Often,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,Rarely,,,Rarely,Most of the time,,30,25,5,10,5,25,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Scaling data science solution up to full database",Most of the time,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,Most of the time,,,,,76-99% of projects,Entirely internal,Standalone Team,Too many to list; FaceWarehouse; IBUG; LFPW; AFLW; MultiPIE; Many more.,"Ground truth by people who labelled it inaccurately, or didn't really think through what they really did, or didn't describe fairly and adequately what they did and its limitations.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Network drives,Git,Sometimes,31000,GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Australia,18,Employed part-time,,,No,Yes,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,DataCamp,,2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,GitHub,Blogs,,Very useful,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,Data Machina Newsletter,The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Time Series",Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,26,"Not employed, but looking for work",,,,,,,,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Stack Overflow Q&A,Tutoring/mentoring",,,,,,Very useful,,,,,,,,Very useful,,,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,,,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Data Scientist,Researcher",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Ukraine,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",10,50,20,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Text data,Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Random Forests,Recommender Systems,Simulation",,,,,,,Often,Rarely,,,,,,,,,,,,,,,Rarely,Often,,,Sometimes,,,,,,,10,50,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Always,17200,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Ireland,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Scientist,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,30,10,30,20,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,Tableau,TensorFlow",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,Rarely,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Often,Often,,,,,,,,,,,,Most of the time,,Sometimes,,Rarely,,,,,Sometimes,Most of the time,,,,,5,70,15,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,51-75% of projects,Do not know,Other,O-net,difficult to match to client's company,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,60000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Portugal,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Mathematica,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,,,,,Very useful,Very useful,,,Very useful,,,,,"FastML Blog,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Miner,Researcher,Software Developer/Software Engineer",University courses,20,30,0,50,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,Fewer than 10 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Rarely,10GB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression,SVMs","Java,NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,Rarely,,,,,,Often,,,,"Bayesian Techniques,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",,,Often,,,,,,,,,,,Often,,Often,,Often,,,Often,,,,,Often,,,Often,Often,,,,30,50,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",,,,,Often,,,,Often,,,,,,Often,,Often,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,12000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Australia,NA,"Not employed, but looking for work",,,,,,,,Mathematica,Neural Nets,Python,Google Search,"Blogs,College/University,Textbook",,Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,Self-taught,80,0,0,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Female,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,20,10,0,10,0,60,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,Other,R,"Government website,Other","Other,Other,Other",,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician,Other",Work,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation,Other","Text data,Relational data,Other",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Other","C/C++,Java,Minitab,NoSQL,Python,R,SAS Base,Other",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,Sometimes,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,,,,Most of the time,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,Logistic Regression,Markov Logic Networks,Naive Bayes,Prescriptive Modeling,Segmentation,Simulation,Text Analytics,Time Series Analysis,Other,Other,Other",Sometimes,Sometimes,Sometimes,,,Sometimes,Often,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,Often,Sometimes,,,,Often,,,,Often,Often,,Sometimes,Most of the time,Most of the time,Often,Often,30,10,0,10,30,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,I prefer not to say,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Most of the time,Most of the time,,,Most of the time,Often,Most of the time,Most of the time,Often,,Often,,,Often,,Often,Most of the time,,,Most of the time,Most of the time,Most of the time,76-99% of projects,More internal than external,IT Department,Client data.,"The management team that truly cannot understand basic concepts. When the Data Director literally does not know the difference between RAM and ROM, but has the job because he golfs with the CEO.... Internal politics and Bullshite over Substance kill most projects.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",USB external disk,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Most of the time,"150,000",USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"College/University,Official documentation,Online courses,Stack Overflow Q&A",,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Researcher,University courses,0,10,10,80,0,0,,,A bachelor's degree,CRM/Marketing,20 to 99 employees,Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,Not Useful,,,,,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Other",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,Other,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Java,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,,University courses,30,10,10,40,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,I don't know,Increased significantly,Don't know,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Don't know,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs",,,,,,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,,Often,,,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Australia,54,"Not employed, but looking for work",,,,,,,,Weka,Uplift Modeling,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Very useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"Data Machina Newsletter,KDnuggets Blog",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,I never declared a major,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,30,0,0,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Brazil,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Logistic Regression",Primary/elementary school,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Never,10GB,Bayesian Techniques,"Flume,Hadoop/Hive/Pig,Java,NoSQL,Perl,Python,Unix shell / awk",,,,,,,Sometimes,,Often,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,Sometimes,Often,,,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Naive Bayes,Recommender Systems,Text Analytics,Time Series Analysis",Often,Often,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,Often,Sometimes,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,Often,Often,,,,Most of the time,Often,Often,Most of the time,Sometimes,,Often,,Often,Often,Often,Often,Often,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Never,"120,000",BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Singapore,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Time Series Analysis,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Very useful,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,Very useful,Very useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,Researcher,University courses,30,10,40,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,10 to 19 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,Often,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,,Often,Sometimes,Often,Most of the time,Often,Sometimes,Sometimes,,,,,Often,,Often,,Often,Often,Often,,,Often,Often,Often,,,Sometimes,Most of the time,,,,,35,45,5,10,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,Often,Often,Sometimes,,,Sometimes,,,,,Most of the time,,,Often,Often,,Often,Often,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,60000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Other",,,,,Very useful,,,,,,Very useful,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,Statistician,University courses,50,10,30,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,More than 10 years,A tech-specific job board,Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Often,,,,,,,Most of the time,,Often,,Often,Most of the time,,Often,,,Most of the time,Sometimes,,,,,,,60,20,10,5,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database",,,,,Most of the time,,,,Often,Often,Often,,Most of the time,,,,,Sometimes,,,,,Less than 10% of projects,Entirely internal,Other,"3rd parties; clients, etc.",PII; coverage; precision,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,31,"Not employed, but looking for work",,,,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,5-10 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,40,10,10,10,30,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,United States,42,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Conferences,Friends network,Online courses,YouTube Videos",,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,,,,,,Very useful,,1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,A humanities discipline,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Colombia,26,Employed full-time,,,No,Yes,Computer Scientist,Fine,,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,30,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,,,,"Data Stories Podcast,FlowingData Blog,Linear Digressions Podcast",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Other,100,0,0,0,0,0,Unsupervised Learning,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United States,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Psychology,More than 10 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician,Other",University courses,40,4,5,50,1,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Other,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,Python,R,SAS Base,SQL",,Sometimes,,,,,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,,,Often,,Sometimes,Rarely,,,,,Most of the time,,Often,,,,,Sometimes,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Often,,Sometimes,,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,,,Most of the time,Often,,,Most of the time,,,,Often,,,,60,5,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,,Often,,Most of the time,,,,,,,,,,Most of the time,,76-99% of projects,More internal than external,Business Department,CMS Data; Medicare Data; State Department of Health Data; ,Access,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,125000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,45,0,0,5,0,Time Series,Neural Networks - RNNs,Primary/elementary school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Brazil,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Business Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Data Visualization,Neural Networks,Segmentation,Time Series Analysis",,Rarely,,,,,Often,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,Often,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Sometimes,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,Other,Government data,infrastructure,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,277000,BRL,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Amazon Machine Learning,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,TensorFlow",Most of the time,Most of the time,,,,,,,Rarely,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,Often,Most of the time,,,Sometimes,Often,,Often,,,,Often,,,,40,30,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,Often,,,Rarely,,Often,,,,Most of the time,Most of the time,,100% of projects,Entirely external,Standalone Team,census records ,"poor data management by clients which leads to long, under-budgeted data cleaning. ","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,60000,USD,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,More than 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Most of the time,10MB,"Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,MATLAB/Octave,NoSQL,SQL,TensorFlow",,Often,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,"Logistic Regression,Neural Networks,SVMs,Text Analytics",,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,Often,Often,,,,,40,30,0,20,10,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,140000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Other,"Arxiv,Blogs,Conferences,Friends network,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,80,10,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,High school,Technology,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,,,"Random Forests,Other","Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Rarely,,,Rarely,Rarely,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,Rarely,Often,,,,Rarely,,Often,,,,"Collaborative Filtering,Random Forests,Recommender Systems",,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,,,20,10,60,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,Often,,,,Sometimes,,,,Often,,Often,,Sometimes,Sometimes,,Sometimes,,Often,10-25% of projects,Entirely internal,IT Department,,"Lack of committed big data infrastructure (Spark, HDFS, etc.)","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,178000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Other,Time Series Analysis,R,"Government website,I collect my own data (e.g. web-scraping)",Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Other",University courses,0,0,25,75,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,10MB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Simulation",,,,,,,Often,,,,,,,Rarely,,Often,,,,,,,,,,,Sometimes,,,,,,,0,10,25,15,20,30,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Limitations of tools",,Sometimes,,,Often,,,,,,,,Often,,,,,,,,,,76-99% of projects,Entirely internal,Other,United States Census,"Collection methods changed, so old data is hard to compare to new data","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Australia,39,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,,Neural Nets,R,,"Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Programmer",Work,40,0,60,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Stayed the same,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,10MB,Regression/Logistic Regression,"R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,"Logistic Regression,Segmentation",,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,,,,,50,20,10,0,20,0,Enough to tune the parameters properly,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Sometimes,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Sometimes,80000,,Has decreased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)",Decision Trees - Random Forests,A bachelor's degree,Government,I prefer not to answer,Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Python,R",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests",,,Sometimes,,,,,Often,,,,,,,,,,,,Often,,,Often,,,,,,,,,,,50,50,0,0,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Other",,,,,,,,,Often,,,,,,,,,,,,,Often,Less than 10% of projects,Do not know,IT Department,,,,,,Subversion,Rarely,"30,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Textbook",,,,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,,Self-taught,75,5,10,2.5,7.5,0,"Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,Spark / MLlib,SQL,TensorFlow",Rarely,,,,Rarely,,,,Sometimes,,,,,,,,Often,,,,,,Rarely,,,,,,,,Most of the time,,Most of the time,,,,,Often,,,Sometimes,Most of the time,,,,Rarely,,,,,,"Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,Often,,,,,Most of the time,Often,Often,,,Often,,Rarely,Often,Often,,,,Often,Sometimes,,Often,,,,,Often,,,,,,70,15,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",Sometimes,,,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,76-99% of projects,More internal than external,Business Department,Truven; CMS Medicare/Medicaid Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,,Sometimes,100000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed part-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,50,30,20,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)",Logistic Regression,A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Always,100MB,Regression/Logistic Regression,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"kNN and Other Clustering,Logistic Regression",,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,10,50,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,Most of the time,,100% of projects,More internal than external,IT Department,None,Having access to it,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,S3,Git,Rarely,24000,BRL,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Necessary,,,,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,55,0,0,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Not important,Not important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,Master's degree,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Other,Work,20,0,50,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,40,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,Other,Random Forests,Python,Google Search,"Company internal community,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A",,,,Very useful,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Computer Vision,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Python,QlikView,SQL,Other",,Sometimes,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Often,Sometimes,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,,Often,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,Often,,Sometimes,Often,Most of the time,,,,10,10,10,10,60,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,,Sometimes,,,,,Sometimes,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,150000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Argentina,25,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",40,20,0,30,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Other",,Sometimes,,,,,,Often,,,,,Sometimes,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs",Often,,,Most of the time,,Most of the time,Most of the time,Sometimes,Often,,,,Rarely,,,Rarely,,Rarely,Sometimes,Most of the time,,,Most of the time,,Often,,,Sometimes,,,,,,40,30,10,15,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Most of the time,Sometimes,,,,,,,Often,,,,,,,,,,,,,,100% of projects,More external than internal,IT Department,,Cleaning data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,24000,ARS,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,27,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,R,GitHub,"Blogs,Kaggle,Online courses",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,A humanities discipline,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Pakistan,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Survival Analysis,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,1 to 2 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",Self-taught,55,20,25,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Often,,Rarely,,,,,,,,Sometimes,Often,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,Often,,,,,,,Often,,Often,Often,Most of the time,Sometimes,,Sometimes,Sometimes,,,,Sometimes,Most of the time,Most of the time,,,,45,25,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Often,,,Sometimes,,,,,,Often,Sometimes,,,,,Most of the time,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",,85000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,33,"Not employed, but looking for work",,,,,,,,Unix shell / awk,Text Mining,R,I collect my own data (e.g. web-scraping),"College/University,Online courses,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,Udacity,Traditional Workstation,0 - 1 hour,PhD,No,Master's degree,Other,3 to 5 years,Other,University courses,0,10,0,90,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Female,United States,NA,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"Blogs,Online courses,Personal Projects,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,50,30,10,0,10,0,,,High school,Manufacturing,500 to 999 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,Other,"Amazon Web services,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python",,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Recommender Systems",,Rarely,,,,,Often,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,20,20,10,25,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,,,,Often,Often,,,,Often,,Often,,,,,,Often,,26-50% of projects,More internal than external,IT Department,,Figuring out what the useful data is and making it useful,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Rarely,107000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Hong Kong,36,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Software Developer/Software Engineer",University courses,30,10,0,50,10,0,"Computer Vision,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",,Sometimes,1GB,Other,"C/C++,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"Cross-Validation,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,0,50,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,76-99% of projects,Do not know,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,600000,HKD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,19,Employed full-time,,,No,Yes,,,,Python,Regression,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Stack Overflow Q&A,Textbook",,,Very useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,Very useful,,,,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Other,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,1 to 2 years,I haven't started working yet,University courses,15,0,0,80,0,5,,,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",NoSQL,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,"FlowingData Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Researcher",Kaggle competitions,10,25,25,0,40,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Technology,500 to 999 employees,Increased significantly,3-5 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk,Other",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Often,Sometimes,,,"Association Rules,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,,,,,Most of the time,,Sometimes,,Sometimes,,,Often,,Sometimes,Sometimes,Often,,,Sometimes,,,Often,Often,,,,25,35,15,15,10,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,Sometimes,,,Sometimes,,,,,Often,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,Human error in data entry,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,90000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,15,20,25,40,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Other,Don't know,,"CNNs,HMMs,Neural Networks","C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Data Visualization,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,Sometimes,,,Often,,,,,,Often,Sometimes,,,,,,Sometimes,Often,,,,,Rarely,Sometimes,,,Sometimes,,,,10,35,25,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,51-75% of projects,Entirely internal,Business Department,,,,I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Other,24,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,C/C++,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Not Useful,,,Not Useful,,,,Somewhat useful,KDnuggets Blog,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Fine arts or performing arts,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),Primary/elementary school,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,52,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Podcasts,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",8,30,0,60,2,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SAS Base,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Often,,,,,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",Sometimes,,,,,Often,Most of the time,Often,Often,,,Often,,Often,,Often,,,Often,,Often,,Often,,,,,Often,Sometimes,,,,,35,20,10,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data",,,,,,Often,,,,,,Sometimes,,,,,Often,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Sometimes,40000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Perl,Anomaly Detection,R,"GitHub,University/Non-profit research group websites","College/University,Kaggle,Textbook",,,Somewhat useful,,,,Very useful,,,,,,,,Very useful,,,,"Data Stories Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Statistician",University courses,20,10,0,70,0,0,"Machine Translation,Reinforcement learning,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Simulation,SVMs,Time Series Analysis",,,Often,,,Sometimes,Often,Sometimes,,,,,,,,Often,,Sometimes,,,,,Sometimes,,,,Sometimes,Sometimes,,Often,,,,20,20,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,Sometimes,,,Sometimes,,,Often,Often,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,75000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,23,Employed part-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,10 to 19 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,,"IBM Watson / Waton Analytics,Python,QlikView,SQL,Tableau,TensorFlow",,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,"Naive Bayes,Natural Language Processing",,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,40,15,15,15,15,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,Sometimes,Often,,,,,,,Often,Often,,Sometimes,,51-75% of projects,More internal than external,Other,,,,,,,,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,R,Anomaly Detection,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,80,10,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Most of the time,1TB,"CNNs,Neural Networks,SVMs","C/C++,MATLAB/Octave",,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Often,,,Often,,,,,,,,,Sometimes,,,,Most of the time,Often,,,,,,,Often,,,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,,,,Often,,,,,,,,,,,Often,,,26-50% of projects,More internal than external,Central Insights Team,,,,Email,,,Rarely,67500,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Australia,22,"Not employed, but looking for work",,,,,,,,R,Monte Carlo Methods,R,GitHub,"College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,"FlowingData Blog,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,30,0,0,70,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,South Korea,34,Employed part-time,,,Yes,,Scientist/Researcher,,Employed by college or university,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,,,,Very useful,Very useful,,Very useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",Work,50,0,30,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,1GB,"Bayesian Techniques,Evolutionary Approaches,Random Forests","Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,R",,,,,,,,,Rarely,,,,,,Often,,,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,Sometimes,Sometimes,,,,Often,,,Often,,,,Often,,,,Often,Often,,Often,,Sometimes,,,,,,Often,,,,,40,20,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Privacy issues",,,,Sometimes,Often,Sometimes,,,,Most of the time,,,,,,,Often,,,,,,10-25% of projects,More external than internal,Standalone Team,,preprocessing,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,3000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Brazil,24,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Deep learning,C/C++/C#,"Google Search,I collect my own data (e.g. web-scraping)",Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Other,2 - 10 hours,Other,Yes,,Other,,Other,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,51,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,SQL,Google Search,"College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,60,0,0,40,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important +Male,Canada,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Textbook",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,33,33,34,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,Regression/Logistic Regression,"Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL,Tableau",,,,,Often,,,,Often,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Sometimes,Often,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,20,15,15,20,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,,,,,,,,Often,,,,,,Often,,,,,,Often,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,75000,CAD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Stan,Deep learning,R,"GitHub,University/Non-profit research group websites","Arxiv,Blogs,Personal Projects,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,,,,,,,,,,Very useful,,,Very useful,,Very useful,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Programmer,Researcher,Statistician",Self-taught,90,0,10,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Rarely,100GB,"Bayesian Techniques,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression",,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,0,70,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,10-25% of projects,Do not know,Business Department,,,,I don't typically share data,,,Rarely,1200000,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Partially Derivative Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,Other,University courses,20,0,30,50,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Always,1GB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,RNNs","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,,Often,Sometimes,,,,Most of the time,,Sometimes,,,Most of the time,Often,Most of the time,,Often,Sometimes,,,Most of the time,,Most of the time,Most of the time,,,,10,30,40,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,Often,,,,Most of the time,,,Sometimes,,Sometimes,,,,,Most of the time,Sometimes,,,,Often,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other","Git LFS, Jupyterhub",Git,Always,"120,000",,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Google Search,"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Business Analyst,Self-taught,50,20,10,10,10,0,,,A master's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,Regression/Logistic Regression,"Amazon Web services,Python,SQL",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Logistic Regression,Time Series Analysis",Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,60,20,5,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,Sometimes,Sometimes,Most of the time,,Often,,Often,Often,,,,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,SMMT,Integration,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,42000,GBP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,52,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,I collect my own data (e.g. web-scraping),"Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,,,,,,5-10 years,,,Necessary,,Necessary,Necessary,Necessary,,Necessary,,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,,Sort of (Explain more),Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,30,10,10,5,5,Outlier detection (e.g. Fraud detection),,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,Very Important,,,Very Important,Very Important,,,Very Important,Very Important,,, +Male,United States,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Other,"Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,,University courses,30,50,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - RNNs",A master's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,60,5,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",Sometimes,,Most of the time,,Often,Often,,,Often,,,,,,,,Often,,,,,,76-99% of projects,Entirely internal,Business Department,"Economic Data Statistics, Bureau of Labor Statistics",It's not relevant to predictions,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,,Rarely,58000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Argentina,28,Employed full-time,,,No,Yes,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,,,,Other,Other,2 - 10 hours,Other,Yes,Bachelor's degree,Other,Less than a year,Other,Self-taught,15,20,0,50,15,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important +Male,United States,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Personal Projects,Stack Overflow Q&A",,,Not Useful,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,,University courses,25,0,15,60,0,0,"Natural Language Processing,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A master's degree,Mix of fields,20 to 99 employees,Stayed the same,Don't know,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Never,100MB,Bayesian Techniques,"Amazon Web services,Python,R,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Often,,,,"Data Visualization,Evolutionary Approaches",,,,,,,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,,,,,,55,0,0,25,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,,,,,51-75% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,Git,Never,100000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,Very useful,,,Very useful,Very useful,,Very useful,,,,,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,40,30,0,10,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Sometimes,,,Most of the time,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems",Sometimes,Sometimes,,,,,Most of the time,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,,Often,,,,Sometimes,Sometimes,,,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,Often,,,,,,Often,,,,,Sometimes,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,25000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,Other",,Very useful,,,Very useful,Very useful,,,,,,Very useful,,Very useful,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Statistician",University courses,30,5,40,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Sometimes,Sometimes,,Sometimes,Often,Most of the time,Often,Most of the time,,,,,Often,,Most of the time,,,,Sometimes,Often,,Often,,,,,,,Often,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,Sometimes,,Sometimes,Often,Sometimes,,,Often,Often,,,Often,Sometimes,Most of the time,Often,,,,,Often,,10-25% of projects,Entirely internal,Business Department,,"Working with IT department to incorporate data science work in real-time, streaming environment where the data is flowing.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Rarely,105000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,21,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,,,,Very useful,,Somewhat useful,,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,40,10,0,40,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Female,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Republic of China,25,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,Very useful,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,"FastML Blog,Jack's Import AI Newsletter,KDnuggets Blog",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher",Self-taught,80,10,NA,10,NA,NA,"Adversarial Learning,Computer Vision,Natural Language Processing","Bayesian Techniques,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,United States,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Java,Other,Matlab,Other,"Conferences,Personal Projects,Textbook",,,,,Somewhat useful,,,,,,,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,More than 10 years,Statistician,University courses,95,0,0,5,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Text data,Relational data",Always,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,Other","Java,MATLAB/Octave,SQL,Other,Other",,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,Often,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",Often,,,,,Sometimes,,Often,Rarely,Sometimes,,,,Sometimes,Often,Often,,Rarely,,Sometimes,Sometimes,,,,,Sometimes,,,,Rarely,,,,75,15,3,2,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Sometimes,,,,Most of the time,,,,Most of the time,Often,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,Poor/noisy data architecture,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,R,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Management information systems,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Brazil,27,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,,Nice to have,Necessary,,,"Coursera,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United Kingdom,32,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,,,,Siraj Raval YouTube Channel,3-5 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Necessary,Necessary,,"Coursera,edX,Other",Other,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,20,50,5,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",16-20,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important +Male,Germany,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,Computer Scientist,University courses,60,10,20,0,0,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,1GB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),SQL,Tableau,Other,Other,Other",,Sometimes,,,,,,,,,,,Often,,Rarely,,Often,,,,,Sometimes,,,,,,,,,Sometimes,,Often,,Sometimes,,,,,,,Often,,,Sometimes,,,,Sometimes,Sometimes,Sometimes,"Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,,Sometimes,Often,Often,,Sometimes,,,,,,,,,,Often,Sometimes,Sometimes,,,,,,,,Often,,,,,50,15,0,25,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,Sometimes,,Often,,Often,,Sometimes,,51-75% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,Self-taught,50,25,0,0,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",,Technology,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Other,GPU accelerated Workstation,Image data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,20,50,0,20,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,60000,USD,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,"Data Analyst,Programmer,Researcher",Self-taught,20,20,30,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Pharmaceutical,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Other,,1GB,Other,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,70,0,10,10,10,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,100% of projects,Entirely external,Central Insights Team,,,Other,Company Developed Platform,,Subversion,Sometimes,180000,CNY,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Online courses,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Other,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,10,0,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Australia,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,"Business Analyst,Data Analyst",Self-taught,0,100,0,0,0,0,,,A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,,"R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,10,5,5,70,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Often,,,,,,,,,,,,,,,Often,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,,,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,DataRobot,Deep learning,Python,Google Search,"Friends network,Tutoring/mentoring",,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Work,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","DataRobot,Jupyter notebooks,Perl,Python,TensorFlow,Unix shell / awk",,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,Most of the time,,Often,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,Often,Often,Most of the time,,Often,Most of the time,Often,Most of the time,Most of the time,Most of the time,,,,40,10,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Do not know,Standalone Team,Prefer not to answer,Prefer not to answer,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Always,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by non-profit or NGO,SAS Base,Text Mining,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites,Other","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,1-2 years,,Nice to have,Necessary,Necessary,,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Biology,1 to 2 years,"Data Analyst,Other",University courses,0,75,0,25,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,3 to 5 years,Researcher,University courses,30,10,10,50,0,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Rarely,100MB,"Decision Trees,HMMs,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,RapidMiner (free version),SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Sometimes,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Logistic Regression,Natural Language Processing,Simulation,SVMs,Text Analytics",Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,Sometimes,Often,Most of the time,,,,,90,0,0,2,8,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Often,,,,,,,,,,,Often,,Sometimes,,,,Less than 10% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,,"75,000",AMD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Regression,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,10,5,0,5,0,"Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Java,Spark / MLlib",Sometimes,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Naive Bayes,Neural Networks",Sometimes,Sometimes,Sometimes,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,10,10,5,20,20,35,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,Sometimes,,,,,,,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,26-50% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,69,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,,Somewhat useful,,3-5 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,0,5,5,90,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,Portugal,27,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,I collect my own data (e.g. web-scraping),"College/University,Newsletters,Official documentation,Online courses",,,Very useful,,,,,Somewhat useful,,Very useful,Very useful,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,25,25,25,25,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Manufacturing,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,NoSQL,R,RapidMiner (free version),SQL",,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,Often,Sometimes,,,Often,Most of the time,Most of the time,,,,,,Often,,Most of the time,,Sometimes,,Most of the time,,,Most of the time,Often,,,,Sometimes,Sometimes,Sometimes,,,,10,40,20,30,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others",Most of the time,,,,,Sometimes,,,,,,,,,,,,,,,,,76-99% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Never,17000,EUR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,United States,48,"Not employed, but looking for work",,,,,,,,R,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),1-2 years,,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Other,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Other,Self-taught,15,50,25,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,,,,,,,,,,,,,,,, +Female,Canada,39,"Not employed, but looking for work",,,,,,,,Python,Other,R,Other,"Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Very useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX","Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Other,Sort of (Explain more),Doctoral degree,Biology,6 to 10 years,Researcher,Self-taught,60,15,25,0,0,0,"Reinforcement learning,Time Series",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,25,30,10,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data,Other",Rarely,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,Tableau,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,Sometimes,Sometimes,,Sometimes,,Often,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,70,20,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,Sometimes,Often,Often,,,Most of the time,,,,,,,,Most of the time,,,Most of the time,Often,,76-99% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,154000,CAD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler",Self-taught,25,25,25,0,25,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Oracle Data Mining/ Oracle R Enterprise,R,Tableau",,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,"Collaborative Filtering,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,Sometimes,,,Sometimes,Most of the time,,,Most of the time,,Often,,Often,,,Sometimes,Rarely,,,Often,Sometimes,,Often,,Most of the time,Sometimes,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,Often,,,Most of the time,,,,Most of the time,,,,Often,Most of the time,,,,,,,,,51-75% of projects,More internal than external,Other,census; weather,Dirty Data,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Git,Git,Rarely,125000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Newsletters,Online courses,Personal Projects,Trade book,Tutoring/mentoring",,,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,Other,Self-taught,55,0,15,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A professional degree,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Not at all important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Never,,,"Hadoop/Hive/Pig,NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,0,0,0,75,25,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data",,Often,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,dirty!!!,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,70000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Female,Spain,29,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"Data Stories Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Business Analyst,Other",University courses,10,20,0,50,20,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,India,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,YouTube Videos",,Very useful,Very useful,,,,Very useful,Very useful,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,20,0,0,80,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,,Very Important,Very Important,Very Important +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,R,Cluster Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,"Data Analyst,Researcher",Work,50,25,10,0,0,15,Survival Analysis,Logistic Regression,A doctoral degree,Academic,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1MB,Regression/Logistic Regression,"IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,NoSQL,Oracle Data Mining/ Oracle R Enterprise,R,SQL,Tableau",,,,,,,,,,Rarely,Rarely,Often,Rarely,,,,,,,,,,,,,,Rarely,Rarely,,,,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Logistic Regression",Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,80,5,2,5,8,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,Sometimes,Often,,,Sometimes,,Often,Most of the time,Most of the time,,,,Most of the time,Most of the time,,76-99% of projects,More internal than external,Standalone Team,Educational,Inconsistency and a lack of data governance ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"48,000",,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Canada,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer",University courses,20,0,30,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Very useful,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,3 to 5 years,Other,Self-taught,60,5,25,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10TB,"Gradient Boosted Machines,Neural Networks,Random Forests","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,,,,,Sometimes,,Most of the time,,Often,,Sometimes,,Most of the time,Most of the time,,Most of the time,,,,,Sometimes,Most of the time,Most of the time,,,,70,15,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,Most of the time,,,,,,Most of the time,,,,,,Often,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,78000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,Very useful,Very useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Self-employed",Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Scientist,Researcher",University courses,60,5,10,25,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Text Analytics",,,Sometimes,,,,Often,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,Sometimes,,,,,40,15,5,30,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,,,,Often,Sometimes,,Often,,,,,,,,,,,,,Often,,51-75% of projects,Entirely internal,Other,Census data,Cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,115000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Personal Projects,Podcasts",,Very useful,,Very useful,,,,,,,,Very useful,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,Business Analyst,University courses,50,0,30,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Decreased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Rarely,,,,Often,Often,Sometimes,Sometimes,,,Sometimes,,,,Often,,,,,Often,Sometimes,Sometimes,,,Often,Sometimes,,Sometimes,Often,,,,70,15,2,3,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,,,Sometimes,Most of the time,,,,,Often,Often,,,Sometimes,,Often,,,51-75% of projects,Entirely internal,Business Department,Census;weather,Lack of tools to build models on entire database,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,,Sometimes,71000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"Researcher,Other",University courses,50,40,0,10,0,0,,,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Podcasts",,,Very useful,,Not Useful,,Very useful,,,,Somewhat useful,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,Udacity,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,,University courses,50,0,0,40,10,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,People 's Republic of China,28,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",35,35,10,0,20,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,Random Forests,"Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Random Forests,Recommender Systems",,Sometimes,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,40,30,10,10,10,0,Enough to tune the parameters properly,"Explaining data science to others,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,,Often,,,,,,Most of the time,,,,,,Often,,,,,51-75% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,300000,CNY,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Canada,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,Not Useful,,,Very useful,,"Emergent/Future Newsletter (Algorithmia),FlowingData Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,I never declared a major,6 to 10 years,"DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Video data,Other",Sometimes,10GB,"Bayesian Techniques,CNNs,Ensemble Methods,Random Forests,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,SQL,Tableau,TensorFlow",,,,,Sometimes,,Rarely,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,Often,,,,,,"CNNs,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Often,,,Often,,Sometimes,,,Sometimes,,Rarely,,,,,,,Sometimes,,Sometimes,,,,,Often,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Often,,Most of the time,Sometimes,,Sometimes,,,,,,Sometimes,Sometimes,,,,Sometimes,Most of the time,Often,,26-50% of projects,Entirely internal,Other,None,Data pipeline and schema keeps getting broken or altered over time by developers.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Rarely,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Female,New Zealand,33,"Not employed, but looking for work",,,,,,,,R,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,Very useful,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,"Partially Derivative Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Other",Traditional Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Colombia,28,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by professional services/consulting firm,Tableau,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,0,0,20,80,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A professional degree,Telecommunications,"10,000 or more employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1MB,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Simulation",,,,,,Sometimes,Often,,,,,,,,,Often,,,,,,,Sometimes,,,,Most of the time,,,,,,,10,15,10,15,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Most of the time,,Often,,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,None,None,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,20000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Amazon Machine Learning,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,1GB,"CNNs,Neural Networks","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"CNNs,Neural Networks",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,45,20,10,20,5,0,Enough to tune the parameters properly,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,100% of projects,Entirely external,IT Department,,Difficult to obtain,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Most of the time,24000,BRL,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Other,11,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Computer Scientist,DBA/Database Engineer",University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,"10,000 or more employees",Stayed the same,Don't know,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,Other,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,100,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,26-50% of projects,More external than internal,IT Department,None,"Delta is constantly ignored, data is often incomplete. ",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Most of the time,86000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by college or university",TensorFlow,Other,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,,,,,,,,,Somewhat useful,,Not Useful,,Not Useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,10,0,80,10,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,10 to 19 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data",Always,100GB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,MATLAB/Octave,Python,Other",,Rarely,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Segmentation,Simulation,SVMs",,,,Most of the time,,Often,Often,,,,,,,Most of the time,,Sometimes,,,,Sometimes,,,,,,Rarely,Rarely,Sometimes,,,,,,30,30,30,5,5,0,Enough to refine and innovate on the algorithm,Explaining data science to others,,,,,,Sometimes,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Central Insights Team,casia;ms-celeb;ibug;megaface;scrub;vgg;lfw;other,too few,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Share Drive/SharePoint,Other",git-lfs,Git,Always,65000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Python,R",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,Simulation",,,,,,,Most of the time,Rarely,Rarely,,,,,,,Often,,,,Sometimes,,,Rarely,,,,Most of the time,,,,,,,10,20,5,25,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,100% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Always,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,"FlowingData Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Sometimes,Often,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,,Often,,,Often,Often,,,Most of the time,,Often,Most of the time,Most of the time,,Most of the time,Most of the time,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,,,,Sometimes,,Often,,Most of the time,,,,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,"Data from government agancies. Public tax authorities, Central Registry of Firms, etc. ",No db diagram (mapping) ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Other",Github,Git,Rarely,24,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Yes,Master's degree,Biology,,"Data Analyst,Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Indonesia,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Unnecessary,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Australia,39,Employed full-time,,,No,Yes,Programmer,Fine,Employed by government,Amazon Web services,Time Series Analysis,R,"Google Search,Government website","Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,R,Survival Analysis,SAS,Google Search,"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,Not Useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,7,0,7,86,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,31,Employed part-time,,,Yes,,Scientist/Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,20,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Other",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,,,,,Rarely,,,Rarely,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,Sometimes,Often,,,Sometimes,,Often,,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,Often,,,Sometimes,,Often,Sometimes,Sometimes,,,,55,10,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,Sometimes,Often,,,,,,,,,Sometimes,,,Sometimes,,Often,,,,51-75% of projects,More internal than external,Standalone Team,Depending on project from available transactional data to benchmark data sets,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,"77,000",USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"DataCamp,Other",GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,60,0,30,10,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important +Male,Japan,58,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,Random Forests,Python,Government website,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Not Useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,3 to 5 years,"Data Miner,Researcher",Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series",Logistic Regression,High school,Academic,100 to 499 employees,Decreased slightly,Less than one year,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,Regression/Logistic Regression,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation",,,,,,,Often,,,,,,,,,Often,,,,,Rarely,,,,,Sometimes,Sometimes,,,,,,,20,30,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Privacy issues",,,,,Sometimes,,,,Sometimes,,,,,,,,Often,,,,,,76-99% of projects,More internal than external,IT Department,NCBI GEO datasets,relationship between high school and university points of all students. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,,JPY,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Somewhat useful,,,,,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,Less than a year,Business Analyst,Self-taught,80,20,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Other,100 to 499 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,Other,"Jupyter notebooks,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Often,,,,,,,"A/B Testing,Data Visualization",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,0,20,40,0,Enough to run the code / standard library,"Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,,,,,Sometimes,,,,,Often,,76-99% of projects,More internal than external,IT Department,None (to my knowledge),Lack of knowledge of how business operations are manifested in the database.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Other,Never,35000,USD,Other,6,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,1 to 2 years,Data Analyst,Self-taught,50,10,10,30,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Financial,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,1GB,"Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Base,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,Often,Sometimes,,Rarely,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs",,,,,,Most of the time,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,,,Sometimes,,,,,Sometimes,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Often,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,Fannie Mae/Freddie Mac,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Bitbucket,Git",Rarely,"80,000",,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Japan,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Julia,Deep learning,Python,Google Search,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Bachelor's degree,,,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important +Male,United States,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Not Useful,Not Useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher",University courses,2,3,10,60,25,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1PB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Flume,Google Cloud Compute,Java,Julia,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,Often,Often,,,,,,,Rarely,Rarely,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,"Ensemble Methods,Neural Networks,Random Forests",,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Often,,,,,,,,,,,70,20,0,8,2,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others",,,,,Often,Rarely,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Brazil,49,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,Some other way,Very important,Other,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,Sometimes,1MB,"Decision Trees,GANs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,,,,,,,,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,Often,Often,,Often,,Most of the time,,Often,,,Most of the time,,Most of the time,,,,,,Often,Most of the time,,,,35,25,10,15,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Most of the time,Sometimes,,,,,,,Often,Often,,,,,,Often,,,,,,,100% of projects,Approximately half internal and half external,Other,Goverment Dataset,Build a data team,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"55,000",,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Anomaly Detection,Python,Google Search,"College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,40,10,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,"Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,R,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",Sometimes,,,,Most of the time,,,,Often,,,Often,Most of the time,,Often,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,Data cleanliness,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,87645,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"GitHub,Government website,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,,3-5 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,10,20,0,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"College/University,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,30,0,20,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,University courses,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression",A doctoral degree,Pharmaceutical,20 to 99 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,R",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation",,,Most of the time,,,Often,Most of the time,,Most of the time,Most of the time,,,,Often,,Most of the time,,Most of the time,,,Most of the time,,,,,,Most of the time,,,,,,,50,10,5,5,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT",Often,,,,Often,,,,,,Sometimes,Sometimes,,,Most of the time,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,121000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Bayesian Methods,SQL,Google Search,"Blogs,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,Data Analyst,Other,20,30,0,0,0,50,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,500 to 999 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,,,"Microsoft Excel Data Mining,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,100,0,0,0,0,0,Enough to tune the parameters properly,"Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Most of the time,,,Sometimes,,,,,Often,,,Often,,,,,,,None,More internal than external,Standalone Team,,Inconsistency of sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,40000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,South Korea,53,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A",,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,Self-taught,50,40,10,0,0,0,"Computer Vision,Natural Language Processing",Neural Networks - CNNs,,Academic,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data",Rarely,1GB,"CNNs,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,Spark / MLlib,TensorFlow",,,,Rarely,,,,,Rarely,,,,,,,,Often,,,Rarely,Sometimes,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"CNNs,Natural Language Processing,Neural Networks,Random Forests,RNNs,Text Analytics",,,,Often,,,,,,,,,,,,,,,Often,Often,,,Sometimes,,Rarely,,,,Sometimes,,,,,40,30,0,10,20,0,Enough to run the code / standard library,"Dirty data,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,Often,,,,Often,,,,Often,,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,80000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,I don't plan on learning a new tool/technology,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Personal Projects",Very useful,Very useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Other",University courses,20,0,20,55,5,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Most of the time,10GB,"CNNs,Gradient Boosted Machines,Neural Networks,RNNs,SVMs","C/C++,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk,Other",,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Most of the time,,Often,Often,,,"A/B Testing,CNNs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Most of the time,,,Often,,,,,,,,,,,,,,Rarely,Often,Most of the time,Often,,Often,,Often,Often,,Sometimes,Often,Often,,,,30,20,10,5,35,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,,,Sometimes,,,,Sometimes,,Rarely,,,Often,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,"Git,Mercurial",Most of the time,"850,000",,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Philippines,26,Employed full-time,,,No,Yes,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Microsoft R Server (Formerly Revolution Analytics),Neural Nets,C/C++/C#,"Google Search,University/Non-profit research group websites","Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Physics,1 to 2 years,"Data Analyst,Engineer,Researcher",University courses,30,20,0,50,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,70,Retired,,,Yes,,Business Analyst,Fine,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,Programmer,Self-taught,80,20,0,0,0,0,Time Series,Neural Networks - CNNs,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,Social Network Analysis,Python,University/Non-profit research group websites,"Blogs,Company internal community,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",Work,15,15,70,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10MB,Regression/Logistic Regression,"Microsoft R Server (Formerly Revolution Analytics),R,SAP BusinessObjects Predictive Analytics,Other",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,Often,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,20,30,15,15,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,Sometimes,Often,Often,,,,,,,,,Sometimes,,,,26-50% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,Other,Rarely,150000,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,Other,"Friends network,Kaggle,Online courses",,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,More than 10 years,"Data Scientist,Researcher",Self-taught,15,0,10,NA,0,75,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,"Gradient Boosted Machines,Random Forests",,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,Often,,Sometimes,,,,,,Most of the time,,,,,,,,,26-50% of projects,Entirely internal,Other,,Historical data is contaminated,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Git,Subversion",Rarely,125000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,Very useful,Very useful,,,,,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,,,,Basic laptop (Macbook),,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer",University courses,0,30,10,60,0,0,"Adversarial Learning,Time Series","Bayesian Techniques,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,Business Analyst,University courses,0,0,0,100,0,0,Time Series,Gradient Boosting,A bachelor's degree,Financial,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,100PB,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,50,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,Graph (e.g. GraphBase/Neo4j),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,85000,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,Bayesian Methods,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Programmer",University courses,40,10,30,20,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,Java,Python,SQL,TensorFlow",,,,Often,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,SVMs,Text Analytics",Often,,,Sometimes,Often,Often,,Sometimes,Sometimes,,,Sometimes,,Often,,Often,,,Often,Sometimes,,,,Most of the time,,,,Sometimes,Often,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,Most of the time,Often,Often,,,,,Often,,Most of the time,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Text Mining,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Data Stories Podcast,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Master's degree,No,Bachelor's degree,Management information systems,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,20,0,60,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,60,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Personal Projects,Textbook",,,Very useful,,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,Self-taught,100,0,0,0,0,0,"Adversarial Learning,Natural Language Processing,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Other,Text data,Most of the time,1TB,"Regression/Logistic Regression,SVMs",C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Ensemble Methods,HMMs,Logistic Regression,Naive Bayes,SVMs,Text Analytics",Often,,,,,Often,,,Often,,,,Sometimes,,,Often,,Often,,,,,,,,,,Often,Often,,,,,40,40,20,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Often,,,Often,Often,,Often,,,,,,,,,Often,,Often,,Often,,None,Entirely internal,Other,TREC -- Text Retrieval Conference; other public text datasets,Privacy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",web,"Git,Subversion",Most of the time,"200,000",CAD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,Python,,"College/University,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,0,40,30,30,0,0,,,A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Data Visualization,Logistic Regression",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,65000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Australia,23,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,R,Deep learning,SQL,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Other",Self-taught,70,20,10,0,0,0,,,A bachelor's degree,Financial,20 to 99 employees,Increased slightly,Less than one year,Some other way,Important,Other,Workstation + Cloud service,Relational data,Always,100GB,,"Java,R,SQL,Tableau",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Often,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,50,5,20,15,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,Most of the time,,,,Less than 10% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,78000,AUD,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,25,Employed part-time,,,Yes,,Scientist/Researcher,,Employed by college or university,Jupyter notebooks,Monte Carlo Methods,Python,GitHub,"College/University,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,Very useful,,,Very useful,Very useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,10,5,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Video data",Most of the time,10TB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,RNNs,SVMs","Amazon Machine Learning,Julia,Jupyter notebooks,Python,R,RapidMiner (commercial version)",Sometimes,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,Rarely,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation",,,,,,Often,Often,,Often,,,,,,,Often,,,,,Often,,Often,,Often,Sometimes,,,,,,,,20,20,40,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Bitbucket,Git",,80000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Colombia,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Text Mining,R,Google Search,"Blogs,College/University,Kaggle,Personal Projects,Textbook",,Very useful,Very useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Statistician",Work,0,0,30,70,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","KNIME (free version),NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,Sometimes,,,,Often,Most of the time,Often,,,,Sometimes,,Often,,Most of the time,,,,,Often,,Often,,,Most of the time,Often,,,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT",,,,,Most of the time,,,,,,,,,,Often,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,Increase the revenue,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,80000000,COP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle",Very useful,,,,Very useful,,Somewhat useful,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,15,10,0,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Manufacturing,100 to 499 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,1GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Perl,Python,TensorFlow,Other",,,,Often,,,,,,,,,,,Sometimes,,Rarely,,,,Often,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,Often,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Often,Most of the time,,,Often,,Often,,Most of the time,,,,,Most of the time,,Often,,,,,Most of the time,,,,,,50,20,0,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Often,,,,,,,,Sometimes,Sometimes,,,,,,Often,Most of the time,,,,Most of the time,,100% of projects,Entirely internal,Other,mnist,face recognition,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Mercurial,Subversion",Rarely,"8,000,000",JPY,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,GitHub,"Blogs,Online courses,Textbook",,Very useful,,,,,,,,,Very useful,,,,Very useful,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Udacity,Laptop or Workstation and local IT supported servers,40+,Kaggle Competitions,No,Bachelor's degree,Computer Science,,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Newsletters,Online courses,Textbook,YouTube Videos",,Very useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,,,,Coursera,Traditional Workstation,11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,40,10,0,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +A different identity,People 's Republic of China,21,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Regression,Matlab,GitHub,Company internal community,,,,Very useful,,,,,,,,,,,,,,,"Jack's Import AI Newsletter,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,,Other,Yes,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,Singapore,16,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,C/C++,Deep learning,Matlab,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Other,No,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,23,Employed part-time,,,Yes,,Data Miner,Fine,Employed by college or university,Python,Random Forests,Python,GitHub,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,30,20,0,30,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,I prefer not to answer,Decreased significantly,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Other,Text data,Rarely,10MB,"Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,HMMs,Logistic Regression,Natural Language Processing,SVMs",,,,,,,Sometimes,Sometimes,,,,,Sometimes,,,Often,,,Often,,,,,,,,,Sometimes,,,,,,40,30,0,10,20,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,10-25% of projects,Do not know,Other,,,Other,Other,,Git,,,,Other,4,,,,,,,,,,,,,,,,,, +Male,Portugal,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,24,"Not employed, but looking for work",,,,,,,,Julia,Time Series Analysis,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Very useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other","Gaming Laptop (Laptop + CUDA capable GPU),Other",11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer",University courses,10,30,20,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Argentina,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,40,"Not employed, but looking for work",,,,,,,,Oracle Data Mining/ Oracle R Enterprise,Association Rules,SQL,I collect my own data (e.g. web-scraping),"Friends network,Personal Projects,Textbook,YouTube Videos",,,,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner,Researcher","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important +Male,Indonesia,21,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Textbook",,,Somewhat useful,,,Very useful,,,,,,,,,Very useful,,,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,30,0,0,70,0,0,Unsupervised Learning,Bayesian Techniques,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,Very useful,,Very useful,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,32,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Other,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Researcher,Other",University courses,20,10,20,50,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Military/Security,20 to 99 employees,Stayed the same,3-5 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,,"Decision Trees,Neural Networks,RNNs","Amazon Web services,Google Cloud Compute,Unix shell / awk",,Often,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,5,5,10,70,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,Often,,,,Often,,100% of projects,More internal than external,Other,,,Key-value store (e.g. Redis/Riak),Company Developed Platform,,Git,Most of the time,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Tableau,Text Mining,SQL,Google Search,"Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Very useful,,Very useful,,,,,Very useful,Very useful,Very useful,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Researcher,Other",University courses,1,19,20,60,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1GB,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,SQL,Tableau,Unix shell / awk",Often,Most of the time,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,Rarely,,,,"Data Visualization,Natural Language Processing,Recommender Systems,Text Analytics",,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,Often,,,,,50,0,0,40,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,,76-99% of projects,Approximately half internal and half external,Other,,Lots of text and logs. Data needs to be extracted from text,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,45000,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Other,University courses,10,5,0,85,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Most of the time,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Naive Bayes,Random Forests,Recommender Systems",,,Most of the time,,Most of the time,Most of the time,,Often,,,,,,Often,,,,Most of the time,,,,,Often,Often,,,,,,,,,,50,20,5,5,20,0,Enough to run the code / standard library,"Lack of significant domain expert input,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,Sometimes,,Often,,,,Most of the time,Most of the time,,,Most of the time,,76-99% of projects,Entirely internal,Other,Na,Obtaining it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Other,Rarely,59000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",45,20,20,10,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","DataRobot,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),R,SAS Enterprise Miner,SQL",,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Rarely,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Rarely,,,,,Often,,Most of the time,,,Often,,,,Often,,,Most of the time,,,Most of the time,Sometimes,,,,50,20,5,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",Sometimes,Most of the time,,,Most of the time,,,,Often,,,,Most of the time,,,,,,,,Often,,10-25% of projects,More external than internal,IT Department,census data,missing data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,94000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Python,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Newsletters,Official documentation,Online courses,Textbook,YouTube Videos",,,,,,Very useful,Somewhat useful,Very useful,,Not Useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Data Elixir Newsletter,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Engineer,Self-taught,50,20,0,0,10,20,,,Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important +Male,United States,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,,"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,SQL,,"Official documentation,Online courses,Textbook",,,,,,,,,,Very useful,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,"10,000 or more employees",Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,10TB,Regression/Logistic Regression,"R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,70,0,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,Most of the time,,76-99% of projects,Entirely external,Central Insights Team,Datasource; US census data,Multiple databases with differing definitions of a consumer ,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Ftp,Git,Sometimes,93000,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,35,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Government website,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"FlowingData Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Other,No,I did not complete any formal education past high school,,Less than a year,Programmer,Other,20,20,0,0,0,60,Speech Recognition,Neural Networks - CNNs,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important +Male,Brazil,22,Employed full-time,,,No,Yes,Programmer,Fine,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",Python,Decision Trees,Python,Google Search,"Blogs,College/University,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,50,30,0,0,20,0,Adversarial Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,65,Retired,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,Amazon Web services,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Minitab,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,,,,,Somewhat useful,,Very useful,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,Data Analyst,University courses,NA,NA,NA,NA,NA,NA,Survival Analysis,Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,20,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,RapidMiner (free version),TensorFlow",,Often,,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation",,,Sometimes,,,,,Often,,,,Sometimes,,,,Most of the time,,Often,,Sometimes,,,Often,,,Most of the time,,,,,,,,40,30,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,,,,Often,Sometimes,Often,Sometimes,,,,,,,Often,,,,10-25% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,80000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,40,10,5,15,0,"Computer Vision,Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Technology,10 to 19 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Other",Most of the time,100MB,"Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow",Rarely,Sometimes,,Sometimes,,,,,,,,,,,Often,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,RNNs,Segmentation,Simulation,Time Series Analysis",,,,Most of the time,,Most of the time,Often,,,,,,,Often,,,,,Sometimes,Most of the time,,,,,Most of the time,Sometimes,Sometimes,,,Most of the time,,,,40,30,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Often,,Sometimes,,Most of the time,,,,Often,Sometimes,Sometimes,,,,Often,Sometimes,Often,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Bitbucket,,,,,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,30,Employed full-time,,,Yes,,Other,Fine,Employed by government,TensorFlow,Neural Nets,Python,"GitHub,Government website","Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,FastML Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,Less than a year,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",15,70,0,15,0,0,Unsupervised Learning,Bayesian Techniques,"Some college/university study, no bachelor's degree",Government,100 to 499 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,10MB,Neural Networks,"Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Decision Trees,Neural Networks",Sometimes,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,10,75,0,0,16,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",Often,,,,,Sometimes,,,Most of the time,,,,,,,,Most of the time,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,800000,TWD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Denmark,36,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Talking Machines Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Workstation + Cloud service,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,70,10,10,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Singapore,26,Employed full-time,,,No,Yes,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Neural Nets,Python,Other,"Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,,1 to 2 years,Business Analyst,University courses,0,15,20,50,5,10,,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Web services,Deep learning,Java,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Speech Recognition,,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,United States,40,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Podcasts,Textbook",,Somewhat useful,,,,,,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Female,United States,42,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Python,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Online courses,Podcasts,Stack Overflow Q&A,Other",,,Very useful,,Very useful,,,,,,Very useful,,Very useful,Very useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,1 to 2 years,"Business Analyst,Other",University courses,5,0,0,95,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Telecommunications,"5,000 to 9,999 employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,<1MB,Decision Trees,"Orange,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Decision Trees,kNN and Other Clustering",,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,80,0,0,0,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,Most of the time,,Most of the time,Sometimes,,Sometimes,,,,Sometimes,,,,,,Most of the time,,None,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Other",Sometimes,75000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Newsletters,Online courses,Personal Projects,Textbook",Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",42,19,39,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,20 to 99 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Image data,Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Sometimes,Rarely,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Most of the time,,Most of the time,Most of the time,Often,Sometimes,,,Often,,Often,,Often,,Sometimes,Often,Most of the time,Sometimes,,Sometimes,,Often,,,Sometimes,Sometimes,,,,,10,40,30,10,10,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Most of the time,,,Often,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,"ImageNet, PascalVOC",Amount of data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,"50,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,23,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,Google Search,"Official documentation,Online courses,Textbook,YouTube Videos,Other",,,,,,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Other,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important +Male,Belgium,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Colombia,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Self-taught,75,15,0,0,10,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A professional degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,,,Sometimes,Most of the time,Often,,,,Often,,Rarely,,Often,,,,,Sometimes,,Often,,,,,,,,,,,60,15,10,15,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,Often,,,,,,Most of the time,,,Often,Sometimes,,,,76-99% of projects,Approximately half internal and half external,Business Department,,"Connection, speed and dirty data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,3600000,COP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,"Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,3 to 5 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,,,Often,,,,,,,Often,,,,Most of the time,Often,,Most of the time,,,,20,50,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,Most of the time,,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,"IBGE, FGV ",Data source small or dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +A different identity,Brazil,60,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,SQL,Other,Java,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Data Machina Newsletter,< 1 year,,,,,,,,,,,,,,,Workstation + Cloud service,,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,DBA/Database Engineer,Self-taught,95,1,1,1,1,1,Computer Vision,Decision Trees - Random Forests,,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Mathematica,Deep learning,Scala,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Company internal community,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,Not Useful,Somewhat useful,Very useful,,,,,,Very useful,,Somewhat useful,Not Useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,"Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,20,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,High school,Internet-based,100 to 499 employees,Stayed the same,3-5 years,A tech-specific job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,,"Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Python,Spark / MLlib,SQL,Unix shell / awk",Sometimes,Most of the time,,,Most of the time,,Often,Sometimes,Often,,,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,Often,,,,,,Often,,,,"A/B Testing,Natural Language Processing",Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,0,25,65,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,I prefer not to say,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,Sometimes,,Sometimes,,,,Often,,,Often,,Most of the time,Most of the time,,,,Less than 10% of projects,More internal than external,Other,none,lack of documentation,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,what ever the other person can understand,Git,Never,165000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Master's degree,Mathematics or statistics,,Statistician,University courses,NA,NA,NA,NA,NA,NA,"Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,Other,21,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Linear Digressions Podcast",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,,,"DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Brazil,37,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Other,,"Business Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,RapidMiner (commercial version),Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,100,0,0,0,0,0,,,A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Video data,Text data",Rarely,<1MB,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,5,5,5,65,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others",Often,Most of the time,,Most of the time,Often,Often,,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Never,126850,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,,Not Useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",1-2 years,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Operations Research Practitioner,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,10,15,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Female,Indonesia,28,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Other",Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,45,0,0,5,25,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by government,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Tutoring/mentoring",,,,,,,Somewhat useful,,,,Very useful,,,,,,Very useful,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Fine arts or performing arts,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,20,20,30,0,0,Time Series,"Bayesian Techniques,Logistic Regression",High school,Government,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",Sometimes,,Often,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,Most of the time,,,,15,10,45,15,15,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,Sometimes,Sometimes,Often,,,,Often,,,,,,Most of the time,,,,,,Most of the time,,100% of projects,More internal than external,Central Insights Team,CMS data,The data warehouse and IT infrastructure is not set up for data science workflows.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Never,85000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Kaggle competitions,30,0,0,0,70,0,Computer Vision,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Brazil,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Other,R,Google Search,"Arxiv,Blogs,Kaggle,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,,University courses,50,0,30,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Internet-based,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,R,Spark / MLlib,SQL,Stan",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,Often,Rarely,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Most of the time,,,Most of the time,,Most of the time,Most of the time,Sometimes,,Most of the time,,,,Often,,Often,Rarely,,Most of the time,,Most of the time,,,,,Most of the time,,Sometimes,,,,22,25,12,16,25,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,Sometimes,,,,,Rarely,,,,,,,Sometimes,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,"We work on human sensory data (what people smell and taste in food and beverage products) which is notoriously difficult data because of the high dimensionality, non-linearity, non-iid and homogeneous population, and the non-normal distribution of flavor in what we taste. It takes a lot of preprocessing to make our data usable for predictions. ","Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed part-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,30,10,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Segmentation,Text Analytics",Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,34,0,0,33,33,0,Enough to run the code / standard library,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,,,,Rarely,,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,,9,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Survival Analysis,Python,Google Search,"Blogs,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,6 to 10 years,"Data Scientist,Researcher",Self-taught,25,50,25,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,1TB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Julia,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Often,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis,Other",Most of the time,,,,,Most of the time,Most of the time,Sometimes,,,,,,Rarely,,Sometimes,,,Often,Often,Sometimes,Most of the time,Sometimes,,,,Often,,Often,Often,Rarely,,,0,0,10,15,75,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,Sometimes,Often,Most of the time,,Most of the time,,,,,,,Often,,,,,Often,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,226000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,37,Employed full-time,,,Yes,,Other,Fine,Employed by government,R,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Official documentation,,,,,,,,,,Very useful,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Business Analyst,Self-taught,50,25,25,0,0,0,Adversarial Learning,Ensemble Methods,A doctoral degree,Government,"5,000 to 9,999 employees",Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,10GB,Decision Trees,"SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,20,30,20,0,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Privacy issues",Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,51-75% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Other,Never,75000,CAD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by government,TensorFlow,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,6 to 10 years,"Business Analyst,Data Analyst,Statistician",Self-taught,25,25,40,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,500 to 999 employees,Increased slightly,3-5 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,SAS Base,SAS JMP",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,Often,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Often,Rarely,,,,,Sometimes,,Often,,,,Sometimes,Most of the time,Most of the time,Often,,,Sometimes,Most of the time,,,Most of the time,,,,20,30,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Sometimes,,,Most of the time,Often,,Sometimes,,,Most of the time,,Often,Most of the time,Often,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,140000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Germany,38,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Personal Projects,Textbook",Very useful,,,,,,Very useful,,,,,Very useful,,,Very useful,,,,"O'Reilly Data Newsletter,Talking Machines Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Physics,1 to 2 years,"Business Analyst,Data Miner,Data Scientist,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",0,50,30,20,0,0,"Recommendation Engines,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,48,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Fine,Employed by non-profit or NGO,Julia,Proprietary Algorithms,Java,Google Search,"Kaggle,Personal Projects,Podcasts,Tutoring/mentoring",,,,,,,Very useful,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,"FlowingData Blog,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",,Master's degree,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Business Analyst,Computer Scientist,Data Miner,DBA/Database Engineer,Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,10,10,20,0,Time Series,Ensemble Methods,,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Other,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,Ireland,42,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,DBA/Database Engineer,Self-taught,NA,30,0,70,0,0,,Logistic Regression,A master's degree,Retail,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,,,,"Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Statistics,Python,R,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Often,,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,Most of the time,Most of the time,,,Often,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,50,0,10,10,0,30,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,100% of projects,Entirely internal,Other,Geographical Data,knowing about the huge ecosystem,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,100000,EUR,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Deep learning,Python,,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler,Researcher",University courses,40,10,10,40,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Other",,Sometimes,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,Often,,,,,,,Most of the time,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,,,,,Rarely,Often,,,,,,,,Often,,,,,,Often,Often,Often,,Rarely,,,Rarely,,,,,0,30,10,0,10,50,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Rarely,Often,,,Often,,,,,Sometimes,Rarely,,,,Often,,Rarely,,Less than 10% of projects,More internal than external,Standalone Team,"Social, Census, Federal, State, County",Paying customers,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Trade book",Very useful,,,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,"Becoming a Data Scientist Podcast,FastML Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,Self-taught,65,25,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,United States,43,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,IBM Cognos,,SQL,,Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,6 to 10 years,Data Analyst,Self-taught,20,30,50,0,0,0,Outlier detection (e.g. Fraud detection),,A bachelor's degree,Insurance,100 to 499 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,,,Other,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Cross-Validation,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,100,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,,,,,Always,80000,USD,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,More than 10 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Non-profit,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,10MB,"Decision Trees,Neural Networks,Random Forests","KNIME (free version),Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Text Analytics",,,Sometimes,,,Often,,,Often,,,,,Sometimes,Often,Often,,Often,,Most of the time,,,,,,,,,Often,,,,,25,25,5,25,20,0,Enough to tune the parameters properly,"Explaining data science to others,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,Sometimes,,,100% of projects,Entirely internal,Other,"Ncoa, american community survey, tax returns summarized by IRS",Getting all elements to a common and useful level of detail.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Sometimes,,,,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Kaggle,Textbook",Very useful,,,,,,Very useful,,,,,,,,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Engineer,Researcher",University courses,30,40,20,5,5,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Python,Spark / MLlib,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",Rarely,,Sometimes,Sometimes,Sometimes,Often,Often,Sometimes,Sometimes,,,,,,,Often,,Often,Most of the time,Sometimes,,,Sometimes,Often,Sometimes,Often,,Often,Often,,,,,30,40,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Organization is small and cannot afford a data science team",Most of the time,,,Often,Often,,,,,,,,,,,Often,,,,,,,76-99% of projects,More internal than external,IT Department,wikepedia,lack of data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Sometimes,,CNY,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,New Zealand,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,1 to 2 years,"Machine Learning Engineer,Other",Self-taught,70,20,10,0,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Video data",Always,100GB,CNNs,"Amazon Web services,C/C++,Python,TensorFlow,Unix shell / awk",,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,PCA and Dimensionality Reduction,Segmentation",,,,Most of the time,,Often,Often,,Sometimes,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,40,20,20,5,15,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,26-50% of projects,More internal than external,Standalone Team,None,Labelling; getting_enough_data;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,External HDD,"Bitbucket,Git",Sometimes,80000,NZD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,South Korea,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Matlab,"Google Search,University/Non-profit research group websites","College/University,Conferences,Online courses",,,Somewhat useful,,Very useful,,,,,,Very useful,,,,,,,,,1-2 years,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,More than 10 years,Computer Scientist,University courses,30,30,0,40,0,0,Computer Vision,"Decision Trees - Random Forests,Ensemble Methods",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important +Male,Russia,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,Other",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Very useful,,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Data Scientist,University courses,10,5,34,50,1,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Other,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,Often,,,,,,,,Rarely,Sometimes,,,,,,Often,Most of the time,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,Rarely,Rarely,,,Often,Often,Often,Often,,,,,Often,Often,Often,,Sometimes,Sometimes,,Often,,Often,Sometimes,,Often,Sometimes,,Sometimes,Sometimes,,,,60,15,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,Sometimes,,,,Often,,100% of projects,Do not know,Standalone Team,Too many to list,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,130000,AMD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Researcher,Other",Self-taught,95,5,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,Mix of fields,,,,,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,10MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,R,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,Rarely,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Most of the time,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Often,Sometimes,,,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,100% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,Yes,,Data Miner,Fine,Employed by college or university,,,,,"College/University,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,,Somewhat useful,Very useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Engineer,Machine Learning Engineer,Programmer",Self-taught,65,5,10,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Survival Analysis","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Stayed the same,,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Video data,Text data",Always,100GB,"CNNs,GANs,Neural Networks","C/C++,Java,Perl,RapidMiner (free version),SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,Most of the time,,,,,,Often,,,,"CNNs,GANs,Neural Networks,RNNs,Simulation",,,,Most of the time,,,,,,,Often,,,,,,,,,Most of the time,,,,,Sometimes,,Most of the time,,,,,,,75,15,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Scaling data science solution up to full database",Sometimes,,,,Most of the time,,,,,,,,,,,,,Often,,,,,10-25% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,,,,6,,,,,,,,,,,,,,,,,, +Male,Canada,23,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,,Self-taught,40,0,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Rarely,,Most of the time,,,,,Often,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,Most of the time,Often,,Most of the time,Often,,,,35,15,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,Sometimes,,Sometimes,Often,,,,,Often,Often,,,,,Sometimes,Sometimes,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,70000,CAD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Julia,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Newsletters,Online courses,Personal Projects",,,,,Somewhat useful,,Very useful,Very useful,,,Very useful,Very useful,,,,,,,"Data Stories Podcast,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,0,10,20,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Most of the time,100TB,"Decision Trees,Random Forests","Google Cloud Compute,Hadoop/Hive/Pig,NoSQL,Python,Spark / MLlib",,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests,Recommender Systems",Often,,,Sometimes,Often,Often,Often,Often,,,,Often,,,,,,,,,,,Sometimes,Often,,,,,,,,,,50,20,5,5,20,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues",,,,Often,Most of the time,,,,Often,,,,,,,,Most of the time,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Rarely,95000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,1 to 2 years,Other,Self-taught,20,10,20,0,0,50,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,,,,,,,,,,,Often,,,,Often,Most of the time,,Most of the time,,,,,,,,Often,,,,,30,25,20,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",Often,,,,Most of the time,,,,,Often,,,,,,,,Sometimes,,,,,10-25% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,117000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,Somewhat useful,,,Not Useful,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,< 1 year,,,,,,,,,,,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,No education,Technology,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,100MB,Random Forests,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Random Forests",,,,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,0,30,0,70,0,0,Enough to run the code / standard library,Difficulties in deployment/scoring,,,,Often,,,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Business Department,No outside data ,Cleaning data,Other,Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,658000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,KNIME (free version),"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,30,20,5,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Friends network,Kaggle,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,,Very useful,,Very useful,Very useful,Very useful,,,,,,,,Very useful,Very useful,Very useful,Very useful,"R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Personal Projects",,Very useful,Very useful,Very useful,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Business Analyst,Machine Learning Engineer,Programmer",University courses,NA,NA,NA,NA,NA,NA,Time Series,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Female,United States,32,Employed part-time,,,Yes,,Predictive Modeler,Fine,Employed by non-profit or NGO,R,Time Series Analysis,SQL,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,Data Analyst,Self-taught,70,0,20,10,0,0,"Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Non-profit,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","IBM Cognos,IBM SPSS Modeler,R,SQL",,,,,,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Neural Networks,Prescriptive Modeling,SVMs,Time Series Analysis",,,Sometimes,Sometimes,,Often,,Sometimes,,,,,,Sometimes,,,,,,Often,,Sometimes,,,,,,Sometimes,,Sometimes,,,,60,30,5,0,5,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process",,Often,,,,,,Sometimes,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Data Stories Podcast,Jack's Import AI Newsletter,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,,,,,,,,,,,,,,,, +Male,United States,32,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,NoSQL,Time Series Analysis,Python,"Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,25,40,25,10,0,0,,,A master's degree,Academic,,,,"A friend, family member, or former colleague told me",Not very important,Other,Basic laptop (Macbook),Other,Most of the time,,,"Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Simulation",,,,,,,Often,,,,,,,,,,,,,,,,,,,,Often,,,,,,,5,10,20,5,60,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,Other,"Email,Other",globus,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,47000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,GitHub,"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity,Other","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Engineer,Kaggle competitions,30,40,0,0,30,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,People 's Republic of China,100,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Conferences,Online courses,Other",Very useful,,,,Very useful,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Coursera,GPU accelerated Workstation,11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,,"Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Reinforcement learning,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important +Male,United States,43,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Udacity,GPU accelerated Workstation,40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important +Male,United States,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Company internal community,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,,,,,"N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Video data,Most of the time,100TB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,TensorFlow,Other",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,Most of the time,,,"Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Simulation",,,,,,,Most of the time,,Sometimes,,,Sometimes,,,,Sometimes,,,,Often,,,Often,,,,Often,,,,,,,30,10,40,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Often,,,Often,,,,Often,,,,Most of the time,,,Most of the time,,,Often,,Often,,51-75% of projects,Approximately half internal and half external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Email,,Git,Most of the time,160000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Bayesian Methods,Python,Government website,"Conferences,Kaggle,Newsletters,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,,,< 1 year,Necessary,,Necessary,,Necessary,Necessary,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Textbook",,Very useful,Somewhat useful,,,,Very useful,,,,,,,,Somewhat useful,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,10,0,0,40,10,"Adversarial Learning,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Other,Python,,"Arxiv,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,"Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service",Other,Always,100MB,Regression/Logistic Regression,"C/C++,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Rarely,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,Rarely,,,,0,20,79,0,1,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Rarely,,Sometimes,,,Sometimes,Sometimes,,,,Rarely,Sometimes,,,,,Most of the time,Sometimes,Most of the time,,100% of projects,Entirely internal,IT Department,,Lack of a database,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Rarely,255000,USD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Mexico,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Julia,Bayesian Methods,Java,GitHub,"Non-Kaggle online communities,Official documentation",,,,,,,,,Very useful,Very useful,,,,,,,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer",Work,50,0,50,0,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs",A master's degree,Telecommunications,10 to 19 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Markov Logic Networks,Neural Networks","Amazon Machine Learning,Amazon Web services,Angoss,C/C++,Cloudera,DataRobot,Flume,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Java,Julia,Jupyter notebooks,MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python,QlikView,R,Spark / MLlib",Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other,Other,Other",Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,0,0,0,100,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",Most of the time,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Sometimes,92000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Belgium,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,"FlowingData Blog,Jack's Import AI Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,University courses,30,40,10,10,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,10 to 19 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Other",Rarely,1TB,"Ensemble Methods,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,Tableau,TIBCO Spotfire",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Random Forests,SVMs",,,,,,Most of the time,Most of the time,,,,,Often,,,,,,,,,,,Often,,,,,Often,,,,,,20,0,0,50,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,Most of the time,,,Most of the time,,,,,Most of the time,Most of the time,,,100% of projects,More external than internal,Standalone Team,"Pride, European projects ",Lack of gold standard or domain expertise ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,24000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Taiwan,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Conferences,YouTube Videos",Somewhat useful,,,,Not Useful,,,,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,80,0,0,20,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,YouTube Videos",,Very useful,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,,,A bachelor's degree,Retail,"1,000 to 4,999 employees",Decreased slightly,1-2 years,Some other way,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,1MB,"Bayesian Techniques,Random Forests","Amazon Machine Learning,Python,R,Tableau",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,Often,,,,,,,"A/B Testing,Random Forests,Time Series Analysis",Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,40,0,0,30,30,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by government,Julia,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,50,30,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Rarely,,Often,,,,Sometimes,Rarely,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Sometimes,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",,,,,,Rarely,Most of the time,Sometimes,,,,,,,,Often,,,,Rarely,,Often,Sometimes,,,,Sometimes,,,Sometimes,,,,60,20,0,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,Most of the time,,76-99% of projects,Approximately half internal and half external,Other,,Access to tools and cleaning the data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,90000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Singapore,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,,Very useful,,Not Useful,,Very useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,25,10,5,40,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Male,Australia,18,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Operations Research Practitioner,Perfectly,Employed by government,,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,Other,Other,10,0,0,90,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Government,I don't know,Stayed the same,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Traditional Workstation",Relational data,Don't know,<1MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,SAS JMP",,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Often,,Often,,,,,,,Often,,,,,,,,,,,,"Cross-Validation,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,Sometimes,,,,,,,,,,Often,,,,Often,,,Often,,,,,Often,,,,,,40,20,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Sometimes,,,51-75% of projects,Approximately half internal and half external,Other,,ill conditioned data matrices,Other,Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,80000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,,SQL,"I collect my own data (e.g. web-scraping),Other",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Business Analyst,Self-taught,100,0,0,0,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Rarely,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,90,0,0,0,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,None,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Git",Sometimes,,,Other,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,GPU accelerated Workstation,40+,Github Portfolio,No,Master's degree,Electrical Engineering,,"Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,Very useful,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,,Very useful,"Data Elixir Newsletter,DataTau News Aggregator,KDnuggets Blog",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,25,15,10,25,15,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Malaysia,27,Employed full-time,,,Yes,,Other,Fine,Employed by government,Other,Deep learning,SQL,"GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,Other,Other,Other",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"Programmer,Other",Self-taught,60,15,25,0,0,0,,,A bachelor's degree,Government,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,Often,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,Often,,,,"A/B Testing,Lift Analysis,Logistic Regression,Prescriptive Modeling,Time Series Analysis",Often,,,,,,,,,,,,,,Often,Often,,,,,,Often,,,,,,,,Often,,,,15,5,5,35,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Often,Often,,Most of the time,Most of the time,Most of the time,,,,Often,,Most of the time,,,,Often,Most of the time,,51-75% of projects,More external than internal,Standalone Team,patents registration; IPO sets; IP datasets; census data,Compatibility; Dirty Data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"3,200",MYR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Python,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Engineer,Researcher",Self-taught,40,10,25,25,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - RNNs",Primary/elementary school,Technology,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Sometimes,10MB,"Bayesian Techniques,Neural Networks",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Neural Networks",,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,25,30,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,Often,,Sometimes,,Most of the time,,26-50% of projects,Approximately half internal and half external,IT Department,Public,Clean,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"36,000,000",COP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Canada,29,"Not employed, but looking for work",,,,,,,,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Textbook",,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,Not Useful,,,,,3-5 years,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,A health science,3 to 5 years,"Data Analyst,Researcher,Other",University courses,25,20,40,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Canada,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher",University courses,20,0,30,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,Impala,Jupyter notebooks,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,Often,,,Often,Often,,Rarely,,,Sometimes,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,Rarely,,,Sometimes,Sometimes,,,Often,,,Often,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",Often,,Sometimes,,Sometimes,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Often,Often,Often,Sometimes,,Often,Sometimes,Sometimes,Sometimes,,,,,45,15,5,15,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data,Other",,,,,Sometimes,Often,,,Sometimes,,Often,,,,,,,,,,Sometimes,Most of the time,100% of projects,More internal than external,Central Insights Team,"Weather, demographics, social media",Getting access to client data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git,Subversion",Sometimes,107000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Hong Kong,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Machine Learning Engineer,Programmer",Work,10,20,30,30,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,100 to 499 employees,Decreased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,R,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Most of the time,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Simulation,Time Series Analysis",Rarely,,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,Sometimes,,,,Most of the time,,,,Sometimes,,,Often,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Sometimes,208000,HKD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,30,0,20,0,"Computer Vision,Machine Translation",Neural Networks - CNNs,,Technology,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service",Image data,Sometimes,10GB,CNNs,"C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,PCA and Dimensionality Reduction",,,,Often,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,20,20,20,10,30,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,minist; KITTI,Data preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Subversion,,250000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",R,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Company internal community,Online courses,Textbook,YouTube Videos",,,,Very useful,,,,,,,Very useful,,,,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,"Data Analyst,Data Scientist",Work,10,10,40,40,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods","Google Cloud Compute,Python,R,Tableau",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Time Series Analysis",Most of the time,,Often,,,,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,40,10,5,25,20,0,Enough to tune the parameters properly,"Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,51-75% of projects,More internal than external,Business Department,proprietary almost exclusively,Flattening data with repeated records,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,Git,Rarely,"190,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,"Not employed, but looking for work",,,,,,,,Python,Regression,Python,University/Non-profit research group websites,"College/University,Online courses,Tutoring/mentoring",,,Somewhat useful,,,,,,,,Very useful,,,,,,Very useful,,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Other,University courses,0,10,0,60,0,30,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Not important +Female,Australia,25,Employed part-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Random Forests,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,30,10,25,30,5,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Most of the time,10MB,Decision Trees,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Time Series Analysis",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,10,30,30,10,20,0,Enough to run the code / standard library,"Explaining data science to others,Inability to integrate findings into organization's decision-making process",,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,10-25% of projects,More internal than external,,"general economic, social and political info",,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,Employed part-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,DataRobot,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,Self-taught,30,0,20,50,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,10MB,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R,SAS Base",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,Often,,,,,,,,,,,,,,"Logistic Regression,Random Forests",,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,40,20,20,10,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data",,Often,,,Most of the time,,,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Egypt,22,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,C/C++/C#,Google Search,"College/University,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,,Very useful,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,50,20,10,10,0,"Computer Vision,Natural Language Processing",Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,37,"Not employed, but looking for work",,,,,,,,Microsoft R Server (Formerly Revolution Analytics),,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,"Data Machina Newsletter,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,32,Employed full-time,,,No,Yes,Data Scientist,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,1-2 years,,,Necessary,,Necessary,,Necessary,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Natural Language Processing,Time Series",Other (please specify; separate by semi-colon),A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,,,,,,Somewhat important,Very Important,Very Important,,,,Very Important,,, +Male,United States,51,Employed full-time,,,Yes,,Data Scientist,,,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Researcher",Self-taught,50,30,10,10,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Perl,Python,R,SQL,Tableau,TensorFlow",Sometimes,Most of the time,,Often,Often,,,,Often,,,,,Sometimes,Often,,Most of the time,,,,Sometimes,Sometimes,,,,,Often,,,Rarely,Most of the time,,Often,,,,,,,,,Often,,,Sometimes,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Most of the time,Often,,Most of the time,,Sometimes,Most of the time,,Often,Sometimes,Sometimes,Often,,Often,,Often,Most of the time,Most of the time,Often,,Sometimes,,Most of the time,,,Rarely,Most of the time,Most of the time,,,,20,50,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Sometimes,,Often,Often,,,,Often,,,Often,,,,,Sometimes,,Most of the time,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,NLP academic paper benchmark datasets,"preserving privacy of regulated, sensitive data","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Git",Sometimes,200000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,,"Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,Very useful,,1-2 years,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,edX",,0 - 1 hour,PhD,No,Master's degree,A social science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,19,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,,SAS,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,0 - 1 hour,,No,Some college/university study without earning a bachelor's degree,Management information systems,I don't write code to analyze data,I haven't started working yet,Other,0,0,0,0,0,100,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,40,"Not employed, but looking for work",,,,,,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Other","Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",5,15,80,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,59,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,35,"Not employed, but looking for work",,,,,,,,Cloudera,Anomaly Detection,Python,GitHub,"Friends network,Kaggle,Stack Overflow Q&A",,,,,,Somewhat useful,Very useful,,,,,,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Professional degree,,Less than a year,"Business Analyst,Engineer",University courses,10,20,0,60,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important +Male,United States,16,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites",YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Not important,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer",Self-taught,60,20,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",United States,45,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Other,Neural Nets,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Not Useful,,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Survival Analysis,Time Series",,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Female,Nigeria,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX",,2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",Very useful,,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,0,0,0,80,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,55,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Anomaly Detection,Python,University/Non-profit research group websites,"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Not Useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Computer Vision,,A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python",,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Neural Networks,SVMs",,,Sometimes,Sometimes,,Often,,,Sometimes,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,50,0,0,0,50,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Most of the time,,,,Often,,,,,,,Sometimes,Most of the time,,,,,,26-50% of projects,More internal than external,Other,LITA,privacy issues,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"10,000,000",JPY,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other,,,,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important +Female,United States,17,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,NoSQL,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,Less than a year,Predictive Modeler,Self-taught,70,0,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL",Rarely,Often,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,,,Often,,Sometimes,,,,,Sometimes,,Sometimes,,,,Often,,,Sometimes,,,,10,50,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Sometimes,,Sometimes,,,,Often,,76-99% of projects,Approximately half internal and half external,Standalone Team,FDIC,Formatting,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,,,,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,25,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses",Somewhat useful,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Other,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Male,United States,14,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Personal Projects,Podcasts",,,,,,,Somewhat useful,,,,,Very useful,Somewhat useful,,,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Other,No,I did not complete any formal education past high school,,,"Business Analyst,I haven't started working yet",Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,South Korea,30,Employed full-time,,,Yes,,Data Miner,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Machine Learning Engineer,Operations Research Practitioner,Researcher",University courses,20,0,0,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,RNNs,SVMs","Java,R,SQL,TensorFlow,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,Rarely,,,Often,,,"Cross-Validation,Data Visualization,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,,Often,,,,,,,,,,,Sometimes,Sometimes,,Often,,Often,Most of the time,,Most of the time,,Most of the time,,,,20,20,20,20,20,0,,"Dirty data,Explaining data science to others,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,,,,,,,,,,Often,Most of the time,Often,,,10-25% of projects,Entirely internal,,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,10000,,,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Cluster Analysis,Java,Google Search,"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,The Analytics Dispatch Newsletter",1-2 years,Necessary,Necessary,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,70,20,0,0,10,NA,Survival Analysis,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Japan,32,Employed full-time,,,Yes,,Data Scientist,,,Stan,Bayesian Methods,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Stack Overflow Q&A",,Very useful,,,,,,,,,,,,Very useful,,,,,"Data Elixir Newsletter,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Researcher,Other,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",High school,Mix of fields,"1,000 to 4,999 employees",Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,Other","Amazon Web services,IBM SPSS Modeler,Python,R,SAS Enterprise Miner,Stan,Tableau",,Rarely,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,Rarely,,,,Sometimes,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Random Forests,Time Series Analysis",Sometimes,Rarely,Sometimes,,,Often,Most of the time,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,20,20,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,Often,,,Most of the time,,,,Most of the time,,,Most of the time,,,,Often,,,100% of projects,Entirely internal,Business Department,economic data;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"8,500,000",JPY,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Canada,21,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,"FlowingData Blog,R Bloggers Blog Aggregator",< 1 year,,,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,,,,Other,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection)",,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,R,University/Non-profit research group websites,"Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",15,30,0,50,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,YouTube Videos,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Image data,Never,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Cross-Validation,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Company politics / Lack of management/financial support for a data science team,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,,,,,,,,,,Very useful,Very useful,,Very useful,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",55,40,0,5,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important +Male,Other,26,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,Somewhat useful,,Very useful,,Very useful,,Very useful,Somewhat useful,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,60,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,"5,000 to 9,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Often,Most of the time,Sometimes,,,,,,Most of the time,,Sometimes,,,,Sometimes,Most of the time,,,,,,,Sometimes,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,Most of the time,Sometimes,Often,Most of the time,,Most of the time,Sometimes,,Often,,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,26-50% of projects,Entirely internal,IT Department,,No datawarehousing. I have to find the data from messy transactional data bases.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Never,5500,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Malaysia,34,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,25,0,0,50,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,,,,,,"Amazon Web services,Python,R,TensorFlow",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,Rarely,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization",Most of the time,,,,,Often,,,Most of the time,,,,,,,,,,,,,,None,Do not know,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Share Drive/SharePoint,,Git,Never,77000,USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Official documentation",,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Other,University courses,0,0,0,100,0,0,,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,United States,38,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Very useful,,Very useful,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,40,0,20,0,40,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM Cognos,MATLAB/Octave,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,TIBCO Spotfire",,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,Sometimes,,Often,,,,,,,,,Often,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,,,,,Rarely,,,,,Often,,Often,,,,,,Often,Most of the time,,,,10,70,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,Often,,,,,,,Often,,Often,,Sometimes,,,Sometimes,Often,,10-25% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,I don't typically share data",,Other,Sometimes,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",,Monte Carlo Methods,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher",Self-taught,40,10,20,20,10,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Sometimes,1TB,"CNNs,Decision Trees,GANs,Gradient Boosted Machines,Neural Networks,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Sometimes,,,Sometimes,,Sometimes,,Often,,Often,Often,,,,Often,Often,,Often,,Often,,,,40,30,5,25,NA,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,Most of the time,Often,,,,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,20000,USD,Other,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Mexico,26,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher",University courses,55,20,5,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",High school,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10TB,"Bayesian Techniques,SVMs","Hadoop/Hive/Pig,Java,Python,R",,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Segmentation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Rarely,,,,,Often,,,,,,,Sometimes,,,,Rarely,,,,10,10,30,50,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,Often,,,,,,Often,,,,100% of projects,More internal than external,Business Department,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,0,10,0,60,20,10,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,,"Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Lift Analysis,Segmentation",Sometimes,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,0,0,0,0,100,0,,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,51-75% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Rarely,,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,Amazon Machine Learning,MARS,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,Very useful,,,Somewhat useful,,Very useful,Very useful,,,,"FastML Blog,Jack's Import AI Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,R,SQL",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Most of the time,,,,,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Often,,Often,,,Most of the time,,Most of the time,,,,,Most of the time,,Often,,,,10,60,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Most of the time,,Most of the time,,,Most of the time,,Often,,Often,Most of the time,,,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Most of the time,465000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,20,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,Talking Machines Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,85,15,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,Researcher,University courses,60,0,30,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A professional degree,Mix of fields,I prefer not to answer,Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,Regression/Logistic Regression,"R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Most of the time,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,Often,,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,Census; NCES; Other Government Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Tableau,Monte Carlo Methods,C/C++/C#,University/Non-profit research group websites,"Online courses,Podcasts,Textbook",,,,,,,,,,,Somewhat useful,,Not Useful,,Somewhat useful,,,,"Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,"Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Other,Traditional Workstation,Other,Always,10MB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,Tableau",,Rarely,,Most of the time,,,,,,,,,,,Rarely,,Rarely,,,,Sometimes,Rarely,,,,,Rarely,,,,Often,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Simulation",,,Often,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,50,20,0,10,0,20,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization",,,Most of the time,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,More external than internal,Other,mostly from academia - too many colleges to list here,validating results of the algorithms,Other,Company Developed Platform,,Other,Never,220000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed part-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Amazon Web services,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Friends network,Personal Projects,YouTube Videos,Other,Other",,,,,,Very useful,,,,,,Somewhat useful,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Master's degree,,Less than a year,Other,Other,0,0,0,0,0,100,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Vietnam,22,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Factor Analysis,Python,I collect my own data (e.g. web-scraping),Textbook,,,,,,,,,,,,,,,Very useful,,,,"FastML Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Engineer,Self-taught,0,0,0,0,100,0,Computer Vision,Logistic Regression,A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Don't know,100MB,Neural Networks,"Java,Jupyter notebooks,SQL",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Naive Bayes,Neural Networks",,,,,,,Often,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,0,0,0,90,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,Often,,,,Often,,,Most of the time,,,Most of the time,,,,,,,,76-99% of projects,Do not know,IT Department,dont know,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Bitbucket,Never,10000000,VND,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Self-employed,Python,Decision Trees,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,Survival Analysis,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Female,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,,Necessary,,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,50,20,0,20,10,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,,Very Important,,,,,,Very Important,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Not Useful,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,6 to 10 years,"Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,10,10,30,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,100MB,"Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs",Rarely,Sometimes,Sometimes,,,Most of the time,Often,,,,,,,Rarely,,Often,,Sometimes,Rarely,,Sometimes,,Sometimes,,,,,Often,,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,Often,Sometimes,,,Often,Sometimes,,Rarely,,,,Sometimes,,Rarely,,Sometimes,Most of the time,Rarely,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,180000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,Python,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects",,Very useful,,,,,,,,,,Very useful,,,,,,,"FlowingData Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Rarely,100MB,Regression/Logistic Regression,"Cloudera,Jupyter notebooks,Python,R,SAS Base,SQL,Tableau",,,,,Rarely,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,Sometimes,,Sometimes,Often,,Often,,,Sometimes,Sometimes,,Most of the time,,Sometimes,Often,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,,Sometimes,105000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Taiwan,48,Employed full-time,,,Yes,,Engineer,Fine,Self-employed,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)",,No education,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Video data,Text data,Relational data",Never,,"CNNs,Neural Networks,RNNs","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,CNNs,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,500000,TWD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,SQL,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,5,5,0,90,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by government,Python,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,Very useful,,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Operations Research Practitioner,Statistician",University courses,30,0,30,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Government,"5,000 to 9,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","R,SAP BusinessObjects Predictive Analytics,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,,Often,Sometimes,,"Cross-Validation,Data Visualization,Random Forests,Segmentation,Simulation",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Often,,,,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,Most of the time,,,,,Often,,Often,,Often,,Often,,26-50% of projects,More internal than external,Standalone Team,Bureau of Labor Statistics; Census;,Ensuring the data are clean and useful.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"79,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,68,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,Not Useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,Necessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,75,10,0,10,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Not Useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,60,40,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A doctoral degree,Academic,500 to 999 employees,,,A tech-specific job board,Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Relational data",,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,SQL,Stan,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,Often,,,Rarely,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation",,,Often,,,Sometimes,Most of the time,,,,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,Often,,,,,,,20,75,0,5,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Other,53,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",High school,Manufacturing,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Text data,Most of the time,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Minitab,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,Often,,,Rarely,,,,,Often,,Often,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"CNNs,Naive Bayes,Neural Networks,RNNs,Text Analytics",,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,Often,,,,,50,20,5,0,10,15,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,Often,,,,,,Often,,,,,Often,,,,,,,76-99% of projects,Entirely internal,IT Department,,clean up,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,175000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +A different identity,United States,45,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Cluster Analysis,R,University/Non-profit research group websites,"Blogs,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,,,Somewhat useful,,Very useful,,Very useful,Very useful,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Programmer,Other",Self-taught,75,0,0,25,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Rarely,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Time Series Analysis",,,Rarely,,,Sometimes,Often,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Often,,,,80,10,0,5,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,,,,,,,,,Rarely,,Often,Most of the time,,,76-99% of projects,More external than internal,Standalone Team,NCCC,cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Dropbox,Generic cloud file sharing software (Dropbox/Box/etc.),Always,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Other,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",IBM SPSS Modeler,Survival Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,,Necessary,,Necessary,Necessary,,,,,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",35,45,20,0,0,0,,,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important +Male,Russia,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,,Very useful,Very useful,,Very useful,,,3-5 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Machine Learning Engineer",University courses,40,20,5,20,15,0,Natural Language Processing,Bayesian Techniques,A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,India,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Unix shell / awk,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,Somewhat useful,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Always,1GB,"Bayesian Techniques,Decision Trees,Random Forests,SVMs","Microsoft Excel Data Mining,NoSQL,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,,,,Often,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,,,Often,Most of the time,Often,,,,,,Sometimes,,,,,,,Often,,Often,,,,,,,,,,,50,25,5,15,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Most of the time,,,,Sometimes,,,,Often,,,,,,,,,,,,,,26-50% of projects,Do not know,Standalone Team,Kaggle; Data.gov; web scraping,Cleaning the data or modifying the data set to my desired format,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,475000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Mexico,22,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Emergent/Future Newsletter (Algorithmia),FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Machine Learning Engineer,Programmer",Work,75,0,20,5,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Most of the time,,Decision Trees,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,0,50,0,50,0,0,Enough to run the code / standard library,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,SAS Base,Time Series Analysis,R,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,,,,,Somewhat useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Programmer,Statistician",Self-taught,50,10,0,20,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Very Important +Male,Brazil,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,,,,,,,Very useful,"Data Machina Newsletter,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,40,10,10,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Retail,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests","Google Cloud Compute,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,Sometimes,Rarely,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Often,,Often,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Sometimes,,,Often,Most of the time,Often,Often,,,,,,,Often,,Sometimes,,Often,Most of the time,,Often,,,,,Often,,,,,,50,20,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Often,Most of the time,,76-99% of projects,Entirely external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,100000,BRL,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,64,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Stack Overflow Q&A,Textbook",,,,Not Useful,Very useful,,,,,,,,,Very useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,20,0,50,30,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition","Bayesian Techniques,Gradient Boosting,Support Vector Machines (SVMs)",High school,Manufacturing,"1,000 to 4,999 employees",Increased slightly,6-10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Recommender Systems",,,Often,,,Often,,,Sometimes,,,,,Sometimes,,,,Sometimes,,,,,,Often,,,,,,,,,,15,15,50,5,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database",Often,,,,Often,,,,Sometimes,,,,,,,,Sometimes,Often,,,,,Less than 10% of projects,More internal than external,Standalone Team,"Right now, self contained, but this will change","Primary dataset is a set of arrays, making it hard to collapse into a vector","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Not Useful,Not Useful,,,,Somewhat useful,,,Very useful,,Very useful,,Very useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,1 to 2 years,"Researcher,Statistician",Self-taught,40,0,40,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Relational data,Other",Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Python,R,SAS Base,SQL,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,Sometimes,,,,Most of the time,Often,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,,Often,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,Sometimes,Most of the time,Sometimes,,Often,Most of the time,,,,,,Most of the time,,,,Often,Most of the time,,,100% of projects,More internal than external,Other,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,68000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,C/C++,Genetic & Evolutionary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,Very useful,,Somewhat useful,,Very useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,"Data Miner,Machine Learning Engineer,Researcher",University courses,40,10,20,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10MB,"CNNs,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Python,R,Other",,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation",,,,Sometimes,,Often,Most of the time,,Sometimes,Most of the time,,,,Sometimes,,Sometimes,,,,Often,Sometimes,,,,,,Most of the time,,,,,,,13,65,5,12,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,Sometimes,,,,,Sometimes,,,,,,Often,,26-50% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git,Mercurial",Rarely,484000,RUB,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Taiwan,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Very useful,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Engineer,Predictive Modeler,Researcher",Self-taught,15,15,15,20,5,30,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Base,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,Most of the time,Sometimes,Most of the time,Often,,Often,,,,Most of the time,,Most of the time,,,,,Often,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Sometimes,,,Most of the time,Most of the time,Often,Often,,,Often,,,,Most of the time,,Often,,,Most of the time,Sometimes,Most of the time,,,,Most of the time,Often,Most of the time,Most of the time,,,,50,10,15,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Often,,Most of the time,,,Often,,Often,,,,,,Often,,Often,,Often,Often,,10-25% of projects,Approximately half internal and half external,Business Department,"Bloomberg, WRDS, Reuters","Data ETL, and not unified index for matching data from different sources","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,1100000,TWD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Decision Trees,Python,"Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Personal Projects,Podcasts",,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,Not Useful,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,42,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Business Analyst,Data Miner,Engineer,Software Developer/Software Engineer",Self-taught,10,50,40,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,100GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,Sometimes,,,Sometimes,Most of the time,Most of the time,Sometimes,,,,Sometimes,,,,Often,,,Often,,Often,,Sometimes,Often,,,,Sometimes,Often,,,,,40,40,0,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,Sometimes,,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,Often,,Most of the time,,,,,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,none,to organize and reduce size in order to handle easily,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Bitbucket,Git",,6500000,JPY,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,20,30,30,20,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Academic,100 to 499 employees,Stayed the same,Don't know,Some other way,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data",Don't know,100MB,"CNNs,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Time Series Analysis",,,,Often,,Most of the time,Often,,,Often,,,,Sometimes,,Often,,,,Most of the time,,,,,Often,,,,,Often,,,,10,50,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools",Often,,,,,,,,Often,,Sometimes,Often,Often,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",Github,Git,Sometimes,80000,BRL,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by college or university",Amazon Machine Learning,Text Mining,R,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,10,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Taiwan,26,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,Very useful,,Very useful,,,,Somewhat useful,Very useful,,,,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Programmer,Work,10,5,50,0,35,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Text data,Sometimes,1GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Python,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Ensemble Methods,Natural Language Processing,Neural Networks,Recommender Systems,Text Analytics",Rarely,,,Often,,Often,,,Often,,,,,,,,,,Often,Often,,,,Often,,,,,Often,,,,,70,29,1,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Often,,,,,,Often,Often,,,,,,,,,Often,Most of the time,,None,More external than internal,IT Department,government opendata,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Email,I don't typically share data",,Git,Rarely,"648,000",TWD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",TensorFlow,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Retail,"10,000 or more employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Java,Jupyter notebooks,Spark / MLlib,SQL",,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes",,,,,,Often,Often,Often,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,Difficulties in deployment/scoring,,,,Often,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Git",Sometimes,2200000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,1 to 2 years,"Data Analyst,Researcher",University courses,70,10,10,10,0,0,,,A bachelor's degree,Non-profit,10 to 19 employees,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,10MB,"Bayesian Techniques,Regression/Logistic Regression","R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,"Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Simulation",,,,,,Rarely,Often,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,,,,Sometimes,,,,,Sometimes,,100% of projects,More internal than external,Standalone Team,NLCD; NASA Lansat,Lack of good metadata.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Dropbox or Github,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"37,000",,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Argentina,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,College/University,Kaggle,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,,,,,,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,50,25,25,0,0,0,Recommendation Engines,"Logistic Regression,Neural Networks - CNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Somewhat useful,,Very useful,,,,,,Very useful,Somewhat useful,,Very useful,,,,,,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,PhD,Yes,Bachelor's degree,Mathematics or statistics,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,Government website,"Blogs,College/University,Non-Kaggle online communities,Stack Overflow Q&A",,Very useful,Very useful,,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,30,20,10,0,0,,Logistic Regression,A master's degree,Other,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1MB,,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,35,25,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Often,,Most of the time,,,,Sometimes,,,,,,,Often,,,,Sometimes,,,10-25% of projects,More internal than external,Business Department,,Dirty data; recorded data does not align with organizational needs,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,74000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Google Search,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Talking Machines Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,60,20,10,0,10,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Logistic Regression,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Not Useful,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),I did not complete any formal education past high school,,,"Data Analyst,Data Scientist,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Female,India,22,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by college or university,Python,Random Forests,R,"Google Search,Government website","Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,Very useful,,Somewhat useful,,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,A social science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,10,0,0,0,90,Time Series,Logistic Regression,A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Female,Kenya,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,,,,Very useful,Very useful,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,30,20,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Retail,"1,000 to 4,999 employees",Decreased slightly,6-10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,Other","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,Often,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,Most of the time,Most of the time,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",Sometimes,Often,,,Often,Often,Often,Sometimes,,Often,,Sometimes,,,,Often,,Often,,Often,,Often,,Most of the time,Sometimes,,,,Sometimes,Most of the time,,,,25,25,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Often,,,Sometimes,,,Most of the time,,,,,,,Most of the time,,,,Sometimes,,,10-25% of projects,Entirely internal,IT Department,weather data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",,600000,CNY,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",1-2 years,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,40,0,10,20,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,16-20,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,15,5,20,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Video data,Text data",Always,10GB,"CNNs,GANs,Neural Networks,Regression/Logistic Regression","C/C++,Google Cloud Compute,MATLAB/Octave,Python,RapidMiner (free version),TensorFlow,Other",,,,Most of the time,,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,,Often,,,Often,,,"CNNs,Cross-Validation,Data Visualization,GANs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,Most of the time,Sometimes,,,,Sometimes,,,,,Often,,,,Most of the time,Sometimes,,,,,,,,,,,,,20,40,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,,Often,,,,,Most of the time,Often,,,Less than 10% of projects,More internal than external,IT Department,a lot of them,processing and memory resources,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,38000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Jupyter notebooks,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,40,10,20,30,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,Other,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Relational data",Most of the time,100GB,"Ensemble Methods,Random Forests","Jupyter notebooks,MATLAB/Octave,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Often,Often,Often,,,,,,,,,,,,Sometimes,,Often,,,Often,,,,Rarely,,,,20,25,10,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,Often,,,,Sometimes,Sometimes,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,Landsat ,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Most of the time,330000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,33,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Other,Rule Induction,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,5,35,0,0,0,"Natural Language Processing,Reinforcement learning","Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Never,100GB,Neural Networks,"C/C++,Jupyter notebooks,Python,Stan,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,Often,,,,"Cross-Validation,HMMs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",,,,,,Often,,,,,,,Sometimes,,,,,,Sometimes,Often,Sometimes,,,,Often,,,,Sometimes,Rarely,,,,30,30,5,10,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,Often,,,,,,Most of the time,Often,,10-25% of projects,More internal than external,Standalone Team,"commoncrowl, wikimedia, KITTI, ImageNet",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Never,7000000,JPY,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, but looking for work",,,,,,,,SQL,Time Series Analysis,R,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,40,10,20,10,0,Outlier detection (e.g. Fraud detection),Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Amazon Web services,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Tutoring/mentoring",Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,10,0,50,30,10,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,Fewer than 10 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,100MB,"CNNs,Neural Networks,RNNs,SVMs","Amazon Web services,Python,R,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,,,,,"A/B Testing,CNNs,Naive Bayes,Natural Language Processing,RNNs,SVMs,Text Analytics",Often,,,Sometimes,,,,,,,,,,,,,,Sometimes,Often,,,,,,Often,,,Sometimes,Often,,,,,35,10,40,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,Often,Often,,,,Sometimes,Sometimes,Sometimes,,,Most of the time,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,"Google News, Twitter, Facebook",Getting enough data to train models,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Rarely,85000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",Very useful,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Not Useful,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,Researcher,University courses,25,50,10,5,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"5,000 to 9,999 employees",Increased slightly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,Often,Often,,,Sometimes,Sometimes,Sometimes,,Sometimes,,Often,,,,Sometimes,,,,,45,20,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Often,,,Most of the time,Sometimes,,,Sometimes,Rarely,Sometimes,,,,,,,,Most of the time,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,150000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by government,Python,Anomaly Detection,SAS,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Work,40,0,55,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",High school,Government,"5,000 to 9,999 employees",Decreased significantly,More than 10 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Always,100TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,Often,,,,,,,,,,,Often,,,,,,,,,,Often,,,Often,,Sometimes,Sometimes,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Often,Often,,,Often,Often,Sometimes,,,,,,Rarely,Most of the time,Most of the time,,Rarely,Sometimes,,Most of the time,Most of the time,Rarely,,,,,Rarely,Sometimes,Sometimes,,,,25,50,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,51-75% of projects,Entirely internal,IT Department,none,none,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Never,65000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,Very useful,,,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,50,49,0,0,1,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Logistic Regression",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,36,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,R,Social Network Analysis,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Professional degree,,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,,,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,,,,,,,,,,,,,,,, +Male,Other,43,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Google Cloud Compute,Proprietary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses",,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,,Self-taught,50,20,30,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,20 to 99 employees,Increased slightly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1MB,"Bayesian Techniques,Decision Trees","Google Cloud Compute,Java",,,,,,,,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",Often,,Often,,,,Often,Often,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Often,,,,10,10,0,60,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,Often,Often,,,,Often,,,,,,,,Often,,Often,,,,51-75% of projects,Approximately half internal and half external,IT Department,,"clean data inputs, appropriate parameters, meaningful outputs ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Git,Sometimes,"120,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,South Korea,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Text Mining,C/C++/C#,"University/Non-profit research group websites,Other","College/University,Online courses",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +A different identity,Other,24,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,,,,,Very useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,PhD,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,0,60,0,0,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,Very useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,55,0,0,0,10,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Data Scientist,,,Mathematica,Random Forests,,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,,Self-taught,60,10,15,10,5,0,,Decision Trees - Random Forests,A bachelor's degree,Manufacturing,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,,,Decision Trees,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,,Often,Often,,,,,,,,Sometimes,,,Often,Sometimes,,,Often,,,,,Often,Sometimes,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,26-50% of projects,,Central Insights Team,,,,I don't typically share data,,Mercurial,,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,KNIME (free version),Other,Python,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Other,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Text Mining,Python,GitHub,"Arxiv,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,,"FastML Blog,KDnuggets Blog",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,29,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Very useful,Very useful,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,KDnuggets Blog,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,A health science,Less than a year,Other,Self-taught,80,10,0,0,10,0,Reinforcement learning,"Decision Trees - Random Forests,Logistic Regression",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Female,United States,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Personal Projects,Stack Overflow Q&A",,Very useful,,,,Somewhat useful,,,,,,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,University courses,30,5,20,40,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,47,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,1-2 years,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,Online Courses and Certifications,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Kaggle competitions,20,70,0,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,Mexico,20,"Not employed, but looking for work",,,,,,,,Other,Random Forests,C/C++/C#,Government website,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer",University courses,20,20,20,20,20,0,"Adversarial Learning,Machine Translation,Recommendation Engines,Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Tableau,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,Pakistan,22,"Not employed, but looking for work",,,,,,,,Statistica (Quest/Dell-formerly Statsoft),Time Series Analysis,Stata,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,,I haven't started working yet,Self-taught,50,20,15,15,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,,Somewhat important,Very Important +Male,United States,24,Employed full-time,,,No,Yes,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Necessary,Necessary,Necessary,,Necessary,,,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,0,0,0,80,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Female,United States,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,Somewhat useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",University courses,0,10,45,45,0,0,,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,Less than a year,Data Scientist,University courses,10,30,10,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,Often,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,Sometimes,Often,Often,Often,Often,,,,,Often,,Often,,Often,Often,,Often,,Often,Sometimes,,Often,Sometimes,,Often,Often,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,Often,Often,Often,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,1300000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,,,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,I did not complete any formal education past high school,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Philippines,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Amazon Web services,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,,,Very useful,,,,,,,Very useful,,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer",University courses,5,5,10,75,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Markov Logic Networks,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Random Forests,Time Series Analysis",Most of the time,,Often,,,,Often,Most of the time,,,,,,,,Often,Often,,,,,,Often,,,,,,,Often,,,,40,20,5,10,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Git,Never,55000,PHP,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Monte Carlo Methods,Python,Google Search,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,GPU accelerated Workstation,0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Poorly,Employed by non-profit or NGO,Other,Other,Other,Other,"Non-Kaggle online communities,Other,Other,Other",,,,,,,,,Very useful,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Other,Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Military/Security,,,,,Very important,Other,Other,Other,Never,<1MB,Other,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Often,,,Other,"data ""science"" is bullshit","data ""science"" is bullshit",Other,Other,"data ""science"" is bullshit",Other,,1.00E+11,AMD,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Regression,,,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,,,,,,,,,,,,,,,,2 - 10 hours,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,Self-taught,60,40,0,0,0,0,,,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Other,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,Not Useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Linear Digressions Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Other,University courses,30,0,5,65,0,0,Time Series,"Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Financial,10 to 19 employees,Increased slightly,1-2 years,Some other way,Very important,Other,Basic laptop (Macbook),Other,Sometimes,10MB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Sometimes,Most of the time,,,,,Rarely,,Sometimes,,Rarely,,,,Sometimes,Often,,,,,,,Sometimes,,,,,,25,25,10,20,20,0,Enough to tune the parameters properly,"Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Most of the time,,,,,Often,,100% of projects,Approximately half internal and half external,Standalone Team,Time Series Data,Data availability and Cost,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"20,000",,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by a company that performs advanced analytics,Amazon Web services,Social Network Analysis,Python,Other,Company internal community,,,,Very useful,,,,,,,,,,,,,,,,3-5 years,Nice to have,,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,A humanities discipline,I don't write code to analyze data,"Researcher,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Support Vector Machines (SVMs),A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important +Female,United States,31,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Other,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Very useful,Very useful,,,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Data Analyst,Data Scientist,Researcher",Self-taught,80,0,10,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Government,500 to 999 employees,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","Amazon Web services,Julia,NoSQL,Perl,Python,R,SAS Base,SQL,Tableau",,Rarely,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Often,,Rarely,Often,Most of the time,Often,,,,Rarely,,Sometimes,,Most of the time,Sometimes,Sometimes,Often,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,Often,Most of the time,,,,10,10,10,50,20,0,Enough to refine and innovate on the algorithm,"Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,Rarely,Rarely,Sometimes,,,,100% of projects,More internal than external,Other,AHRQ MEPS; multiple U.S. Census Bureau datasets; Others,Working with data suppliers. Mining and other ETL easy by comparison,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,80000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,SQL,Google Search,"Blogs,College/University,Company internal community,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer",Self-taught,30,10,50,10,0,0,Survival Analysis,Decision Trees - Gradient Boosted Machines,High school,Government,"10,000 or more employees",Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100GB,"Decision Trees,Random Forests","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Simulation,Text Analytics",,,,,,Most of the time,Often,Often,,,,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Microsoft SQL Server Data Mining,Other,SQL,Government website,"Blogs,Conferences,Friends network,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Very useful,Not Useful,,,Not Useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,,University courses,35,28,17,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",High school,Government,20 to 99 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and private datacenters,Relational data,Always,100MB,"Random Forests,Regression/Logistic Regression","Orange,Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,,Rarely,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,Often,,,,Often,Most of the time,Sometimes,,Rarely,,,,Often,,,,,,,Often,,,,,,Often,,Sometimes,Sometimes,,,,20,20,5,5,50,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,Most of the time,Sometimes,,Often,,,,10-25% of projects,Approximately half internal and half external,Business Department,"We have access (and use) all state-owned data sets. This could include tax returns, state expenditures, HR records, and departmental records.","Data quality is often low. Our data is frequently fragmented and difficult to compile because it wasn't organized for analysis, but as part of a business process.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,65000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Self-employed,Other,Other,Python,Government website,"Friends network,Kaggle,Stack Overflow Q&A,Textbook,Other",,,,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Most of the time,10GB,"Regression/Logistic Regression,SVMs","NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Recommender Systems,SVMs",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,Rarely,,,,,,80,1,17,1,1,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,Often,,,,,,Most of the time,Sometimes,,,Sometimes,,,100% of projects,Entirely internal,Other,,Collecting and munging the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,Git,Don't know,0,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Philippines,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Online courses",,Very useful,,,Very useful,,,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Software Developer/Software Engineer",Work,30,40,30,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,,"Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Most of the time,,,,,,Most of the time,,Often,,,,,,,,,,Most of the time,,,,Often,,Often,,,,,,,,Most of the time,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",Often,,Often,,Often,Often,Often,,,,,,,,,Often,,,,,,,,,,,,,Often,Often,,,,20,20,20,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Scaling data science solution up to full database",Sometimes,Often,,,Often,,,,,,,,,,,,,Often,,,,,10-25% of projects,Entirely internal,IT Department,,Undocumented sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,65,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Google Search,"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Mix of fields,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,SQL",,,,,,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression",,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,70,10,0,0,20,0,Enough to run the code / standard library,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,134000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Indonesia,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,Very useful,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Scientist,DBA/Database Engineer,Engineer,Programmer",Self-taught,50,50,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,NoSQL,Python,Spark / MLlib,SQL",,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Often,Often,,Often,,Most of the time,Often,,,,,,,,Often,,Often,,Sometimes,Sometimes,,Sometimes,,,Often,,,Often,Often,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Often,,,,Often,,,,Often,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,data cleansing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,0,IDR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,France,26,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,R,Government website,"College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,,Very useful,,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,10,50,30,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Colombia,35,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Government website,I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",25,70,0,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Other,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Rarely,<1MB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R",,,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,,Sometimes,,,Most of the time,Most of the time,Often,,,,,,,,Most of the time,,Sometimes,,Rarely,Often,,Often,,,Often,,,,Often,,,,60,20,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,,,Most of the time,Sometimes,,,,Often,,,,,,,Most of the time,Sometimes,Often,,,Rarely,,51-75% of projects,Entirely internal,Other,Open data from datos.gov.co and dane.gov.co,"To joint and tidying from different sources. The lack of data, because I did have to start by creating a system for record a good part f the current data, wait a while and then start to analyze it when at least we have a year of data.","Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,8000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects",Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,,,,,,,"Emergent/Future Newsletter (Algorithmia),FastML Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,Self-taught,40,10,30,20,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",High school,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Image data,Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,Sometimes,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,Sometimes,Most of the time,,,,,,,,,,Often,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,40,10,5,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data",,Sometimes,,Sometimes,Often,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,imagenet,labelling,Key-value store (e.g. Redis/Riak),I don't typically share data,,Git,Never,"300,000",CNY,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Telecommunications,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Never,10GB,"CNNs,HMMs,SVMs","Amazon Web services,NoSQL,Python,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,Often,,,,Most of the time,,,,,,"CNNs,SVMs",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,70,10,5,5,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,Often,Most of the time,,,Often,Most of the time,Most of the time,,Often,,,,Most of the time,Sometimes,,,Most of the time,,,76-99% of projects,More internal than external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),"Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,150000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,FlowingData Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Management information systems,Less than a year,Business Analyst,University courses,0,30,0,70,0,0,Computer Vision,Decision Trees - Random Forests,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Canada,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,PhD,Yes,Some college/university study without earning a bachelor's degree,Physics,,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important +Male,India,48,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,Blogs,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",Very useful,Very useful,,,,Very useful,,,,,Very useful,Very useful,Very useful,Very useful,,Very useful,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Statistician",University courses,50,30,0,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","Angoss,IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,Sometimes,,Rarely,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,Sometimes,,Most of the time,,,Most of the time,Most of the time,,,,30,10,30,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,,Often,,,,,,,,Often,Often,Most of the time,,51-75% of projects,Entirely internal,IT Department,None,None,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,Other,Rarely,0,BIF,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Speech Recognition,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,University/Non-profit research group websites,"Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,65,35,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,,28,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Somewhat useful,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Researcher,I haven't started working yet",Self-taught,60,20,0,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Hong Kong,24,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Female,Taiwan,30,"Not employed, but looking for work",,,,,,,,Python,Link Analysis,Python,Government website,"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,,"Data Stories Podcast,FlowingData Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Yes,Master's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,60,5,30,5,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Australia,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Personal Projects,Textbook",,,,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,,,,"FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",University courses,50,0,30,10,10,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Other,"1,000 to 4,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Neural Networks,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,Most of the time,,Often,,,Most of the time,,,Often,Most of the time,Most of the time,,Often,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,51-75% of projects,Entirely internal,Central Insights Team,None,Getting access to it,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Git,Never,200000,AUD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Taiwan,40,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,KNIME (free version),Deep learning,Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Data Analyst",Self-taught,50,30,15,0,5,0,Reinforcement learning,Decision Trees - Gradient Boosted Machines,,Government,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,Random Forests,"Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R",,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,Rarely,,Sometimes,Often,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,40,30,10,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Privacy issues,Scaling data science solution up to full database",Most of the time,,,,,,,,,Sometimes,,,,,,,Often,Most of the time,,,,,26-50% of projects,More external than internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Subversion",Sometimes,70000,TWD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Chile,37,"Not employed, but looking for work",,,,,,,,,,,,"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Very useful,,< 1 year,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",University courses,0,50,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,50,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A,Other",,,Very useful,,,,,,,,,,,Somewhat useful,,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,0,10,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Other",Work,30,20,40,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Other,500 to 999 employees,Increased significantly,3-5 years,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10TB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau",,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Often,,,,Rarely,,Sometimes,,Often,,,Sometimes,,,,Often,,,,,,Sometimes,Sometimes,,,,60,10,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,320000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Australia,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Kaggle,Online courses,Textbook,YouTube Videos",,,Very useful,Very useful,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Other,University courses,0,0,0,90,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,500 to 999 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Don't know,1GB,Other,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,QlikView,R,RapidMiner (free version),SAS Enterprise Miner,TIBCO Spotfire,Other",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,,Often,,,,,,,,Sometimes,,Most of the time,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Simulation,Text Analytics",,Rarely,,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,20,20,0,60,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,100% of projects,Do not know,Other,Whatever public ontology knowledgebases I can find,Finding something which will fit into research directives,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Most of the time,27000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Colombia,23,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Friends network,Official documentation,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Github Portfolio,No,Professional degree,,Less than a year,Researcher,University courses,40,0,15,45,0,0,,"Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Taiwan,29,"Not employed, but looking for work",,,,,,,,Unix shell / awk,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,"Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,,Less than a year,"Business Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler",University courses,40,0,0,60,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,India,26,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",0 - 1 hour,PhD,No,Master's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,90,5,0,5,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,57,Retired,,,Yes,,Computer Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,Trade book",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,80,10,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,Java,Google Search,"Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,Somewhat useful,,,,"Data Stories Podcast,KDnuggets Blog,Linear Digressions Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,DataCamp,Traditional Workstation,11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,65,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,R,I don't plan on learning a new ML/DS method,R,,"Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,,Very useful,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Statistician",University courses,40,10,30,20,0,0,"Survival Analysis,Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,A master's degree,Other,"5,000 to 9,999 employees",Increased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Always,10MB,Other,"C/C++,Python,R,SAS Base,SAS JMP,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,Most of the time,,Sometimes,,Most of the time,,,,,,Rarely,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,20,20,20,10,10,10,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data,Other",Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,10-25% of projects,Entirely internal,Standalone Team,,Getting it - privacy issues,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,160000,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,Python,Deep learning,Python,,"Arxiv,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Data Scientist,Self-taught,55,0,35,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Mathematica,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,Often,,Sometimes,,Rarely,,,,,,Sometimes,Often,,,,20,35,10,25,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,Sometimes,Often,,,,,,,Often,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,Size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",,,Git,Sometimes,140000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,Amazon Web services,Rule Induction,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Other,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,85,0,10,5,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,,Most of the time,Sometimes,Often,,,,,Most of the time,,,,Sometimes,Rarely,,Rarely,,,,,Rarely,Most of the time,,,Most of the time,,,,3,90,0,5,2,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,Most of the time,Often,,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,22500,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",30,40,0,5,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Singapore,35,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","College/University,Online courses,Personal Projects",,,Very useful,,,,,,,,Very useful,Very useful,,,,,,,"Data Stories Podcast,Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,50,30,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Java,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau,TensorFlow",,Most of the time,,,,,,Most of the time,,,,,,,Rarely,,,,,,,,Often,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,,Most of the time,Most of the time,,,Sometimes,,Sometimes,Sometimes,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Sometimes,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,20,60,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,More internal than external,,Cannot comment,"Cleaning, storing, versioning and timely access","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,,,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,United States,21,"Not employed, but looking for work",,,,,,,,Java,"Ensemble Methods (e.g. boosting, bagging)",Python,"Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Company internal community,Newsletters,Non-Kaggle online communities,Online courses",Somewhat useful,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher",Self-taught,20,10,40,25,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Indonesia,22,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Online courses",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Researcher,Statistician",University courses,5,10,5,80,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important +Male,Japan,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Conferences,Kaggle",Very useful,Very useful,,,Very useful,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,30,0,60,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Sometimes,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Julia,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,,,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,Rarely,Most of the time,,Most of the time,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,,Often,Most of the time,Often,,,,,,,,Often,,,,Often,Often,Often,Often,,,Often,Sometimes,Often,Sometimes,Often,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,Sometimes,Often,Often,,,,Sometimes,,,,,,Sometimes,,,,,,,10-25% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Engineer,Other",Self-taught,30,30,15,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,Rarely,Sometimes,,,Sometimes,Most of the time,Often,Often,,,,,Often,,Often,,,,,Often,,Often,,,,,,,,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Most of the time,,Often,,,,,Most of the time,,,,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,no,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",oracle DB,Subversion,Rarely,1500000,TWD,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler",University courses,0,10,0,90,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Hadoop/Hive/Pig,R,SQL",,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests",Rarely,Rarely,,,,,Often,Most of the time,Most of the time,,,,,,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,30,40,10,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Sometimes,Rarely,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Often,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Never,100000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Australia,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Analyst,Data Scientist",University courses,85,2,1,10,2,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,CRM/Marketing,Fewer than 10 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,IBM SPSS Modeler,MATLAB/Octave,Python,R,SAS Enterprise Miner,SQL,Stan,Tableau,TensorFlow",,Often,,Rarely,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,Rarely,,,Sometimes,Often,,Rarely,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Often,Sometimes,Sometimes,Often,Most of the time,Often,Sometimes,Often,,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,Often,Sometimes,Sometimes,Most of the time,,,,70,10,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,Sometimes,Often,Often,,Often,Often,Sometimes,,,,,Sometimes,Often,Often,Often,Often,Often,Often,,51-75% of projects,Approximately half internal and half external,Business Department,ABS ,not updated frequently ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,140000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Physics,1 to 2 years,,Kaggle competitions,30,0,10,0,60,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Other,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Random Forests,RNNs","Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics,Time Series Analysis",,Sometimes,,Often,,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,Often,Often,Often,,Often,,Often,,,,Often,Often,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,110000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,43,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Stack Overflow Q&A,Trade book",Very useful,,,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,80,0,10,10,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Relational data,Sometimes,1GB,"Ensemble Methods,Neural Networks,Other","C/C++,Jupyter notebooks,Python,R,Stan,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,,,Most of the time,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,RNNs,Other",,,Sometimes,,,Often,,,Often,,,,,,,,,,,Often,Sometimes,,,,Sometimes,,,,,,Most of the time,,,15,50,5,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues",Often,,,,,,,,Sometimes,Often,Often,,,,,,Often,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,Stocks data;Industry Data,domain understanding,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Rarely,200000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Hong Kong,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"DataTau News Aggregator,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,More than 10 years,Statistician,Self-taught,80,0,20,0,0,0,"Adversarial Learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,CRM/Marketing,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Stan,Unix shell / awk",,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,Most of the time,,,,,,,,Sometimes,,Most of the time,,,,,Often,,,,Often,Sometimes,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,Often,Often,,,,,Often,Often,Often,,,Sometimes,Sometimes,,,Sometimes,,,,60,30,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Need to coordinate with IT",,,,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,,,10-25% of projects,Approximately half internal and half external,Other,,,Other,"Company Developed Platform,Email,Share Drive/SharePoint",,Other,Rarely,70000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,"Not employed, but looking for work",,,,,,,,R,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Very useful,Very useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,Operations Research Practitioner,Statistician,Other",University courses,0,20,0,80,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Deep learning,Python,Google Search,Arxiv,Very useful,,,,,,,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,40+,Github Portfolio,Yes,Master's degree,Computer Science,,"Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,,,,,,,,,,,,,, +Male,South Korea,26,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Personal Projects,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,,Very useful,"FastML Blog,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,30,20,0,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",,Academic,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Rarely,10GB,"Neural Networks,RNNs","Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,TensorFlow",,Sometimes,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,Often,,Most of the time,Often,Rarely,,,,,Rarely,Sometimes,,Sometimes,,Often,Often,Most of the time,,,,Sometimes,Often,Sometimes,,Sometimes,Sometimes,Sometimes,,,,20,50,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Privacy issues",,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,Sometimes,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Social Network Analysis,R,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,35,10,20,5,5,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow",Sometimes,,,Sometimes,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Often,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Often,Often,Often,Often,Often,Often,Often,,,Often,Often,Often,,Often,Often,,Often,Often,Often,,Most of the time,Most of the time,Often,,,Often,Often,Often,,,,15,40,10,20,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,Often,Often,Often,,,Sometimes,,,,,,,,,Often,Often,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,120000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Colombia,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Doctoral degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,25,20,10,40,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,Academic,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Python,R,Spark / MLlib,TensorFlow",,Rarely,,,Sometimes,,,,Sometimes,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,,Often,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",Sometimes,Sometimes,,Often,Often,Most of the time,Often,Most of the time,Most of the time,Sometimes,,Most of the time,,Most of the time,,Most of the time,,Often,,Sometimes,Most of the time,,Most of the time,Often,,Often,,Often,,Often,,,,70,10,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,Often,,,Often,,,,,,,Most of the time,,,,Sometimes,,,51-75% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",,130000000,COP,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle",,Very useful,,,,,Very useful,,,,,,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",3-5 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,20,20,10,10,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,40,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Programmer,Other",Self-taught,75,15,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,People 's Republic of China,22,Employed full-time,,,No,Yes,Other,,Employed by government,Mathematica,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Very useful,,,,,< 1 year,Necessary,Nice to have,,,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Computer Vision,"Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,10,5,20,60,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Other",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Sometimes,Sometimes,,,,Most of the time,,Often,,Sometimes,,,,,Often,,Most of the time,,,,,,,Often,,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,,,,Often,,,,Often,,,,,Often,Sometimes,,,,10-25% of projects,More external than internal,Standalone Team,General time series data sets,High frequency data (sampled at large rate),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Never,103000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Argentina,44,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Engineer",Work,5,5,50,40,0,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,United States,48,Employed full-time,,,Yes,,Data Scientist,Fine,,TensorFlow,Other,Python,University/Non-profit research group websites,"Conferences,Personal Projects",,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Software Developer/Software Engineer,Other",Work,20,0,70,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Pharmaceutical,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Other",Sometimes,1MB,"Random Forests,Regression/Logistic Regression,SVMs,Other","Jupyter notebooks,MATLAB/Octave,Python,SAS Base,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,Rarely,,,,Sometimes,,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Rarely,Sometimes,,Most of the time,Most of the time,Sometimes,,Often,,,,Rarely,,Often,,,,Sometimes,Often,,Rarely,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,,40,20,0,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,,,,Sometimes,,,,,Sometimes,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",15,30,15,40,0,0,,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Tableau,Deep learning,R,Google Search,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Scientist",University courses,10,15,75,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,20 to 99 employees,Stayed the same,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,Rarely,,,Rarely,,,Most of the time,Rarely,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,Rarely,,,Often,Most of the time,Sometimes,Often,,,Sometimes,,Sometimes,,Often,,,Sometimes,,Sometimes,Sometimes,Often,,,,,,Sometimes,Often,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Often,,Often,Sometimes,,,,,,,,,,,,,Often,Often,Sometimes,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,78000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed part-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Jupyter notebooks,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,3 to 5 years,"Data Analyst,Researcher,Other",University courses,40,20,10,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Academic,"5,000 to 9,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,,"CNNs,Decision Trees,Random Forests,SVMs","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,Sometimes,,,,Often,,,Sometimes,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,,Rarely,,Most of the time,Most of the time,Often,,,,,,Often,,Most of the time,,Often,,,Sometimes,,Often,,,,,,,,,,,10,30,5,15,30,10,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Sometimes,40000,USD,Other,5,,,,,,,,,,,,,,,,,, +Male,Russia,44,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Unix shell / awk,Random Forests,Python,University/Non-profit research group websites,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Somewhat useful,,,,,"Data Machina Newsletter,DataTau News Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Telecommunications,20 to 99 employees,Decreased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Ensemble Methods,Logistic Regression,Random Forests,Segmentation",,,,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,50,0,0,30,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Privacy issues",Often,,,,,,,,,,,,,,,,Sometimes,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,700000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Other",Self-taught,70,0,0,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Other,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,,,40,5,15,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Other",Often,Most of the time,,,,Sometimes,,,Often,,,,,,,Sometimes,,,,,,Most of the time,76-99% of projects,More internal than external,Business Department,Claims data; Macroeconomic data; NPI registry,The data is incredibly dirty and often times intentionally so because we are receiving data dumps from opposition in legal cases.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"71,500",USD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Official documentation,Online courses,Textbook",,,,,,,Very useful,,,Very useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100GB,"Bayesian Techniques,Decision Trees,Random Forests","Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,Most of the time,Most of the time,,,,,,Often,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Sometimes,,,,,,Often,,,,"Bayesian Techniques,Naive Bayes,Text Analytics",,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,20,20,40,15,5,NA,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,None,More internal than external,IT Department,maxmind ip,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",Sometimes,180000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,Less than a year,Programmer,Self-taught,30,40,0,0,30,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,DataRobot,Genetic & Evolutionary Algorithms,C/C++/C#,I collect my own data (e.g. web-scraping),"College/University,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,,,,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer,Statistician",University courses,35,35,0,20,0,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important +Male,Hong Kong,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,,,"Coursera,DataCamp,edX,Udacity,Other",Other,40+,Other,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,United States,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Partially Derivative Podcast",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Psychology,1 to 2 years,"Researcher,Other",Self-taught,30,50,0,15,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Company internal community,Friends network,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",15,40,20,20,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,Often,,,,Often,,,,,,Often,Often,,,,10,30,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,Often,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,,76-99% of projects,More external than internal,Other,No,"The data are from different sources. Normally, we need to normalize the data into a general mapping in order to run the predictive model.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,SFTP,Bitbucket,,90000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,18,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,"O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,40,20,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks","Amazon Web services,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,Often,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,Logistic Regression,Markov Logic Networks,Neural Networks",Sometimes,,Often,,,,,Often,,,,,,,,Most of the time,Often,,,Most of the time,,,,,,,,,,,,,,20,10,30,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",Often,Often,,,Sometimes,,,Often,Sometimes,,,,,,,,,,,,Most of the time,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,,,,7,,,,,,,,,,,,,,,,,, +Female,India,46,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Time Series Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping)","Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Other",Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Other",University courses,30,20,0,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,29,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Scala,University/Non-profit research group websites,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Other,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,29,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog",1-2 years,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,20,0,10,20,0,"Natural Language Processing,Time Series","Bayesian Techniques,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,Not Useful,Not Useful,Somewhat useful,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Hidden Markov Models HMMs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,SQL,,Python,I collect my own data (e.g. web-scraping),"College/University,Online courses,YouTube Videos",,,Very useful,,,,,,,,Very useful,,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,,Self-taught,20,40,10,30,0,0,,,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,70,Retired,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,YouTube Videos",,,,,Very useful,,Very useful,,,,,,,,,,,Very useful,Siraj Raval YouTube Channel,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,Tableau,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Online courses,Podcasts,YouTube Videos",Very useful,,Very useful,,,,,,,,Very useful,,Somewhat useful,,,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,15,30,15,40,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Other,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,20,50,0,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Always,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,KNIME (free version),NoSQL,R,Spark / MLlib,SQL,TensorFlow",,,,,Most of the time,,,Sometimes,Sometimes,,,,,,Most of the time,,,,Sometimes,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,Rarely,,,,,,"Association Rules,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs",,Rarely,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,Rarely,,,Most of the time,Most of the time,Most of the time,,,,,Rarely,,,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,,,,Rarely,,,,,Rarely,,,Often,,,Most of the time,Most of the time,,10-25% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,115000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Germany,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","College/University,Company internal community,Conferences,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other,Other",,,Not Useful,Not Useful,Not Useful,Somewhat useful,,,,Very useful,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Researcher",Work,80,0,10,10,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A professional degree,Academic,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,10TB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Enterprise Miner,SQL,Other,Other,Other",,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,Rarely,Rarely,Rarely,"Association Rules,Bayesian Techniques,Logistic Regression,Neural Networks,Other,Other,Other",,Often,Often,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,Often,Often,Sometimes,60,15,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",Most of the time,,,,Sometimes,,,,Rarely,Most of the time,Often,Rarely,Rarely,,,,Rarely,Sometimes,Sometimes,,,Most of the time,Less than 10% of projects,Approximately half internal and half external,Other,Nielsen Kilts Database from Chicago Booth Household Panel Data and Retail Scanner Datasets,"No machine learning expertise on team, support from advisor, funding for data subscription renewal",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"I don't typically share data,Other",Data sharing is limited by Nielsen contract,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,25000,EUR,Other,2,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Rule Induction,Python,Google Search,"Conferences,Kaggle,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +A different identity,Other,80,Retired,,,Yes,,Other,Poorly,Self-employed,Other,Other,Other,Other,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,Other,Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),No education,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Java,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",20,10,20,10,40,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Insurance,Fewer than 10 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Often,Rarely,Sometimes,,,Most of the time,,Sometimes,,Rarely,,,,,,,Sometimes,,,Sometimes,,,Rarely,Sometimes,,,,35,35,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,Sometimes,,,,,Often,,,Often,,,,,,Often,,,100% of projects,Approximately half internal and half external,Central Insights Team,"Finantial, demographic",Create an impact,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,Other,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,40,20,20,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Time Series Analysis,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,,,,,,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Time Series,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Julia,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Kaggle,Official documentation,Podcasts,Textbook",,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Very useful,,,Very useful,,Very useful,,,,The Data Skeptic Podcast,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,A social science,,Programmer,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Very Important +Male,Other,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,25,"Not employed, but looking for work",,,,,,,,C/C++,Text Mining,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,,Somewhat useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Data Scientist,DBA/Database Engineer,Engineer,Other",Kaggle competitions,35,0,25,15,25,0,"Adversarial Learning,Speech Recognition,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Australia,56,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Social Network Analysis,Julia,"Government website,University/Non-profit research group websites","Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,,,,,,Somewhat useful,,Very useful,Very useful,,,,,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Physics,More than 10 years,"Programmer,Software Developer/Software Engineer",University courses,50,25,0,25,0,0,Time Series,"Bayesian Techniques,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,Anomaly Detection,R,,"College/University,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,35,0,10,55,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Other,,1GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,Often,,,Sometimes,Most of the time,,,,,,Rarely,Sometimes,,Rarely,,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,15,35,0,25,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,NHANES,Inconsistent naming conventions/timestamps/etc.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,30,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,Often,,,Sometimes,,Sometimes,,Most of the time,,Often,,,Often,,Most of the time,,,,,Most of the time,Often,Often,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,,Often,,,Often,Often,,51-75% of projects,More external than internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,210000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Podcasts,Textbook",,,,,,,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,"Data Stories Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,20,10,0,0,0,,Decision Trees - Random Forests,A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests","Amazon Web services,Python,QlikView,R,Spark / MLlib,SQL",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,Often,,,,,,,,Most of the time,Often,,,,,,,,,,"Cross-Validation,Decision Trees",,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,90,3,3,3,1,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,Most of the time,,,,,,,,,Often,Often,,,,100% of projects,Entirely internal,Standalone Team,First Data Bank; Websites,Cleaning poorly maintained data; lack of system understanding leading to down time. ,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Git,Other",Sometimes,107000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Blogs,College/University,Friends network,Official documentation,Online courses,Personal Projects,Tutoring/mentoring",Very useful,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,,"FastML Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Master's degree,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Natural Language Processing,Neural Networks - RNNs,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,New Zealand,44,Employed full-time,,,Yes,,Researcher,,,,Deep learning,Python,"GitHub,Google Search","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher",Work,40,5,40,10,5,0,Computer Vision,"Logistic Regression,Support Vector Machines (SVMs)",,Academic,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Image data,,,,"C/C++,Java,MATLAB/Octave,Python,R",,,,Rarely,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,Often,,,,,Most of the time,,,,,,,,20,10,10,30,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Most of the time,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,,,,,Company Developed Platform,,"Bitbucket,Git",,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,TensorFlow,Deep learning,Python,,"Arxiv,Conferences,Textbook",Somewhat useful,,,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",University courses,40,0,30,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Military/Security,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,1GB,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Google Cloud Compute,Hadoop/Hive/Pig,Java,Python,SQL,TensorFlow",,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,,,,"CNNs,Evolutionary Approaches,Logistic Regression,Neural Networks",,,,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,30,20,5,10,35,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Never,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Mexico,24,"Independent contractor, freelancer, or self-employed",,,No,Yes,Computer Scientist,Fine,Self-employed,IBM Watson / Waton Analytics,Proprietary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,Somewhat useful,,,Not Useful,,,Not Useful,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Professional degree,,Less than a year,"Computer Scientist,Engineer,Programmer",University courses,50,20,0,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important +Female,Australia,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Other",Self-taught,30,20,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"5,000 to 9,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,RNNs","Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Other,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Often,,,Sometimes,Sometimes,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,RNNs",Rarely,,,Often,,Sometimes,Sometimes,Often,Sometimes,,,,,,,Sometimes,,Rarely,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,,60,15,10,0,10,5,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Often,Sometimes,,,Often,,,,,,,,,Sometimes,Often,,,,Less than 10% of projects,More internal than external,Other,,Cleaning text,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,350000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,18,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Online courses,Textbook",,,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,,,"Data Elixir Newsletter,DataTau News Aggregator",< 1 year,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,University courses,20,40,0,40,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important +Male,United States,49,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Other,Text Mining,R,Other,"Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,,,,,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A health science,More than 10 years,Researcher,Self-taught,60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Never,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Microsoft Excel Data Mining,Orange,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,Rarely,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,Rarely,,,,"Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,,Sometimes,Sometimes,,,,,,,,Often,,,,,Sometimes,,Sometimes,,,Sometimes,,Rarely,,Sometimes,,,,60,25,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",Often,Often,,,,,,Often,Sometimes,,,,,,Often,,,,,,,,26-50% of projects,More internal than external,Other,,"I work largely with Electronic Medical Records data. Most hospitals devote a lot of resources to capturing data, but very little resource in extracting that data for analytics and machine learning. Often IT has to make changes to extraction methods just so I can access the data that I need.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,125000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Software Developer/Software Engineer,Other",University courses,62,8,18,12,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,10GB,"Neural Networks,RNNs","Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Neural Networks,RNNs",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Sometimes,,,,,,Often,Often,,,,,,,,Often,Sometimes,,100% of projects,Entirely internal,,MIMIC-III,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,35,Employed full-time,,,No,Yes,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by college or university",Orange,,R,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Researcher,University courses,20,20,20,40,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Australia,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Tableau,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Online courses",,,Somewhat useful,Very useful,,,Somewhat useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,A bachelor's degree,Internet-based,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,100MB,Bayesian Techniques,"Amazon Web services,NoSQL,Python,Tableau,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,"A/B Testing,Data Visualization,Naive Bayes,Recommender Systems,Text Analytics",Most of the time,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,,Most of the time,,,,,50,10,0,40,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,I don't typically share data",,Bitbucket,Sometimes,116000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Julia,Deep learning,Python,Google Search,"College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,"Data Stories Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,50,0,0,50,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,,,,, +Female,United States,43,Employed part-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,IBM Watson / Waton Analytics,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Official documentation,Podcasts",,,Very useful,,,,,,,Very useful,,,Somewhat useful,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,"DBA/Database Engineer,Programmer",University courses,20,0,0,70,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased significantly,Don't know,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,,,,"Java,Python",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,SVMs",,,Most of the time,,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,50,30,5,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Often,,,,,Less than 10% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,50,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Personal Projects,Podcasts",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Not important +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Tableau,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,Necessary,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),I did not complete any formal education past high school,,,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important +Female,Indonesia,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,Tableau,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,1 to 2 years,Data Analyst,University courses,30,5,60,5,0,0,,,A master's degree,Academic,20 to 99 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,25,40,0,15,20,0,Enough to run the code / standard library,"Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,100% of projects,More internal than external,Other,College scorecard; census ,Data definitions ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,24,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,A social science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,C/C++,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,,,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,,Nice to have,,,,,,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,30,10,5,40,15,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,Very Important,Very Important,Very Important,Somewhat important,,,,, +Male,United States,24,Employed full-time,,,Yes,,Business Analyst,,,Google Cloud Compute,Deep learning,SQL,I collect my own data (e.g. web-scraping),"Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,Less than a year,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,50,30,10,0,0,Time Series,Neural Networks - CNNs,A master's degree,Other,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,1GB,Neural Networks,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,72000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Australia,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Neural Nets,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"DataTau News Aggregator,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,6 to 10 years,"Business Analyst,Other",Work,60,20,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Government,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,C/C++,Cloudera,Google Cloud Compute,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,NoSQL,Python,QlikView,R,SAS Enterprise Miner,Tableau",Sometimes,,,Often,Sometimes,,,Sometimes,,,Often,Often,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,Sometimes,,,,,,Often,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Often,,,,,Often,,,,,,,,Most of the time,,Sometimes,,,,Most of the time,,,,Most of the time,Often,,,Most of the time,,,,30,25,10,5,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,Most of the time,,Most of the time,,,Most of the time,,Often,Sometimes,,,,Often,Most of the time,,26-50% of projects,More internal than external,Standalone Team,ABS; AHW; DHS; Annual Report;,cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,250000,AUD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,No,Yes,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,Necessary,,Necessary,,,Necessary,,,,,,Coursera,Traditional Workstation,40+,Online Courses and Certifications,,Bachelor's degree,Electrical Engineering,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition","Bayesian Techniques,Logistic Regression",I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,,,Very Important,Not important,,Very Important,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Support Vector Machines (SVM),Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Miner,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,Tableau",,Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Association Rules,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes",,Sometimes,,,,,,Often,,,,,,Sometimes,,Sometimes,,Often,,,,,,,,,,,,,,,,50,0,0,40,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,76-99% of projects,Entirely internal,Business Department,,Linking multiple flavour of data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Other,Never,1800000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed part-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Social Network Analysis,Java,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,A humanities discipline,I don't write code to analyze data,I haven't started working yet,University courses,5,0,0,90,0,5,Unsupervised Learning,Decision Trees - Random Forests,High school,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,31,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Neural Nets,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,,,Not Useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,"Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Hospitality/Entertainment/Sports,500 to 999 employees,Stayed the same,3-5 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",,10GB,Other,"Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,Most of the time,,Rarely,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,Often,,,,70,2,3,20,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools",Often,,,,Most of the time,,,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,67000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Data Machina Newsletter,FlowingData Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,,Nice to have,,Necessary,Necessary,Necessary,Necessary,,,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,50,0,0,50,0,0,Time Series,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Australia,33,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Cluster Analysis,R,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Other",Self-taught,70,30,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A professional degree,Other,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Never,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Sometimes,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Natural Language Processing,Random Forests",,,,,,Sometimes,Most of the time,Often,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,0,25,0,25,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,,,Often,Most of the time,,,,,,,,,,Often,Often,,,76-99% of projects,More external than internal,Business Department,"Government Health Data",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,150000,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,R,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,Somewhat useful,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,1 to 2 years,Data Analyst,University courses,10,35,25,25,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Government,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM Cognos,Microsoft R Server (Formerly Revolution Analytics),R,RapidMiner (free version),SQL",,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,Rarely,,,,,,,Rarely,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Often,,,,Sometimes,Most of the time,Often,,,,Sometimes,,Sometimes,,Often,,,Sometimes,,Sometimes,Rarely,Often,,,Often,Rarely,,Often,Sometimes,,,,20,25,0,25,30,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,Sometimes,Often,,,Often,Often,,,,,,Sometimes,,,Sometimes,Sometimes,,Sometimes,,100% of projects,More internal than external,Central Insights Team,Ontario open data; enivronics demographic data; statscan data; Toronto police data,Messy and incomplete,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Other,Rarely,"70,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,Not Useful,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Internet-based,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,,,,,Most of the time,,,,Often,,,,,,,Sometimes,,,,Often,,,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,Often,,,,,,,Most of the time,,,,Sometimes,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Sometimes,220000,USD,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Hong Kong,17,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Tableau,Text Mining,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,Somewhat useful,,,,Somewhat useful,,Very useful,,Very useful,,,Somewhat useful,,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,NA,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters",,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,"Data Stories Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",1-2 years,,,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,Yes,Master's degree,Other,1 to 2 years,Other,University courses,40,30,0,30,0,0,"Adversarial Learning,Speech Recognition,Survival Analysis","Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,,Somewhat important,Very Important,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,"Jack's Import AI Newsletter,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Other,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Machine Translation,Reinforcement learning","Markov Logic Networks,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,Switzerland,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Python,Deep learning,R,Other,"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Psychology,3 to 5 years,Other,Self-taught,50,15,0,25,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,,,,,Not very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"CNNs,Logistic Regression,Random Forests,SVMs",,,,Often,,,,,,,,,,,,Rarely,,,,,,,Often,,,,,Rarely,,,,,,35,14,1,25,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Sometimes,,,,,,Often,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,85000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,48,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,GitHub,"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,Adversarial Learning,Other (please specify; separate by semi-colon),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Kaggle,Online courses,Personal Projects",Somewhat useful,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,"Software Developer/Software Engineer,Other",Self-taught,15,35,15,35,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,Other",,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,Rarely,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Often,,,,,,,Often,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Sometimes,,,,,Often,Most of the time,Often,Often,,,,,,,Often,,,,Often,Often,,Often,Sometimes,,,,,,,,,,65,10,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,Most of the time,100% of projects,More external than internal,Standalone Team,NCUA call report data,Proprietary data very messy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,114000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Colombia,31,Retired,,,Yes,,Scientist/Researcher,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Employed by government",Jupyter notebooks,Genetic & Evolutionary Algorithms,Matlab,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,,Very useful,,Very useful,,Somewhat useful,,,,,Very useful,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,10,10,40,0,0,"Computer Vision,Speech Recognition,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,15,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Not Useful,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler",Self-taught,50,30,0,10,10,0,"Computer Vision,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Brazil,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Uplift Modeling,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Kaggle,Personal Projects,Podcasts,Textbook",Somewhat useful,Very useful,,Somewhat useful,,,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,"KDnuggets Blog,Linear Digressions Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,30,40,0,10,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Other,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Most of the time,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Rarely,,,,,,Sometimes,Often,,,,,,Often,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,Often,,,,,Sometimes,,Often,,,,"Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,,Most of the time,Often,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,,,Often,Often,Sometimes,,,,,,Most of the time,,,,50,10,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,51-75% of projects,Entirely external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,70,Retired,,,Yes,,Operations Research Practitioner,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,C/C++/C#,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,100,0,0,0,0,0,,"Gradient Boosting,Logistic Regression,Neural Networks - RNNs",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,24,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Natural Language Processing,,I prefer not to answer,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important +Male,India,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Official documentation,Online courses,Personal Projects,Tutoring/mentoring",,,,,,Somewhat useful,,,,Very useful,Very useful,Very useful,,,,,Somewhat useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,100MB,"CNNs,Gradient Boosted Machines,Regression/Logistic Regression","Java,Microsoft Excel Data Mining,Python,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Gradient Boosted Machines,Logistic Regression,Neural Networks",,,,Often,,,,,,,,Often,,,,Sometimes,,,,Often,,,,,,,,,,,,,,30,20,20,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",Often,,,,,,,Sometimes,Often,,,,,Sometimes,,,Often,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,"Data.gov.in; Image data",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Bitbucket,Git",Never,"1,500,000",INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Bayesian Methods,R,I collect my own data (e.g. web-scraping),Blogs,,Very useful,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,6 to 10 years,Business Analyst,Work,40,10,50,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",,Technology,"1,000 to 4,999 employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,,,,Often,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,,,,,Most of the time,,,,,,,Most of the time,,,Most of the time,,,Often,Most of the time,,,,20,30,10,10,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,,,,Most of the time,,,Sometimes,Sometimes,Most of the time,,,,,,,,,Most of the time,,,100% of projects,More internal than external,Central Insights Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Rarely,3000000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,United States,58,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,,10MB,CNNs,"Amazon Web services,Flume,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk,Other",,Rarely,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks",,,,Most of the time,,Sometimes,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,30,60,0,0,10,0,Enough to run the code / standard library,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,,Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,Yes,DBA/Database Engineer,,Employed by company that makes advanced analytic software,Jupyter notebooks,Regression,Python,Google Search,"Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX",Workstation + Cloud service,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Physics,6 to 10 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,Supervised Machine Learning (Tabular Data),,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Engineer,Researcher",Self-taught,50,30,10,10,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Modeler,NoSQL,Python,SQL",Rarely,Often,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Naive Bayes,Natural Language Processing,RNNs,SVMs",Sometimes,,Often,Often,,,,Often,Often,,,,,,,,,Often,Often,,,,,,Often,,,Often,,,,,,10,40,30,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Often,,,,,,Often,,,,,Most of the time,,76-99% of projects,More internal than external,Business Department,UCI,data not complete,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,180000,CNY,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Vietnam,22,Employed part-time,,,No,Yes,Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Friends network,Textbook",,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,"Emergent/Future Newsletter (Algorithmia),FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Computer Science,1 to 2 years,Researcher,University courses,40,0,10,40,0,10,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,6,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Text Mining,C/C++/C#,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Other",,Very useful,,,,,,,,,,Very useful,Not Useful,Not Useful,Not Useful,,,,Other (Separate different answers with semicolon),5-10 years,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Other,Sort of (Explain more),Doctoral degree,Fine arts or performing arts,I don't write code to analyze data,Other,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important +Male,India,23,"Not employed, but looking for work",,,,,,,,Amazon Web services,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Nice to have,,,,edX,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,"Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Colombia,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Anomaly Detection,Python,University/Non-profit research group websites,Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,Less than a year,"Operations Research Practitioner,Software Developer/Software Engineer",University courses,20,35,0,45,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Ensemble Methods",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,United States,64,"Independent contractor, freelancer, or self-employed",,,No,Yes,Researcher,Perfectly,"Employed by professional services/consulting firm,Self-employed",R,Deep learning,R,GitHub,"Newsletters,Online courses,Other",,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Researcher,Other",Work,0,0,75,25,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important +Male,India,34,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Amazon Machine Learning,Social Network Analysis,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Computer Scientist,Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,Python,R",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,20,70,0,0,10,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Limitations of tools",,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,100000,INR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,No,Master's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,90,0,10,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Self-employed,,,,,Arxiv,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Scientist,Engineer",Kaggle competitions,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Insurance,,,,,,,,,,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Column-oriented relational (e.g. KDB/MariaDB),,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,50,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Time Series Analysis,R,University/Non-profit research group websites,"Company internal community,Kaggle,Online courses,Textbook",,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Very useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Programmer",Self-taught,75,25,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",,,Other,TIBCO Spotfire,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,0,5,0,55,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Privacy issues",,,,,,,Most of the time,,,,,,,,,,Often,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Sometimes,45000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,30,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Excel Data Mining,Python,SAS Base,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,,,,,Rarely,,,,Most of the time,,,Sometimes,,,Often,,,,"Data Visualization,Logistic Regression,Text Analytics",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,30,10,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,Often,,,,,,,,Often,,Often,,,,Often,Sometimes,,51-75% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,95000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Hong Kong,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,15,0,5,50,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs",High school,Internet-based,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Never,,,"Amazon Web services,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,90,0,0,5,5,0,,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Rarely,,,Often,,,,,,,,,,,Sometimes,,,Less than 10% of projects,Entirely internal,Other,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,,Never,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Web services,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,,University courses,20,0,20,60,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,,Laptop or Workstation and private datacenters,Text data,Sometimes,1TB,"Bayesian Techniques,Ensemble Methods,Markov Logic Networks,Regression/Logistic Regression","Julia,Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,Often,,,,Most of the time,,Sometimes,Rarely,,,,,,Often,Often,,,Sometimes,Often,,,,Sometimes,,,,,,,,,30,30,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Most of the time,,,,7,,,,,,,,,,,,,,,,,, +Female,Philippines,19,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,DataRobot,Social Network Analysis,R,Google Search,"College/University,Personal Projects",,,Very useful,,,,,,,,,Somewhat useful,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,20,15,10,43,12,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,I don't know,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,100MB,Regression/Logistic Regression,"IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,R,SAS Enterprise Miner,SQL",,,,,,,,,,Sometimes,Often,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Time Series Analysis",,Often,,,,Often,Most of the time,Often,,,,,,,,Most of the time,,Often,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,15,20,25,25,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues",Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,,,26-50% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,,"11,000",PHP,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Australia,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Google Search,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Other,University courses,18,20,30,30,2,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,6-10 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Most of the time,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,,,10,10,50,30,0,0,"Enough to code it again from scratch, albeit it may run slowly",Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Rarely,,,,,76-99% of projects,More internal than external,IT Department,UCI,Finding interesting data sets,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +A different identity,United States,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,R,Deep learning,SQL,,"College/University,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Business Analyst,University courses,25,0,25,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Technology,"10,000 or more employees",,Don't know,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Always,1GB,"Regression/Logistic Regression,Other","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,Sometimes,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing",,,,,,Sometimes,Often,,,,,,,Most of the time,,Sometimes,,,Most of the time,,,,,,,,,,,,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,Dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Subversion,Rarely,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Management information systems,More than 10 years,Researcher,University courses,60,30,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Academic,500 to 999 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,Statistica (Quest/Dell-formerly Statsoft),Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,Often,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Text Analytics",,Sometimes,,,,Often,Most of the time,Often,,,,,,,,Often,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,10,10,20,50,10,0,Enough to tune the parameters properly,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Most of the time,,,100% of projects,Approximately half internal and half external,IT Department,NCBI SRA; TCGA; ENCODE,slow speed of file downloading,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,27000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,60,40,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Technology,100 to 499 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,100GB,Decision Trees,"Jupyter notebooks,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,70,5,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,Sometimes,,,,,,,,,Often,,,51-75% of projects,Entirely internal,IT Department,Iris;,Dirty data;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,420000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,University/Non-profit research group websites,"Blogs,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,0,20,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,Bayesian Techniques,"Python,SAS Base,SAS JMP",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,20,30,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,Often,,,,,Sometimes,,,,,,Often,,,,Often,,Often,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,130000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,21,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Conferences,Personal Projects",Very useful,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,KDnuggets Blog,1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Indonesia,23,"Not employed, but looking for work",,,,,,,,Oracle Data Mining/ Oracle R Enterprise,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Friends network,Kaggle,Online courses,Personal Projects,Tutoring/mentoring",,,Somewhat useful,,,Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,30,35,0,35,0,0,Adversarial Learning,"Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Indonesia,48,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,Cloudera,Deep learning,Python,I collect my own data (e.g. web-scraping),"Company internal community,Friends network,Newsletters,Personal Projects,Textbook",,,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Scientist,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,25,5,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,High school,Technology,20 to 99 employees,Stayed the same,6-10 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100TB,GANs,"Cloudera,Hadoop/Hive/Pig,Impala,Microsoft Excel Data Mining,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,Often,,,,Often,,,,,Often,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,"Simulation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,30,20,30,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Scaling data science solution up to full database",,,,,Often,Often,,,,,,,,,,,,Most of the time,,,,,10-25% of projects,More internal than external,IT Department,None,Cloudera System and Virtualization of Datawarehouse,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Most of the time,"250,000,000",IDR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Company internal community,Conferences,Personal Projects,Textbook,Tutoring/mentoring",,,Very useful,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,20,0,20,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A professional degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,Most of the time,,,,,Sometimes,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,Often,,Often,Often,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Simulation,Time Series Analysis,Other",Most of the time,Sometimes,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,Most of the time,,Most of the time,Often,Most of the time,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,Often,Often,,,20,30,30,5,15,0,Enough to refine and innovate on the algorithm,"I prefer not to say,Lack of significant domain expert input",,,,,,,Sometimes,,,,Often,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,305000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,IBM Watson / Waton Analytics,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer",Self-taught,80,20,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Markov Logic Networks",High school,Mix of fields,,,,,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Always,<1MB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib",,,,,,,,,Often,,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,"Association Rules,Bayesian Techniques,Naive Bayes,Natural Language Processing,Neural Networks",,Sometimes,Often,,,,,,,,,,,,,,,Often,Often,Sometimes,,,,,,,,,,,,,,25,40,20,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Often,,,,,,,Often,,,,,,,10-25% of projects,More internal than external,Standalone Team,,control the universe,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Most of the time,250000,GTQ,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,,,< 1 year,Nice to have,,Nice to have,,Nice to have,,,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Other,Self-taught,50,10,10,0,0,30,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Other,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,"Neural Networks,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,Often,,,,,Often,Sometimes,,,,,40,20,10,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues",Most of the time,,,,Often,,,,Most of the time,Sometimes,,,,,,,Rarely,,,,,,Less than 10% of projects,More internal than external,Other,Mesmite Database,Data is fully scatter and data will not have perfect ratio - for example in 5000 records there will be 8 to 10 True positive record will be there rest are all FPs. So it becoming night mare to build the Training to model to predict TPs. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,470000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Other","Blogs,Newsletters,Personal Projects,Podcasts,Textbook,Other,Other",,Somewhat useful,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,,"Data Elixir Newsletter,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Other",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics",Often,,Sometimes,,,Often,Most of the time,Often,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,Often,Sometimes,,,,,Often,,,,,30,10,10,35,15,0,Enough to explain the algorithm to someone non-technical,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,Cleanliness of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,70000,CAD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,,Very useful,,,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,10,10,30,40,5,5,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10MB,SVMs,"MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",,,,,,,,Often,,,,,,,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,Sometimes,,Most of the time,Most of the time,,,,,10,10,60,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,Often,,Often,,,Often,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Sometimes,72000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Iran,28,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"FastML Blog,KDnuggets Blog,Linear Digressions Podcast",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher",University courses,30,10,20,30,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important +Male,India,35,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Trade book,YouTube Videos",,,,,,,,,,,,,,,,Very useful,,Very useful,"Data Elixir Newsletter,KDnuggets Blog",1-2 years,Nice to have,,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,I never declared a major,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,30,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Not Useful,,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Other,Yes,Master's degree,A social science,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Other",University courses,30,10,10,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,34,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,1-2 years,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Other,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Biology,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important +Male,Other,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Miner,Data Scientist,Predictive Modeler",University courses,15,0,20,60,5,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Other,Other",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Often,Sometimes,,,Sometimes,Often,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,,,Often,Most of the time,Most of the time,Most of the time,Sometimes,,,,,Often,Most of the time,Most of the time,,Sometimes,Sometimes,Sometimes,Often,Often,Often,Often,,Most of the time,,Sometimes,Sometimes,Often,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,Often,,Most of the time,Often,Often,,Often,Most of the time,,,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,"Geographical, Public dataset, social media scraping, open government data",verify its integrity,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,3000000,THB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Stan,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Personal Projects,Textbook",Not Useful,,,,,,,,,,,Very useful,,,Very useful,,,,,1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,A health science,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Survival Analysis,Unsupervised Learning",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important +Male,Other,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,40,0,40,0,0,Natural Language Processing,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Female,Pakistan,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Researcher,Other",Self-taught,30,30,40,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Retail,10 to 19 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,Other,"Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Recommender Systems",,Sometimes,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,50,20,10,10,10,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of significant domain expert input",,,,,Often,Most of the time,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,It's not enough.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,Bitbucket,Rarely,,PKR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,19,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Decision Trees,C/C++/C#,"Google Search,I collect my own data (e.g. web-scraping)",Textbook,,,,,,,,,,,,,,,Very useful,,,,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Self-taught,100,0,0,0,0,0,Natural Language Processing,,A professional degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important +"Non-binary, genderqueer, or gender non-conforming",Philippines,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,Very useful,Very useful,Very useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,Researcher,Statistician",Self-taught,60,25,5,5,5,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100GB,"GANs,Neural Networks,RNNs,SVMs","Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Often,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,RNNs,Segmentation,SVMs,Text Analytics",,,,Most of the time,,Most of the time,Often,,,,Often,,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,30,55,0,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,Often,,,,Most of the time,,,Most of the time,Most of the time,Sometimes,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,"Kaggle Datasets, UCI machine learning datasets, imagenet",data cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"150,000.00",PHP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Mathematica,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Textbook",,Very useful,Very useful,,,,Very useful,,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,,,,"O'Reilly Data Newsletter,Talking Machines Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,40,0,0,40,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Not important,Somewhat important,Very Important +Male,Other,23,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series",Other (please specify; separate by semi-colon),No education,Internet-based,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Relational data",Sometimes,1GB,Neural Networks,"NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,,,"A/B Testing,Collaborative Filtering,Logistic Regression,Recommender Systems,Time Series Analysis",Sometimes,,,,Often,,,,,,,,,,,Often,,,,,,,,Often,,,,,,Often,,,,40,35,10,5,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,42000,NPR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,Python,I don't plan on learning a new ML/DS method,Python,Other,"Conferences,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,,Very useful,,,Very useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,Self-taught,0,20,80,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Other,Never,1GB,Other,"Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,Sometimes,Sometimes,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,Sometimes,Sometimes,,Most of the time,,,,"A/B Testing,Other",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,60,0,0,40,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data,Other",Often,,,,Often,Often,,,Often,Often,,,Often,Often,,,Often,Often,Often,,Often,Most of the time,100% of projects,More external than internal,Other,,"getting it, storing it","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,1000000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Computer Scientist,Data Miner,Data Scientist,Engineer,Machine Learning Engineer",Self-taught,60,20,5,2,13,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Survival Analysis","Bayesian Techniques,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Web services,Neural Nets,Python,GitHub,"Arxiv,Stack Overflow Q&A",Very useful,,,,,,,,,,,,,Very useful,,,,,"Data Machina Newsletter,No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Neural Networks - CNNs,Neural Networks - GANs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Malaysia,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Link Analysis,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,20,0,30,0,0,Natural Language Processing,Logistic Regression,High school,Technology,10 to 19 employees,Decreased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,Other,Text data,Never,1GB,Regression/Logistic Regression,"Java,Jupyter notebooks,Mathematica,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Often,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics",,,,,,,Often,,,,,,,,,Sometimes,,,Often,,,,,,,,,,Often,,,,,20,30,5,5,40,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,10-25% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Colombia,0,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),,Github Portfolio,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",I prefer not to answer,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important +Male,United States,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Julia,Deep learning,Python,Google Search,"College/University,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Researcher",Self-taught,60,0,0,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Often,,,,,Often,,Often,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,50,20,5,10,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Sometimes,,Most of the time,,,Often,Most of the time,Sometimes,Often,,,Often,Most of the time,,,Most of the time,Most of the time,Often,Most of the time,,76-99% of projects,More external than internal,Other,U.S. Census,"Small dataset, general data","Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"95,000",USD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,50,0,10,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Relational data,Other",Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Stan,Tableau,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,Sometimes,,Sometimes,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,75,5,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,Sometimes,,,,Often,,,,,Most of the time,Most of the time,Often,,100% of projects,Approximately half internal and half external,Central Insights Team,Credit data; mapping data; etc,Joining data at different resolutions,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Other",Shared private data center ,Git,Sometimes,90000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,R,Other,SAS,Government website,"Arxiv,Blogs,Kaggle,Official documentation,Textbook",Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,More than 10 years,Other,Self-taught,34,0,33,33,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Government,"10,000 or more employees",Decreased significantly,Don't know,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,,100GB,Neural Networks,"SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",Sometimes,,,,,Most of the time,,,Rarely,,,,,,,,,,,Most of the time,Most of the time,,Rarely,,,,Sometimes,,,Sometimes,,,,10,78,0,2,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Often,Often,Sometimes,,Most of the time,,,Often,Most of the time,,,,,,Most of the time,,Often,,,,Sometimes,,100% of projects,Entirely internal,Other,proprietary data alone,cleanliness of data sets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,public facing ftp or web page,Other,Sometimes,120000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Operations Research Practitioner,Researcher",Work,20,20,40,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks",A master's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,,,,,,,Often,Rarely,,,,,,,Most of the time,,,,30,25,15,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Rarely,Rarely,,Sometimes,Sometimes,,,,,,Often,Sometimes,,,,,,,Sometimes,,Most of the time,,26-50% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1400000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Germany,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Textbook",,Very useful,Very useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,,University courses,20,0,20,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",Primary/elementary school,Manufacturing,"5,000 to 9,999 employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Rarely,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","R,RapidMiner (commercial version),SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,Rarely,,,Sometimes,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Recommender Systems",,Often,,,,Rarely,Often,Often,,,,,,,,Often,,,,Often,,,,Often,,,,,,,,,,75,10,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Most of the time,,,Most of the time,Most of the time,,,Often,,,,Most of the time,Most of the time,,,,,Most of the time,,,,10-25% of projects,Entirely internal,Other,,,,,,"Git,Other",Sometimes,80000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Philippines,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,Tutoring/mentoring",,,,,,,,,,,Very useful,,Very useful,,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,20 to 99 employees,Stayed the same,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Amazon Web services,KNIME (free version),Python,R,RapidMiner (commercial version),Tableau",,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,Often,Most of the time,,,,,,,,,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Naive Bayes,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Often,Most of the time,,,,70,10,5,5,5,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Privacy issues",,,,,Most of the time,Often,,,,,,,,,,,Most of the time,,,,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,20000,USD,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,40,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Other",,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Talking Machines Podcast",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Programmer,Self-taught,30,70,0,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer",Work,10,0,60,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Segmentation",Often,,,,Often,,Often,Most of the time,Most of the time,,,,,,,Most of the time,,Often,,Sometimes,,,Often,Often,,Often,,,,,,,,40,5,15,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Sometimes,,Often,Most of the time,,,,,,,,Sometimes,,,,,,Often,Often,,26-50% of projects,More internal than external,Other,N/A,Data issue and access,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,150000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,10,10,0,50,30,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important +Female,Israel,34,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,,,Very useful,,Very useful,,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,,Nice to have,Necessary,,,,"Coursera,Other",Traditional Workstation,2 - 10 hours,Master's degree,Yes,Master's degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important +Male,India,21,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,"Data Stories Podcast,DataTau News Aggregator,Partially Derivative Podcast",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Female,Australia,25,Employed part-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Data Analyst,University courses,0,15,35,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Mix of fields,500 to 999 employees,Increased slightly,3-5 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,100GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,KNIME (free version),Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,Often,,,,Often,,Rarely,,Often,,,,,,Often,Often,Sometimes,,Sometimes,,,Rarely,Rarely,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT",,,,Often,Most of the time,,,,,,,,,,Sometimes,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,65000,AUD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Singapore,23,Employed part-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Deep learning,R,Google Search,"Arxiv,Blogs,Kaggle,Online courses,Textbook,YouTube Videos,Other",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"Jack's Import AI Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,20,10,60,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression",A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Don't know,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,R,RapidMiner (free version),TensorFlow",,Rarely,,,,,,,,,,,,,,,Often,,Rarely,,,,Often,,,,,,,,Most of the time,,Most of the time,,Often,,,,,,,,,,,Sometimes,,,,,,"Natural Language Processing,Prescriptive Modeling,Text Analytics",,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,,50,30,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Rarely,,,,,,,Rarely,,,Rarely,,Rarely,,Less than 10% of projects,More internal than external,Standalone Team,skip,skip,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Git,Never,6000,SGD,Other,6,,,,,,,,,,,,,,,,,, +Female,Malaysia,22,"Not employed, but looking for work",,,,,,,,DataRobot,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects,YouTube Videos",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,0,10,0,90,0,0,Machine Translation,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,20,0,0,80,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important +Female,Canada,52,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Salfrod Systems CART/MARS/TreeNet/RF/SPM,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Podcasts",,,Very useful,,,,Very useful,,,,,,Somewhat useful,,,,,,"KDnuggets Blog,Linear Digressions Podcast",15+ years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Yes,Bachelor's degree,Mathematics or statistics,,"Operations Research Practitioner,Other",University courses,NA,NA,NA,NA,NA,NA,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Machine Learning Engineer,Self-taught,40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,,Sometimes,Most of the time,Most of the time,Often,Often,,,,,,,Often,,,Sometimes,Often,Often,,Often,,Often,,,,,Often,,,,70,20,5,3,2,0,Enough to explain the algorithm to someone non-technical,"Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,,,,,,,,,,,,,,,,,Sometimes,,,Often,51-75% of projects,Entirely external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,Unnecessary,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Other,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,30,0,70,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,DataRobot,Social Network Analysis,R,GitHub,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,50,10,10,10,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,500 to 999 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,Other,,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,Often,,,,,,,,Often,,,Sometimes,Often,,,Often,Often,,,,,Often,Often,,,,20,50,10,10,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Other",,Often,,,Often,,,,,,,,,,,,,,,,,Often,26-50% of projects,Do not know,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,Employed full-time,,,Yes,,Business Analyst,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,99,0,0,0,1,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",High school,Internet-based,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Recommender Systems,Segmentation,SVMs,Text Analytics",Sometimes,Sometimes,Often,,Sometimes,Most of the time,Often,Sometimes,,,,,,,,,,Often,Often,,,,,Often,,Sometimes,,Sometimes,Often,,,,,50,30,5,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Most of the time,,,,Most of the time,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,10-25% of projects,More internal than external,Standalone Team,"movie,news",hard to get labelled data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,240000,CNY,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Textbook,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer",Self-taught,50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Neural Networks - RNNs,A master's degree,Telecommunications,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,,,,,,,"Amazon Web services,C/C++,Python",,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Neural Networks,RNNs,Segmentation",,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,Often,,,,,,,,0,0,100,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,None,Do not know,Central Insights Team,,,,Email,,Git,Rarely,100000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by college or university,Jupyter notebooks,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Official documentation,Personal Projects,YouTube Videos,Other",Very useful,Very useful,Very useful,,Very useful,,,,,Very useful,,Somewhat useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,University courses,20,0,20,60,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A professional degree,Academic,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Never,10GB,CNNs,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,Segmentation",,,,Most of the time,,Sometimes,Often,,,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,75,20,0,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues",Most of the time,Often,,,,,,Sometimes,Most of the time,,Most of the time,,,,Rarely,,Most of the time,,,,,,100% of projects,Entirely internal,Standalone Team,ABIDE; ADNI; NIHPD; MICCAI 2012,Getting access,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Git,Sometimes,53000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Julia,Social Network Analysis,SQL,Google Search,"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,"DataTau News Aggregator,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",University courses,10,50,20,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,20 to 99 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,1GB,"Bayesian Techniques,Decision Trees,HMMs,Other","Amazon Web services,Hadoop/Hive/Pig,NoSQL,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics",,,Sometimes,,,Often,Most of the time,Often,Sometimes,,,,,,,Sometimes,,,Often,,Often,,Sometimes,,,Rarely,Rarely,Rarely,Often,,,,,40,20,20,20,0,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,,,Sometimes,,,,,,,Sometimes,Most of the time,,,,Most of the time,,,,,76-99% of projects,Entirely internal,Standalone Team,Pubmed; quandl; ,Low throughput of APIs,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,80000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Indonesia,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,20,0,15,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data",,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,Python,SQL,TensorFlow",,Rarely,,Sometimes,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs",,,,Most of the time,,Most of the time,Most of the time,Often,Often,,,,Sometimes,Sometimes,,Often,,Often,,Most of the time,Often,,Sometimes,,Often,,,Often,,,,,,20,70,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Often,,,Often,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,,,,,,Sometimes,,,Often,,51-75% of projects,Entirely internal,Other,mnist; imagenet; ,"large size data, small hardware capabilities to process them","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,120000000,IDR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,,Proprietary Algorithms,Python,I collect my own data (e.g. web-scraping),"Blogs,Company internal community,Newsletters,Stack Overflow Q&A",,Very useful,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Machine Learning Engineer,Researcher",Work,50,0,25,25,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,10 to 19 employees,Increased slightly,3-5 years,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",Other,Most of the time,100GB,,"Amazon Web services,C/C++,Julia,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,Most of the time,,Sometimes,,,,,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,25,45,0,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,,,7,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,University/Non-profit research group websites,"College/University,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,Engineer,University courses,30,30,0,30,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Relational data",Rarely,,"CNNs,Decision Trees,HMMs,Neural Networks,Random Forests","Java,R",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"CNNs,HMMs,Naive Bayes,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,19,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by college or university,Jupyter notebooks,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Online courses,Textbook",,Somewhat useful,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,,,Partially Derivative Podcast,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,,,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Colombia,34,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer",University courses,20,20,10,50,0,0,Computer Vision,"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Neural Networks",Often,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,50,25,10,15,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources",,,,,Most of the time,,,,,Often,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer",Self-taught,50,20,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Orange,Python,R,SQL,Tableau",Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,Rarely,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Often,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Often,Often,,Often,,Most of the time,,Often,,,,50,10,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,Most of the time,Often,,,,,,,,,Most of the time,,,,,Sometimes,,,26-50% of projects,More internal than external,Business Department,,parsing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,100000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Cluster Analysis,Python,University/Non-profit research group websites,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,65,10,10,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,Anomaly Detection,R,University/Non-profit research group websites,"Arxiv,Blogs,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,,,,,Very useful,Very useful,,,Very useful,,,,"Data Elixir Newsletter,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,A health science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,10,40,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,Often,,,Often,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Most of the time,Most of the time,Often,Sometimes,Sometimes,,Sometimes,,Most of the time,,Most of the time,,Often,,Often,Most of the time,,Most of the time,,Sometimes,Sometimes,Most of the time,,Most of the time,Most of the time,,,,30,10,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Sometimes,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,,Often,,,100% of projects,Entirely internal,Business Department,,messy text data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,70000,NZD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Mathematica,Neural Nets,Python,University/Non-profit research group websites,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Somewhat useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,DataRobot,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Very useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,6 to 10 years,"Data Analyst,Other",Self-taught,45,15,15,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Python,R,RapidMiner (free version),SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,Rarely,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation,SVMs,Time Series Analysis",Sometimes,Sometimes,,,,Often,Most of the time,Often,Sometimes,,,,,Sometimes,,Often,,,,,,,Often,,,,Often,Sometimes,,Often,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Sometimes,Often,,,,,,Sometimes,,,,,,,,,,,,Often,,Sometimes,51-75% of projects,More internal than external,Other,Hoovers; Dunn & Bradstreet; Federal Government Departmental Information;,Time lag,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,,Rarely,115500,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Recommendation Engines,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",,Financial,"10,000 or more employees",Stayed the same,6-10 years,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,,Bayesian Techniques,"QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Time Series Analysis",Rarely,Rarely,Rarely,,,,Rarely,,,,,,,,,Rarely,Rarely,Rarely,,,Rarely,,,,,,,,,Rarely,,,,50,10,0,10,30,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,,,,Commercial Data Platform,,Git,Never,2500000,INR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Tableau,Bayesian Methods,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",University courses,40,5,30,20,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau",,,,,Most of the time,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Decision Trees,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Sometimes,Often,,Often,,Most of the time,,Often,,,Often,Often,Often,Most of the time,,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,,Often,Sometimes,Sometimes,Most of the time,,Often,,,,30,15,15,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,,,,,,,,,,,,,Sometimes,Often,Often,,,10-25% of projects,Do not know,Central Insights Team,UCL datasets; US Gov. Datasets; Gov of India Datasets,Deployment,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,10000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,,"Data Machina Newsletter,Linear Digressions Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,edX,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Management information systems,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Romania,45,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,RapidMiner (free version),"Ensemble Methods (e.g. boosting, bagging)",Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Academic,"1,000 to 4,999 employees",Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,10MB,Ensemble Methods,"IBM SPSS Statistics,R,RapidMiner (free version)",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,"Cross-Validation,Ensemble Methods",,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,50,30,0,10,10,0,Enough to run the code / standard library,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,10-25% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,20000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,,,,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,25,60,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data,Other",,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,R,Spark / MLlib,SQL",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Often,,,,,,Often,,,,,,,,Sometimes,Rarely,,,,,,,,,,"Association Rules,Data Visualization,Naive Bayes,Simulation,Text Analytics",,Rarely,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,Often,,Often,,,,,80,10,0,5,5,0,Enough to run the code / standard library,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,Often,Sometimes,,,,,,,,Most of the time,Often,Most of the time,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,87000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,R,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer,Other",Self-taught,50,40,0,0,5,5,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series",Other (please specify; separate by semi-colon),High school,Technology,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,Decision Trees,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Segmentation",,,,,,,Sometimes,Often,Sometimes,,,,,,,,,,Rarely,,,,,,,Sometimes,,,,,,,,25,0,0,50,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",Often,,,Sometimes,,,,,,Often,,,Sometimes,,,,,,,,Rarely,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,2000000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,"Google Search,Government website","Conferences,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Very useful,,,,,,Very useful,,,,Somewhat useful,,Very useful,Very useful,,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,40,60,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Researcher,Software Developer/Software Engineer",Self-taught,50,0,30,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Financial,100 to 499 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,,,"Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Sometimes,,Often,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,0,0,0,0,0,100,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,100,0,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,HMMs,Neural Networks,RNNs","Java,Python,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,Often,Often,,,"CNNs,Cross-Validation,Ensemble Methods,Natural Language Processing,Neural Networks,RNNs",,,,Often,,Often,,,Often,,,,,,,,,,Often,Often,,,,,Often,,,,,,,,,30,30,5,10,25,0,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Flume,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Company internal community,Online courses,Personal Projects,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,3 to 5 years,"Researcher,Statistician,Other",University courses,25,25,25,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,Rarely,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Sometimes,,,,Often,,Often,,Often,,,,,Sometimes,,Sometimes,Sometimes,,,Often,,Sometimes,Sometimes,,,,25,5,5,25,40,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,Often,,Sometimes,,,,,,Often,Sometimes,,Sometimes,,Most of the time,Often,Often,,10-25% of projects,More internal than external,Standalone Team,American Community Survey,format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",Rarely,135000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Personal Projects,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,,,,,Very useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Work,20,0,60,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,Regression/Logistic Regression,"C/C++,Cloudera,Hadoop/Hive/Pig,IBM SPSS Statistics,NoSQL,Python,R,SAS JMP,Spark / MLlib,Statistica (Quest/Dell-formerly Statsoft),TIBCO Spotfire",,,,Most of the time,Sometimes,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,Often,Often,,,Most of the time,,,Most of the time,,,,,"Cross-Validation,Logistic Regression,Simulation,Text Analytics",,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,Often,,,,,30,50,10,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,NTSB;Kaggle;,Big data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,110000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,43,"Not employed, but looking for work",,,,,,,,Tableau,Regression,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Github Portfolio,No,Master's degree,,I don't write code to analyze data,Other,Self-taught,10,90,0,0,0,0,Time Series,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,25,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,SAP BusinessObjects Predictive Analytics,I don't plan on learning a new ML/DS method,C/C++/C#,Google Search,"Blogs,College/University,Conferences,Friends network,Newsletters,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,Very useful,Data Machina Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Engineer,Statistician",Self-taught,50,0,50,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,SQL,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,Unnecessary,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Master's degree,No,Master's degree,,Less than a year,Other,University courses,10,20,0,65,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important +Male,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Google Cloud Compute,Proprietary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs",Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",45,25,20,0,10,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,500 to 999 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,Often,,,,Sometimes,,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,,Sometimes,,Often,,,,"A/B Testing,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes",Sometimes,,,,,,,,,,,Sometimes,,Often,,Often,,Most of the time,,,,,,,,,,,,,,,,5,30,30,30,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Sometimes,,,,Often,,Often,,,Most of the time,,Sometimes,,Often,Often,,,,26-50% of projects,More internal than external,IT Department,,,Key-value store (e.g. Redis/Riak),Commercial Data Platform,,"Bitbucket,Git",Rarely,172000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,"FlowingData Blog,Jack's Import AI Newsletter,KDnuggets Blog",< 1 year,Nice to have,,,,Necessary,Necessary,Necessary,Nice to have,,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Natural Language Processing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",25,50,20,0,5,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Other,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,20,10,0,25,45,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,,,,,,,,,,,Often,,,,Sometimes,,,26-50% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,,Always,,,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Unix shell / awk,Deep learning,SQL,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,,,"FlowingData Blog,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",University courses,20,0,40,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Perl,Python,R,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk,Other",,Often,,,Sometimes,,,,Sometimes,,,,,Often,,,Often,,,,,,,Rarely,,,,,,Rarely,Often,,Often,,,,,,,,Sometimes,Sometimes,Rarely,,,Rarely,,Sometimes,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,Often,,,Sometimes,Rarely,Sometimes,,Often,Sometimes,,,,,Sometimes,Sometimes,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,"Census;Social Media;Geographic (Shapefiles, geojson, etc.);government collected data;various public apis",joining disparate datasets without common keys in sensible ways.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Git,Most of the time,"140,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,39,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Textbook",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,Very useful,,,,,< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Kaggle Competitions,Yes,Master's degree,A social science,Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,0,60,0,20,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Chile,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,QlikView,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Business Analyst,Work,0,30,50,20,0,0,Natural Language Processing,Logistic Regression,A bachelor's degree,Technology,20 to 99 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,Regression/Logistic Regression,"QlikView,R,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,40,0,60,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,50000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,20,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Julia,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,Very useful,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,University courses,20,5,25,40,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,,"Random Forests,Other","Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,Most of the time,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction",Often,,,,,Sometimes,,,,,,,,Rarely,,Sometimes,,,Often,,Rarely,,,,,,,,,,,,,0,40,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations of tools,,,,,,,,,,,,,Rarely,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Never,55000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer",Self-taught,40,20,0,15,0,25,"Computer Vision,Recommendation Engines,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - RNNs",A master's degree,Financial,20 to 99 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Rarely,1GB,RNNs,"Amazon Machine Learning,Google Cloud Compute",Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,10,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,Often,,Most of the time,,,Sometimes,,Often,,,,,Rarely,,Less than 10% of projects,More external than internal,IT Department,,,Graph (e.g. GraphBase/Neo4j),Commercial Data Platform,,Bitbucket,Sometimes,1000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,Somewhat useful,Very useful,"Data Stories Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,0,10,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Indonesia,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,80,0,0,20,0,0,Computer Vision,Neural Networks - CNNs,Primary/elementary school,Financial,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Other,Basic laptop (Macbook),"Image data,Relational data",Never,100MB,CNNs,"Amazon Web services,NoSQL,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Often,,,,Rarely,,,,,,CNNs,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,70,0,30,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,dogs vs cats,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,10000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,C/C++,Factor Analysis,R,I collect my own data (e.g. web-scraping),Podcasts,,,,,,,,,,,,,Very useful,,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,20,0,20,60,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Python,R,Spark / MLlib,SQL,Tableau",Rarely,Rarely,,Most of the time,Rarely,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Time Series Analysis",Often,Often,,,,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,Most of the time,,,,40,15,20,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Often,Sometimes,,,,,Often,,,,,Often,,,,,,Most of the time,,Often,,51-75% of projects,Approximately half internal and half external,Other,Exchange data ,"Specs vary by exchange, features need to be engineered and custom configurations must be created.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,150000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,1 to 2 years,"Data Analyst,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,"5,000 to 9,999 employees",Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Sometimes,Sometimes,,,,,,Often,,Most of the time,,,Sometimes,,Often,,Often,,,Most of the time,Sometimes,,Sometimes,Most of the time,,,,45,25,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,Often,,,,,,,Often,,,,,,Often,,,,10-25% of projects,More internal than external,Standalone Team,Gold eBiz,Non-matching variable names; non-matching variable data types; relational databases keep getting remapped too often,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Email,Other",Liquid Files,Generic cloud file sharing software (Dropbox/Box/etc.),Never,"50,000",,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Philippines,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Kaggle competitions,30,30,20,5,15,0,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,1MB,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,Sometimes,Sometimes,,,,30,10,20,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,,Sometimes,,Often,Most of the time,,Often,,,,,,Sometimes,,,,Sometimes,,51-75% of projects,More internal than external,IT Department,social media,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,23000,PHP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Tableau,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher",Self-taught,50,10,20,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods","Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,70,10,10,10,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,,,,,,30000,CAD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,,Self-taught,50,0,0,50,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,Don't know,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Don't know,100GB,,"C/C++,Python,R,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Neural Networks,Recommender Systems",,Most of the time,Often,,,,Most of the time,Rarely,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,40,10,40,10,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Git,Sometimes,110000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),,Computer Science,1 to 2 years,I haven't started working yet,University courses,20,60,0,20,0,0,Natural Language Processing,"Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Portugal,29,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,40,0,35,20,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Image data,Text data,Relational data",,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,HMMs,Random Forests,SVMs,Other","Java,Mathematica,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,RapidMiner (commercial version),SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,Most of the time,,,,,Rarely,Rarely,,,,Rarely,,,,,,Sometimes,,Often,Rarely,,,,,Rarely,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Lift Analysis,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,Often,,,Often,,Often,Sometimes,Sometimes,,,Often,Most of the time,Sometimes,,,Sometimes,,,Most of the time,,Often,Sometimes,,Often,Sometimes,Sometimes,,Most of the time,,,,20,30,30,0,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,,,,Often,Sometimes,,10-25% of projects,Approximately half internal and half external,Standalone Team,"temporal, high-dimensional. structural online data; biomedical data",NA,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,,,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Programmer",Self-taught,50,20,10,0,20,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Decision Trees,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,Sometimes,Sometimes,,,,30,15,15,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,,,,,,,,Sometimes,,Often,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,,Rarely,99000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by non-profit or NGO,R,Regression,,Other,"Blogs,College/University,Newsletters,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,Very useful,,,Very useful,,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A social science,Less than a year,Other,Other,20,20,20,0,0,40,,,A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important +Female,Indonesia,22,"Not employed, but looking for work",,,,,,,,,,,,"College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,Engineer,Programmer",University courses,0,0,0,100,0,0,"Machine Translation,Survival Analysis",Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Support Vector Machines (SVM),SQL,"GitHub,Government website,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer",University courses,5,20,30,40,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,IBM Cognos,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,RapidMiner (free version),SQL,TensorFlow",,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,Often,,,,Sometimes,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Logistic Regression,Recommender Systems,Time Series Analysis",,,,,,Often,,,,,,,,,,Often,,,,,,,,Sometimes,,,,,,Often,,,,40,10,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,Often,Often,,,,,,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,1500000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle",,,,,Very useful,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,80,5,0,10,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Microsoft Azure Machine Learning,Deep learning,Python,Government website,"Blogs,Official documentation,Online courses,Podcasts,Textbook",,Very useful,,,,,,,,Very useful,Very useful,,Very useful,,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Don't know,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Most of the time,,,,,,Often,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods,Logistic Regression,Random Forests",,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,50,10,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,Most of the time,,Less than 10% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Bitbucket,Rarely,120000,AUD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +A different identity,India,34,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,R,Other,Blogs,,Very useful,,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Other,0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Other,Anomaly Detection,SQL,"Government website,University/Non-profit research group websites,Other","Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,Other,Other",,Somewhat useful,Very useful,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Fine arts or performing arts,3 to 5 years,"Data Analyst,Data Scientist,Other",Self-taught,50,25,25,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,"1,000 to 4,999 employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,Often,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Time Series Analysis,Other,Other",,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,Most of the time,Sometimes,,10,10,5,10,15,50,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,Often,,,,,Sometimes,,Sometimes,,,,Often,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,"American Community Survey and US census, CDC data sets, USGS water quality data, data.gov, US tigris shape files","frequently we are asked to make inference and share results based on small area analyses with extremely sparse data and suppression issues. Error and confidence are issues at time. Also, ownership of the data and public opinion frequently impede progress.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,SQL tables & API's,Other,Rarely,49750,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Chile,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,Very useful,Somewhat useful,Talking Machines Podcast,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Recommendation Engines,"Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,India,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,SQL,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Other,50,0,50,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,Mix of fields,"10,000 or more employees",Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,100GB,Other,"Java,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,30,0,50,0,20,0,Enough to run the code / standard library,"Limitations in the state of the art in machine learning,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,Often,Often,,,,,,,Often,,,None,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Via servers,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,Very useful,,,Not Useful,,Somewhat useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,5,25,0,0,"Natural Language Processing,Reinforcement learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Academic,100 to 499 employees,Increased significantly,More than 10 years,A tech-specific job board,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Video data,Rarely,100GB,"CNNs,GANs,Neural Networks,RNNs","Amazon Web services,C/C++,Jupyter notebooks,Python,R",,Rarely,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Most of the time,,Sometimes,,,,,Sometimes,,,Often,,Rarely,,,Often,Most of the time,Sometimes,,,,Most of the time,,,Rarely,,,,,,10,80,5,3,2,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools",Rarely,,,,,,,,Rarely,Most of the time,,Often,Sometimes,,,,,,,,,,None,Entirely internal,Standalone Team,OpenAI Gym,,Other,I don't typically share data,,Git,Sometimes,"30,000",,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Tableau,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),40+,Github Portfolio,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Other,University courses,30,10,10,40,10,0,Natural Language Processing,Bayesian Techniques,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,,,Amazon Web services,Anomaly Detection,Python,Google Search,"Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"FlowingData Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",29,50,15,1,5,0,Time Series,,A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,,,"Amazon Web services,C/C++,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Unix shell / awk",,Often,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,Rarely,,Sometimes,,,,,,,,,Often,,,,,,Often,,,,"A/B Testing,Data Visualization,Simulation",Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,5,1,0,2,2,90,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Often,,,,,,,,,,,Most of the time,Sometimes,,,Often,,,51-75% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",git/GitHub,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,South Africa,38,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,"Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,Natural Language Processing,Bayesian Techniques,No education,Internet-based,"5,000 to 9,999 employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100MB,Bayesian Techniques,"Amazon Web services,Jupyter notebooks,NoSQL,Python,R",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,60,20,10,10,0,0,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,Often,Most of the time,,51-75% of projects,More external than internal,Standalone Team,SimilarWeb,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,"1,100,000",,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Russia,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by government,Jupyter notebooks,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,,,Necessary,,Necessary,Necessary,,,,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,Time Series,Support Vector Machines (SVMs),A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Female,United States,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,A health science,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,35,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,O'Reilly Data Newsletter,1-2 years,,,,,,,Necessary,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Master's degree,A health science,3 to 5 years,Data Analyst,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,,,,,,,,,,,,,,Very Important, +Male,India,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,R,Genetic & Evolutionary Algorithms,Matlab,University/Non-profit research group websites,"College/University,Conferences,Kaggle,Newsletters,Textbook,YouTube Videos",,,Somewhat useful,,Very useful,,Very useful,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,80,0,10,10,0,0,Time Series,Evolutionary Approaches,A bachelor's degree,Academic,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Other,Never,10GB,Evolutionary Approaches,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Evolutionary Approaches,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,Limitations in the state of the art in machine learning,,,,,,,,,,,,Often,,,,,,,,,,,10-25% of projects,Do not know,Other,"Healthy Brain Network Data, Child Mind Institute New York",Specifications or Details of the data not completely defined.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Pen Drives,,Sometimes,700000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,2,1,80,15,2,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,500 to 999 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,Minitab,Python,Spark / MLlib,SQL",,,,Sometimes,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Rarely,,,Rarely,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,Often,,Often,,,Often,,Often,,Most of the time,,,,,Often,,Sometimes,,,,30,30,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,Often,,,,,Sometimes,,,,,,,,,Often,,,,,10-25% of projects,More internal than external,IT Department,Zendesk,Cleaning the data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,140000000,IDR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,Very useful,,,,"Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",20,10,10,60,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Always,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics",Sometimes,,,,,Most of the time,Most of the time,Rarely,,,,Sometimes,,,,Often,,,Rarely,,Sometimes,,Often,,,Often,Often,,Rarely,,,,,30,20,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Often,,,Sometimes,,,,,,,,,,Often,,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,154000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Not Useful,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",50,50,NA,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",,A master's degree,Internet-based,Fewer than 10 employees,Stayed the same,1-2 years,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,"Bayesian Techniques,Neural Networks","Amazon Web services,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Natural Language Processing,Text Analytics",Often,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,50,15,20,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input",,,,Often,,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,105000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"DataTau News Aggregator,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Other",University courses,35,5,25,35,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Orange,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,Often,,,,,Rarely,,Sometimes,,Most of the time,,,,,,,,Rarely,Most of the time,,,,Rarely,,Most of the time,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,Sometimes,,,Sometimes,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,Sometimes,Often,,,Sometimes,Sometimes,Sometimes,Sometimes,Most of the time,Sometimes,,Most of the time,,,Often,Often,,,,70,5,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Often,,Often,Most of the time,,,Often,,,,,Sometimes,,,Rarely,,Sometimes,Most of the time,,,76-99% of projects,More internal than external,IT Department,Demographic data;,Collected differently over time so hard to have consistent results,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,90000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Tableau,Support Vector Machines (SVM),R,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,70,20,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,100MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,worldbank,Missing Information,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Bitbucket,Rarely,100000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Other,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Data Machina Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Researcher",University courses,20,20,0,50,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Philippines,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,,,,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Researcher,Statistician",University courses,50,0,30,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Decreased significantly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10MB,Regression/Logistic Regression,"IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Minitab,QlikView,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,Sometimes,Often,Often,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,Sometimes,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,50,20,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Sometimes,Most of the time,,,Most of the time,Often,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,28000,HKD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,Amazon Web services,Social Network Analysis,Python,GitHub,Blogs,,Very useful,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,10,20,25,10,10,Natural Language Processing,Evolutionary Approaches,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,,,,,,,,,,, +Male,Canada,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,"College/University,Online courses,Personal Projects",,,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,10,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Sometimes,1GB,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,25,0,15,35,25,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,76-99% of projects,Entirely external,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Bitbucket,Never,72000,CAD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,5,5,5,0,15,70,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Insurance,"10,000 or more employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,DataRobot,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,Rarely,,,,Rarely,,Rarely,Sometimes,,,,,,,,Most of the time,,,,,Rarely,Sometimes,,Sometimes,,,,,,Most of the time,,Often,,,,,,,,Rarely,Sometimes,,,Sometimes,Most of the time,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,Rarely,Sometimes,Most of the time,Sometimes,Most of the time,Often,Most of the time,Most of the time,,,Often,,Sometimes,,Most of the time,,Often,Often,Most of the time,Often,,Most of the time,Sometimes,Sometimes,,,Rarely,Most of the time,Sometimes,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Most of the time,Most of the time,,Often,Most of the time,,Sometimes,Often,Most of the time,Often,,Sometimes,,Often,Often,Most of the time,Often,,10-25% of projects,More internal than external,IT Department,,quality,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Always,230000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,DataCamp,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,45,0,15,0,0,Natural Language Processing,"Bayesian Techniques,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,22,Employed part-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Engineer",University courses,20,0,0,80,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,,Regression/Logistic Regression,"Minitab,Python,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,"Data Visualization,Logistic Regression,Time Series Analysis,Other",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Sometimes,,10,30,20,30,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Often,Rarely,,,Often,,,,,,,,,,,,Sometimes,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"72,000",,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Self-employed,R,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,,,,Very useful,The Data Skeptic Podcast,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Neural Networks - CNNs",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Anomaly Detection,Python,,"Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Predictive Modeler,Other,60,10,10,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression",Sometimes,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,35,5,0,35,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,,,,,,,,,Rarely,,,Often,,Sometimes,Most of the time,,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,140000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Canada,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Self-employed,SAP BusinessObjects Predictive Analytics,Regression,SQL,,"Newsletters,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,Very useful,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Programmer,Other",Work,20,0,70,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,,,"Decision Trees,Regression/Logistic Regression","IBM Cognos,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,SQL",,,,,,,,,,Often,,,,,,,,,,,,,Often,,Often,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Segmentation,Time Series Analysis",,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,80,0,0,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,,,Most of the time,,,,Most of the time,,10-25% of projects,More internal than external,IT Department,,disparate and dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,"70,000",CAD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,College/University,Conferences,Friends network,Kaggle,Stack Overflow Q&A,Textbook",Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Machine Learning Engineer,Researcher",University courses,15,0,15,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Other,Never,10MB,"CNNs,Neural Networks,SVMs,Other","Jupyter notebooks,MATLAB/Octave,Perl,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,Most of the time,,,,,Sometimes,,,,,,Often,Often,,,,,,,Sometimes,,Often,,,,10,50,0,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Often,,100% of projects,Do not know,Standalone Team,BCI competition data,cleaning/filtering noisy data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"120,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Singapore,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Deep learning,Python,GitHub,"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,,,"Data Machina Newsletter,Jack's Import AI Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",Work,50,10,30,0,10,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Video data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,SVMs,Time Series Analysis",,Sometimes,Sometimes,Sometimes,Sometimes,Often,Most of the time,Often,Often,Often,,,,Often,,Sometimes,,Sometimes,,Often,Often,,Often,Sometimes,,,Most of the time,Rarely,,Often,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,Sometimes,Often,,,,,,Often,,,Sometimes,,,26-50% of projects,Entirely internal,Central Insights Team,open street map,consistency of the data and version control,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,80000,SGD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Support Vector Machines (SVM),Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,Somewhat useful,Very useful,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,,Self-taught,75,0,20,5,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,Decision Trees,"Amazon Web services,Java",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Naive Bayes,Segmentation",,,,,,Most of the time,,Most of the time,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,75,15,0,0,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,300000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Statistician,Other",University courses,50,0,25,25,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,Sometimes,Often,Rarely,Sometimes,Often,Sometimes,,,,Often,,Sometimes,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Sometimes,Often,,,,,,,,,Sometimes,,,Often,,,Sometimes,Most of the time,,51-75% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Rarely,70000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,39,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Microsoft SQL Server Data Mining,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Biology,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Natural Language Processing,Logistic Regression,High school,Insurance,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Often,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,20,20,0,10,50,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,Rarely,Often,,,,,,,Often,,,,Sometimes,Often,,10-25% of projects,Entirely internal,Other,,unclear relationship between data provided and goals,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Git,Rarely,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Neural Nets,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,,,,Very useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",University courses,0,30,50,20,0,0,"Recommendation Engines,Unsupervised Learning",Other (please specify; separate by semi-colon),A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Lift Analysis,PCA and Dimensionality Reduction,Recommender Systems,Segmentation",,Often,,,Most of the time,,Sometimes,,,,,,,Often,Sometimes,,,,,,Most of the time,,,Often,,Most of the time,,,,,,,,20,40,20,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,,,,,,Sometimes,,Often,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,20000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Increased significantly,3-5 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,,"Amazon Web services,C/C++,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,Most of the time,Often,,,,Rarely,,,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,90,5,5,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Never,175000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,IBM Watson / Waton Analytics,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Online courses,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Scientist,Self-taught,70,0,30,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,,,,,Not very important,Research that advances the state of the art of machine learning,Traditional Workstation,Other,Always,1MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,Markov Logic Networks,Neural Networks","Julia,NoSQL,Python,SQL",,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Neural Networks",,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,5,60,10,15,10,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,10-25% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","I don't typically share data,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,27,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Nice to have,,,Nice to have,Nice to have,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,,More than 10 years,I haven't started working yet,University courses,40,0,0,0,60,0,Unsupervised Learning,"Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Male,India,58,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,R,Cluster Analysis,R,,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Self-taught,80,10,10,0,0,0,Supervised Machine Learning (Tabular Data),Evolutionary Approaches,A bachelor's degree,Academic,100 to 499 employees,Stayed the same,3-5 years,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Rarely,,"Evolutionary Approaches,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Segmentation,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,10,80,10,0,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Other,,,,I don't typically share data,,,,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Switzerland,44,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Not Useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,20,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Financial,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,Regression/Logistic Regression,"Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,Rarely,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Often,,Most of the time,Most of the time,Most of the time,,Often,Often,Most of the time,,,,Most of the time,,,,,Most of the time,,Most of the time,,100% of projects,More external than internal,Other,"Financial dataset eg bloomberg, morningstar",lack of consistent data quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,180000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,36,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,TensorFlow,Other,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,,,Very useful,,Very useful,,,,Very useful,"Talking Machines Podcast,Other (Separate different answers with semicolon)",3-5 years,,,,,,,,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Bachelor's degree,Fine arts or performing arts,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",59,34,0,7,0,0,Unsupervised Learning,"Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,27,"Not employed, but looking for work",,,,,,,,Mathematica,Deep learning,Python,Google Search,"Arxiv,Kaggle,Newsletters,Personal Projects,YouTube Videos",Very useful,,,,,,Very useful,Very useful,,,,Very useful,,,,,,Very useful,,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Other,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Other,Kaggle competitions,100,0,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,MATLAB/Octave,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",65,30,0,0,5,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,R,Google Search,"Blogs,Online courses",,Somewhat useful,,,,,,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Master's degree,Electrical Engineering,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,39,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Google Search,"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Not Useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Physics,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,50,50,0,0,0,0,,Logistic Regression,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,47,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,,,Necessary,,Necessary,Necessary,Necessary,Necessary,,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,11 - 39 hours,Kaggle Competitions,Yes,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,0,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Machine Translation,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,IBM SPSS Modeler,Jupyter notebooks,Python,SQL,TensorFlow",,Sometimes,,,,,,Sometimes,,,Rarely,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,,Most of the time,,,,,,,Sometimes,,,,,Most of the time,Sometimes,,,Sometimes,,,,,Rarely,Most of the time,,,,,60,0,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,,,,,Often,,,,,,Sometimes,,Sometimes,,,,Often,,51-75% of projects,More external than internal,Central Insights Team,Web text; Social Media; Wikipedia,Cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,"1,500,000",THB,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,India,24,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,C/C++/C#,Google Search,"College/University,Textbook,YouTube Videos",,,Very useful,,,,,,,,,,,,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,,,,,0 - 1 hour,Master's degree,Yes,Master's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Vietnam,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer",Self-taught,50,30,0,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",Often,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Often,,,Often,,Most of the time,,,,,Often,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,Often,,Often,,,,,Most of the time,Often,,Sometimes,,Less than 10% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,1700000,INR,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Other,27,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters",,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Programmer,Kaggle competitions,30,0,0,30,40,0,"Computer Vision,Speech Recognition,Survival Analysis",Support Vector Machines (SVMs),High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,23,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler,Programmer",Self-taught,80,10,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,Most of the time,,Often,,,,,,Most of the time,Most of the time,,,,,,,Often,,,,40,10,0,30,20,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Often,,,Often,,,,,,,,Sometimes,Most of the time,,100% of projects,More internal than external,Standalone Team,NA,Lack of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,,,,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Data Analyst,Perfectly,Employed by college or university,Other,I don't plan on learning a new ML/DS method,SAS,"Google Search,Government website","Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,< 1 year,Nice to have,,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Other,Less than a year,I haven't started working yet,Self-taught,20,40,10,0,0,30,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",No education,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,,Very Important,,Not important,Somewhat important,Somewhat important +Male,India,23,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Hadoop/Hive/Pig,Bayesian Methods,Python,,"Blogs,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,Very useful,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,10,80,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SQL,Unix shell / awk",,Often,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Segmentation,Time Series Analysis",Often,,,,,Sometimes,Most of the time,,,,,,,,,Rarely,,Rarely,,,,,,,,Sometimes,,,,Often,,,,20,5,5,10,60,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,IT Department,Us census,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,120000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,57,"Not employed, but looking for work",,,,,,,,,Association Rules,R,,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,,15+ years,,,,,,,,,,,,,,"Coursera,edX,Other",,11 - 39 hours,,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,"Operations Research Practitioner,Other",University courses,50,50,0,0,0,0,"Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,No,Yes,Other,Poorly,Self-employed,Google Cloud Compute,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website",Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,edX,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,I don't write code to analyze data,Other,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important +Male,Pakistan,36,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,R,GitHub,"Company internal community,Conferences,Kaggle,Textbook",,,,Somewhat useful,Somewhat useful,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R",,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Often,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,SVMs,Text Analytics",,,Most of the time,,,,,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,Often,Most of the time,,,,,50,25,25,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,,,,,Sometimes,,,,,,,,,,Often,,Often,,Less than 10% of projects,Approximately half internal and half external,IT Department,Scraping and Parsing Web Resources,Authenticity issues,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Git,Rarely,30000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,South Korea,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Support Vector Machines (SVM),Java,"University/Non-profit research group websites,Other","College/University,YouTube Videos",,,Somewhat useful,,,,,,,,,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,A social science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Physics,Less than a year,I haven't started working yet,Self-taught,25,25,10,10,25,5,Unsupervised Learning,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Other,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Other",Work,20,0,80,0,0,0,"Natural Language Processing,Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Time Series Analysis",Often,,Often,,Sometimes,Often,Most of the time,,,,,,,Sometimes,Often,Often,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,Often,,,,20,20,5,20,35,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,,,Sometimes,,Often,,,76-99% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,100000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,Python,Google Search,"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Miner,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,20,0,5,5,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,Most of the time,,,Sometimes,,,Most of the time,,,,"A/B Testing,Association Rules,Data Visualization,Lift Analysis,Logistic Regression,Random Forests,Segmentation",Most of the time,Often,,,,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,Sometimes,,,Most of the time,,,,,,,,20,5,10,5,60,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,Often,,,,,,,,,,,,,,,,,Often,,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Share Drive/SharePoint",,Subversion,Rarely,1800000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Colombia,29,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"College/University,Conferences,Friends network,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Not Useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,Not Useful,Not Useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer",Self-taught,30,30,20,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,I don't know,Stayed the same,Less than one year,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Always,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","KNIME (free version),Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Random Forests,Segmentation,Text Analytics",,,Sometimes,,,,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools",Sometimes,Often,,,Often,Often,,,Often,,,,Most of the time,,,,,,,,,,26-50% of projects,More internal than external,Business Department,BID,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,4000000,COP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,South Korea,46,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,GitHub,Newsletters,,,,,,,,Somewhat useful,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,Business Analyst,Self-taught,30,0,50,20,0,0,"Adversarial Learning,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,Markov Logic Networks,"Minitab,R",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Neural Networks",Sometimes,,,,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,30,30,10,10,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Lack of data science talent in the organization",,,,Sometimes,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,google,dirty data gathering,Graph (e.g. GraphBase/Neo4j),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Association Rules,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,20,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,NoSQL,Python,R,Tableau",,,,,,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Prescriptive Modeling,Time Series Analysis",Rarely,Rarely,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,50,10,10,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,IT Department,Only proprietary,Unavailability ,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Bitbucket,,2250000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,India,40,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,37,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (free version),,R,GitHub,"Blogs,Podcasts,YouTube Videos",,Very useful,,,,,,,,,,,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,30,1,1,1,37,Time Series,Bayesian Techniques,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Decision Trees","Cloudera,Impala,NoSQL,Python,R,RapidMiner (commercial version),SQL,Tableau",,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Random Forests",,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,40,20,5,10,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Other,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites,Other","Arxiv,Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Personal Projects",Very useful,Very useful,,Somewhat useful,,,Very useful,,Not Useful,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Researcher,Other",Kaggle competitions,30,0,40,0,30,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Other",Rarely,1GB,"CNNs,GANs,Neural Networks,Random Forests,RNNs,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis,Other",,,,Most of the time,,Often,Most of the time,,,,Often,,,,,,,,,Most of the time,Often,,Sometimes,,Often,Rarely,Sometimes,Rarely,,Sometimes,Sometimes,,,10,35,5,5,20,25,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",,,,,Sometimes,,,,,,,Sometimes,,Often,,,,,,Most of the time,Rarely,Often,51-75% of projects,More internal than external,Other,MIRFlickr; Omniglot; MSCOCO; Cityscapes; Labeled Faces in the Wild; AI Gym environments; NASA exoplanet database,Baseline determination,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Network drives,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,"10,000,000",JPY,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,Hadoop/Hive/Pig,Time Series Analysis,Python,Government website,Personal Projects,,,,,,,,,,,,Not Useful,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,30,20,20,30,0,0,Time Series,Hidden Markov Models HMMs,A bachelor's degree,Academic,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Never,,Neural Networks,"C/C++,Python,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,0,0,0,0,0,100,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,None,,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Never,540000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Colombia,69,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",12,85,0,0,3,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Other,29,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by college or university,DataRobot,Association Rules,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Non-Kaggle online communities,Textbook",,,Somewhat useful,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Statistician,University courses,0,0,0,100,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,NA,I prefer not to say,,,,,,,,,,,,Podcasts,,,,,,,,,,,,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,100,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,38,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Java,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,No,Professional degree,,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Female,India,25,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,Less than a year,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,40,0,0,40,,,A bachelor's degree,Technology,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,100MB,,"Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,Often,,,Often,,,,Often,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,None,,IT Department,,building code,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,300000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,I did not complete any formal education past high school,,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,India,30,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,No Free Hunch Blog,1-2 years,Necessary,Unnecessary,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,India,26,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,GitHub,"Blogs,College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,,Very useful,,,,,Very useful,Somewhat useful,,Very useful,,,,,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,Business Analyst,Self-taught,50,20,20,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Malaysia,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook,Other",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"edX,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Female,India,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Bayesian Methods,Other,Other,"College/University,YouTube Videos",,,Very useful,,,,,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,,,Other,2 - 10 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,,,,,,,,,,,,,,, +Female,South Korea,29,Employed full-time,,,No,Yes,Researcher,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by government",Python,Genetic & Evolutionary Algorithms,SQL,University/Non-profit research group websites,"College/University,Conferences",,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,20,0,0,80,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important +Female,United States,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Factor Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Trade book,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,3 to 5 years,Business Analyst,Work,20,50,30,0,0,0,,,High school,Internet-based,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees","Amazon Web services,Python,R,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Lift Analysis,Logistic Regression,Natural Language Processing",Sometimes,,Rarely,,,,Most of the time,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,,,,,25,10,10,20,35,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,,,Often,,,,,,,,,,,Sometimes,Most of the time,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,85000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,"College/University,Company internal community,Personal Projects,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Scientist,Statistician",Self-taught,60,10,20,10,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Financial,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,HMMs,Markov Logic Networks,Regression/Logistic Regression","C/C++,Microsoft Excel Data Mining,NoSQL,Python,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,HMMs,Logistic Regression",,,Most of the time,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,,60,10,20,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Scaling data science solution up to full database",,,,,Most of the time,,,,,Often,,,,,Sometimes,,,Sometimes,,,,,51-75% of projects,,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,,Rarely,200000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Mexico,42,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website",Other,,,,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,,6 to 10 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +"Non-binary, genderqueer, or gender non-conforming",United States,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,Bayesian Methods,Python,University/Non-profit research group websites,"Arxiv,College/University,Conferences,Official documentation,Personal Projects",Very useful,,Very useful,,Very useful,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,70,0,30,0,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,"HMMs,Markov Logic Networks,Neural Networks","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,,,Most of the time,Sometimes,,,,,,Sometimes,Sometimes,,,,,Often,Sometimes,Sometimes,,,,,,,,Often,,,,,10,10,0,5,0,75,Enough to refine and innovate on the algorithm,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Sometimes,Sometimes,,51-75% of projects,More external than internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,100000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses",,,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,Programmer,University courses,45,0,0,40,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A professional degree,Other,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,RapidMiner (commercial version),Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Naive Bayes,Text Analytics",,Sometimes,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,40,15,10,10,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,,,,51-75% of projects,Entirely internal,Business Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,"60,000",,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,28,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Social Network Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Very useful,,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,,Predictive Modeler,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,"Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,,University courses,50,0,40,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Mix of fields,Fewer than 10 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Perl,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,,,Most of the time,Most of the time,Most of the time,Often,,,,,,,Often,,Rarely,,,Sometimes,,Sometimes,,,,,,,,,,,40,30,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,Sometimes,,,,,,,,Sometimes,,,,,Often,,,,76-99% of projects,More internal than external,Standalone Team,US Census; American Community Survey,people who answer surveys are inherently different than those who do not,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,58000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,"Data Analyst,Engineer,Researcher",Other,20,20,0,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,,,,,Very important,Other,"Basic laptop (Macbook),Other",Other,Rarely,,Regression/Logistic Regression,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,Natural Language Processing",,,,,,,,,,,,,,Rarely,,Rarely,,,Rarely,,,,,,,,,,,,,,,65,5,0,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Other",,,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,26-50% of projects,Do not know,Other,,Getting data from sensor team/getting permission/training to access historian,,Email,,,Always,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Jupyter notebooks,Monte Carlo Methods,Python,Google Search,"Arxiv,College/University,Conferences,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,Not Useful,Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,I haven't started working yet,University courses,15,5,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,Decreased slightly,More than 10 years,,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Image data,Relational data",Never,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,Random Forests,Regression/Logistic Regression,Other","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,Most of the time,,,Most of the time,Most of the time,Sometimes,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,,,,Often,,Sometimes,,,,Often,Sometimes,,,,,,15,55,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,76-99% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,25000,,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,Very useful,Very useful,"Data Elixir Newsletter,Data Stories Podcast,Partially Derivative Podcast",< 1 year,,,Necessary,,,Necessary,,,,,,,,edX,,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,Self-taught,50,0,30,0,0,20,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Podcasts,Textbook,YouTube Videos",,,,,,,,,,,,,Very useful,,Very useful,,,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,SAS Base,Tableau,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,,Often,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,"Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,,,Sometimes,,,,Sometimes,,,,Often,,,,,,,Often,,,,,,,,,,,50,10,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,26-50% of projects,More internal than external,Business Department,Service provider data,Joining and aggregating from multiple source systems - not everything is in the data warehouse or even in a similar format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,3000000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,Other,Self-taught,40,5,5,45,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Never,10TB,,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,Often,,,,,,,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,0,0,20,40,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,Often,,Most of the time,Sometimes,,Sometimes,,,,,Often,,Often,,Sometimes,Often,Most of the time,,Sometimes,,Less than 10% of projects,Entirely internal,Standalone Team,,Data is logged in bizarre formats that are hard to consume. ,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"160,000",USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Kenya,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by non-profit or NGO,SQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Other,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,GANs,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,Often,,Often,Often,,,,Rarely,,,,,Often,,,Sometimes,Often,Sometimes,,,,Often,Often,,,Often,Sometimes,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,Other",,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,100% of projects,Entirely external,Other,Government Open data initiative; Earth Observation data from NASA and ESA; IoT sensor data,Lack of affordable computing resources to process large datasets,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,35000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Data Stories Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,6 to 10 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,10,40,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,Not very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,1GB,"CNNs,GANs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,Sometimes,,Rarely,,,,Often,Often,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Rarely,,,,Most of the time,,,,,,"Association Rules,CNNs,Data Visualization,GANs,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,RNNs,Text Analytics,Time Series Analysis",,Sometimes,,Often,,,Often,,,,Sometimes,,,,,Often,,,Often,Most of the time,Often,Sometimes,,,Often,,,,Most of the time,Most of the time,,,,20,40,5,10,0,25,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,Often,,,,,Most of the time,,,,,Sometimes,,51-75% of projects,More external than internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Other",Company File Server,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,65000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +A different identity,Other,11,Retired,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Python,Time Series Analysis,R,University/Non-profit research group websites,"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,10,70,0,0,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - RNNs",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Not Useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,35,0,0,50,15,0,Computer Vision,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Not important +Male,Australia,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Julia,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,,,"Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Jupyter notebooks,Python,Tableau,TensorFlow,Unix shell / awk",Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,Sometimes,,Sometimes,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,Often,Often,Often,,,,,Sometimes,,Sometimes,,Often,,,,,Often,,Often,Often,,Sometimes,,Sometimes,Often,Often,,,,30,40,0,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Most of the time,,,,,Sometimes,,Sometimes,,26-50% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,AUD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Very useful,,,Very useful,,,,KDnuggets Blog,< 1 year,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,"Coursera,edX","Basic laptop (Macbook),Workstation + Cloud service",11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,,,,Very Important,,,,,,Very Important,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Podcasts",,,Very useful,,,,Very useful,,,,Very useful,,Very useful,,,,,,,< 1 year,,Nice to have,,,,Necessary,Necessary,,Necessary,Necessary,,,,"DataCamp,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,40,25,0,30,5,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,24,Employed full-time,,,No,Yes,Other,Fine,Employed by company that makes advanced analytic software,SAP BusinessObjects Predictive Analytics,Deep learning,R,"Google Search,Government website","College/University,Newsletters,Non-Kaggle online communities",,,Very useful,,,,,Very useful,Very useful,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,,Necessary,Necessary,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,20,20,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,, +Female,United States,40,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Other,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Personal Projects,Stack Overflow Q&A,Trade book",,,,,,,,,,,,Very useful,,Somewhat useful,,Very useful,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Other",Self-taught,10,70,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,100 to 499 employees,Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,,,"Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization",,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,5,5,5,15,10,60,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,Most of the time,,,,,,,,,,,,Often,Most of the time,,Less than 10% of projects,Entirely internal,Standalone Team,None,Availability of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,77250,USD,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Other,57,"Independent contractor, freelancer, or self-employed",,,No,Yes,Researcher,Fine,Employed by college or university,,,,,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,No,Master's degree,A humanities discipline,I don't write code to analyze data,"Researcher,I haven't started working yet",Other,100,0,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,,,SAP BusinessObjects Predictive Analytics,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Python,RapidMiner (free version),Spark / MLlib",,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,Often,Often,,,,Most of the time,Most of the time,,,,Most of the time,,,,Often,,Most of the time,Most of the time,,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,910000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Sweden,40,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Programmer,Researcher",Self-taught,70,10,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,Python,R,SQL,Unix shell / awk,Other",,,,Rarely,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,Often,,,"Association Rules,CNNs,Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,Sometimes,,Most of the time,,Often,,,,,,,,Often,,Often,Sometimes,Sometimes,,,Often,,,,Sometimes,Often,Often,Often,,,,20,40,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Often,,,,Sometimes,,,Sometimes,,Often,,Sometimes,,,,Often,,Often,,Sometimes,,10-25% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,47000,SEK,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,GitHub,"College/University,Conferences,Kaggle,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,,3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,3 to 5 years,I haven't started working yet,Self-taught,50,0,0,45,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,SQL,"Google Search,Government website","College/University,Conferences,Kaggle,Textbook",,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Researcher,Statistician",University courses,10,10,50,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression","C/C++,IBM Cognos,IBM SPSS Modeler,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SAS Enterprise Miner,SQL,Other",,,,Rarely,,,,,,Often,Sometimes,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,Rarely,,,,,Rarely,Sometimes,,,Often,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Often,,,,,,Often,,Sometimes,,,,,,,,,,,,,Often,Often,,,,70,5,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,Often,Sometimes,,Often,,,,,,,,,,,Most of the time,Often,,,26-50% of projects,More internal than external,Other,FICO,No data governance,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,83000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,24,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,College/University,Friends network,Official documentation,Online courses,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,Very useful,,,,Very useful,Very useful,,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,edX,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,A social science,,"Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Operations Research Practitioner,Researcher",University courses,10,10,20,30,20,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,,,,,,,"C/C++,Python,R,SAS Base,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Often,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,,Often,,Most of the time,,,Sometimes,,,,Most of the time,,,,0,0,0,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT",Most of the time,Often,Often,Sometimes,,,,Most of the time,,,Often,,,,Most of the time,,,,,,,,100% of projects,Entirely internal,Central Insights Team,,,,,,,,80000,USD,,5,,,,,,,,,,,,,,,,,, +Male,United States,58,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,edX,Udacity",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Management information systems,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Japan,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Self-employed,Python,Deep learning,Python,Other,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,10,10,80,0,0,0,Unsupervised Learning,Neural Networks - CNNs,,Technology,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Sometimes,10GB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Neural Networks,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,Often,,,,,20,20,35,15,10,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,9000000,JPY,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Friends network,Kaggle,Official documentation,Personal Projects",,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist",Self-taught,50,0,30,10,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,100GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Rarely,,Sometimes,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems",Often,,,,Often,Most of the time,,,Most of the time,,,Often,,,,Sometimes,,,Often,,,,Sometimes,Often,,,,,,,,,,10,5,5,0,5,75,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,,,,Rarely,,,,,,,,,,Rarely,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Other,s3,Git,Rarely,210000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Regression,C/C++/C#,Google Search,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Engineer,,0,0,0,95,0,5,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,61,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Perfectly,Employed by college or university,Tableau,Time Series Analysis,,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Linear Digressions Podcast,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Data Analyst,Operations Research Practitioner,Predictive Modeler",University courses,0,0,0,100,0,0,,"Decision Trees - Gradient Boosted Machines,Markov Logic Networks,Neural Networks - CNNs",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,,,,Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,KDnuggets Blog,3-5 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",40+,Experience from work in a company related to ML,Yes,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",Work,50,15,25,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Not important,Very Important,Not important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Female,United States,37,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,Other,95,0,0,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,I don't know,Stayed the same,1-2 years,A general-purpose job board,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Other","Image data,Text data,Relational data",,1GB,"Regression/Logistic Regression,Other","Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,Sometimes,,,,Often,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,,,,Sometimes,Most of the time,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,50,25,0,15,10,0,Enough to tune the parameters properly,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Often,Sometimes,,100% of projects,Entirely external,Other,API; FOIA; Open gov; MTurk; google forms; web scraping;,"All the datasets are different, so the challenges are unique. From database, to survey data. They all have their limitations and idiosyncrasies! I cannot take what I learn in one project and apply it to the next. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Other",Much is also on AWS where it stays.,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Most of the time,88500,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,75,20,0,0,0,Natural Language Processing,,A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,Less than one year,An external recruiter or headhunter,,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Text data",Rarely,1GB,"Decision Trees,Random Forests","Amazon Web services,Hadoop/Hive/Pig,NoSQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Recommender Systems",Most of the time,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,40,5,5,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",Most of the time,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,IMDB,It's not enough. The company has started gathering data for about an year.,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,60,20,19,0,1,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,,Internet-based,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM Watson / Waton Analytics,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,,,,,Sometimes,,Often,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Naive Bayes,Random Forests",,,,,,,Often,Most of the time,,,,Most of the time,,,,,,Often,,,,,Most of the time,,,,,,,,,,,70,10,0,20,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,Most of the time,,,,Sometimes,,,,,,,,,Often,,,,,51-75% of projects,Entirely external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,22,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Linear Digressions Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,South Korea,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,49,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Other,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",,,,,,,,,,,Very useful,Very useful,Very useful,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Financial,20 to 99 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10GB,Regression/Logistic Regression,"IBM SPSS Modeler,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL,Tableau",,,,,,,,,,,Rarely,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Lift Analysis,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Rarely,,,,,,,,Sometimes,Rarely,,,,,,,,,,Often,,,,Rarely,,,,60,0,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,Often,,,,,,,,Often,,,Often,,,,Less than 10% of projects,More internal than external,Other,Credit bureau data,Quality control,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"130,000",,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,30,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Internet-based,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Other","Python,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Recommender Systems",Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,Most of the time,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,170000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Argentina,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,Outlier detection (e.g. Fraud detection),"Evolutionary Approaches,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,Singapore,23,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,,"FastML Blog,Jack's Import AI Newsletter,KDnuggets Blog",1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,30,10,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Romania,19,"Not employed, but looking for work",,,,,,,,Tableau,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,15,0,80,0,0,Computer Vision,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Nigeria,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Other,,"Blogs,College/University,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,Very useful,Very useful,,,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Telecommunications,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Don't know,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,Most of the time,,,,Most of the time,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Sometimes,,,Often,Often,,,,Sometimes,Often,,,,,Most of the time,Often,,100% of projects,More external than internal,Business Department,"Market Intelligence Data, Price monitoring",Mostly dirty,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Bitbucket,Git",Always,"6,000,000",NGN,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,University/Non-profit research group websites,"College/University,Online courses,Personal Projects,Textbook",,,Very useful,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,"Data Machina Newsletter,KDnuggets Blog,The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,30,30,40,0,0,Reinforcement learning,"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,South Korea,46,Employed full-time,,,Yes,,Researcher,Fine,,Hadoop/Hive/Pig,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,15,0,15,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Financial,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Never,10GB,CNNs,"MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,50,30,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,,,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,Less than 10% of projects,Entirely external,Business Department,hospitals;,hard to get annotated data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,60000000,KRW,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,66,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,RapidMiner (commercial version),Social Network Analysis,R,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A humanities discipline,More than 10 years,Researcher,University courses,40,0,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service,Other",Relational data,Most of the time,100MB,"Markov Logic Networks,Random Forests,Regression/Logistic Regression","Mathematica,Microsoft Excel Data Mining,SAS JMP,SQL,Tableau,TIBCO Spotfire,Other,Other,Other",,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,Often,,Most of the time,,Most of the time,Most of the time,Most of the time,"Cross-Validation,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,30,20,5,5,40,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,South Korea,31,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Laptop or Workstation and local IT supported servers,,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,25,Employed full-time,,,No,Yes,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Other,1 to 2 years,Data Analyst,Self-taught,50,5,40,1,3,1,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,,,, +Male,Poland,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,44,Employed full-time,,,Yes,,Machine Learning Engineer,,Employed by professional services/consulting firm,Weka,,,University/Non-profit research group websites,Tutoring/mentoring,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,20,0,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Hidden Markov Models HMMs,,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,,,"Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,HMMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,0,20,40,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,,,,,,,,,,,,,5,,,,,,,,,,,,,,,,,, +Female,Singapore,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Text Mining,Python,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",Sometimes,Sometimes,,,,,,,Sometimes,,,,,,Often,,Often,,,,Rarely,,,,,,Rarely,,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",Often,,,,Often,,Often,Often,,,,,,,,,,,Often,,,,Often,Often,,,Often,,Often,Often,,,,50,30,0,20,0,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Often,,,,,,,,Often,,,,,,,Often,,,Often,,51-75% of projects,More external than internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,-,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Australia,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,27,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects",,Very useful,Somewhat useful,,,,Very useful,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",Self-taught,25,10,35,20,10,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",,Telecommunications,"10,000 or more employees",Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,Lift Analysis,Logistic Regression",,,,,,Often,,Most of the time,,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,50,15,10,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",Sometimes,,,,,,,,,,Sometimes,,Often,,Most of the time,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,30000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Regression,Python,Google Search,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Business Analyst,Work,10,30,60,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Often,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,Often,,,,,,,"A/B Testing,Association Rules,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",Often,Often,,,,,Most of the time,,,,,,,Sometimes,Often,Sometimes,,,,,Rarely,,,,,Often,,,,Sometimes,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Often,,Often,,Often,Often,,,,Sometimes,Often,,,,Sometimes,,Sometimes,Sometimes,,100% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,1420000,INR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,,,Somewhat useful,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Scientist,University courses,25,10,10,35,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,Unix shell / awk",,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,,Often,Often,,,,Often,,Often,,Often,,Often,Most of the time,,Often,,Often,,,,,,Most of the time,,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,Need to coordinate with IT,,,,,,,,,,,,,,,Rarely,,,,,,,,51-75% of projects,Do not know,Business Department,,Converting data into products,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,1000000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Africa,36,Employed full-time,,,Yes,,Data Analyst,,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,0,0,50,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Rarely,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,Often,,,,,,,,,Often,,,,,,Often,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",Often,,,,Often,Often,Often,Often,Often,,,Often,,Often,,Often,,,Often,Often,Often,,Often,Often,,,,,Often,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,26-50% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Bitbucket,Rarely,720000,ZAR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,MATLAB/Octave,Deep learning,R,Google Search,"College/University,Online courses,YouTube Videos,Other",,,Very useful,,,,,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","KNIME (free version),MATLAB/Octave,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,Most of the time,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",,Often,,,,Often,Most of the time,Sometimes,Often,,,,,Often,,Sometimes,,,,,Sometimes,Often,Often,,,,Most of the time,,Sometimes,Most of the time,,,,50,30,0,15,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Unavailability of/difficult access to data",Most of the time,,,Sometimes,Often,Often,,Rarely,Sometimes,Often,Sometimes,,,,,,,,,,Most of the time,,100% of projects,More internal than external,Business Department,"IHS Global Insights, CEB",,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,3500000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,C/C++,Neural Nets,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,Very useful,"Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Researcher",University courses,35,0,5,60,0,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Text data,Rarely,,"Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Python",,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,SVMs,Text Analytics",Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,,,Often,Sometimes,,,,,,,,Sometimes,Often,,,,,30,5,0,25,40,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,Sometimes,,,,,,,,Often,,,51-75% of projects,More internal than external,IT Department,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,28000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Python,Time Series Analysis,R,Google Search,"Blogs,Company internal community,Conferences,Friends network,Newsletters,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Not Useful,Somewhat useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,70,20,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,500 to 999 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,Most of the time,Often,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,Rarely,,Rarely,,,,,,,,,,,0,20,20,60,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,Often,Sometimes,Most of the time,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,None,Variable reduction,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,770000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,0,40,0,30,0,30,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,IBM Watson / Waton Analytics,Proprietary Algorithms,Java,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Data Analyst,Self-taught,40,0,30,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Ensemble Methods",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10PB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,Cloudera,Flume,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,NoSQL,SQL,Tableau,Unix shell / awk",,,,Rarely,Often,,Often,,Most of the time,,,,Often,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"Association Rules,Data Visualization,Lift Analysis,Natural Language Processing,Text Analytics,Time Series Analysis",,Often,,,,,Most of the time,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,Sometimes,,,,30,10,20,20,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,363000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,,,Very useful,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,Data Scientist,Work,40,25,25,5,5,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Other",Always,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Python,SQL",,Sometimes,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs",,Sometimes,Most of the time,,,Most of the time,Often,Often,,,,,,Most of the time,,Often,,Most of the time,Most of the time,Often,Sometimes,,,Rarely,,,,Most of the time,,,,,,5,60,10,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,Sometimes,,,Often,,,,Sometimes,,,,,,,,,Most of the time,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,75000,INR,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer",Self-taught,70,0,30,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,United States,22,Employed part-time,,,No,Yes,Researcher,Poorly,Employed by college or university,Spark / MLlib,Neural Nets,Python,Google Search,"College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,,,,,,,,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,30,0,0,70,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Russia,26,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Very useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,Data Stories Podcast,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,Less than a year,Programmer,Self-taught,60,20,0,10,10,0,"Natural Language Processing,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Taiwan,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,15,0,80,5,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Scientist",University courses,40,10,30,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,Most of the time,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Often,,,Often,Most of the time,Sometimes,Sometimes,,,,,Often,Often,Often,,Sometimes,Rarely,Sometimes,,Often,Sometimes,Sometimes,,Most of the time,Often,Sometimes,Sometimes,Often,,,,30,10,10,10,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Often,Sometimes,Often,,Most of the time,Often,,,,,,,Sometimes,Often,Often,,,,,,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,Appropriate instrumentation for the exact phenomenon we're trying to measure.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,215000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Perfectly,Self-employed,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Not Useful,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,,Very useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher",Work,20,30,40,0,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,Sometimes,,,,,,,,,,,,Often,,Often,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,,Often,Most of the time,Most of the time,Sometimes,,,,,,Most of the time,,,,Often,Most of the time,Often,Often,Most of the time,Sometimes,Often,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,50,20,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,Often,,,Most of the time,,,Often,,Sometimes,,Most of the time,,Often,,Most of the time,,,26-50% of projects,More internal than external,Standalone Team,"IMDB review, Twitter, Bestbuy review data, Amazon review data, Credit Card Fraud Data, USPS, MNIST, GUN Point, Physionet, Stock Market (S&P), UBIRIS, CASIA, Prof. Wang's Image Data, Chicago crime data, Sanfrancisco crime data, Titanic Dataset",,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,900000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,"Engineer,Other",Other,80,5,15,0,0,0,Computer Vision,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Always,100TB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Often,,Often,,,,,,,,,,,0,70,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Often,100% of projects,Entirely internal,Standalone Team,"Cifar, kaggle ",Expense to label images for supervised learning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,75000,USD,Other,8,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,< 1 year,,,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,42,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Regression,,,"Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,100,0,0,0,0,0,Computer Vision,,,Other,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Traditional Workstation","Image data,Text data",Most of the time,10MB,,"Jupyter notebooks,Minitab,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Segmentation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,20,20,0,50,10,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Sometimes,,Often,Sometimes,,,,,,,,,,,,Most of the time,,,Often,,76-99% of projects,Entirely internal,Other,none,cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,130000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,"Other,I haven't started working yet",Kaggle competitions,30,25,0,15,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Academic,I don't know,,,,Somewhat important,Other,Laptop or Workstation and private datacenters,Other,Always,100MB,"Random Forests,Regression/Logistic Regression","R,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,Sometimes,,,"Cross-Validation,Random Forests,Simulation,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,Most of the time,,,,15,35,15,35,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,,,,,100% of projects,Approximately half internal and half external,Other,NASA Kepler mission data,Computing time and cluster deployment,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,28000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Python,"Google Search,Government website","Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,,< 1 year,,Nice to have,,,Nice to have,Nice to have,Nice to have,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,Time Series,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Textbook",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Computer Vision,"Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Technology,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Always,100GB,"CNNs,RNNs","Amazon Web services,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Neural Networks,Random Forests,RNNs",,,,Often,,,,,,,,,,,,,,,,Most of the time,,,Often,,Often,,,,,,,,,40,30,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Do not know,Standalone Team,,Training data preparation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,200000,USD,,9,,,,,,,,,,,,,,,,,, +Male,India,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,SAS Base,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Data Analyst,Self-taught,50,25,0,0,25,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Don't know,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","KNIME (free version),Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Often,,,,,,,,Often,,,Sometimes,,,Often,Often,,,Often,,,Often,Sometimes,,,,50,10,0,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,Often,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,600000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,India,41,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses",,Very useful,,,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,DataCamp,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Kaggle competitions,40,10,10,0,40,0,"Recommendation Engines,Other (please specify; separate by semi-colon)",,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Other,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Udacity,Other",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Brazil,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,,Very useful,,,,Very useful,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,University courses,35,0,0,50,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Other,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,,100MB,,"Amazon Web services,Python,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,Rarely,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,25,0,20,50,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Sometimes,,Sometimes,,,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Always,15000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,India,43,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Text Mining,R,I collect my own data (e.g. web-scraping),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,I don't write code to analyze data,Other,University courses,30,20,10,40,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Academic,500 to 999 employees,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Rarely,1MB,Decision Trees,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,30,20,20,20,0,10,Enough to explain the algorithm to someone non-technical,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Other,Sometimes,"75,000",RUB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +A different identity,South Korea,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,Tableau,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by college or university,,,,,"Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Unsupervised Learning,Neural Networks - CNNs,A doctoral degree,Mix of fields,100 to 499 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Never,1MB,"Neural Networks,RNNs","Amazon Web services,C/C++,Java,Python,R,TensorFlow",,Rarely,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Often,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,40,20,10,10,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,None,Entirely internal,Standalone Team,,,,,,,,1400000,INR,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Australia,29,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Researcher,Self-taught,75,0,10,10,5,0,"Computer Vision,Natural Language Processing",Neural Networks - CNNs,A bachelor's degree,Academic,20 to 99 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,SQL",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Naive Bayes,Neural Networks",Sometimes,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,65,10,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,Often,Often,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,30000,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,70,0,0,30,0,0,"Computer Vision,Time Series",Neural Networks - RNNs,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,No,Yes,Statistician,Fine,Employed by college or university,Orange,Time Series Analysis,R,University/Non-profit research group websites,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Other,Sort of (Explain more),Doctoral degree,,6 to 10 years,Other,Self-taught,60,40,0,0,0,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Not important,Not important,Somewhat important +Female,United States,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Other,"Arxiv,College/University,Conferences,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",< 1 year,,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,Yes,Master's degree,A social science,1 to 2 years,"Business Analyst,Programmer",University courses,5,35,0,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Japan,26,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Personal Projects,Textbook,Tutoring/mentoring",,Very useful,Somewhat useful,,Very useful,,Very useful,,,,,Somewhat useful,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Business Analyst,University courses,40,5,15,40,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Mix of fields,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,Sometimes,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,Sometimes,Often,Rarely,,Most of the time,Most of the time,Sometimes,,,,,,,,Often,,Sometimes,Most of the time,Sometimes,Most of the time,,Sometimes,,Most of the time,,,,,,,,,30,20,10,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Turkey,34,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Text Mining,Python,University/Non-profit research group websites,"Blogs,College/University,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter",3-5 years,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,20,30,26,4,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,United States,43,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,"Government website,University/Non-profit research group websites","Conferences,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,,,,Very useful,,Somewhat useful,,,,Very useful,,Very useful,Very useful,,Very useful,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",70,25,0,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Retail,"10,000 or more employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,R,SQL",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",,,Rarely,,,Often,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,Sometimes,,Sometimes,,,,Sometimes,Often,,,Often,,Rarely,,Most of the time,,,,60,30,5,1,4,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",Most of the time,Most of the time,,,Most of the time,Sometimes,,Often,Most of the time,,,,,Sometimes,Most of the time,,,Most of the time,,,,,Less than 10% of projects,More internal than external,Other,Nielsen; AggData,Inability to match observations across data sets,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,95000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Kaggle,Newsletters,Online courses,Podcasts",Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,,,,,,"Emergent/Future Newsletter (Algorithmia),Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Technology,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,10GB,"CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Natural Language Processing,Recommender Systems,Text Analytics,Time Series Analysis",,,,Sometimes,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,Most of the time,Sometimes,,,,15,25,35,10,15,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Often,,,,Most of the time,,,10-25% of projects,More external than internal,Business Department,Imagenet,Size,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,7500000,INR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Non-Kaggle online communities,Personal Projects",,,,,,,Very useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Business Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,30,0,30,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,Financial,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Rarely,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,Rarely,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Most of the time,,,,Sometimes,Often,,Often,,,Often,,,,Most of the time,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,Often,,,,Often,,,Often,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,170000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,20,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Conferences,Newsletters,Online courses,YouTube Videos",,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Self-taught,40,40,0,20,0,0,,Logistic Regression,High school,Mix of fields,"10,000 or more employees",,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,Traditional Workstation,Text data,Don't know,<1MB,Regression/Logistic Regression,"IBM SPSS Statistics,R",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Logistic Regression,Naive Bayes,SVMs",,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,Sometimes,,,,,,40,30,0,10,20,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,28,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Scientist,Programmer,Researcher",University courses,0,20,0,80,0,0,Computer Vision,"Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,Decision Trees,"Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,50,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,Rarely,,,Often,,,,Most of the time,,,,,,Most of the time,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,60000,TRY,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Sweden,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Researcher,Self-taught,50,30,20,0,0,0,,"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,500 to 999 employees,,1-2 years,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Sometimes,100GB,"CNNs,Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks,NoSQL,Python,Unix shell / awk,Other",,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Other,,Dirt and changing formats. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,50000,SEK,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Vietnam,23,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,19,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Not Useful,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,"Data Machina Newsletter,FastML Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,50,25,0,0,25,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Not Useful,Somewhat useful,,,Not Useful,,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,Partially Derivative Podcast",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,7,3,30,0,"Computer Vision,Recommendation Engines","Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,36,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,"Coursera,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,No,Master's degree,Other,1 to 2 years,"Business Analyst,Engineer,Statistician,Other",University courses,0,60,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Singapore,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist",Self-taught,20,0,0,0,80,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,CRM/Marketing,"10,000 or more employees",Stayed the same,1-2 years,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,KNIME (free version),Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,Sometimes,,Rarely,,,,,,,Most of the time,,,Rarely,Often,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",,,,,,Most of the time,Most of the time,,,,,,,,,Sometimes,,Sometimes,Most of the time,Often,,,,,,,,,Most of the time,,,,,40,20,20,20,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,More external than internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,Bitbucket,Never,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Female,United States,37,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Other",University courses,20,10,40,25,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important +Female,United States,27,"Not employed, but looking for work",,,,,,,,Java,Decision Trees,C/C++/C#,Google Search,"College/University,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,< 1 year,Necessary,Necessary,Necessary,Necessary,,,,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Other,I don't write code to analyze data,I haven't started working yet,Self-taught,60,0,0,40,0,0,Reinforcement learning,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Somewhat important,,Very Important,Very Important,,Very Important,Somewhat important,Very Important,,Not important,Somewhat important,,Very Important,Very Important +Male,India,29,"Not employed, but looking for work",,,,,,,,Python,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer",Self-taught,65,25,0,5,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Other,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Matlab,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Necessary,Nice to have,,Necessary,Necessary,Nice to have,,,Necessary,,,,edX,Basic laptop (Macbook),,Other,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,0,50,0,50,0,0,Natural Language Processing,Support Vector Machines (SVMs),High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,28,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,SQL,Bayesian Methods,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Statistician,I haven't started working yet",Self-taught,60,10,10,10,0,10,Time Series,Logistic Regression,No education,Government,20 to 99 employees,Decreased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,Bayesian Techniques,Oracle Data Mining/ Oracle R Enterprise,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,65,5,5,10,10,5,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,0,INR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by non-profit or NGO,Flume,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Statistician",Self-taught,40,25,25,5,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,CRM/Marketing,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Perl,Python,R,SAS Enterprise Miner",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,,,Rarely,,,,,,,,,,,,,"Cross-Validation,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,Often,,,,,,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,Often,,,,,Often,Most of the time,,,,,30,20,10,20,15,5,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning",,Sometimes,,,Often,Sometimes,,,,,,Most of the time,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,amazon product reviews;IMDB movie reviews;UCI ML repo,Data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,500000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Perl,Link Analysis,R,University/Non-profit research group websites,"College/University,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,University courses,30,10,20,40,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Random Forests,SVMs",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,65,10,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,Often,Most of the time,,Often,,Rarely,,51-75% of projects,More internal than external,Standalone Team,Too much to describe,Opaque and changing data standards,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,Secure access to local Unix based file systems,"Bitbucket,Git",Never,60000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Other,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,edX,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,,,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,40,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Employed by college or university,R,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Not Useful,,,,,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp",Traditional Workstation,0 - 1 hour,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Not important +Male,Taiwan,35,"Not employed, but looking for work",,,,,,,,DataRobot,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,A health science,1 to 2 years,"Computer Scientist,Data Scientist,Operations Research Practitioner,Programmer,Statistician",Self-taught,60,20,0,0,20,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Time Series","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,75,25,0,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Bayesian Techniques,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,70,25,5,0,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,1700000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Other",Self-taught,70,0,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Rarely,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,Often,Often,,Often,Often,,,Often,,Often,,Often,,Often,,Often,Often,Often,Often,Often,,Often,Often,Often,Often,Often,,,,55,20,5,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,Often,,Most of the time,Often,,,,Sometimes,Often,,Sometimes,Often,Often,,Sometimes,,,Often,Often,,10-25% of projects,,IT Department,,Access,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,131000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",15,30,20,25,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Naive Bayes,Neural Networks",,Most of the time,,,,,Most of the time,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,20,35,0,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools",Most of the time,,,,,,,,Often,Most of the time,,,Most of the time,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Official documentation,Online courses,Personal Projects",Very useful,,Very useful,,Very useful,,,,,Very useful,Very useful,Very useful,,,,,,,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,49,0,0,51,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data",,10GB,Other,"C/C++,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,Often,,,,,,,,Often,,,,,Often,,,,,,,None,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,80000,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search","Blogs,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,"Data Stories Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Programmer,Self-taught,100,0,0,0,0,0,Computer Vision,Support Vector Machines (SVMs),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,23,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Genetic & Evolutionary Algorithms,R,Google Search,"Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,Less than a year,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Statistician",Self-taught,30,20,0,30,NA,20,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Pakistan,29,"Independent contractor, freelancer, or self-employed",,,No,Yes,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Master's degree,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",University courses,10,50,10,25,5,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Julia,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,High school,Mix of fields,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,Ensemble Methods,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests",,,Often,,,,,Most of the time,Often,,,Often,Sometimes,,,Often,,Most of the time,Often,,,,Often,,,,,,,,,,,40,20,20,10,10,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,,More external than internal,Standalone Team,kaggle;NCUS;KDD;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,480000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,65,5,5,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important +Male,India,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Bayesian Methods,R,"GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,Very useful,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"Data Elixir Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100MB,"Regression/Logistic Regression,SVMs","C/C++,Cloudera,Java,Microsoft Excel Data Mining,R,RapidMiner (free version),SQL,Tableau",,,,Often,Sometimes,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,SVMs,Time Series Analysis",,,,,,Often,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,Often,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,,,Most of the time,,,,Sometimes,,,,Often,,,,10-25% of projects,More internal than external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,750000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,Google Search,"Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,Engineer,University courses,65,0,5,30,0,0,,"Bayesian Techniques,Evolutionary Approaches",A master's degree,Mix of fields,20 to 99 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Text data,Most of the time,10MB,Evolutionary Approaches,"C/C++,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Sometimes,,,,,,Rarely,,,,"Data Visualization,Evolutionary Approaches,Neural Networks,Simulation",,,,,,,Sometimes,,,Sometimes,,,,,,,,,,Rarely,,,,,,,Sometimes,,,,,,,10,40,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,Most of the time,Often,,Often,Sometimes,Sometimes,,26-50% of projects,Approximately half internal and half external,Other,FARS,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,100000,AUD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,20,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Self-employed,IBM Watson / Waton Analytics,Support Vector Machines (SVM),R,Google Search,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Other,Sort of (Explain more),Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Business Analyst,Data Analyst,Engineer",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Time Series","Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Machine Learning Engineer,Operations Research Practitioner",University courses,5,5,60,15,15,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Text data,Relational data",,,"Bayesian Techniques,Decision Trees,HMMs,Markov Logic Networks","C/C++,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"HMMs,Markov Logic Networks,Simulation",,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,0,50,0,10,40,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,Often,,,Most of the time,,,,Sometimes,,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"170,000",CNY,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Rarely,,,"C/C++,NoSQL,Perl,Python",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,80,0,0,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,Most of the time,,,Often,Most of the time,,,,,,Most of the time,Most of the time,,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other","Commercial Data Platform,Share Drive/SharePoint",,"Git,Other",Sometimes,95000,USD,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Text Mining,C/C++/C#,University/Non-profit research group websites,"Blogs,College/University,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,Somewhat useful,,,Very useful,,,,,,Very useful,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,Researcher,University courses,20,0,30,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Financial,100 to 499 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,QlikView,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Often,,,,,,,,,,,Often,,Rarely,,,,Rarely,,,,Most of the time,,,,,,Most of the time,Rarely,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Rarely,,Rarely,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,Most of the time,Sometimes,Most of the time,Most of the time,,,,40,20,20,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,Most of the time,,,,26-50% of projects,More external than internal,Standalone Team,All data we have is private.,N/A,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,,105000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,SAS Enterprise Miner,Anomaly Detection,SAS,Government website,"Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Not Useful,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,3 to 5 years,"Business Analyst,Data Analyst,Other",Self-taught,30,30,20,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",I don't know/not sure,Other,500 to 999 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees","Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Time Series Analysis",,,Sometimes,,,,Often,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,50,10,5,10,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,Often,Most of the time,,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,Most of the time,,Most of the time,,,76-99% of projects,Entirely internal,Other,None,Years of bad business practices created systems with dirty data.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,65000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,"Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",High school,Technology,100 to 499 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,NoSQL,Python",,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,Text Analytics",,,,,,Often,,Often,,,,Often,,,,,,,,,,,,,,,,,Often,,,,,10,30,30,20,10,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,Sometimes,,,,,,Rarely,Sometimes,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Text Mining,R,,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,,,,,,Necessary,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,Business Analyst,Other,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer",Self-taught,50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Other,"10,000 or more employees",Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,SQL",,,,,Rarely,,,,Rarely,,,,,,Most of the time,,Often,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,Most of the time,,Rarely,,,,,,Rarely,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Sometimes,Most of the time,Sometimes,Often,,,Sometimes,Sometimes,Sometimes,,Sometimes,,Rarely,,Sometimes,Sometimes,,Often,,,,Often,Sometimes,Sometimes,Most of the time,,,,40,10,0,15,15,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Sometimes,,Sometimes,,Most of the time,Often,,,Often,,Most of the time,,,Most of the time,Most of the time,,,Sometimes,Often,Often,Most of the time,Most of the time,76-99% of projects,More internal than external,Other,,Large volumes of data; Lack of appropriate tools to decode binary data source files,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Network drives; External HDDs,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion,Other",Never,109000,BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Online courses",Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Engineer,Researcher",University courses,20,10,30,40,0,0,"Adversarial Learning,Computer Vision","Bayesian Techniques,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Increased significantly,3-5 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data",Sometimes,10GB,"CNNs,GANs,Markov Logic Networks,Neural Networks,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Often,,Often,Often,,,,Often,,,Often,,,,Sometimes,,Sometimes,Often,,,,,,,Often,,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,,,Often,Often,,,,,,,,,10-25% of projects,More internal than external,Other,,,Other,I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Researcher,University courses,0,0,60,30,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Other,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,10GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,Rarely,Sometimes,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Most of the time,,,Rarely,Most of the time,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Sometimes,,,,,,Often,,Often,,,,Most of the time,Sometimes,,Often,,Often,,,,,Most of the time,,,,5,40,15,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,Most of the time,Often,,,,,,,,Sometimes,,,,,Sometimes,Often,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Bitbucket,Sometimes,"150,000",USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,Australia,55,"Not employed, but looking for work",,,,,,,,SQL,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",Kaggle,,,,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),3-5 years,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Psychology,More than 10 years,Data Analyst,University courses,0,0,0,100,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important +Male,Spain,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,,,,,"College/University,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,Very useful,Very useful,,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,10,10,10,10,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",High school,Academic,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","C/C++,IBM SPSS Statistics,Python,R,SAS JMP",,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,Markov Logic Networks,Neural Networks,Simulation,Time Series Analysis",,,Sometimes,,,,,Often,,,,,,,,Often,Often,,,Sometimes,,,,,,,Often,,,Often,,,,10,30,10,10,30,10,Enough to refine and innovate on the algorithm,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,some company and universities,Clean data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,41,"Not employed, but looking for work",,,,,,,,R,Text Mining,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,No Free Hunch Blog,5-10 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,"Reinforcement learning,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Very Important +Male,United States,43,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Other,More than 10 years,Researcher,Self-taught,70,20,5,5,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Australia,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Github Portfolio,No,Bachelor's degree,Management information systems,,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Israel,23,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A health science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,,,,Not at all important,,Laptop or Workstation and private datacenters,Relational data,,,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,,,,,,,,,Sometimes,,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,22,Employed part-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,"Engineer,Programmer",University courses,10,0,0,85,5,0,"Adversarial Learning,Natural Language Processing,Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs",,Academic,20 to 99 employees,Stayed the same,More than 10 years,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Rarely,100GB,"Bayesian Techniques,CNNs,GANs","C/C++,Jupyter notebooks,Mathematica,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Rarely,,,,Sometimes,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,Often,,Sometimes,Often,,,,Often,,,Often,,,,,Often,Often,Often,,,,,Rarely,,,Sometimes,Sometimes,,,,20,45,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,Sometimes,Sometimes,Rarely,,Sometimes,,,10-25% of projects,More internal than external,Other,,"analyzing honeypot(IoTPOT) data(e.g., online clustering)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,0,JPY,I am not currently employed,7,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Python,Text Mining,Python,"GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Statistician,University courses,30,10,10,50,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A professional degree,Academic,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,"CNNs,Decision Trees,Regression/Logistic Regression","IBM Cognos,IBM SPSS Statistics,IBM Watson / Waton Analytics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,Statistica (Quest/Dell-formerly Statsoft),Tableau",,,,,,,,,,Rarely,,Often,Rarely,,,,,,,,,,Often,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Rarely,Sometimes,,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,,,Rarely,,Often,Often,Often,,,,,,Often,,Often,,,,Sometimes,Often,Most of the time,,Sometimes,,Sometimes,Rarely,,,Rarely,,,,25,40,0,10,15,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,,,,,,,,Most of the time,,,,,,,,Most of the time,Often,,10-25% of projects,Entirely internal,IT Department,Academic Resources,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1500000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,RapidMiner (free version),Social Network Analysis,R,Google Search,"Friends network,YouTube Videos",,,,,,Very useful,,,,,,,,,,,,Very useful,"Data Machina Newsletter,FastML Blog,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Female,Singapore,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important +Female,South Korea,28,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by government,SAS Enterprise Miner,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Conferences,Kaggle,Newsletters,Textbook",,,Very useful,,Very useful,,Very useful,Somewhat useful,,,,,,,Very useful,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Researcher,University courses,20,30,0,50,0,0,"Reinforcement learning,Time Series","Decision Trees - Random Forests,Logistic Regression",,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Online courses,YouTube Videos",Very useful,,,,,,,,,,Very useful,,,,,,,Very useful,,1-2 years,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,PhD,No,Master's degree,Electrical Engineering,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,India,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher,Statistician",University courses,0,30,30,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Relational data,Other",Always,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,TIBCO Spotfire",,,,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Sometimes,Sometimes,,,Most of the time,,,Sometimes,,Sometimes,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",Often,Sometimes,,,,Often,Most of the time,Sometimes,,,,,,,Often,Most of the time,,Most of the time,,,Sometimes,,Sometimes,,,Often,Often,,,,,,,40,15,10,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,,,,,Most of the time,,,,,Sometimes,,,10-25% of projects,Entirely internal,Standalone Team,,,,,,,,2000000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Software Developer/Software Engineer,Statistician",University courses,30,20,10,30,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Image data,Video data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Impala,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Stan,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Often,,,,,,Rarely,,,Most of the time,,,,,,,Most of the time,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Often,Often,Often,Most of the time,Most of the time,Often,Often,,Rarely,Often,,Often,Often,Often,,,Often,Often,Often,Often,Often,Often,Often,Often,,Often,Often,Often,,,,40,30,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Most of the time,,Most of the time,,,Sometimes,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,,,,,Sometimes,Most of the time,,Less than 10% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Most of the time,5000000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Hong Kong,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Java,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist",Self-taught,60,0,0,25,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,CRM/Marketing,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Sometimes,,,Most of the time,,,,Often,,,,,Sometimes,Often,,,,,25,30,5,15,25,0,Enough to explain the algorithm to someone non-technical,Difficulties in deployment/scoring,,,,Often,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,47000,HKD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,Python,Text Mining,R,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,1-2 years,,,,,Necessary,Necessary,,,,,,,,,"Basic laptop (Macbook),Other",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Other",Self-taught,50,0,50,0,0,0,Natural Language Processing,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Germany,29,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,,Somewhat useful,"Data Stories Podcast,Linear Digressions Podcast,Partially Derivative Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,India,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,55,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,KNIME (commercial version),MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,Most of the time,,,,,,,,,,,,,,Rarely,,,Sometimes,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,,,,Sometimes,Most of the time,Often,Sometimes,,,,,Sometimes,,Often,,,,,Sometimes,,Rarely,,,Sometimes,Most of the time,Sometimes,,Often,,,,10,20,20,40,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Sometimes,,,,Sometimes,Sometimes,,,,,Often,,,Rarely,,Sometimes,Often,Sometimes,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,,,,,Very useful,Very useful,,,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,Other",Other,11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,20,0,0,0,40,"Computer Vision,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,South Korea,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,Very useful,,,Very useful,"Data Stories Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",40,50,5,5,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Markov Logic Networks",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Decision Trees,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes",,,Rarely,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,20,5,25,40,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Sometimes,,,,,,,,,,Often,,,,,,,,,,,,76-99% of projects,More external than internal,,lanyon;IMS,Data access and network speed,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,700000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast",< 1 year,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,I haven't started working yet,Self-taught,80,5,0,0,15,0,"Computer Vision,Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,,,,,,,, +Male,Pakistan,29,Employed full-time,,,No,Yes,Data Scientist,,,Amazon Machine Learning,Cluster Analysis,SAS,Google Search,"College/University,Stack Overflow Q&A",,,Very useful,,,,,,,,,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,< 1 year,,,,,,,,,,,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation",,Master's degree,,Master's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Engineer,Researcher,Other",University courses,0,0,0,100,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis",,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,21,"Not employed, but looking for work",,,,,,,,Amazon Web services,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",11 - 39 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,25,25,50,0,0,Time Series,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by government,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Online courses,Podcasts,Textbook",,Very useful,Very useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Not Useful,,,,"Data Stories Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",20,10,30,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A professional degree,Military/Security,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Never,10GB,,"Minitab,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,Often,,,,,,,,,Often,,,,,,Most of the time,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,,,,,,,,Often,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,Sometimes,60000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",SAS Base,Deep learning,R,Google Search,"Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Mix of fields,100 to 499 employees,Decreased significantly,6-10 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,100TB,,"Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Lift Analysis,Segmentation,Time Series Analysis",Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,Often,,,,60,5,0,10,25,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,Often,Most of the time,,,,Most of the time,,Most of the time,,,,,,Often,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,508000,INR,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,<1MB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Python,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Logistic Regression,SVMs",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,50,25,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Sometimes,,,,,,,,Often,,,Often,,,,Often,,Often,,,,,76-99% of projects,More external than internal,Standalone Team,mysql;,processing; enrichment; noise,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,Never,300000,INR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,70,0,0,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,Internet-based,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,10TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,Often,,,,Often,,,,,,Often,,Most of the time,,,,Sometimes,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,15,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Privacy issues",Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,Git,Sometimes,1500000,RUB,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,Python,,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses",,,Very useful,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,50,0,50,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,France,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,Python,GitHub,"Kaggle,Newsletters",,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,3 to 5 years,"Computer Scientist,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",60,35,0,0,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Insurance,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,,,Often,Often,,,Sometimes,,,Often,,Often,Often,,,,Often,Often,Often,,Often,Often,Often,Often,,Often,Often,Often,,,,40,50,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,Often,,Often,,Often,,,,,,,,Often,,51-75% of projects,Approximately half internal and half external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Google Cloud Compute,Time Series Analysis,SQL,"Government website,Other","Blogs,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Management information systems,More than 10 years,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Never,100MB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,QlikView,R,SAS Base,SQL,Tableau",,Often,,,Often,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Often,Often,Often,,,,,Often,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression",,,,,,,Most of the time,Sometimes,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,60,5,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,Often,,,,,,Most of the time,,,Sometimes,,Often,Most of the time,,,76-99% of projects,More internal than external,Central Insights Team,Financial; cue; dvla; experian ,Consistency ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Rarely,90000,GBP,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Netherlands,42,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,I don't plan on learning a new ML/DS method,Python,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,,,"Some college/university study, no bachelor's degree",Internet-based,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,,,"NoSQL,Python,R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,,Most of the time,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,40,10,10,30,10,0,Enough to run the code / standard library,"Limitations of tools,Privacy issues,Scaling data science solution up to full database,Other",,,,,,,,,,,,,Sometimes,,,,Most of the time,Often,,,,Often,10-25% of projects,Entirely internal,Other,,Size and amount of records,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Other",Sometimes,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,10,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Computer Vision,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data,Text data",Sometimes,10GB,"CNNs,Neural Networks,RNNs,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,Most of the time,,,Often,,,,,,,,,,,,,Most of the time,Often,,,,Most of the time,,,,,,,,,50,20,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Most of the time,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Always,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,Neural Nets,R,University/Non-profit research group websites,"Personal Projects,Textbook",,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,0,0,50,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,Never,1GB,,"Amazon Web services,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation",,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,40,0,0,0,60,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making",Sometimes,Sometimes,Often,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Git,Rarely,150000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,36,"Not employed, but looking for work",,,,,,,,Stan,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,Not Useful,,Very useful,Not Useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,PhD,Yes,Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,"Computer Vision,Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Other,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Python,Decision Trees,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop or Workstation and local IT supported servers,Other",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,40,0,10,10,0,40,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Taiwan,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Association Rules,Other,GitHub,College/University,,,Very useful,,,,,,,,,,,,,,,,Data Stories Podcast,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Taiwan,38,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,TensorFlow,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Stack Overflow Q&A,Other",,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,Researcher,Work,0,0,80,0,0,20,"Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,Other,"C/C++,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Rarely,Often,,,,Often,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,Most of the time,Sometimes,,Most of the time,,,,"Association Rules,Data Visualization,HMMs,kNN and Other Clustering,Lift Analysis,PCA and Dimensionality Reduction,Segmentation",,Sometimes,,,,,Most of the time,,,,,,Sometimes,Most of the time,Sometimes,,,,,,Often,,,,,Often,,,,,,,,40,10,5,40,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Privacy issues",Most of the time,,,,,,,,,,Most of the time,,,,,,Often,,,,,,100% of projects,Approximately half internal and half external,Other,no,Data collection,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Most of the time,,,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,40,10,20,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,Often,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,,Often,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Natural Language Processing,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Most of the time,,,Often,Most of the time,,Often,Most of the time,,,Often,Most of the time,,,Most of the time,,,Most of the time,,,,Often,,Often,Most of the time,,Most of the time,Most of the time,Often,,,,50,20,20,0,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Most of the time,Most of the time,Often,Sometimes,Most of the time,,,Most of the time,,Sometimes,,,Sometimes,,,,Most of the time,,,,Often,,10-25% of projects,More internal than external,IT Department,"Open data set, Kaggle open data set, Open street map, Wikipedia","Dirty data need to arrange multiple algorithms to clean, connect records.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,"900,000",TWD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,Very useful,Very useful,,,,Somewhat useful,"Data Stories Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Business Analyst,University courses,33,1,33,0,33,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Never,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Sometimes,,,,,Most of the time,Most of the time,Most of the time,,Sometimes,,,,,,Sometimes,,,,,80,1,0,9,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Sometimes,,Most of the time,,100% of projects,Approximately half internal and half external,Other,Lytica,Computational resources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Rarely,350000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Singapore,45,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,GitHub,"Personal Projects,YouTube Videos",,,,,,,,,,,,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,60,5,25,5,0,5,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Relational data",Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","MATLAB/Octave,Microsoft Excel Data Mining",,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,Often,,Most of the time,,Often,,,Most of the time,Most of the time,Often,,,Most of the time,Most of the time,,,Most of the time,,,,35,25,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,120000,SGD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Julia,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,Very useful,Somewhat useful,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,,,,Very useful,"DataTau News Aggregator,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,5,20,5,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Female,United States,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,The Data Skeptic Podcast,< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle",Very useful,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Engineer,Software Developer/Software Engineer",University courses,10,30,20,30,5,5,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I prefer not to answer,Technology,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data",Don't know,1GB,"Ensemble Methods,Neural Networks","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics",,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,40,30,15,15,0,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Sometimes,,,Often,,,,,,,Sometimes,,,Less than 10% of projects,Do not know,IT Department,,,,,,Git,,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,12.5,0,50,12.5,0,Outlier detection (e.g. Fraud detection),Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important +"Non-binary, genderqueer, or gender non-conforming",Philippines,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,SQL,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Researcher",Self-taught,70,20,10,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series",,A bachelor's degree,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Video data,Text data,Relational data",,100TB,Regression/Logistic Regression,"Amazon Web services,Impala,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau,Other",,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,Most of the time,"A/B Testing,Data Visualization,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,30,0,25,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,,,,Most of the time,,,,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Always,"1,200,000",PHP,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,SAS Base,Deep learning,R,I collect my own data (e.g. web-scraping),"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Statistician,Other",Self-taught,60,30,10,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Bayesian Techniques,Logistic Regression",Primary/elementary school,Non-profit,"5,000 to 9,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,Other",,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Often,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,,,,Most of the time,Often,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,Often,Often,Most of the time,,Sometimes,,,,,,,Often,,,26-50% of projects,Do not know,Standalone Team,,,Graph (e.g. GraphBase/Neo4j),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,360000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,Programmer,Fine,Employed by college or university,Oracle Data Mining/ Oracle R Enterprise,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Master's degree,Physics,1 to 2 years,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Female,South Korea,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Personal Projects",Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,,,,,,"Data Machina Newsletter,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Researcher,Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,25,30,0,45,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Australia,NA,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,I don't plan on learning a new tool/technology,Time Series Analysis,Python,Government website,"Blogs,Official documentation",,Somewhat useful,,,,,,,,Very useful,,,,,,,,,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Software Developer/Software Engineer,Other",Self-taught,90,10,0,0,0,0,Time Series,Logistic Regression,,Non-profit,,,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,,"NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,Often,,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,50,10,10,25,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,Often,,,,,Most of the time,Sometimes,,,Most of the time,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Conferences,Friends network,Online courses,Textbook,YouTube Videos,Other",,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Other",Self-taught,70,10,10,10,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Other,Relational data,Sometimes,1PB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,Tableau,Unix shell / awk,Other,Other",Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,,,Most of the time,,Most of the time,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,Sometimes,,,Often,Sometimes,Sometimes,,"Association Rules,Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,,,Most of the time,Most of the time,,Sometimes,,,,,,Most of the time,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,Most of the time,,,,,Most of the time,Often,,,,60,5,10,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,Often,Often,,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,,Less than 10% of projects,More internal than external,Business Department,Social media,IT dropping it,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,205,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,15,20,20,20,15,10,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,100MB,"Ensemble Methods,Random Forests","Python,R,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Most of the time,,Sometimes,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,Often,Most of the time,,,,,,,Often,,Often,,Rarely,,Rarely,,,Often,,,,,Often,Often,,,,,25,20,5,25,10,15,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,,,,Often,,76-99% of projects,More external than internal,Standalone Team,"Twitter, GA",Access to data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,660000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,High school,Academic,20 to 99 employees,Stayed the same,Don't know,Some other way,Somewhat important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,,,Regression/Logistic Regression,"C/C++,Jupyter notebooks,Mathematica,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Data Visualization,Simulation",,,,,,,Often,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,25,25,0,25,25,0,Enough to tune the parameters properly,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Often,,,,,,,,,,,,,51-75% of projects,Do not know,Other,,,Other,Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,30000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Very useful,Very useful,,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Udacity,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Outlier detection (e.g. Fraud detection),,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Very useful,"FastML Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,Other,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Iran,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,NoSQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,Data Scientist,Programmer",University courses,60,10,0,20,10,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning","Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,"1,000 to 4,999 employees",Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Evolutionary Approaches,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,Rarely,,,,,,,,Often,,Rarely,,Often,,,,,,,,,,Often,,,,,,,,,,,Often,,,,Often,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Natural Language Processing",Often,,,,Sometimes,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,30,40,20,0,0,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,Most of the time,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,54000000,IRR,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Hong Kong,20,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Adversarial Learning,Support Vector Machines (SVMs),High school,Retail,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,100MB,Regression/Logistic Regression,"Amazon Machine Learning,Python",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing",,,,,,Most of the time,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,10,20,30,10,30,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Bitbucket,Rarely,1,AED,I am not currently employed,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",Julia,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Not Useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Business Analyst,Operations Research Practitioner,Researcher,Other",University courses,50,30,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Non-profit,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Text Analytics,Time Series Analysis",Often,,Rarely,,,Most of the time,Most of the time,Sometimes,Often,Sometimes,,,,Often,,Most of the time,Sometimes,,,,Often,Often,,,,,Often,,Often,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Sometimes,Rarely,,Often,,,,Often,,Often,,,,,Most of the time,,,,Most of the time,Most of the time,,76-99% of projects,More external than internal,Business Department,data.gov,Obtaining relevant and timely data from government agencies,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Nigeria,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,R,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Telecommunications,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,Regression/Logistic Regression,"IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,QlikView,R,Tableau",,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,Most of the time,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,Often,,,,,,,"Association Rules,Data Visualization,Decision Trees,Prescriptive Modeling,Segmentation,Text Analytics",,Sometimes,,,,,Most of the time,Rarely,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,10,20,40,10,20,0,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Rarely,,,,,,26-50% of projects,Entirely internal,Other,None,None ,Other,Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,4000000,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",Very useful,,,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,"FastML Blog,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Business Analyst,University courses,70,10,10,5,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,"5,000 to 9,999 employees",Increased significantly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Python,R,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,Often,Often,,,,Rarely,,,,Most of the time,,,,,,Most of the time,Most of the time,,,Most of the time,,,,Often,,,,20,40,20,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,,,,Often,,,,,,,,Sometimes,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),,770000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Kaggle,Newsletters,Online courses,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler",University courses,50,20,5,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Support Vector Machines (SVMs)",I prefer not to answer,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,Often,,Sometimes,Sometimes,Sometimes,,Sometimes,Most of the time,,,Sometimes,,,,Often,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Prescriptive Modeling,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,Sometimes,,Most of the time,Often,Often,Sometimes,,Sometimes,,Sometimes,,,,,,Often,,Often,,Sometimes,,,Often,,Often,Most of the time,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Often,Sometimes,,Sometimes,Often,,,Most of the time,,,,,,Most of the time,,Often,,,100% of projects,Entirely internal,IT Department,none,real-time data not available,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Never,,INR,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Tableau,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Online courses",,,,,,,,,Very useful,,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"edX,Other",Other,0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Norway,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler",Work,20,20,60,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Financial,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Regression/Logistic Regression","Amazon Web services,C/C++,Python,R,SQL,TensorFlow",,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,Often,Often,,,,,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,750000,NOK,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Angoss,Other,Matlab,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,Computer Scientist,Work,20,20,10,15,20,15,Survival Analysis,Logistic Regression,A bachelor's degree,Technology,I prefer not to answer,Increased significantly,6-10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Always,10GB,Random Forests,"R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Naive Bayes,Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Most of the time,,,,,,10-25% of projects,,IT Department,,,Graph (e.g. GraphBase/Neo4j),Email,,Bitbucket,,140000,INR,Has decreased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Minitab,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Engineer,Researcher",Self-taught,40,50,0,10,0,0,Other (please specify; separate by semi-colon),"Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks","IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Python,Tableau",,,,,,,,,,,,Sometimes,,,,,Often,,,,Sometimes,,Most of the time,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,,,,Most of the time,,,,30,10,10,30,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others",,Sometimes,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Never,75000,AED,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,C/C++,Social Network Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Conferences,Kaggle,Tutoring/mentoring,YouTube Videos",,Very useful,,Somewhat useful,Not Useful,,Somewhat useful,,,,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Operations Research Practitioner,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Manufacturing,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,1GB,"Evolutionary Approaches,Neural Networks","Java,Minitab,Python",,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Evolutionary Approaches,Neural Networks,Simulation,Time Series Analysis",Sometimes,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,Often,,,Rarely,,,,70,10,10,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Often,,,,,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,Customer Ratings;Rating Agencies; Journals,Accuracy;Usability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,Git,Never,60000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,Very useful,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,PhD,Sort of (Explain more),Bachelor's degree,Computer Science,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by college or university,NoSQL,Monte Carlo Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Official documentation,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,Very useful,,,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Machine Learning Engineer,Programmer,Researcher",Self-taught,30,15,0,55,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Relational data",Never,1GB,"Bayesian Techniques,Decision Trees,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","Java,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs",Often,,Often,Sometimes,,Most of the time,Most of the time,,Sometimes,,,,Sometimes,Most of the time,,Most of the time,,Most of the time,,Often,Often,,,,,,,Most of the time,,,,,,50,10,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,100% of projects,Entirely external,Standalone Team,UCI repository,Organization,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,github,"Git,Other",Sometimes,"100,000",USD,Other,8,,,,,,,,,,,,,,,,,, +Male,Republic of China,26,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Arxiv,Stack Overflow Q&A",Very useful,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Researcher,Self-taught,60,0,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data",Most of the time,1TB,"CNNs,Gradient Boosted Machines,Neural Networks","C/C++,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Gradient Boosted Machines,Logistic Regression,Neural Networks,RNNs",,,,Sometimes,,,,,,,,Most of the time,,,,Most of the time,,,,Often,,,,,Often,,,,,,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,"Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Russia,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Stack Overflow Q&A",,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Statistician",Work,0,15,40,40,5,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Video data,Text data",Rarely,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,60,0,0,40,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,420000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,19,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,27,Employed full-time,,,No,Yes,Other,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Time Series Analysis,R,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",10,10,10,10,0,60,"Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,16-20,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,21,Employed full-time,,,No,Yes,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",5,65,15,15,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Belarus,24,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,C/C++,Deep learning,Python,Google Search,Arxiv,Very useful,,,,,,,,,,,,,,,,,,,1-2 years,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,University courses,25,25,0,30,20,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Stayed the same,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,Regression/Logistic Regression,"MATLAB/Octave,QlikView,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Often,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,Most of the time,,,,45,20,10,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT",Often,,,,Often,,,,,,,,,,Often,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,140000,USD,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,37,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Text Mining,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,Researcher,University courses,70,0,0,30,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Academic,100 to 499 employees,Increased slightly,More than 10 years,Some other way,Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Most of the time,10GB,Other,"R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,Most of the time,,,"Data Visualization,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,50,10,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,Often,,,,,,,Often,,,Sometimes,Often,,100% of projects,Approximately half internal and half external,Other,"Firm financials, stock prices",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,150000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Official documentation,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,Very useful,,,,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,Data Scientist,Self-taught,30,40,0,0,30,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,A bachelor's degree,Academic,100 to 499 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Always,100GB,"Regression/Logistic Regression,Other","IBM SPSS Statistics,R,SAS Base,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Often,,,,,,,,,,"Logistic Regression,Other",,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,60,10,0,10,20,0,Enough to tune the parameters properly,"Dirty data,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,Most of the time,,,Often,,,,,,,100% of projects,More internal than external,Standalone Team,National health insurance database,200GB,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,17000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,Very useful,,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),40+,Master's degree,No,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,University courses,40,40,0,20,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,15,30,20,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Increased significantly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Other,Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs",,Rarely,Sometimes,Most of the time,,Sometimes,Most of the time,Rarely,,,,,Rarely,Sometimes,,Rarely,Rarely,Rarely,Often,Often,Sometimes,,Rarely,,Often,,Rarely,Sometimes,,,,,,30,50,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,The Cancer Genome Atlas; 1000 genomes database,Integration,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Kaggle Competitions,No,Doctoral degree,Engineering (non-computer focused),1 to 2 years,Researcher,Kaggle competitions,10,10,0,0,80,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Other,Anomaly Detection,R,Government website,"Conferences,Friends network,Online courses,Stack Overflow Q&A,Textbook",,,,,Very useful,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,60,35,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Government,500 to 999 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Other","R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Often,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Often,Often,Often,Often,,,Often,,Often,,Sometimes,,,Sometimes,Often,Often,Often,Often,Sometimes,,Often,,Sometimes,Sometimes,Often,,,,40,15,0,15,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",Sometimes,,,,Often,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,,51-75% of projects,Entirely internal,Other,Experian purchased data,Unknown level of noise in the Experian data set,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Other",Sometimes,"117,000",USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Scientist",Work,30,10,60,0,0,0,Time Series,Decision Trees - Random Forests,A master's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,10GB,"Decision Trees,Random Forests","IBM Cognos,Microsoft Excel Data Mining",,,,,,,,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,25,20,10,5,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data",,,,Often,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Other,,Cleaning of data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Bitbucket,Most of the time,550000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Female,India,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,IBM SPSS Statistics,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Somewhat useful,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,High school,Academic,100 to 499 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,10GB,Random Forests,"Amazon Web services,Microsoft Azure Machine Learning,Spark / MLlib",,Often,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,"CNNs,Logistic Regression,Random Forests",,,,Rarely,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,50,20,10,NA,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,,,,,,Most of the time,,26-50% of projects,More internal than external,Business Department,Questionnaires,Data Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,42000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Microsoft R Server (Formerly Revolution Analytics),Social Network Analysis,Python,Other,"Blogs,Kaggle,Official documentation,Tutoring/mentoring",,Very useful,,,,,Very useful,,,Somewhat useful,,,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Other,Other,30,15,35,0,20,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Often,,,,,,,,,,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Random Forests",,,,,,Rarely,Often,Rarely,,,,,,,,Often,Sometimes,Sometimes,,Sometimes,,,Often,,,,,,,,,,,50,25,10,10,5,0,Enough to tune the parameters properly,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Privacy issues,Scaling data science solution up to full database",,,,,Most of the time,,,,,Sometimes,Most of the time,,Sometimes,,,,Most of the time,Rarely,,,,,100% of projects,Approximately half internal and half external,IT Department,"call center data, play store reviews",dirty data,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Other,Never,840000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Text Mining,R,GitHub,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,,"Engineer,I haven't started working yet",Self-taught,NA,NA,NA,NA,NA,NA,Adversarial Learning,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Association Rules,Python,GitHub,Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,Researcher",Self-taught,70,0,10,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Decision Trees - Random Forests,A doctoral degree,Academic,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,Ensemble Methods,"Java,Python,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization",,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,70,10,5,10,5,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,Often,Most of the time,,,,Often,,,,Often,,,,,,,,,,51-75% of projects,More internal than external,IT Department,TPC-H,Data cleansing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,4000000,JPY,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Other,Perfectly,Employed by college or university,IBM SPSS Statistics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Tutoring/mentoring",,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,Very useful,,"Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",5-10 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,,"Coursera,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Computer Science,More than 10 years,Other,Self-taught,85,0,5,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,,,,Very Important,,,Very Important,,,,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,India,23,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Technology,"10,000 or more employees",Decreased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Other",Relational data,Sometimes,1MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Segmentation",,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,50,20,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Most of the time,,,,,,,,Often,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,Data needs to be in a form that is ready for analysis tidying data is usually the biggest challenge ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Never,300000,,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Social Network Analysis,Python,University/Non-profit research group websites,"Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,FlowingData Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,30,20,20,30,0,0,"Machine Translation,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,South Africa,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",15,50,25,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft Excel Data Mining,NoSQL,R,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,Often,,,,,,Often,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Segmentation",Often,Often,,,,,Often,Most of the time,,,,,,Often,,Often,,,,,,,,,,Often,,,,,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,,,,Often,,,,Most of the time,,,Most of the time,Often,,26-50% of projects,More internal than external,Central Insights Team,Na,Dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,0,ZAR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing",Logistic Regression,"Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Text data",Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,,,,,,,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Often,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,Most of the time,,,,,40,30,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Israel,30,Employed full-time,,,No,Yes,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,1 to 2 years,"Data Miner,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,0,0,30,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,Not Useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,University courses,30,30,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +A different identity,Belarus,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Google Search,"Conferences,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,,Engineering (non-computer focused),Less than a year,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",5,80,15,0,0,0,,"Decision Trees - Random Forests,Logistic Regression",,Retail,"1,000 to 4,999 employees",,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,100MB,"Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Random Forests,Segmentation",Rarely,,,,,,Often,,,,,,,,,,,,,,,,Rarely,,,Often,,,,,,,,20,20,5,25,30,0,,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Often,Often,Often,,,,,,Sometimes,,,,,,Most of the time,,,,,,Most of the time,,26-50% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,,Rarely,12000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,42,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,40+,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,India,21,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Anomaly Detection,Python,GitHub,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,< 1 year,,Necessary,,,Necessary,,Necessary,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Other,70,30,0,0,0,0,Natural Language Processing,"Bayesian Techniques,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,,Very Important,,Very Important,Very Important,Very Important,,,,,Very Important,,,, +Male,United States,25,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"FastML Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,15,15,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Retail,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,100TB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,Segmentation",Often,,,Rarely,Sometimes,,,Often,,,,,,Often,,Often,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,Often,,,,,,,,50,15,15,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,Sometimes,,26-50% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,120000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,Java,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,Very useful,,,Very useful,,,Very useful,,,Very useful,"Data Machina Newsletter,KDnuggets Blog",< 1 year,Unnecessary,Nice to have,Necessary,,,Nice to have,,Nice to have,,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Doctoral degree,Biology,I don't write code to analyze data,Researcher,Self-taught,70,0,0,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,Japan,26,Employed full-time,,,Yes,,Data Analyst,,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Data Analyst,Programmer",Self-taught,50,10,10,20,10,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,10 to 19 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,,Decision Trees,"IBM SPSS Modeler,Python,R",,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,SVMs",,,Sometimes,,,Often,Often,Often,,,,,,,,Often,,,,Sometimes,,,Sometimes,,,,,Often,,,,,,50,10,10,10,10,10,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,3000000,JPY,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,44,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Very useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,Computer Scientist,Self-taught,80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,"Decision Trees,Regression/Logistic Regression","Cloudera,Java,Python,QlikView,RapidMiner (free version),Spark / MLlib",,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,Often,,,Sometimes,,,,,,Often,,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Often,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,10,10,10,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,Often,,,,Often,,,,,,Sometimes,Sometimes,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Bitbucket,Never,100000,SGD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,University courses,70,20,0,10,0,0,"Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,,100GB,,"Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk,Other",,Often,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Often,,,,,,Often,Most of the time,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,,,,10,0,60,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,Different teams use different terms to define the same thing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Never,132000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Hong Kong,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,,"KDnuggets Blog,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,,Less than a year,Researcher,University courses,10,20,0,70,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Other (please specify; separate by semi-colon),High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Kenya,46,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,C/C++,Anomaly Detection,C/C++/C#,Google Search,"Blogs,Official documentation,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,1 to 2 years,"Computer Scientist,Programmer",Self-taught,60,20,20,0,0,0,"Computer Vision,Time Series,Unsupervised Learning",Neural Networks - GANs,Primary/elementary school,Academic,500 to 999 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Image data,Never,1GB,Neural Networks,"C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Neural Networks,Simulation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,Sometimes,,,,20,30,5,10,35,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Sometimes,,,,,Most of the time,,,Most of the time,,,Often,,Often,,Often,Most of the time,,51-75% of projects,Entirely internal,IT Department,NASA images; TCIA,finding the data I require within the data source,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,"90,000",KES,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,Very useful,,,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,Operations Research Practitioner,Work,20,30,50,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Relational data,Other",Sometimes,1GB,"Ensemble Methods,Random Forests,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Segmentation,SVMs",Often,Sometimes,,,Sometimes,,Most of the time,Often,,,,,,,,Often,,Often,,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input",Most of the time,Often,Often,,,,,,,,Sometimes,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Sometimes,2400000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,Australia,54,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Time Series Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,College/University,Conferences",Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,I never declared a major,6 to 10 years,"Business Analyst,Researcher",University courses,50,0,0,50,0,0,"Computer Vision,Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Image data,Other",,10GB,"Bayesian Techniques,Neural Networks,Other","C/C++,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,Sometimes,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,Most of the time,Often,,,,,Sometimes,,,,Most of the time,,,,10,50,0,10,30,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Hong Kong,35,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Other",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,40,20,10,10,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Financial,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Traditional Workstation,Other,Most of the time,100MB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,10,20,20,30,20,0,Enough to tune the parameters properly,"Dirty data,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database",,,,,Often,,,Often,,,,,,,,,,Often,,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,,,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Australia,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A",Very useful,Somewhat useful,,,Not Useful,Very useful,,,,,,Very useful,Somewhat useful,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Self-taught,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Julia,Jupyter notebooks,KNIME (free version),NoSQL,Python,QlikView,R,SQL,Tableau,Unix shell / awk",Rarely,Often,,,,,,Rarely,Rarely,,,,,,,Rarely,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,Often,Often,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Rarely,Sometimes,,,,,,,,,Rarely,Sometimes,Rarely,,Often,Rarely,,,,Often,Sometimes,Sometimes,Often,,,,50,5,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Often,,Most of the time,,,,Often,,,,,Most of the time,,,,,Sometimes,Sometimes,,,51-75% of projects,More internal than external,Business Department,ABS; ATO; Equifax; D&B,Finding out data definitions,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Other",Confluence,"Bitbucket,Git",Sometimes,137000,AUD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Taiwan,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Jupyter notebooks,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Personal Projects,Textbook",,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,,,Somewhat useful,,,,"DataTau News Aggregator,FlowingData Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Engineer",Self-taught,30,10,40,20,0,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods","Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",Rarely,10GB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Ensemble Methods,kNN and Other Clustering,Random Forests,Time Series Analysis",,Rarely,Sometimes,,,,Most of the time,,Often,,,,,Often,,,,,,,,,Often,,,,,,,Most of the time,,,,30,20,10,30,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Often,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,Often,Most of the time,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Other",Dropbox,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,120000,SGD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Switzerland,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Government website,"Blogs,College/University,Company internal community,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Scientist,Other",Self-taught,30,10,20,40,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Python,R,SAS Base,Spark / MLlib,SQL,TensorFlow",,,,,,,,,Rarely,,,,,,,,Often,,Sometimes,,,Rarely,,,,,,,,,Often,,Most of the time,,,,,Rarely,,,Rarely,Sometimes,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,Text Analytics",Sometimes,,Sometimes,,,Often,Most of the time,Rarely,,,,,,,,,,Rarely,Sometimes,,,,Rarely,Sometimes,,Sometimes,,,Often,,,,,30,10,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Often,,Often,,,,,,,,,Sometimes,,Most of the time,,,,Often,Often,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Iran,32,Employed part-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,Very useful,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Fine arts or performing arts,3 to 5 years,Machine Learning Engineer,Kaggle competitions,40,0,10,0,50,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"5,000 to 9,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Sometimes,,,Sometimes,,,,Sometimes,,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,,,30,30,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Social Network Analysis,R,Google Search,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",University courses,60,15,20,0,5,0,"Recommendation Engines,Time Series","Bayesian Techniques,Ensemble Methods",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","KNIME (free version),Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,Often,,Most of the time,,,,,Most of the time,,,,Often,,,Rarely,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,Sometimes,Often,,Sometimes,Sometimes,,Sometimes,Often,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,Often,,Often,Most of the time,,,Most of the time,,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,Most of the time,,,,,,Most of the time,,,,,,Often,,,,,,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,,,,,,,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Denmark,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",DataRobot,Rule Induction,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Textbook",,,,,Very useful,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Other,Work,50,0,50,0,0,0,Reinforcement learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs",A bachelor's degree,Manufacturing,"5,000 to 9,999 employees",Increased significantly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,1MB,Regression/Logistic Regression,"R,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,HMMs,Simulation,Other",,,,,,Often,Most of the time,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,50,30,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,Biological sequnce databases,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Other",Rarely,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Other,Perfectly,"Employed by professional services/consulting firm,Self-employed",Google Cloud Compute,Deep learning,Python,GitHub,"Blogs,Kaggle,Online courses",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Taiwan,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Friends network,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,Very useful,,,Very useful,"Data Stories Podcast,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer",University courses,5,40,10,40,5,0,Computer Vision,"Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data",Most of the time,10GB,"Bayesian Techniques,CNNs,Neural Networks","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction",,,Rarely,Most of the time,,,,,,Sometimes,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,10,50,10,30,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization",,,,Most of the time,Often,,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,Imagenet,"Labeling, define.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Mercurial",Most of the time,30000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,,3 to 5 years,Data Analyst,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs",,,,,,Sometimes,Most of the time,Often,Most of the time,,,Often,,Rarely,,Sometimes,,,,,,,Sometimes,,,,,Sometimes,,,,,,70,10,10,0,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,Most of the time,,,,Sometimes,,,,,,,,,Most of the time,,,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",,20000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,Official documentation,Online courses,Personal Projects",,,,,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Korea,32,"Not employed, but looking for work",,,,,,,,IBM SPSS Statistics,"Ensemble Methods (e.g. boosting, bagging)",R,GitHub,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Somewhat useful,,,,,"Data Machina Newsletter,Linear Digressions Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),I did not complete any formal education past high school,,1 to 2 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",10,60,10,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +,,NA,Employed full-time,,,No,Yes,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Other,"Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,,I don't write code to analyze data,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,Turkey,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Other,University courses,0,20,50,30,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Telecommunications,"10,000 or more employees",Decreased significantly,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,Regression/Logistic Regression,"IBM SPSS Modeler,R,SQL",,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,70,10,10,0,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Sometimes,Often,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Always,45000,TRY,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Stan,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos,Other,Other",Very useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Somewhat useful,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",31,27,20,19,3,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,KNIME (free version),Python,R,SQL,Stan,Tableau",,,,,Often,,,,Often,,,,,Often,,,Most of the time,,Sometimes,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,Sometimes,,Most of the time,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis,Other,Other",,Sometimes,Often,,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,Often,,Often,,Most of the time,,,Most of the time,Often,Often,Sometimes,,Sometimes,Sometimes,,,Sometimes,Most of the time,Often,,15,13,7,20,25,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Sometimes,Sometimes,Sometimes,Often,Often,,Often,Most of the time,,,,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,100% of projects,More internal than external,Business Department,Data shared from partners ,Uncertainty in data; quality; volume,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,85000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Not Useful,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Electrical Engineering,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Hong Kong,49,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Textbook",Very useful,,,,,,,,,,,,,,Somewhat useful,,,,"DataTau News Aggregator,FastML Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,United States,48,Employed full-time,,,Yes,,Computer Scientist,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",University courses,25,25,25,25,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,1TB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,RNNs,Other","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,Rarely,Sometimes,,,,,,,,Most of the time,,Rarely,,Sometimes,Sometimes,Most of the time,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",Sometimes,,Often,Rarely,,Often,Most of the time,Often,,,,Often,,Most of the time,,Most of the time,,Often,,Most of the time,Most of the time,Sometimes,Sometimes,,Often,Sometimes,,Sometimes,,Most of the time,,,,20,50,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,Often,Most of the time,Often,,,Most of the time,,Most of the time,Sometimes,Most of the time,,,,,,Most of the time,Most of the time,,,100% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",Rarely,350000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Israel,32,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Other,Time Series Analysis,Python,GitHub,"Arxiv,Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Machine Learning Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,5,50,30,5,0,Natural Language Processing,"Ensemble Methods,Neural Networks - RNNs",A master's degree,Academic,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation",Text data,Sometimes,100GB,"Neural Networks,RNNs","Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,Rarely,,,,"Cross-Validation,Ensemble Methods,Natural Language Processing,Neural Networks,RNNs",,,,,,Often,,,Often,,,,,,,,,,Most of the time,Often,,,,,Most of the time,,,,,,,,,20,60,0,0,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",Often,,,,Often,,,,,Sometimes,,,,,,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint,Other",google drive,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,50000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,100,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,Neural Nets,C/C++/C#,"Government website,University/Non-profit research group websites","Friends network,Online courses,Podcasts",,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,35,30,30,5,0,0,Recommendation Engines,,A bachelor's degree,Technology,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Video data,Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,C/C++,Java,SQL,Tableau,Unix shell / awk,Other",,Often,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,Often,,,"A/B Testing,Bayesian Techniques,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Recommender Systems,Time Series Analysis",Sometimes,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,,,Often,,,,,,Sometimes,,,,10,50,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,Often,Sometimes,,Most of the time,,,Sometimes,Often,,,Less than 10% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Most of the time,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,SQL,Google Search,"Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Machine Learning Engineer,Software Developer/Software Engineer",University courses,10,30,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data",Most of the time,10GB,"Bayesian Techniques,SVMs","Java,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Naive Bayes,Natural Language Processing,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,Often,,,,,10,50,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,I prefer not to say",,,,,,Often,Often,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,700000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,52,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Other,Other",,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,,,,Necessary,,,,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Engineer,Other",Work,0,50,50,0,0,0,Natural Language Processing,"Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,Very Important,Very Important,Not important,,Very Important,Not important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important +Male,South Africa,26,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle",,Very useful,,,Very useful,,Very useful,,,,,,,,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,,,,,"Traditional Workstation,Workstation + Cloud service",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Management information systems,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,India,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Online courses,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,Very useful,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,30,10,30,30,0,0,"Natural Language Processing,Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10GB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Java,Python",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,Often,,,Most of the time,,,Sometimes,,,,,Often,,Most of the time,,Most of the time,Most of the time,Often,Often,,,,Sometimes,,,Often,Most of the time,,,,,25,45,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Unavailability of/difficult access to data",,Sometimes,,Most of the time,Often,,,,,,Often,,,,,,,,,,Most of the time,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1200000,INR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Hong Kong,29,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,Microsoft Excel Data Mining,Association Rules,R,"GitHub,University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A health science,Less than a year,,University courses,30,30,0,40,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,Primary/elementary school,Government,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,10MB,Decision Trees,"IBM Cognos,R",,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Decision Trees,kNN and Other Clustering",,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,80,10,0,0,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data",Often,,Often,,Most of the time,,,,,,,,,,,,,,,,,,None,Entirely internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"50,000",,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Matlab,Other,"Arxiv,Blogs,College/University,Conferences,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Jack's Import AI Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,"Coursera,edX",Basic laptop (Macbook),40+,PhD,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Text Mining,Python,University/Non-profit research group websites,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Stories Podcast,FastML Blog,Linear Digressions Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,Data Scientist,Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Personal Projects",Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Master's degree,Computer Science,3 to 5 years,"Data Miner,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,10,10,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important +Female,United States,25,"Not employed, but looking for work",,,,,,,,DataRobot,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,No Free Hunch Blog",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,University courses,10,20,10,60,0,0,,"Hidden Markov Models HMMs,Markov Logic Networks",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +"Non-binary, genderqueer, or gender non-conforming",United States,25,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,,Python,University/Non-profit research group websites,Other,,,,,,,,,,,,,,,,,,,"Data Stories Podcast,FastML Blog",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Doctoral degree,Computer Science,,"Data Miner,Machine Learning Engineer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",Self-taught,40,0,40,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Flume,Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Tableau,Unix shell / awk",,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Often,Most of the time,,Most of the time,,Most of the time,Often,Often,Most of the time,,Most of the time,,,Most of the time,,Most of the time,Often,Most of the time,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools",Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,51-75% of projects,Entirely internal,Central Insights Team,"kaggle,NLTK corpus",not enough hardware to store,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,2900000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Argentina,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,10,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,QlikView,R,RapidMiner (free version),SQL,Tableau,TensorFlow",Rarely,Sometimes,,,Rarely,,,,Rarely,,,,Rarely,,,,,,,,,Sometimes,Often,Most of the time,,,Rarely,,,,Rarely,Often,Most of the time,,Rarely,,,,,,,Most of the time,,,Most of the time,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,Sometimes,Often,,Often,Often,,Often,,,Most of the time,Most of the time,,,,35,30,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,Sometimes,,,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,585000,ARS,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,80,10,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",,Financial,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Other,Sometimes,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Java,NoSQL,Python,SQL,TensorFlow,Other,Other",,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,"Decision Trees,kNN and Other Clustering,Neural Networks,RNNs,SVMs,Time Series Analysis",,,,,,,,Most of the time,,,,,,Most of the time,,,,,,Often,,,,,Often,,,Most of the time,,Most of the time,,,,40,20,15,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,,,,,Often,,Rarely,Sometimes,,,,Most of the time,,Most of the time,Most of the time,Sometimes,Often,,Less than 10% of projects,More internal than external,Other,Stock Exchange data,Refining and modifying it to suit the model,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"360,000",INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,South Africa,24,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Python,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,College/University,Textbook",Very useful,,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst",University courses,5,5,0,90,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Most of the time,10MB,"Random Forests,SVMs","Amazon Web services,Java,Jupyter notebooks,Python,R,Other",,Sometimes,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,Rarely,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",Sometimes,,,,,Often,Most of the time,,Often,,,,,,,,,,,,Often,,Most of the time,,,,,Most of the time,Often,,,,,70,15,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Bitbucket,,,ZAR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,France,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Very Important,Not important,Not important,Somewhat important +Female,People 's Republic of China,38,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer",Work,50,30,20,0,0,0,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Mix of fields,"10,000 or more employees",Decreased slightly,More than 10 years,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Bayesian Techniques,Decision Trees,Random Forests","IBM SPSS Modeler,IBM SPSS Statistics,Python,R",,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Sometimes,,,,,Often,,,,,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Business Department,UCI,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,20000,CNY,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,SQL,Bayesian Methods,Python,University/Non-profit research group websites,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,,Not Useful,,Very useful,,,Somewhat useful,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,30,0,20,0,20,30,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,Bayesian Techniques,"Cloudera,Flume,IBM Cognos,IBM Watson / Waton Analytics,Impala,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Often,,Sometimes,,,Rarely,,,Often,Sometimes,Most of the time,,,,,,,,,,,,Often,Sometimes,,,Often,,,,,,,,,,Often,Most of the time,,,Sometimes,,,Most of the time,,,,"Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks",,,,,,,,Most of the time,,,,,,,,,,Often,Sometimes,Sometimes,,,,,,,,,,,,,,50,0,0,50,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,60000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Cloudera,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Company internal community,Online courses,Textbook,Other",,Very useful,Somewhat useful,Not Useful,,,,,,,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,50,0,0,20,0,30,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Academic,I prefer not to answer,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Traditional Workstation,Other","Text data,Relational data",Most of the time,,"Bayesian Techniques,Gradient Boosted Machines,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,Spark / MLlib,SQL",,,,Rarely,,,,,Often,,,,,,Most of the time,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,Often,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,Often,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,Often,,Most of the time,,Sometimes,Often,Sometimes,Most of the time,Sometimes,,,Rarely,,,,,Rarely,Most of the time,,,,,50,30,10,10,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,Often,Sometimes,,,,,,,,,Sometimes,Most of the time,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Python,Deep learning,Python,University/Non-profit research group websites,"College/University,Online courses,Stack Overflow Q&A",,,Very useful,,,,,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Rarely,1MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,Java,MATLAB/Octave,NoSQL,Python,R,SAS Base,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,Rarely,Sometimes,,,,Often,,,,,,Often,,,,,,Sometimes,,,,,,Sometimes,,,,Often,,Often,,,,,Rarely,,,,Sometimes,,Sometimes,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs",,Rarely,Sometimes,Sometimes,Rarely,Sometimes,,Sometimes,,Sometimes,,,,Often,,Often,,Often,,Often,Often,Sometimes,Sometimes,Sometimes,Sometimes,Often,Sometimes,Often,,,,,,10,20,0,0,0,70,Enough to code it from scratch and it will run blazingly fast and be super efficient,Other,,,,,,,,,,,,,,,,,,,,,,Often,10-25% of projects,Do not know,Other,ADULT data set; bitcoin; cervical cancer data; ,Collection of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,no sharing of data,Other,Never,1200000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed part-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by government",Spark / MLlib,Text Mining,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,YouTube Videos",,Very useful,,,Very useful,,Somewhat useful,,,,,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,Engineer,Self-taught,60,0,20,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,500 to 999 employees,Increased slightly,Don't know,A general-purpose job board,Important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data",Sometimes,1TB,"Evolutionary Approaches,Neural Networks,SVMs","Amazon Web services,C/C++,Mathematica,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,Sometimes,,Often,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Often,,Most of the time,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Most of the time,Most of the time,Sometimes,,,,,,20,20,0,25,25,10,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning",,,,,Most of the time,,,,,Rarely,,Sometimes,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,50000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,21,"Independent contractor, freelancer, or self-employed",,,No,Yes,Machine Learning Engineer,Poorly,Self-employed,Mathematica,Neural Nets,Python,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,"Jack's Import AI Newsletter,Linear Digressions Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,,Self-taught,40,50,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,I prefer not to answer,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important +Male,India,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Google Cloud Compute,Time Series Analysis,Python,GitHub,"Blogs,Conferences,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Very useful,,Very useful,Very useful,,,,,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,20,10,10,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs",No education,Academic,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Sometimes,1GB,"CNNs,HMMs,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,RNNs,Simulation",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,10,40,30,10,10,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,None,Do not know,Standalone Team,CIFAR; FLICKR,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,"360,000",INR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Singapore,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by college or university,Spark / MLlib,Neural Nets,Python,Google Search,"Arxiv,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Other,Basic laptop (Macbook),"Text data,Relational data",,100GB,Regression/Logistic Regression,"C/C++,Jupyter notebooks,MATLAB/Octave,Python,SQL",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"kNN and Other Clustering,Simulation",,,,,,,,,,,,,,Often,,,,,,,,,,,,,Often,,,,,,,40,40,0,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Often,,,,,,,,Often,,Often,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,Twitter;Flickr;taxi;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,50000,SGD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Iran,29,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,15,0,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Internet-based,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,Rarely,,,,,,Sometimes,Sometimes,,,,Most of the time,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",Sometimes,,,,Often,Most of the time,Most of the time,,,,,,,,,Often,,,,Sometimes,Sometimes,,Sometimes,Often,,,,,Often,,,,,30,40,10,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,Most of the time,,,Often,,,,,,,Sometimes,,,,Often,,,26-50% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,GitHub,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher",Work,60,20,20,0,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,100MB,"Evolutionary Approaches,Neural Networks,Random Forests,SVMs","Julia,Mathematica,MATLAB/Octave,Perl,Python,R",,,,,,,,,,,,,,,,Sometimes,,,,Often,Sometimes,,,,,,,,,Most of the time,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests",,,,,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,Often,,,Most of the time,,,,,,,,,,,20,20,10,10,40,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,,,,,,,Most of the time,Often,,,,,,Often,,,,,,,,10-25% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,107000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,37,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"GitHub,Government website,University/Non-profit research group websites","Blogs,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,,Not Useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Github Portfolio,Yes,Doctoral degree,Psychology,6 to 10 years,"Researcher,Other",Self-taught,50,20,0,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,,,Very useful,Very useful,"FastML Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,,"Blogs,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,500 to 999 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,Often,,Most of the time,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,Rarely,,,Rarely,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Sometimes,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,70,10,8,2,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Most of the time,,,,,,,,,,,,,Most of the time,,Sometimes,,,10-25% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Sometimes,120000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Sweden,43,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,SAS Base,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,High school,Telecommunications,20 to 99 employees,Increased significantly,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,,,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Segmentation",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,50,0,0,20,20,10,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input",,,,,Often,,,,,,Often,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,Google Trends,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Sometimes,40000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Unix shell / awk,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,"FastML Blog,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",45,15,25,0,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Sometimes,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",Rarely,,,,,Most of the time,Most of the time,Rarely,,,,Sometimes,,Often,,Often,,,Often,Sometimes,Sometimes,,Often,,,,,Often,Often,,,,,50,35,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,Often,,51-75% of projects,Do not know,Other,"kaggle, google search",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Romania,40,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,Other,Other,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Czech Republic,24,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,10,20,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Sometimes,100TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Java,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,Often,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,,,,,Most of the time,Most of the time,,Often,,,,,,,,,,,50,20,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,Sometimes,Often,,,Most of the time,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,600000,CZK,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,edX,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,PhD,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Indonesia,21,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Social Network Analysis,Java,"GitHub,Google Search","College/University,Friends network,YouTube Videos",,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,Very useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,30,30,0,40,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +"Non-binary, genderqueer, or gender non-conforming",United Kingdom,17,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Rule Induction,Matlab,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,25,0,0,5,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Other","Image data,Video data,Other",Most of the time,10GB,"CNNs,Ensemble Methods,Evolutionary Approaches,GANs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,Most of the time,,Often,,,,Often,,,,,Rarely,,Sometimes,,Sometimes,,,,Often,Rarely,,,,,Sometimes,,,,Often,,,,,,,,,,Rarely,Sometimes,,,Rarely,Often,,,,,,"A/B Testing,CNNs,Data Visualization,Ensemble Methods,Evolutionary Approaches,GANs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs",Rarely,,,Often,,,Often,,Often,Often,Rarely,,,Rarely,,Rarely,,,,Often,Often,,,Rarely,,,,Rarely,,,,,,40,45,10,2,3,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,Rarely,Sometimes,,,,,,,,,,,Sometimes,,Less than 10% of projects,More external than internal,Standalone Team,Secret,Secret,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,50000,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Poland,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by college or university,Employed by government",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,80,10,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A doctoral degree,Academic,10 to 19 employees,Stayed the same,Don't know,A tech-specific job board,Very important,Other,Laptop or Workstation and local IT supported servers,Image data,,1GB,"Ensemble Methods,Random Forests,SVMs","Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,Often,,Often,,,,,Often,,,,,,20,20,20,20,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Often,,,,Most of the time,Sometimes,,,,,,,,,,Sometimes,Often,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,60000,PLN,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Other,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,Python,Neural Nets,R,,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Natural Language Processing,Decision Trees - Random Forests,No education,Telecommunications,,,,,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Don't know,,Decision Trees,"C/C++,Java,MATLAB/Octave,Python,R,SQL",,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Random Forests,Text Analytics",,,,,,,Often,Often,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,50,20,15,5,10,0,Enough to refine and innovate on the algorithm,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Often,,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,,,,,6,,,,,,,,,,,,,,,,,, +Female,Russia,44,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,,,Jupyter notebooks,Cluster Analysis,Python,Google Search,"Conferences,Friends network,Kaggle,YouTube Videos",,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,Talking Machines Podcast",1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Professional degree,,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Unsupervised Learning,"Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),Other","Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,10,60,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Other (please specify; separate by semi-colon),A bachelor's degree,CRM/Marketing,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,Other,"Java,Other,Other,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,"Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,15,40,20,15,10,0,Enough to tune the parameters properly,Inability to integrate findings into organization's decision-making process,,,,,,,,Rarely,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,"NER Training Data (for date time, person name, organisation and location);",Identifying useful information from unstructured data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,710000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,0 - 1 hour,PhD,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,20,10,10,10,10,Outlier detection (e.g. Fraud detection),Support Vector Machines (SVMs),I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,India,27,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Other,R,Google Search,Blogs,,Not Useful,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,Very useful,,Very useful,,,Very useful,,Very useful,,Very useful,Very useful,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Predictive Modeler,Statistician",University courses,0,0,25,75,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,IBM Cognos,IBM Watson / Waton Analytics,Julia,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Other",,,,,,,,,Sometimes,Rarely,,,Rarely,,,Sometimes,,,,,,,,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,Often,Sometimes,Rarely,Often,Most of the time,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Often,,Often,Often,Often,Most of the time,,Often,,,Often,Most of the time,Often,Most of the time,Most of the time,,,,25,15,10,15,25,10,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,,,,,Sometimes,,Rarely,,,Often,Most of the time,,Most of the time,,,,Sometimes,,51-75% of projects,Entirely internal,Central Insights Team,Census,Data is ingested at too wide of intervals and has random errors introduced by IT team overseeing data ingestion ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,,Sometimes,135000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,No Free Hunch Blog,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Electrical Engineering,3 to 5 years,Programmer,Self-taught,80,0,0,15,5,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,30,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,Somewhat useful,,"Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,No,Master's degree,"Information technology, networking, or system administration",,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,,,,,,,,,,,,,,,, +Male,South Korea,35,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Google Search,"Kaggle,Textbook",,,,,,,Very useful,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,100,0,0,0,0,0,Unsupervised Learning,Neural Networks - RNNs,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,33,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Not Useful,,Somewhat useful,,,,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,45,5,5,25,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Not Useful,Very useful,,,Very useful,Very useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,10,15,60,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Text data,Sometimes,1TB,"Ensemble Methods,Neural Networks,SVMs","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,Rarely,Most of the time,,,,,,Rarely,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",Often,,,Sometimes,Rarely,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,Often,,,,,,,Most of the time,,,,,,60,20,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,Incorrect label;missing label;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,280000,CNY,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Finland,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,Very useful,Very useful,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Programmer,Statistician",University courses,50,5,15,25,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,1GB,Neural Networks,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,Often,,Sometimes,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,Text Analytics",,,,Sometimes,,Often,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Rarely,30000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Canada,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Fine arts or performing arts,3 to 5 years,"Software Developer/Software Engineer,Other",Work,20,30,50,0,0,0,,"Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",A doctoral degree,Technology,100 to 499 employees,Decreased slightly,Don't know,An external recruiter or headhunter,"N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,,"Amazon Web services,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,Rarely,Often,,,,3,0,5,2,5,85,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,Most of the time,,,,,,,Most of the time,Sometimes,,,Sometimes,Sometimes,,76-99% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,77000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,India,34,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,,,Necessary,,Necessary,,Necessary,,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Natural Language Processing,Speech Recognition,Time Series","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,,,,Very Important,,Very Important,Somewhat important,,Very Important,,,,,,,Somewhat important +Male,Ukraine,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Machine Learning Engineer,Researcher",Self-taught,30,10,20,20,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,Most of the time,,,,,,,,,,,Rarely,,,,Often,,,,,,"CNNs,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,Text Analytics,Time Series Analysis",,,,Sometimes,,,,Often,Often,,,,,Most of the time,,,,Most of the time,Most of the time,Often,,,Often,,Often,,,,Most of the time,Often,,,,40,10,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,700000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Pakistan,23,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,"GitHub,Google Search",College/University,,,Very useful,,,,,,,,,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Engineer,University courses,20,30,0,50,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Always,1GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Java,MATLAB/Octave,NoSQL,Python,SQL,Unix shell / awk",,Sometimes,,Often,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,Most of the time,,,,Often,,,,,,,,,,,Sometimes,,,,,,Often,,,,Markov Logic Networks,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,10,10,10,20,30,20,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,,Often,,,,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),"Email,Share Drive/SharePoint",,Bitbucket,,900000,PKR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,NoSQL,Decision Trees,SQL,Google Search,"Blogs,Official documentation,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"DBA/Database Engineer,Programmer",Self-taught,100,0,0,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,NoSQL,Oracle Data Mining/ Oracle R Enterprise,SQL",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Simulation",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,80,10,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,,,,Often,Often,Often,,,,,,,,,10-25% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Rarely,10000000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,Very useful,,Very useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,15,10,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,33,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,10,10,70,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Most of the time,,Often,Most of the time,Sometimes,Often,,Most of the time,,,Most of the time,,Most of the time,Most of the time,,,,,30,15,10,15,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,I prefer not to say,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Sometimes,,,,,,Often,,,,,,,Often,,,,,,,Often,,51-75% of projects,More internal than external,Business Department,External agencies,Data consistency,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Rarely,"100,000",,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,47,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Computer Science,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,30,20,0,10,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Retail,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,Rarely,,,,,,Often,,Most of the time,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Often,Sometimes,Often,,Often,,Most of the time,Often,Often,,,,,,,Often,,Sometimes,,Sometimes,Often,,Often,Often,,,,,,,,,,30,15,15,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,Often,,,Often,,Often,,,,,,,,,Often,Often,,76-99% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,70000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Other,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Other,Self-taught,50,40,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"5,000 to 9,999 employees",Increased significantly,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Sometimes,Rarely,,Rarely,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,,Sometimes,Sometimes,,Often,,,,,Rarely,,Often,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,Most of the time,,,,,,Most of the time,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Sftp,Bitbucket,Sometimes,110000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,"Not employed, but looking for work",,,,,,,,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,"DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,I haven't started working yet,University courses,30,10,0,50,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Anomaly Detection,SQL,,"Company internal community,Trade book",,,,Very useful,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,6 to 10 years,"Data Analyst,Other",University courses,0,0,40,60,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Retail,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,kNN and Other Clustering,Logistic Regression,Simulation,Time Series Analysis",Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,Sometimes,,,Often,,,,40,10,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,51-75% of projects,Approximately half internal and half external,Other,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,10,10,10,70,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,I don't know,Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,,,Often,Often,Sometimes,,,,,,Often,,Sometimes,,Sometimes,Often,,Sometimes,,Sometimes,,,,,,,,,,,5,40,50,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues",Sometimes,Sometimes,,,,,,,Rarely,,,Often,,,,,Often,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,Somewhat useful,Not Useful,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",15,25,20,5,15,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Sometimes,10MB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Python,R,TensorFlow",,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs",,,,,,Most of the time,,,Sometimes,,,,,,,Often,,,Most of the time,Often,,,Sometimes,,Often,,,,,,,,,15,30,40,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Other",,,,,,,,,,,,Often,,,,,,,,,,Often,Less than 10% of projects,Entirely internal,Standalone Team,"various natural language corpuses (penn Treebank, etc.)",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Subversion",Rarely,190000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Survival Analysis,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,Java,,"Arxiv,Kaggle,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,Software Developer/Software Engineer,Self-taught,50,0,50,0,0,0,,,A bachelor's degree,Technology,"5,000 to 9,999 employees",,Don't know,I visited the company's Web site and found a job listing there,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,,10TB,,"Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Spark / MLlib",,Most of the time,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,"kNN and Other Clustering,Neural Networks",,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Bitbucket,Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Textbook",Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Statistician",Self-taught,90,0,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,"5,000 to 9,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Rarely,,Rarely,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Simulation",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Rarely,,Rarely,,Often,,,,,,,Most of the time,,,Rarely,Rarely,,,,,,,50,10,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,Sometimes,,Often,,,Often,Often,,,,Sometimes,,,,,Rarely,Often,Often,,51-75% of projects,Entirely internal,Other,,labeled training data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,216000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Hungary,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,10,80,0,10,0,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Text Mining,Python,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",50,20,20,5,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Other,100 to 499 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,,,"Collaborative Filtering,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Recommender Systems,Time Series Analysis",,,,,Sometimes,,,Often,Often,,,Often,,,,,,,,,,,Often,Sometimes,,,,,,Often,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,Sometimes,Often,,Often,,,,Sometimes,,Often,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,2000000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,,,A bachelor's degree,Technology,"10,000 or more employees",,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Never,10MB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"Data Visualization,Logistic Regression,Neural Networks",,,,,,,Most of the time,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,30,20,0,30,20,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,Poor and/or complex interfaces to data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,GitHub,"Blogs,Conferences,Kaggle,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,10,10,10,0,70,0,"Machine Translation,Natural Language Processing,Reinforcement learning","Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,Technology,10 to 19 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Evolutionary Approaches,Regression/Logistic Regression,RNNs,SVMs","KNIME (free version),MATLAB/Octave,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Markov Logic Networks,Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis",,Rarely,,Often,,Often,Often,,,,,,,Sometimes,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,Often,Often,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations in the state of the art in machine learning,,,,,,,,,,,,Sometimes,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,Email",,Git,Sometimes,35000,INR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +,,NA,Employed part-time,,,Yes,,Scientist/Researcher,,Employed by government,Mathematica,Neural Nets,,,Podcasts,,,,,,,,,,,,,Not Useful,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,,Other,I don't write code to analyze data,Researcher,University courses,0,100,0,0,0,0,,,,Government,,,,,,,,,,,,Orange,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,100,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,,,,,,,,Git,,,,,5,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Natural Language Processing,Support Vector Machines (SVMs),A bachelor's degree,Internet-based,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,SVMs,"Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,Most of the time,,,Often,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,,,Sometimes,,,,Sometimes,,,,,,Often,Often,,,,Often,,Most of the time,,,,"Naive Bayes,Natural Language Processing,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,Most of the time,Most of the time,,,,,60,20,10,5,5,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,Often,,,,Most of the time,,Most of the time,,,,,Often,,,,,,,10-25% of projects,More internal than external,IT Department,,"Data cleaning , gathering and understanding context","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Git,Sometimes,1600000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,,,Very useful,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Necessary,Necessary,Necessary,,,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,Necessary,Other,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Machine Learning Engineer",Self-taught,60,10,10,0,0,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,11-15,Very Important,Very Important,Very Important,Very Important,Very Important,,,Very Important,Very Important,Very Important,Very Important,,,Very Important,Very Important,Very Important +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by non-profit or NGO,,,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Biology,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,70,0,10,0,Supervised Machine Learning (Tabular Data),,A professional degree,Other,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,Often,Often,,,,60,15,5,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,Sometimes,,Often,Most of the time,,,,,,,,,,,Often,,,26-50% of projects,More internal than external,Standalone Team,BIM,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,"115,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Philippines,27,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAP BusinessObjects Predictive Analytics,Neural Nets,C/C++/C#,I collect my own data (e.g. web-scraping),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,"Computer Scientist,Data Miner,Data Scientist,Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,30,30,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,10TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",Sometimes,Most of the time,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,SVMs,Text Analytics",Most of the time,,,,,Most of the time,Most of the time,Often,Often,,,,,Often,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,Often,,Most of the time,Most of the time,,,,,50,20,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Often,,Most of the time,Most of the time,,Often,,,,Often,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,,100% of projects,Entirely internal,Business Department,None.,Dirty and ambigous.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,1800000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Software Developer/Software Engineer",University courses,40,20,20,10,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,10GB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Random Forests",Sometimes,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Never,550000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Japan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Bayesian Methods,Python,GitHub,"Arxiv,Blogs,College/University,Conferences,Kaggle,Textbook",Very useful,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,,,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Textbook",,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Programmer",Other,20,50,0,10,20,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,<1MB,Other,"Amazon Web services,Google Cloud Compute,Microsoft Azure Machine Learning,NoSQL,Python,TensorFlow,Other",,Most of the time,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,Most of the time,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,,,,,,30,30,30,0,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,More external than internal,Other,university datasets; wikipedia; subtitles; UCI,Finding data in my language (NLP),"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",Never,57600,BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,Very useful,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher",Self-taught,50,10,0,30,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data,Other",Always,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,TensorFlow",Sometimes,Most of the time,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,Sometimes,,,,,Rarely,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Time Series Analysis",Often,,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Most of the time,,,Sometimes,Sometimes,,,Most of the time,Often,,,,,,Most of the time,,,,50,30,5,5,10,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools",,,,Sometimes,Most of the time,,,,,,,Most of the time,Often,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,Confidential,Cleaning & linking,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Bitbucket,,1000000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Software Developer/Software Engineer",University courses,0,10,30,30,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,R",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Decision Trees,GANs,Neural Networks,Random Forests",,,,Sometimes,,,,Sometimes,,,Rarely,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,50,15,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,Sometimes,,Often,Most of the time,,,,26-50% of projects,More external than internal,Standalone Team,weather data;,data is not clean,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Other",Sometimes,150000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,Python,Other,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,Computer Vision,Other (please specify; separate by semi-colon),A bachelor's degree,Technology,20 to 99 employees,Decreased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,10TB,"Regression/Logistic Regression,Other","Python,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,40,10,0,50,0,0,,"Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,Most of the time,Most of the time,,Less than 10% of projects,Do not know,Other,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Other,Rarely,55000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +,India,61,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Programmer",Kaggle competitions,5,20,15,0,60,0,"Adversarial Learning,Computer Vision,Machine Translation,Speech Recognition","Bayesian Techniques,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Image data,Text data,Relational data",Most of the time,100MB,"Markov Logic Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,Often,Often,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,Often,Often,,,,,,,Most of the time,,,,20,20,30,15,15,0,Enough to run the code / standard library,"Lack of significant domain expert input,Need to coordinate with IT",,,,,,,,,,,Often,,,,Most of the time,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,No,Data inconsistency,Graph (e.g. GraphBase/Neo4j),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,450000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,,R,GitHub,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,90,5,0,0,5,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Other,Basic laptop (Macbook),Text data,,,Bayesian Techniques,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,60,20,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Rarely,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,"40,000",USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Official documentation",,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,"Data Machina Newsletter,Linear Digressions Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Master's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Data Miner",University courses,60,0,0,30,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Taiwan,54,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Microsoft R Server (Formerly Revolution Analytics),Deep learning,Python,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,Computer Scientist,Work,30,20,20,20,0,10,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data",Rarely,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Java,Python,TensorFlow",,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",,,Sometimes,Most of the time,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,,Often,Sometimes,,,,,Often,,Often,Sometimes,,,,,25,45,2,18,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,75000,TWD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Australia,26,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,,"O'Reilly Data Newsletter,Partially Derivative Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,,University courses,10,10,0,80,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,Japan,41,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",,Very useful,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A professional degree,Academic,10 to 19 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,100MB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,NoSQL,Python,R,SQL",,Rarely,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",Often,Often,,,Sometimes,Most of the time,Most of the time,Often,,,,,,Sometimes,,,,Sometimes,Often,,Rarely,,,,,Rarely,,,Often,Often,,,,10,20,20,10,20,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database",Sometimes,,,,Most of the time,Sometimes,,,,,,,,Often,,Most of the time,Sometimes,Most of the time,,,,,26-50% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Most of the time,6500000,JPY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,38,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,,Very useful,,"O'Reilly Data Newsletter,Talking Machines Podcast",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Master's degree,,3 to 5 years,"Business Analyst,Computer Scientist",Kaggle competitions,80,20,0,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +Male,India,23,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important +Female,India,21,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Kaggle competitions,75,0,0,0,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,"DataTau News Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,0,40,0,20,0,Other (please specify; separate by semi-colon),,A bachelor's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Don't know,10GB,"CNNs,Ensemble Methods,Neural Networks,RNNs","Java,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Neural Networks,RNNs",,,Sometimes,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,35,30,10,5,20,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Rarely,,,Sometimes,,,Often,,Often,Rarely,,Sometimes,,,,,,Sometimes,,,Less than 10% of projects,Entirely internal,Standalone Team,,incorrect data with multiple datatypes in same column,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,Git,Sometimes,,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Conferences,Kaggle,Non-Kaggle online communities",,Very useful,,,Very useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,DBA/Database Engineer,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,30,0,10,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Always,10GB,,"SAS Enterprise Miner,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,"Data Visualization,Ensemble Methods,Logistic Regression,Segmentation",,,,,,,Often,,Most of the time,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,70,20,0,10,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Explaining data science to others,Limitations of tools,Need to coordinate with IT",,,,Sometimes,,Sometimes,,,,,,,Most of the time,,Sometimes,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,Other,"Commercial Data Platform,Company Developed Platform",,,Never,6000,EUR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Denmark,29,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,Not Useful,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Physics,,"Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Speech Recognition,Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important +Male,Australia,36,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook",,Very useful,,,,,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,,,Somewhat useful,,,,"Data Elixir Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,I don't know/not sure,Government,"5,000 to 9,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),"N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,,,,"Hadoop/Hive/Pig,Python,QlikView,Spark / MLlib,SQL",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,45,5,30,0,0,,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,Often,,,100% of projects,More internal than external,,ABS Census data; public datasets from data.gov.au,Data quality issues in transactional source system,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Share Drive/SharePoint",,,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Online courses",Very useful,Very useful,,,Very useful,,,,,,Somewhat useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,PhD,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Natural Language Processing,Reinforcement learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",25,30,40,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Google Cloud Compute,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,Tableau,TensorFlow",Rarely,,,,,,,Often,Sometimes,,,,,,,,,,,,,Sometimes,Often,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Sometimes,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Rarely,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,,Often,,Often,Often,,Often,,Most of the time,Sometimes,,Sometimes,,Often,Often,Often,,,,30,20,25,15,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,Often,,Often,,,,,Most of the time,,,,,,,76-99% of projects,More external than internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,512000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Australia,52,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Anomaly Detection,Python,Other,"Arxiv,Blogs,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,,,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Researcher,Statistician",Self-taught,40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A professional degree,Other,20 to 99 employees,Increased slightly,3-5 years,Some other way,Very important,Other,GPU accelerated Workstation,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Most of the time,,,Most of the time,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Time Series Analysis",Often,,Often,Often,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,Sometimes,,Most of the time,Sometimes,Often,Sometimes,,Often,,,,,Often,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Often,,,,Most of the time,,,,Most of the time,,,Sometimes,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,NA,Data cleaning;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,230000,AUD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Text Mining,SAS,Google Search,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,,,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important +Male,Germany,30,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,,,,,Very useful,,,,No Free Hunch Blog,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Programmer",Self-taught,40,0,20,20,20,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Other,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,10MB,Regression/Logistic Regression,"MATLAB/Octave,Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Logistic Regression,Random Forests,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,Rarely,,,,,,,Sometimes,,,,50,10,10,5,25,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,Often,,,,,,,,,,,Most of the time,Sometimes,,,,,Sometimes,,10-25% of projects,More internal than external,Central Insights Team,none,"image recognition not in standard formats ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,2500000,INR,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,"Blogs,Friends network,Personal Projects",,Somewhat useful,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Software Developer/Software Engineer,Statistician,Other",Work,85,0,10,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A professional degree,Mix of fields,Fewer than 10 employees,,Less than one year,Some other way,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Perl,Python,R,SQL,Unix shell / awk,Other",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,Often,,,,,,Sometimes,Sometimes,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",Often,,Sometimes,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,Often,Often,,Sometimes,,Often,,,,,,Sometimes,Sometimes,,,,5,15,8,2,10,60,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,,,,Less than 10% of projects,Entirely internal,Other,prefer not to say,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other","Slack, Dropbox, Google Drive, MySQL",Git,Always,,,,8,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,R,Deep learning,R,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,0,0,40,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,Neural Networks,Simulation,Text Analytics,Time Series Analysis",,,,Rarely,,Often,Most of the time,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,Often,,Sometimes,Often,,,,20,10,10,50,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),,0,RUB,I am not currently employed,5,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,Other,30,10,30,0,0,30,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Financial,Fewer than 10 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10TB,"Regression/Logistic Regression,Other","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,Tableau",,Most of the time,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,Often,,,,,,,"A/B Testing,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction",Often,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,,,,,,,,60,20,10,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,Often,Often,,,,,,,,,,Often,Most of the time,,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,300000,INR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,,,,,"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,,Kaggle competitions,30,10,0,20,40,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Very important,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,,"Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"Natural Language Processing,RNNs,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp","Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Master's degree,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,40,0,20,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Hungary,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Operations Research Practitioner,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",High school,Academic,500 to 999 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation","Image data,Text data",Rarely,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,SQL",,,,Sometimes,,,,,,,,,,,,,Often,,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,RNNs,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Often,,,,,,Sometimes,,Often,,,,Most of the time,,,Most of the time,Sometimes,Most of the time,,,,,Often,,,,50,30,5,5,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Often,,,Often,,,,Most of the time,Often,,,,,,Most of the time,,,,,Sometimes,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,,,,Very useful,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,Recommendation Engines,Gradient Boosting,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,South Africa,45,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,DataRobot,Other,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Not Useful,,,,,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Data Scientist,Predictive Modeler,Researcher,Other",University courses,20,30,30,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Not at all important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,QlikView,R,SQL",,,,Rarely,,,,,Rarely,,,,,,,,Often,,,,Rarely,Most of the time,Often,Sometimes,,,Sometimes,,,,Most of the time,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Often,Often,,,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,Often,Most of the time,,Most of the time,,,Often,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Often,Most of the time,Most of the time,,,,0,20,0,0,30,40,Enough to refine and innovate on the algorithm,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,1500000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Greece,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Online courses,Stack Overflow Q&A",,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,20,0,50,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Technology,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,,,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Segmentation",,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,40,0,5,30,25,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,,,,Often,,Sometimes,,,,,Most of the time,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,14000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Bayesian Methods,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,30,5,20,25,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Google Search,Blogs,,Very useful,,,,,,,,,,,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Nice to have,,,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Professional degree,,I don't write code to analyze data,Programmer,Self-taught,100,0,0,0,0,0,Speech Recognition,Neural Networks - CNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Somewhat important,,Somewhat important,Somewhat important,,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,,,,Somewhat important +Female,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Newsletters,Official documentation,Personal Projects,Textbook",,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,80,0,10,0,10,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",,Government,"5,000 to 9,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Rarely,100MB,"Neural Networks,Regression/Logistic Regression","C/C++,KNIME (free version),MATLAB/Octave,Python,Other",,,,Often,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks",,,,Often,,,Often,,,,,,,Most of the time,,Often,,,,Often,,,,,,,,,,,,,,10,40,40,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,,,,,,,Often,,,Often,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,2000000,INR,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Singapore,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Regression,,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,0,0,100,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,Fewer than 10 employees,Decreased significantly,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data",Never,,"Decision Trees,Neural Networks,Random Forests,RNNs,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,RNNs,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,0,0,0,90,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,100% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Other,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",35,5,0,0,60,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,10 to 19 employees,Increased slightly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Never,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Natural Language Processing,Time Series Analysis",,Sometimes,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,50,15,5,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,,,,Most of the time,,,,,Sometimes,Sometimes,Most of the time,,,,Often,,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,25000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Stack Overflow Q&A",,Very useful,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,Computer Scientist,University courses,5,0,45,45,5,0,"Natural Language Processing,Unsupervised Learning",Bayesian Techniques,A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data,Other",Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Java,Python",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,Rarely,,Rarely,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,,,,Sometimes,,Often,,Most of the time,Sometimes,,,,,Sometimes,Often,,,,40,10,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Sometimes,,,,,,,,,,,,Most of the time,,,Often,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,SQL,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"Business Analyst,Data Analyst,Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",5,90,5,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Other,"10,000 or more employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Data Visualization",,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,80,5,0,5,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,Sometimes,,,,Often,,,,,,,Most of the time,Often,,10-25% of projects,More internal than external,IT Department,None,Getting accurate & relevant data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,75000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,,R,,Textbook,,,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,50,20,20,10,NA,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Evolutionary Approaches",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,,Basic laptop (Macbook),Text data,Sometimes,1GB,"Ensemble Methods,Evolutionary Approaches,Neural Networks",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,,40,35,50,10,0,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,26-50% of projects,More external than internal,IT Department,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Indonesia,22,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by government,SAS Enterprise Miner,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,10,10,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Technology,500 to 999 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,Regression/Logistic Regression,"Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,Python,R,RapidMiner (commercial version),RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,Rarely,,,,,,,,Sometimes,,Sometimes,Rarely,,,,,Rarely,,Often,Sometimes,Sometimes,,,,,,,Most of the time,,,Sometimes,,,,,,,"Collaborative Filtering,Cross-Validation",,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,10,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,government,"unstructured data ",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,3000000,IDR,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Italy,36,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Cluster Analysis,Python,Google Search,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,Coursera,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Data Analyst,Other",Other,15,10,10,0,0,65,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,25,25,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - RNNs",A master's degree,Financial,500 to 999 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,Sometimes,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Random Forests",Sometimes,,,,,Sometimes,Often,,,,,,,,,Often,,,,Often,,,Sometimes,,,,,,,,,,,70,10,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,,,,,Rarely,,,,Sometimes,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,45000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,India,33,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Deep learning,Python,University/Non-profit research group websites,"College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,Yes,Master's degree,"Information technology, networking, or system administration",,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Other,32,Employed full-time,,,Yes,,Statistician,Poorly,"Employed by non-profit or NGO,Employed by government",Mathematica,Regression,R,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,10,5,5,5,25,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Mathematica,R",,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Simulation,SVMs",,,Most of the time,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,Most of the time,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,Often,,,,,,,Often,Often,,,,,Often,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,I don't typically share data",,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,12000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,India,27,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Kaggle Competitions,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,23,18,18,25,16,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Netherlands,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Newsletters,Official documentation,Online courses,Personal Projects",,Very useful,Very useful,,Somewhat useful,,,Very useful,,Very useful,Somewhat useful,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",70,20,10,0,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,Internet-based,10 to 19 employees,Increased slightly,3-5 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python",,Sometimes,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Text Analytics",,,,,,,Often,Often,,,,,,Sometimes,,Often,,,Sometimes,,,,,,,,,,Most of the time,,,,,20,20,20,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,I prefer not to say,Limitations in the state of the art in machine learning,Limitations of tools",,Often,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,45000,NPR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Not Useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",1-2 years,,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,0,0,0,100,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,,Somewhat important,Not important,,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important +Male,India,25,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Government website,"Blogs,College/University,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,20,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Never,,Regression/Logistic Regression,"Amazon Web services,Google Cloud Compute,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"A/B Testing,Data Visualization,Lift Analysis,Logistic Regression",Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,50,5,15,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,10.9,AFN,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Other,36,Employed full-time,,,No,Yes,Statistician,Fine,Employed by government,R,Time Series Analysis,Java,GitHub,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Statistician,University courses,90,0,10,NA,0,0,"Time Series,Other (please specify; separate by semi-colon)",Bayesian Techniques,Primary/elementary school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,36,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,Python,Google Search,"Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines",,A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,,"Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Java,NoSQL,Python,RapidMiner (free version),Spark / MLlib,Unix shell / awk",,Most of the time,,,Sometimes,,Sometimes,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,Rarely,,,,,,Often,,,,,,,Most of the time,,,,"A/B Testing,Natural Language Processing,Text Analytics",Most of the time,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,40,40,10,10,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,4000000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Bayesian Methods,Scala,Other,"Newsletters,Official documentation,Online courses,Textbook",,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,30,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",,More than 10 years,"A friend, family member, or former colleague told me",Important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","IBM Cognos,R,SQL,Unix shell / awk,Other",,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,Rarely,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Rarely,,,,Often,Often,Sometimes,Rarely,,,,,Often,,Most of the time,,,Rarely,,,Often,Most of the time,Sometimes,,Sometimes,,,Often,Sometimes,,,,25,25,10,10,10,20,Enough to explain the algorithm to someone non-technical,"Dirty data,I prefer not to say,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,Most of the time,,,,Sometimes,,,,,,,,Sometimes,,Often,,100% of projects,Entirely internal,Standalone Team,,Complexity;Lack of full definition;Ambiguity;Completeness,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,Google Search,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Traditional Workstation,40+,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Biology,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Adversarial Learning,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,India,33,"Not employed, but looking for work",,,,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Company internal community,Friends network,Newsletters,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,,,Somewhat useful,Very useful,,Very useful,Very useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,DataCamp,Other,40+,Kaggle Competitions,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Operations Research Practitioner,Researcher,Other",Other,30,0,10,0,0,60,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,20+,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,United Kingdom,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,R,Other,SQL,Google Search,"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Engineer,Other",Self-taught,80,0,10,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Technology,,,,,Not at all important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Most of the time,10MB,Regression/Logistic Regression,"C/C++,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Logistic Regression,Markov Logic Networks,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,,,Rarely,Rarely,,,,,,,,,,Often,,,Often,,,,20,0,20,40,0,20,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,Sometimes,Often,,Most of the time,Sometimes,,Often,Most of the time,,,,Sometimes,,,,,,,,Often,,10-25% of projects,More internal than external,Other,Global Temperatures,Security,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,70000,GBP,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Factor Analysis,SQL,Google Search,"Blogs,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Not Useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,Sort of (Explain more),Master's degree,,I don't write code to analyze data,Other,Self-taught,80,20,0,0,0,0,Time Series,Other (please specify; separate by semi-colon),A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important +Female,India,21,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,Python,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer",University courses,100,NA,0,0,NA,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Manufacturing,500 to 999 employees,Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Rarely,1GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",Sometimes,Often,,Sometimes,Often,Most of the time,Most of the time,,,,,Sometimes,,,,Most of the time,,,,Sometimes,Often,,,,,,,,,Often,,,,20,10,50,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Privacy issues",Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,Business Department,,Noisy and clumsy data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Other,Never,50400,SGD,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Ukraine,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Other",,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Other",Work,20,10,65,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data",Sometimes,100GB,"CNNs,RNNs,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Mathematica,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk,Other,Other",,Rarely,,Often,,,,,Often,,,,,,Sometimes,,,,,Rarely,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Often,,Most of the time,Sometimes,Sometimes,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Segmentation,SVMs",,,,Most of the time,,Sometimes,Often,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Most of the time,,,Most of the time,Sometimes,,,,,,,,,,,,Often,Most of the time,,Often,,Less than 10% of projects,More internal than external,Standalone Team,"ImageNet, Pascale VOC, and other","Marking, garbage removal","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Never,20000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,United States,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,GitHub,"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Data Scientist,University courses,0,0,40,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Hospitality/Entertainment/Sports,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Image data,Most of the time,,"CNNs,Ensemble Methods,Neural Networks,RNNs,SVMs","Java,Jupyter notebooks,Perl,Python,R,SQL",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,Sometimes,Most of the time,,,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,Sometimes,,Often,Often,Often,Often,,Sometimes,,,,30,70,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","I prefer not to say,Privacy issues",,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Git,,,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Researcher,I haven't started working yet",University courses,30,20,20,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Australia,55,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,DataRobot,Time Series Analysis,SQL,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Not Useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,Less than a year,I haven't started working yet,Self-taught,30,50,0,0,20,0,Other (please specify; separate by semi-colon),Neural Networks - RNNs,High school,Academic,Fewer than 10 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Rarely,100MB,Bayesian Techniques,NoSQL,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,60,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,None,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Bitbucket,,,BHD,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"College/University,Personal Projects,YouTube Videos",,,Somewhat useful,,,,,,,,,Very useful,,,,,,Not Useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Researcher,Self-taught,50,20,20,10,0,0,,,A doctoral degree,Non-profit,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Other,,100MB,,"Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,Rarely,,Rarely,,,Sometimes,Often,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Simulation,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,35,50,5,10,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,NA,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,,Necessary,,,,Necessary,Necessary,Necessary,,Necessary,,,,Other,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,15,20,35,25,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Often,Often,Often,,,Often,,Often,,Often,,,Often,,,,Most of the time,,,,,Often,Often,,,,,30,30,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input",,,,,Often,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,125000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,I haven't started working yet,University courses,20,5,10,30,20,15,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,57,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,R,Bayesian Methods,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher",Self-taught,90,10,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",CRM/Marketing,Fewer than 10 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Rarely,1GB,Other,"Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,Sometimes,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Naive Bayes,Time Series Analysis",,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,0,50,0,0,50,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,90000,AUD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Turkey,27,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Random Forests,Python,,"Arxiv,Blogs,College/University,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,,,,,"FastML Blog,Jack's Import AI Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,"Adversarial Learning,Computer Vision,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data",Most of the time,1TB,"CNNs,GANs,Neural Networks","Amazon Web services,C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Often,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation",,,,Most of the time,Sometimes,Most of the time,Often,,,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,Most of the time,Often,,,Sometimes,,Sometimes,,,,,,,,35,40,15,5,5,0,Enough to explain the algorithm to someone non-technical,Need to coordinate with IT,,,,,,,,,,,,,,,Rarely,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,Git,Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Decision Trees,SAS,Google Search,"Blogs,Non-Kaggle online communities,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,Very useful,,,,,,Very useful,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Very Important +Male,India,29,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Data Machina Newsletter,Partially Derivative Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",Work,30,10,50,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,IBM SPSS Modeler,Python,QlikView,R,SAS Base,SAS Enterprise Miner",,,,Rarely,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,Sometimes,Sometimes,,Often,,Most of the time,,,,Most of the time,Often,Most of the time,Most of the time,Most of the time,,Often,Often,Most of the time,Most of the time,,Most of the time,,,Most of the time,Often,Most of the time,,Most of the time,,,,10,40,10,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Sometimes,Often,,,Often,,Often,,,,,,,,Often,,Often,,51-75% of projects,Approximately half internal and half external,Central Insights Team,"Bloomberg, Capital IQ",ETL,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,2400000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,College/University,Kaggle,Official documentation,Personal Projects,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,Very useful,,Very useful,,,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Time Series","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,"Friends network,Stack Overflow Q&A",,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Internet-based,"5,000 to 9,999 employees",Increased significantly,Less than one year,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Always,1GB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Segmentation,Text Analytics,Time Series Analysis",,Often,Often,,,Most of the time,,,,,,Often,,,,Most of the time,,Most of the time,Most of the time,,,,,,,Most of the time,,,Most of the time,Sometimes,,,,20,40,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Female,United States,54,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Other,Bayesian Methods,R,"Google Search,Other","Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,95,5,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Very important,Other,Traditional Workstation,"Text data,Relational data",Most of the time,<1MB,"Bayesian Techniques,Markov Logic Networks,Random Forests,Regression/Logistic Regression,Other","KNIME (commercial version),KNIME (free version),Oracle Data Mining/ Oracle R Enterprise,R,SAS Enterprise Miner,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,,Most of the time,Most of the time,,,,,,Sometimes,,Most of the time,,Most of the time,,,,Sometimes,Often,,,,,,Most of the time,,,,,80,10,0,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,Sometimes,,,Often,,,,,,,,,,Most of the time,Often,,,Less than 10% of projects,Entirely internal,Standalone Team,None,Stored in slightly different formats over many years,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Other",MS Outlook,,Always,"150,000",USD,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Canada,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,,"Blogs,Company internal community,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,,University courses,NA,20,40,40,0,0,Recommendation Engines,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,"1,000 to 4,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,,"Bayesian Techniques,Regression/Logistic Regression","Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,,,,,,,Most of the time,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Ensemble Methods,Logistic Regression,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,Sometimes,,Sometimes,,Most of the time,,Sometimes,,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,Sometimes,,,,10,30,20,20,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring",,Sometimes,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,All data already communal,Git,Sometimes,"193,000",CAD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Ireland,34,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Time Series Analysis,R,,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,,University courses,0,10,0,85,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Retail,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,45,0,25,30,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,Often,,Most of the time,,Most of the time,,,,,Most of the time,,,10-25% of projects,More internal than external,Other,,,,,,,Sometimes,39000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,Google Search,"Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,Very useful,"FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,40,50,10,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,500 to 999 employees,Decreased significantly,Less than one year,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data,Other",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,Most of the time,Often,,Most of the time,,,,,,,,,,,,,,Often,,Often,,,,,,,,Often,Most of the time,,,Often,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks",,,,,,Often,Most of the time,Often,,,,,,,,Most of the time,,Often,,Often,,,,,,,,,,,,,,50,20,20,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Often,,Often,Most of the time,,,,,Often,,,Most of the time,,,Most of the time,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,160000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,50,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,"Information technology, networking, or system administration",,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,India,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Other,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Miner",Self-taught,50,25,0,25,0,0,,,High school,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Always,1GB,,"Microsoft Excel Data Mining,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,,Often,,,,Often,,Often,,,,,,,Often,,,,,,Often,,,,,,,,Often,,,,20,20,20,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Often,Often,,,Sometimes,,,Often,,Often,,,,Often,,,Often,Often,,Most of the time,,100% of projects,Entirely internal,Other,Nope,"Various , unstructured Data Formats","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,,Rarely,1980000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,30,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Miner",Work,5,15,70,5,5,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A professional degree,Mix of fields,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Never,100MB,"Random Forests,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,Often,,,Often,,,,,,,Rarely,,,"Data Visualization,Logistic Regression,Random Forests",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,8,5,0,7,0,80,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,Standalone Team,,,,,,,Sometimes,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,"Data Scientist,Researcher",University courses,30,10,40,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,Sometimes,,Often,,Often,,,,,Often,,Most of the time,,Sometimes,,,Sometimes,Often,Often,,,,50,25,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,100% of projects,More internal than external,Central Insights Team,Finance team and imagery data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,100000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,SQL,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,50,0,10,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,,,,Most of the time,Often,Most of the time,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,Sometimes,Sometimes,,,Most of the time,,,,,,Sometimes,,,,,50,20,20,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others",Sometimes,Often,,Often,,Most of the time,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Sometimes,100000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Mexico,21,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,,,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,23,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,20,20,0,50,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Female,South Korea,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Support Vector Machines (SVM),R,GitHub,"Arxiv,Blogs,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,"Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,20,10,70,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Financial,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Mathematica,MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,Rarely,,,,,,,,Rarely,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,Text Analytics",Rarely,Rarely,,,Sometimes,,Most of the time,Sometimes,,,,,,Most of the time,Sometimes,Most of the time,,,Sometimes,,,,Sometimes,Sometimes,,Most of the time,,,Sometimes,,,,,10,40,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",,,,,,,,,,Rarely,Rarely,Rarely,Rarely,,Rarely,,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,30000,KRW,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,30,10,0,15,25,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Internet-based,100 to 499 employees,Decreased slightly,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Collaborative Filtering,Data Visualization,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,,,Rarely,,Sometimes,,,,,,,,,,,,,Rarely,,,Often,,,,,Often,,Most of the time,,,,20,20,30,10,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Often,,,,Most of the time,,,,,,,Most of the time,,,,,,,10-25% of projects,More internal than external,IT Department,,Data cleaning and which type of libraries or package to use.,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Company Developed Platform,Share Drive/SharePoint",,Git,Never,150000,INR,Other,6,,,,,,,,,,,,,,,,,, +Male,Iran,24,Employed part-time,,,Yes,,Researcher,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,,Somewhat useful,,,,,"FastML Blog,Jack's Import AI Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,45,0,25,30,0,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,1GB,"CNNs,Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,40,45,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Always,24000000,IRR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,KNIME (free version),,R,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,College/University,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer",Self-taught,100,0,0,0,0,0,"Computer Vision,Machine Translation","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs",,Technology,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data",Sometimes,10MB,"Bayesian Techniques,Neural Networks","Amazon Machine Learning,R",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Text Analytics",,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,40,0,50,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Often,,,,,,,,Rarely,,Less than 10% of projects,Entirely internal,IT Department,,,Key-value store (e.g. Redis/Riak),"Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,"280,000",INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Egypt,33,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Bachelor's degree,"Information technology, networking, or system administration",,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,,,Very useful,,Very useful,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Retail,"5,000 to 9,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,Often,,,Often,,,,,,,"A/B Testing,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,Often,,,,Sometimes,Often,,Sometimes,,,,,,Often,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,,,,,,,Often,,,,10-25% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,60000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Singapore,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Very useful,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,,3 to 5 years,"Business Analyst,Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",60,10,30,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Mix of fields,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,TensorFlow",,Most of the time,,,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,Often,,,Most of the time,,,,Most of the time,,Often,,,,,Often,Often,,Often,,,,,Rarely,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Often,,,Often,Most of the time,Most of the time,Most of the time,,,Often,,Sometimes,Sometimes,Most of the time,,Often,Most of the time,,Sometimes,Often,,Often,,Sometimes,Sometimes,,Most of the time,Most of the time,,,,10,30,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Unavailability of/difficult access to data",,Sometimes,Often,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,26-50% of projects,More external than internal,Other,quandl,cleaning and transformation,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Most of the time,48000,SGD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SAS Base,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,Most of the time,,,,Often,,Often,,,,,Rarely,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests,Segmentation",,,,,,Often,Often,Often,,,,,,Rarely,,,,,,,,,Often,,,Sometimes,,,,,,,,40,30,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,800000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Belgium,54,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Perfectly,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Other",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Female,Pakistan,25,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,DataRobot,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Conferences,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,,Very useful,,Very useful,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher",Work,10,10,30,30,20,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Sometimes,10GB,"CNNs,Ensemble Methods,GANs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Most of the time,,,Often,,Often,,Often,,,Often,,Often,,,,,Often,,,,,Often,,Often,,,,,,30,50,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,Most of the time,Sometimes,,Often,,,,,,Sometimes,,,,100% of projects,More internal than external,IT Department,Satellite Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,480000,PKR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Spain,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Anomaly Detection,R,University/Non-profit research group websites,"College/University,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,Very useful,,,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,10,10,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased slightly,Less than one year,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,,100MB,"Decision Trees,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,NoSQL,Python,R,Spark / MLlib,SQL",Rarely,Rarely,,,Rarely,,,,Rarely,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,Most of the time,,,,,,,,Rarely,Sometimes,,,,,,,,,,"Cross-Validation,PCA and Dimensionality Reduction,SVMs",,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Subversion,Sometimes,30000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Belarus,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Python,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping)","Kaggle,Newsletters",,,,,,,Very useful,Very useful,,,,,,,,,,,"Data Elixir Newsletter,DataTau News Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,Unsupervised Learning,,A master's degree,Internet-based,,,,,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,10MB,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Association Rules,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,85,0,0,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team",Often,,,,Most of the time,,,,,,,,,,,Often,,,,,,,100% of projects,More external than internal,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Greece,23,"Not employed, but looking for work",,,,,,,,Julia,Genetic & Evolutionary Algorithms,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Online courses,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,,,,,Very useful,,,Somewhat useful,,,,,,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Physics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important,Very Important +Male,United States,24,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Python,Neural Nets,Python,University/Non-profit research group websites,"College/University,Personal Projects",,,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Programmer,Other",University courses,0,0,0,100,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Academic,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,100MB,Other,"Amazon Web services,Google Cloud Compute,IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,SQL,Tableau",,Rarely,,,,,,Often,,,Sometimes,,Often,,,,Most of the time,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Data Visualization,Natural Language Processing,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,5,10,50,15,20,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,,,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Java,Deep learning,Python,GitHub,"Blogs,Company internal community,Kaggle,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,FlowingData Blog",< 1 year,Nice to have,Unnecessary,,,Nice to have,,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,0,30,"Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Gradient Boosting,Neural Networks - RNNs",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Newsletters,Online courses,YouTube Videos",Very useful,Very useful,Very useful,,,,,Very useful,,,Very useful,,,,,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,Proprietary Algorithms,SQL,University/Non-profit research group websites,"Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,SQL,Tableau",,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,Rarely,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Rarely,,,,,,,,Most of the time,,,,,Sometimes,,Rarely,Rarely,,,,,,Most of the time,,,,65,10,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools",,Often,,,Often,,,,Sometimes,,,,Most of the time,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,115000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Tutoring/mentoring",,,,,,,,,,,Very useful,Very useful,,,,,Very useful,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler,Researcher",Work,50,20,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A professional degree,CRM/Marketing,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,NoSQL,Orange,Python,R,Spark / MLlib",,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,Often,,Most of the time,,Often,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs",Often,,,,Sometimes,Most of the time,Most of the time,Often,,,,Often,,,Often,Often,,Often,,Often,,Most of the time,Often,Often,,Often,Sometimes,Often,,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,Often,Most of the time,Most of the time,,,Sometimes,Often,Sometimes,Sometimes,Often,,Most of the time,,,,,Most of the time,,Most of the time,,10-25% of projects,Entirely external,Standalone Team,Google public data,"Incomplete, missing data fields","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,99999,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,80,20,0,0,0,0,Survival Analysis,,A bachelor's degree,Mix of fields,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Never,1GB,Regression/Logistic Regression,"Amazon Web services,C/C++,Java,NoSQL,Python,R,SQL",,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,30,20,20,20,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues",,,,,Sometimes,,,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,Rarely,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Most of the time,600000,INR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Israel,27,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Matlab,Google Search,"Arxiv,Blogs,College/University,Company internal community,Friends network,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,Not Useful,,Very useful,,,Somewhat useful,,Very useful,,,,,,Very useful,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,45,5,40,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,India,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,Very useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,0,10,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,5,5,20,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Technology,"5,000 to 9,999 employees",Decreased slightly,Less than one year,A career fair or on-campus recruiting event,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,<1MB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Neural Networks,RNNs","Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",Rarely,,,,,,,Rarely,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,Rarely,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",Sometimes,,,,,,,,Often,,Often,Often,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,40000,INR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,South Korea,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,75,0,15,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Male,India,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Cloudera,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,"Association Rules,Collaborative Filtering,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs",,Often,,,Often,,,Often,,,,,Often,Most of the time,,Most of the time,,,Most of the time,Sometimes,,,Often,Most of the time,,,,Most of the time,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,,,,,,Often,,,,,10-25% of projects,Approximately half internal and half external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Social Network Analysis,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,edX,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",60,30,5,0,5,0,Speech Recognition,"Decision Trees - Gradient Boosted Machines,Neural Networks - RNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,South Africa,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,Linear Digressions Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,35,15,5,45,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)",A doctoral degree,Financial,10 to 19 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,QlikView,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,Sometimes,Sometimes,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics",,,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,Sometimes,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,Sometimes,Most of the time,Sometimes,Sometimes,,,Often,,Often,,,,Often,,,26-50% of projects,Approximately half internal and half external,IT Department,,There is not enough. Very expensive to gather.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Never,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,,Nice to have,,,,,,,,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Australia,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Programmer",Self-taught,50,0,10,0,40,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Government,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,Often,,,,,Often,,,Often,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests",,,,,,Sometimes,Often,Sometimes,Rarely,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,60,20,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,,120000,AUD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,Google Cloud Compute,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,"Other,I haven't started working yet",University courses,0,0,0,50,50,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,I don't know,Increased slightly,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,,Other,"Hadoop/Hive/Pig,NoSQL,QlikView,R,RapidMiner (free version),Spark / MLlib,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,100,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,324000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Other,42,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",85,10,5,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,,Very useful,,Very useful,,,Somewhat useful,Very useful,"Data Stories Podcast,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,"Data Analyst,Programmer",Self-taught,40,40,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis","Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,Sometimes,<1MB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,"Collaborative Filtering,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation",,,,,Sometimes,,,,,,,,,,,,,,Often,Sometimes,,,Sometimes,Sometimes,,Sometimes,,,,,,,,40,20,20,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,10-25% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Bitbucket,Git",Rarely,650000,INR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Very useful,,Very useful,Very useful,,,,,,Very useful,Very useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Jupyter notebooks,Python,Spark / MLlib,SQL",Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,SVMs,Text Analytics",,,Often,,,Often,Often,Often,Often,,,Often,,,Often,,,,,,Often,Often,,Often,,Often,,Often,Often,,,,,30,30,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,I prefer not to say",,,,,Often,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,completeness.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Bitbucket,Never,145000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Netherlands,29,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Company internal community,,,,Very useful,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer,Other",University courses,0,0,40,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,10TB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,,Often,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,Rarely,Often,,Often,Rarely,Often,,Often,Often,Sometimes,Most of the time,Sometimes,Most of the time,Sometimes,Sometimes,Often,Often,Most of the time,Often,Most of the time,,,,30,20,20,5,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,Often,Often,Most of the time,Most of the time,,Often,Most of the time,Sometimes,,Sometimes,Sometimes,Sometimes,Most of the time,Sometimes,Often,Often,Most of the time,Often,Sometimes,,Less than 10% of projects,Entirely internal,Standalone Team,coc;weather;twitter;readit;kaggle;edm,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Share Drive/SharePoint,Other",airdrop,Git,Most of the time,85000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,48,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,R,GitHub,"Blogs,Personal Projects",,Very useful,,,,,,,,,,Very useful,,,,,,,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer",Work,50,0,50,0,0,0,"Computer Vision,Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,10 to 19 employees,Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests",Amazon Web services,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs",,,,Sometimes,,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,,,,,Sometimes,,,Most of the time,,,,,Sometimes,,,,,,30,50,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,51-75% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,"910,000",INR,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,Very useful,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,Data Analyst,Self-taught,60,0,40,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,Primary/elementary school,Internet-based,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,Random Forests,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Random Forests",,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,30,30,30,5,5,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Other,,,Other,Company Developed Platform,,Other,Never,,,,3,,,,,,,,,,,,,,,,,, +Female,Russia,22,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,,,,,"FlowingData Blog,Jack's Import AI Newsletter,The Analytics Dispatch Newsletter",< 1 year,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,20,20,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important +Male,India,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,25,5,30,0,10,30,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs",A master's degree,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Other,"Text data,Relational data",Never,100GB,"Decision Trees,Random Forests","Flume,Hadoop/Hive/Pig,NoSQL,R,SQL,Tableau",,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Often,Sometimes,,,,30,30,10,10,10,10,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,Often,Most of the time,,,,,,Often,Sometimes,,,,,,10-25% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,6000000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,GitHub,"Blogs,College/University",,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,"Data Stories Podcast,FastML Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher",University courses,50,0,0,50,0,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,6-10 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Rarely,1TB,"Decision Trees,RNNs","Java,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Decision Trees,RNNs,Text Analytics",,,,,,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,80,15,5,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,Do not know,Standalone Team,,,"Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Never,33333,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Italy,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Java,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,DataTau News Aggregator,Emergent/Future Newsletter (Algorithmia)",< 1 year,Necessary,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,,,,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,,,,,,,,,,,,,,, +Female,Indonesia,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Hadoop/Hive/Pig,Deep learning,R,I collect my own data (e.g. web-scraping),"Arxiv,Personal Projects,Textbook",Very useful,,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Statistician,University courses,30,30,0,40,0,0,Outlier detection (e.g. Fraud detection),Neural Networks - GANs,A master's degree,Government,10 to 19 employees,Stayed the same,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,1MB,Neural Networks,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,60,0,0,20,20,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,,5,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Self-employed,KNIME (free version),Bayesian Methods,R,Google Search,"College/University,Personal Projects,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,0,10,20,60,0,10,"Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Rarely,<1MB,"Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Rarely,Sometimes,,,,,Sometimes,Rarely,Most of the time,,,,50,30,15,5,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,51-75% of projects,Entirely internal,Standalone Team,weather data,quality data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Always,"40,000",BDT,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,University/Non-profit research group websites,"College/University,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,University courses,20,40,10,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,"Bayesian Techniques,CNNs","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,R,SQL",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Natural Language Processing",,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,30,10,20,10,30,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,,,Often,,26-50% of projects,More internal than external,Standalone Team,nothing,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,65000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,1-2 years,Nice to have,,Necessary,,Necessary,Necessary,,Nice to have,Necessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Julia,Genetic & Evolutionary Algorithms,R,"GitHub,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician",Self-taught,40,40,10,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Java,Mathematica,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Minitab,NoSQL,Perl,Python,QlikView,R,SAS Base,SQL",,Rarely,,Sometimes,,,,,,,Rarely,Sometimes,,,Sometimes,,,,,Often,,,Often,Often,Often,Often,Often,,,Rarely,Often,Sometimes,Most of the time,,,,,Often,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,Sometimes,Rarely,,,,Often,Often,,,,,Sometimes,,,Most of the time,,,,,Most of the time,,Most of the time,,,Most of the time,,Often,,Sometimes,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,,,,Sometimes,Often,Often,,Often,,,Often,,,,,Often,,26-50% of projects,Approximately half internal and half external,Standalone Team,,Cleansing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,750000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,R,I don't plan on learning a new ML/DS method,,Other,"Blogs,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Partially Derivative Podcast,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,DataCamp,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,1 to 2 years,,Self-taught,25,75,0,0,0,0,,,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,Taiwan,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,"Coursera,edX","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Master's degree,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Time Series","Hidden Markov Models HMMs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Company internal community,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,35,20,10,25,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Cloudera,Hadoop/Hive/Pig,Java,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Often,Often,,,,Often,,,,,,Often,Often,Often,,,Often,Often,Most of the time,Most of the time,,Most of the time,,Often,,,Often,Most of the time,,Most of the time,,,,,,,,Most of the time,Often,,,,Often,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics,Time Series Analysis",Often,Often,Often,Often,,Sometimes,,,,,Sometimes,Sometimes,,Sometimes,,Often,,Most of the time,Often,Often,,Sometimes,Sometimes,Often,Often,Often,,,Sometimes,Often,,,,10,30,30,20,10,0,Enough to refine and innovate on the algorithm,"I prefer not to say,Need to coordinate with IT",,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Rarely,1800000,INR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Web services,Text Mining,R,University/Non-profit research group websites,"College/University,Conferences,Non-Kaggle online communities,Online courses,Personal Projects",,,Very useful,,Very useful,,,,Very useful,,Somewhat useful,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,"Researcher,Other",University courses,60,15,20,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,10MB,Regression/Logistic Regression,"Perl,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,Rarely,,,Most of the time,,Rarely,,,,,,,Sometimes,,,,30,20,20,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,,,,,Sometimes,,100% of projects,More internal than external,,,,,Company Developed Platform,,Git,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Online courses",,Somewhat useful,,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,30,10,10,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Text data,,10TB,"CNNs,Decision Trees,Ensemble Methods","Python,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,Sometimes,,Most of the time,,Most of the time,Often,,,Most of the time,,Sometimes,,Sometimes,,,,Sometimes,,,Often,,,,,,,Rarely,,,,30,20,30,20,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,14.5,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Friends network,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,30,15,25,30,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Workstation + Cloud service","Text data,Other",Sometimes,10MB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,Often,Rarely,,,,,,,,Often,,,Rarely,Rarely,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Often,Sometimes,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs",,,,Sometimes,Sometimes,Most of the time,Most of the time,,,,,,,,,Often,,,Often,Most of the time,Sometimes,,,Often,Often,,,,,,,,,30,30,30,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,,,,Most of the time,Most of the time,,,Often,,,,,,Often,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Amazon S3,"Bitbucket,Git",Most of the time,48000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",,Financial,20 to 99 employees,Increased slightly,,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Don't know,,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs,Text Analytics",,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,,Often,,,,,Often,Sometimes,,,,,50,30,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Female,Other,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed part-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Not Useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,edX,Other,40+,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,0,0,40,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Very Important +Male,Ukraine,21,"Not employed, but looking for work",,,,,,,,Microsoft R Server (Formerly Revolution Analytics),Time Series Analysis,C/C++/C#,Google Search,Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,0 - 1 hour,Online Courses and Certifications,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,Work,50,10,15,10,15,0,Time Series,Neural Networks - GANs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,40,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Podcasts",,,Very useful,,,,Very useful,,,,,,Very useful,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer",Self-taught,70,20,0,0,10,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,,"C/C++,R,SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others",Sometimes,,,,,Sometimes,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,105000,,Has decreased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Pakistan,27,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Decision Trees,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Stories Podcast,Linear Digressions Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer",University courses,15,15,5,60,5,0,"Computer Vision,Survival Analysis","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Telecommunications,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Video data,Text data",Sometimes,1TB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","C/C++,Mathematica,MATLAB/Octave,Python,SQL",,,,Often,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"CNNs,Ensemble Methods,Logistic Regression,Naive Bayes,Recommender Systems,Segmentation,Text Analytics",,,,Sometimes,,,,,Often,,,,,,,Often,,Often,,,,,,,,Sometimes,,,Sometimes,,,,,10,10,50,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Rarely,,,Sometimes,Often,Often,,,,,Sometimes,Often,,Sometimes,,Most of the time,Sometimes,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,860000,PKR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft SQL Server Data Mining,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,"Data Machina Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,60,10,20,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Non-profit,Fewer than 10 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,C/C++,Hadoop/Hive/Pig,NoSQL,Python,R,SQL",,Sometimes,,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,Time Series Analysis",,Often,Often,,Often,Often,Often,Often,Often,Often,,Often,,Often,,Often,,Often,,,Often,Often,Often,Often,,,Often,,,Often,,,,25,25,5,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Rarely,Sometimes,Most of the time,,,Sometimes,Most of the time,,Often,Often,,Sometimes,Rarely,,Rarely,,,Sometimes,Often,,100% of projects,More internal than external,Standalone Team,Weather data; credit agency data; public data from UK government ,Cleaning it and putting it all in somewhat similar formats ,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,55000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Somewhat useful,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Predictive Modeler,Researcher",University courses,40,20,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Government,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Text data",Most of the time,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,SVMs,Time Series Analysis",,Sometimes,Rarely,,,,,Rarely,,,,,,Sometimes,,Often,Sometimes,,,,,,,,,,,Often,,Most of the time,,,,25,40,20,5,5,5,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,Often,,Most of the time,,Often,,,Often,,,Most of the time,,,,Less than 10% of projects,More internal than external,Other,global climate models,resolution,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,Other,Rarely,3000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Italy,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Not Useful,,,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,35,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,A humanities discipline,Less than a year,Researcher,Self-taught,55,35,5,0,5,0,,,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,"College/University,Personal Projects,Stack Overflow Q&A",,,Not Useful,,,,,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,Linear Digressions Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,Very useful,,,Very useful,,,Somewhat useful,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Management information systems,1 to 2 years,Software Developer/Software Engineer,University courses,30,30,0,30,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,South Korea,27,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,IBM SPSS Statistics,Regression,R,GitHub,Blogs,,Very useful,,,,,,,,,,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Speech Recognition,Logistic Regression,A master's degree,Retail,"1,000 to 4,999 employees",Increased significantly,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Rarely,1TB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","Java,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Python,R,RapidMiner (commercial version),TensorFlow",,,,,,,,,,,,,,,Often,,,,,,Sometimes,,Most of the time,,,Rarely,,,,,Often,,Rarely,Most of the time,,,,,,,,,,,,Often,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",,,,,Sometimes,Rarely,Most of the time,Often,,,,,,,,Often,,,,,Sometimes,Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,,60,30,0,10,0,0,Enough to run the code / standard library,Difficulties in deployment/scoring,,,,Most of the time,,,,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Google Search,"Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Not Useful,,,,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Other",University courses,70,20,5,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Neural Networks,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,Rarely,,,,,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,Sometimes,,,,,Often,Sometimes,,,Most of the time,,,Rarely,,,,,,,"Association Rules,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,Often,,,,,,,,,,,,Often,,Sometimes,,,,Often,Sometimes,,,,,Often,,,,,,,,25,50,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Most of the time,,Often,,,,,Sometimes,,,Most of the time,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,2400000,INR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Italy,64,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Work,20,20,60,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",,100MB,"CNNs,Neural Networks,Random Forests","Mathematica,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,,Often,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,Most of the time,,Often,,,,10,40,0,10,40,0,Enough to explain the algorithm to someone non-technical,"Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Most of the time,,,51-75% of projects,More internal than external,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,60000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,University/Non-profit research group websites,"Blogs,Non-Kaggle online communities,Official documentation,Online courses,YouTube Videos",,Very useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,Very useful,"DataTau News Aggregator,FastML Blog,The Analytics Dispatch Newsletter",3-5 years,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Machine Learning Engineer,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Natural Language Processing,Hidden Markov Models HMMs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important +Female,Australia,23,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Friends network,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,Not Useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,,Less than a year,Other,University courses,10,5,0,85,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Female,Other,28,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Genetic & Evolutionary Algorithms,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Statistician",Self-taught,20,10,30,0,40,0,Time Series,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,Most of the time,,,Often,,,,,Rarely,,Most of the time,,,,40,25,20,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,,,,,,,,,,,,,,,Often,,,10-25% of projects,More internal than external,Standalone Team,Bloomberg,-,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,15600,EUR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,R,"Google Search,Government website","Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,Somewhat useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,60,10,0,0,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,Australia,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,NoSQL,Text Mining,Python,University/Non-profit research group websites,"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,80,10,10,0,0,,,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1MB,"Decision Trees,Regression/Logistic Regression","Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,Rarely,,Sometimes,,,,Most of the time,,,Often,,,,,,,"Decision Trees,Logistic Regression",,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,20,0,30,10,40,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,Often,,Sometimes,,,Often,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"67,000",AUD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,India,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",R,Support Vector Machines (SVM),R,,"College/University,Kaggle,Newsletters,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,Very useful,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,Other,Work,20,20,60,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,1TB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,IBM SPSS Statistics,Mathematica,Microsoft Excel Data Mining,Minitab,R,Statistica (Quest/Dell-formerly Statsoft)",,Sometimes,,Sometimes,,,,,,,,Rarely,,,,,,,,Rarely,,,Sometimes,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Segmentation,Simulation,Time Series Analysis",Sometimes,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,Often,,Sometimes,,,,,,,,Often,Most of the time,,,Often,,,,10,40,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,Often,,,,Often,Often,,,,,,Often,,,,,,,10-25% of projects,Do not know,Central Insights Team,kaggle data,Variables are not explained properly ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Most of the time,,INR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Switzerland,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,R,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,75,5,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,I don't know,Stayed the same,Don't know,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,Regression/Logistic Regression,"Java,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Segmentation",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,60,0,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Often,,,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,125000,CHF,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Google Search,"College/University,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Reinforcement learning,"Markov Logic Networks,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,10GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Simulation",Sometimes,,Sometimes,,,Often,Often,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,Often,,,,,,,0,50,5,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources",Sometimes,,,,Often,,,,,Sometimes,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,20000,BGN,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Decision Trees,R,Google Search,"Friends network,Kaggle,Personal Projects,Tutoring/mentoring",,,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Data Analyst,Self-taught,80,0,0,0,0,20,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",High school,Mix of fields,500 to 999 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,"N/A, I did not receive any formal education",,Basic laptop (Macbook),Image data,,,,"Microsoft Excel Data Mining,SAS Base",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,"Decision Trees,Random Forests",,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,40,0,0,40,20,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,,Share Drive/SharePoint,,,,150000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Basic laptop (Macbook),GPU accelerated Workstation",40+,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,20,10,20,10,40,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",16-20,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Anomaly Detection,Python,Google Search,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Work,100,0,0,0,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Sometimes,100GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,NoSQL,Perl,Python,R,Spark / MLlib",,Often,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Often,,,Rarely,Rarely,,Rarely,,,,,,,,Rarely,,,,,,,,,,,"CNNs,Naive Bayes,Neural Networks,Text Analytics",,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,40,40,10,10,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Rarely,,,,Rarely,,Rarely,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,,,,,,,,48000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Malaysia,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Google Search,"Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Rarely,<1MB,"Decision Trees,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics",,,,,,,,Rarely,,,,,,,,Often,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,Sometimes,,,,,,,,Often,Often,,,,Often,,10-25% of projects,More internal than external,IT Department,enron emails,terrible grammar and use of English language,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,200000,MYR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,20,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Cluster Analysis,Stata,Google Search,"Friends network,YouTube Videos",,,,,,Very useful,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,DataTau News Aggregator,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,A social science,Less than a year,I haven't started working yet,University courses,10,10,0,80,0,0,Time Series,Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Russia,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Non-Kaggle online communities,Online courses,Textbook",Very useful,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Scientist,Engineer",Self-taught,60,30,0,0,10,0,"Computer Vision,Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Most of the time,10GB,"CNNs,Decision Trees","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,Often,,Most of the time,Often,Sometimes,Sometimes,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,,,,Rarely,,,,10,40,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring",Often,,,Often,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Subversion,Never,12000,USD,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Julia,Deep learning,Python,"I collect my own data (e.g. web-scraping),Other",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Programmer,Statistician",Self-taught,60,20,0,20,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs",A doctoral degree,Academic,20 to 99 employees,Decreased significantly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,100GB,"Bayesian Techniques,Markov Logic Networks,Random Forests","Julia,Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering",,,Sometimes,,,Sometimes,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,10,50,0,10,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Other,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Poland,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,"FastML Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,30,30,10,30,0,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Video data,Text data,Other",Most of the time,10GB,"CNNs,HMMs,RNNs","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Statistics,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Perl,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Often,,,,Rarely,Sometimes,,,Sometimes,,,Sometimes,,Often,,,Rarely,Often,,,Rarely,,,,,,Most of the time,Most of the time,,Sometimes,,,,,Often,Often,,Sometimes,Most of the time,,,Rarely,Often,,Most of the time,,,,"CNNs,Cross-Validation,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,30,30,3,2,5,30,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",I prefer not to answer,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Often,Often,,,Most of the time,,Sometimes,Most of the time,,,Most of the time,Often,Most of the time,,Most of the time,,Sometimes,,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Most of the time,Sometimes,,,,,,,Sometimes,,Often,,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Never,2000000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Singapore,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,,,A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,,Regression/Logistic Regression,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Logistic Regression,Naive Bayes",,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,75,5,0,5,15,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,,,,,,,Often,,,,,Often,,,,Often,,,26-50% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Hungary,40,Employed full-time,,,No,Yes,Other,,Employed by a company that doesn't perform advanced analytics,Julia,Neural Nets,Python,Google Search,"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,DataCamp,Traditional Workstation,2 - 10 hours,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,,"Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Iran,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,Bayesian Methods,C/C++/C#,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer",Self-taught,100,0,0,0,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Academic,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Other,,,"CNNs,HMMs,Neural Networks,SVMs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,HMMs,Neural Networks,Simulation,SVMs",,,,Rarely,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,Often,Rarely,,,,,,50,10,20,0,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,,Other,,,,,,,,0,USD,I am not currently employed,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Netherlands,42,Employed full-time,,,Yes,,Predictive Modeler,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Very useful,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,"FastML Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,"Data Scientist,Predictive Modeler",Self-taught,20,40,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks",,Financial,500 to 999 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,,GPU accelerated Workstation,Relational data,Sometimes,10GB,"Bayesian Techniques,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,,Often,,,Most of the time,Most of the time,,,,,Sometimes,,,,Often,Often,Sometimes,,,,Sometimes,Sometimes,,,Often,,,,Often,,,,70,20,10,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,,,,Often,Often,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,20,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,Don't know,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Video data,Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Often,,,,,,"CNNs,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Often,,,,Often,Often,,,,,,,Often,,Often,Often,Often,,,Often,,Often,,,Often,Often,Often,,,,30,10,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,Often,,,,,,,,,,,,,,,,Often,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Subversion,Rarely,120000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Support Vector Machines (SVM),Python,"GitHub,I collect my own data (e.g. web-scraping),Other","Blogs,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,30,5,5,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",High school,Academic,500 to 999 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,100GB,"Decision Trees,Neural Networks,Other","Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,"Decision Trees,kNN and Other Clustering,Neural Networks,Other",,,,,,,,Sometimes,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,50,40,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,Most of the time,,,Sometimes,,,,Often,,,,,,Often,,,,Less than 10% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Bitbucket,Rarely,120000,LKR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,KDnuggets Blog,< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,50,30,0,0,20,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Female,United Kingdom,46,Retired,,,Yes,,Operations Research Practitioner,Poorly,Employed by company that makes advanced analytic software,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects",,,,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Operations Research Practitioner,Software Developer/Software Engineer",Self-taught,0,100,0,0,0,0,"Natural Language Processing,Speech Recognition","Decision Trees - Gradient Boosted Machines,Markov Logic Networks",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,IBM Watson / Waton Analytics,Text Mining,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,6 to 10 years,Other,Work,80,5,15,0,0,0,Other (please specify; separate by semi-colon),,A bachelor's degree,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",Most of the time,1MB,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Sometimes,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Text Analytics",,Most of the time,,,Often,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,70,5,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",,,,,Most of the time,,,,,,,,,Sometimes,Often,,,,,,,,76-99% of projects,More internal than external,Other,,Dirty and unstructured,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Other,Rarely,"125,000",GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important +Male,India,44,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,"Coursera,DataCamp,Other",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",40,30,0,0,10,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Germany,24,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,10,0,65,5,0,"Computer Vision,Reinforcement learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,India,29,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,YouTube Videos",,,,,,,,,,Very useful,Very useful,,,,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Data Miner,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10TB,"Decision Trees,Random Forests,Regression/Logistic Regression","SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,Often,,,,,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,10-25% of projects,Entirely internal,Standalone Team,no public or third party datasets,gathering and cleaning the data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,60000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,Employed part-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,Partially Derivative Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Brazil,24,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","Arxiv,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects",Very useful,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,"FastML Blog,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,70,0,0,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,Not Useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,80,0,5,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,19,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Personal Projects,YouTube Videos",,,Not Useful,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,NA,99,0,1,0,0,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,Social Network Analysis,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Not Useful,Very useful,,Very useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,,"FastML Blog,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",30,20,30,20,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Other",Most of the time,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,Rarely,,Often,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,Sometimes,,Often,Most of the time,,Often,,,Most of the time,,Most of the time,,Sometimes,,Sometimes,Often,Sometimes,,,Sometimes,,,Sometimes,,,Sometimes,Often,,,,30,10,20,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Rarely,Rarely,Often,Often,Often,,,,,,Sometimes,Sometimes,Often,,,,,Sometimes,Often,Sometimes,,100% of projects,Entirely internal,Standalone Team,,petascale raw sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Other",,Git,Rarely,194000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Official documentation,Online courses,Personal Projects",,,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,0,100,0,0,0,Recommendation Engines,Decision Trees - Gradient Boosted Machines,A bachelor's degree,Retail,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Gradient Boosted Machines,RNNs","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Gradient Boosted Machines,Neural Networks,Recommender Systems,Time Series Analysis",Often,,,,Often,Often,,,,,,Often,,,,,,,,Sometimes,,,,Often,,,,,,Sometimes,,,,60,40,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,Often,,Often,,,,,,,,,,Often,,Less than 10% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,Most of the time,30000,RUB,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Other,26,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Tableau,I don't plan on learning a new ML/DS method,Matlab,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Newsletters,Online courses,Personal Projects",,,Somewhat useful,,Somewhat useful,,,Very useful,,,Very useful,Very useful,,,,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Necessary,,,Necessary,,,,,Necessary,,,,Coursera,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,,,,,Very Important,,Very Important,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,,,Very useful,,,Very useful,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer",Self-taught,40,20,30,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A doctoral degree,Mix of fields,"10,000 or more employees",Decreased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,,,,,,,Often,Sometimes,Sometimes,,,Most of the time,,,Often,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests,Segmentation,Time Series Analysis",,,,,Often,Often,Often,Often,,,,,,Often,,,,,,,,,Often,,,Often,,,,Often,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,Often,,,Most of the time,,Sometimes,Most of the time,Often,,,,,,Often,,,,10-25% of projects,More internal than external,Standalone Team,,"Missing data, high fluctuations in data with no known prior reasons for that","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Spain,50,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,Very useful,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Military/Security,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Always,1TB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Often,,,,,,Often,,Often,,,,,Often,,Often,,,,,,,,,,,20,20,5,30,25,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,Often,,,76-99% of projects,Entirely internal,IT Department,,confidentiality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Personal Projects",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",2 - 10 hours,PhD,Sort of (Explain more),Bachelor's degree,Computer Science,,Other,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,,,,,,,,,,,,,,,, +Male,Argentina,41,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,3-5 years,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Other",Other,2 - 10 hours,Online Courses and Certifications,No,Professional degree,,I don't write code to analyze data,"Data Analyst,DBA/Database Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs",A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Japan,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Other,Python,GitHub,"Arxiv,Kaggle,Personal Projects,Podcasts",Somewhat useful,,,,,,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,,"KDnuggets Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Machine Learning Engineer,Programmer,Researcher",Self-taught,30,0,50,0,20,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I don't know/not sure,Technology,10 to 19 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Always,100GB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests","Amazon Web services,Google Cloud Compute,Jupyter notebooks,TensorFlow",,Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Gradient Boosted Machines,HMMs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Sometimes,Often,,,,,,,,Often,Sometimes,,,Often,,,,Most of the time,Most of the time,,Most of the time,,,,,,,Sometimes,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,Most of the time,,,,Sometimes,,,Sometimes,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,7000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Monte Carlo Methods,Python,"GitHub,University/Non-profit research group websites","Blogs,Company internal community,Official documentation,Stack Overflow Q&A",,Very useful,,Somewhat useful,,,,,,Very useful,,,,Very useful,,,,,"FastML Blog,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,30,0,60,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Government,20 to 99 employees,Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Image data,Other",Most of the time,10MB,"SVMs,Other","C/C++,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,Often,,,,,,,Most of the time,,,,,Sometimes,,Often,,Sometimes,,,,30,40,5,25,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,Often,,,Often,,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Git,Other",Rarely,"34,200",CUP,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Conferences,Kaggle,Personal Projects",,,,,Very useful,,Very useful,,,,,Very useful,,,,,,,"FastML Blog,R Bloggers Blog Aggregator",1-2 years,,,,,,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,30,15,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Non-Kaggle online communities,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,Very useful,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,0,0,0,20,10,"Outlier detection (e.g. Fraud detection),Survival Analysis","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),"Blogs,Newsletters,Official documentation,Stack Overflow Q&A",,Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,,Sometimes,,,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,Often,,,Most of the time,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Need to coordinate with IT,Privacy issues",,,,,Most of the time,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by government,Self-employed",TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Not Useful,Not Useful,,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,60,20,5,10,5,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data,Other",Don't know,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Perl,Python,R,SQL,TensorFlow,Unix shell / awk,Other,Other,Other",,,,Sometimes,,,,Rarely,Rarely,,,,Rarely,,Rarely,,,,,,Rarely,Rarely,,,,,Sometimes,,,Rarely,Often,,Rarely,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,Sometimes,Often,Often,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other,Other",Rarely,,Sometimes,Often,,Often,Sometimes,Often,Most of the time,Most of the time,Rarely,Sometimes,,Often,,Sometimes,,Sometimes,,Most of the time,,,Sometimes,Sometimes,Rarely,Rarely,Rarely,Sometimes,Sometimes,Often,Rarely,Often,,60,15,5,10,5,5,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Rarely,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,51-75% of projects,Do not know,Standalone Team,Financial;Logs;Images,Slow workstation,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Rarely,660000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Italy,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle",Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer",University courses,35,0,0,40,25,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Academic,10 to 19 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems",,,,,Often,Most of the time,,Often,Often,,,Often,,Often,,Often,,,,,,,Sometimes,Often,,,,,,,,,,30,40,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,13500,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Canada,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,,Sort of (Explain more),Bachelor's degree,Computer Science,,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Russia,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses",Very useful,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Statistician",University courses,30,0,0,65,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"5,000 to 9,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100TB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Python,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,"A/B Testing,Cross-Validation,Ensemble Methods,Logistic Regression,Recommender Systems",Sometimes,,,,,Sometimes,,,Most of the time,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,35,15,50,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,Often,,,,,,,Often,,,,,,Most of the time,,,,,100% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Subversion,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Turkey,30,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Jupyter notebooks,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,Other",,Very useful,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,Data Scientist,Work,10,5,50,20,15,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,20 to 99 employees,Decreased significantly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,1TB,Gradient Boosted Machines,"Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Perl,Python,Spark / MLlib,SQL,Tableau",,,,,Most of the time,,Sometimes,,Most of the time,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,Sometimes,,,,,,,"Collaborative Filtering,Cross-Validation,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Time Series Analysis",,,,,Often,Often,,,,,,Often,,,,Rarely,,,,Sometimes,,,Rarely,Often,,,,,,Often,,,,40,35,10,15,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data",,Often,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Bitbucket,Rarely,"75,000",TRY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Factor Analysis,Java,GitHub,"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FlowingData Blog",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Speech Recognition,"Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,36,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by government,,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,Not Useful,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,40,0,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A doctoral degree,Government,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Image data,Text data,Relational data",Sometimes,,"Decision Trees,Random Forests","Amazon Machine Learning,NoSQL,Python,R",Often,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Data Visualization,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",Often,,,Sometimes,,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,50,10,5,5,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,Often,,Most of the time,Often,,,,,,,,,Most of the time,Often,,51-75% of projects,Entirely internal,Standalone Team,Census ,Data Integration ,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Pakistan,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Perfectly,Employed by professional services/consulting firm,SQL,,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,40,0,0,0,0,,,A bachelor's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10GB,,"Amazon Web services,C/C++,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,SQL",,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,10,40,40,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources",Most of the time,,,,,,,,,Often,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,2400000,PKR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,0,70,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A professional degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Python,R,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,Sometimes,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,Most of the time,,Most of the time,Most of the time,,Often,,,,,Often,,Most of the time,,,Sometimes,Most of the time,Sometimes,,,,,Sometimes,Most of the time,,Often,Often,,,,50,35,5,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Sometimes,,,Often,Often,,,,Often,Often,Most of the time,Most of the time,,,Often,,,,,Most of the time,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,"670,000",RUB,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,"FastML Blog,No Free Hunch Blog",3-5 years,,,,,,,,,,,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Physics,,Engineer,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Reinforcement learning",Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Russia,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by government",TensorFlow,Deep learning,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Textbook,Tutoring/mentoring",,Very useful,,,,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Decreased significantly,Less than one year,An external recruiter or headhunter,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Sometimes,100MB,"Neural Networks,RNNs","Java,Python",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks,RNNs",,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,35,35,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Rarely,18000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,33,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,5-10 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,"Computer Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Female,India,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Scientist,University courses,20,10,10,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",,3-5 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Video data,Text data",Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Python,R,SQL",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,,,Often,Sometimes,,,,,,,Rarely,,,,20,30,0,30,20,0,,"Inability to integrate findings into organization's decision-making process,Other",,,,,,,,Sometimes,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Never,,,,7,,,,,,,,,,,,,,,,,, +Male,Greece,37,Employed full-time,,,Yes,,Engineer,Perfectly,Self-employed,TensorFlow,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Manufacturing,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,Sometimes,,Most of the time,,,,,,Most of the time,Often,,Often,,,,Most of the time,,,Most of the time,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,EUR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,South Africa,35,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,R,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Predictive Modeler,Researcher,Statistician",University courses,15,0,0,85,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,40,20,20,0,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,Sometimes,,,,,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,800000,ZAR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed part-time,,,No,Yes,Data Scientist,Poorly,Employed by professional services/consulting firm,Python,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Non-Kaggle online communities,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,,Somewhat useful,,,,Somewhat useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,KDnuggets Blog",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Master's degree,,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,20,10,0,20,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,United Kingdom,47,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,I don't plan on learning a new ML/DS method,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Kaggle,Stack Overflow Q&A,Trade book",,,,,,,Very useful,,,,,,,Somewhat useful,,Somewhat useful,,,"KDnuggets Blog,The Data Skeptic Podcast",< 1 year,,,,,,,,,,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Other,,Other,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook",,Somewhat useful,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,Recommendation Engines,Logistic Regression,A doctoral degree,Internet-based,500 to 999 employees,Increased significantly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Video data,Relational data",Most of the time,100GB,"CNNs,Neural Networks,Regression/Logistic Regression","Java,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Often,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Recommender Systems",Often,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,60,20,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues",Sometimes,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Sometimes,300000,CNY,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Taiwan,35,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Neural Nets,Python,Government website,"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"Data Stories Podcast,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Kaggle Competitions,No,Master's degree,Physics,3 to 5 years,"Data Analyst,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,People 's Republic of China,37,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,Necessary,,,,Necessary,,,Necessary,,Necessary,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,30,40,10,0,20,0,"Adversarial Learning,Time Series",Decision Trees - Random Forests,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,30,Employed full-time,,,No,Yes,Other,Fine,Self-employed,R,Cluster Analysis,R,GitHub,"Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,,Other,2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,I don't write code to analyze data,Other,Self-taught,30,60,10,0,0,0,Adversarial Learning,Bayesian Techniques,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Biology,1 to 2 years,"Data Analyst,Researcher",University courses,15,5,0,10,0,70,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",0,30,30,10,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Random Forests","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems",Most of the time,,,,Sometimes,Often,Sometimes,Most of the time,,,,Rarely,,,,,,,Sometimes,,,Rarely,Sometimes,Sometimes,,,,,,,,,,10,40,30,10,10,0,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Often,,Less than 10% of projects,Entirely internal,Other,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Rarely,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Japan,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","Conferences,Friends network,Non-Kaggle online communities,Online courses",,,,,Somewhat useful,Very useful,,,Very useful,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,0,10,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Internet-based,10 to 19 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Prescriptive Modeling,Text Analytics",,,Rarely,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,Most of the time,,,,,45,15,10,25,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",,Often,,,,,,,Most of the time,,,,,,Often,Often,,,,,,,51-75% of projects,More internal than external,IT Department,Confidential,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,280000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,GitHub,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,,"Computer Scientist,Data Analyst,Engineer,Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition","Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Ukraine,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,24,"Not employed, but looking for work",,,,,,,,RapidMiner (commercial version),Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,Somewhat useful,Somewhat useful,Not Useful,,,,Very useful,,Somewhat useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Talking Machines Podcast",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",University courses,0,20,30,40,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important +Male,India,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,YouTube Videos",Somewhat useful,,,,,,,,,,,,,,,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,Business Analyst,Self-taught,70,30,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,,Very Important,,,Very Important,,,,,Very Important,,Very Important,Very Important,Very Important,Somewhat important +Female,Romania,47,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Statistica (Quest/Dell-formerly Statsoft),Neural Nets,,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Personal Projects,Textbook",,,,,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,More than 10 years,Researcher,Work,25,0,25,50,0,0,Speech Recognition,Neural Networks - RNNs,,Academic,100 to 499 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Rarely,1GB,Neural Networks,"Google Cloud Compute,Java,Mathematica,Microsoft Excel Data Mining,Orange",,,,,,,,Often,,,,,,,Often,,,,,Often,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Ensemble Methods,Natural Language Processing,Simulation",,Often,,,,,Often,,Often,,,,,,,,,,Often,,,,,,,,Often,,,,,,,25,25,25,25,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team",Often,,,,,,,,,Often,,,Often,,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Email,,,Rarely,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,TensorFlow,Anomaly Detection,R,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Technology,,,,,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,"Decision Trees,Random Forests","Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Data Visualization,Decision Trees,Random Forests,Time Series Analysis",,,,,,,Most of the time,Often,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,60,10,0,30,0,0,Enough to explain the algorithm to someone non-technical,Inability to integrate findings into organization's decision-making process,,,,,,,,Most of the time,,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Always,54000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,29,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Textbook",,,,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,"Data Machina Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important +Male,India,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Personal Projects,Textbook",,Very useful,,,,,Very useful,Very useful,,,,Very useful,,,Very useful,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,20,25,35,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,6-10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,10GB,Regression/Logistic Regression,"R,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,Most of the time,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,40,20,0,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,,,,,,Sometimes,,,Most of the time,,Often,Sometimes,,100% of projects,More internal than external,Standalone Team,Weather,Data sanity,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Rarely,100000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Decision Trees,Java,Google Search,"Blogs,Personal Projects",,Somewhat useful,,,,,,,,,,Very useful,,,,,,,"Data Stories Podcast,O'Reilly Data Newsletter,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Machine Translation,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",Finland,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Neural Nets,Python,University/Non-profit research group websites,"Official documentation,Stack Overflow Q&A,Textbook",,,,,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,,,A master's degree,Academic,I don't know,Increased slightly,Don't know,Some other way,Not very important,Other,"Traditional Workstation,Workstation + Cloud service",Relational data,Don't know,100MB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering",,,,,,Sometimes,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,90,0,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,Sometimes,,100% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,24000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,DataRobot,Neural Nets,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Engineer,Predictive Modeler,Researcher",Work,25,15,50,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,TIBCO Spotfire",,Most of the time,,Sometimes,,,,,Most of the time,,Rarely,Rarely,,,,,,,,,,,Often,,Often,,Most of the time,Sometimes,,,Most of the time,Often,Most of the time,,,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,Most of the time,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,Often,Often,Often,,Most of the time,Most of the time,Often,,,Most of the time,Most of the time,,Often,Most of the time,,,,20,35,15,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,,,Sometimes,Often,,,,,,,,Sometimes,,,10-25% of projects,Entirely external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial,Subversion",Sometimes,1400000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Online courses,Personal Projects,Textbook",,Very useful,Very useful,,Very useful,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,0,0,50,50,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Other,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data,Text data",,10GB,"CNNs,Decision Trees,Evolutionary Approaches,GANs,HMMs,Neural Networks,Random Forests,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,,,,,Most of the time,,Often,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Neural Networks,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics",,,Sometimes,Often,,Most of the time,Most of the time,Often,Often,Often,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,Most of the time,Often,Sometimes,Sometimes,Sometimes,,,,,70,20,0,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,Most of the time,Often,,,,Sometimes,,Sometimes,,Most of the time,,,Often,Often,,51-75% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Rarely,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Other,Python,Google Search,"Arxiv,Blogs,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,Work,35,0,45,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Technology,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Image data,Most of the time,100MB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,Most of the time,,,Sometimes,,Often,,,,,,,Most of the time,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,30,50,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of significant domain expert input",,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git,Subversion",Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,,3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,30,40,0,20,10,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,58,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,SQL,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Time Series,Logistic Regression,A master's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,Regression/Logistic Regression,"SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization",,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,70,10,0,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Often,,,,Most of the time,,,,Often,,,,,,Most of the time,,,,26-50% of projects,More internal than external,Other,,Where it is and finding the correct batabase table to use,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"100,000",,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,100MB,"Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,QlikView,R",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,,,,Often,,,,,,,Often,,Often,,,Most of the time,Most of the time,Most of the time,,,,Often,,,Most of the time,Often,,,,,20,20,20,20,20,0,Enough to run the code / standard library,Data Science results not used by business decision makers,,Often,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,450000,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Italy,46,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,R,Anomaly Detection,R,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Not Useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,,Work,40,0,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Academic,500 to 999 employees,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Text data,Other",Sometimes,<1MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,Other","C/C++,R,Stan",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis,Other",,,Sometimes,,,Often,Most of the time,Sometimes,,,,,,,,Often,,,Sometimes,,Sometimes,,Sometimes,,,,Most of the time,,Sometimes,Rarely,,,Often,10,50,10,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,76-99% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,Very useful,,,,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",10,50,10,30,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,22,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,SQL,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,10,40,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,,,,,,,,,,,,,,,, +Male,Spain,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Self-employed,TensorFlow,Genetic & Evolutionary Algorithms,R,I collect my own data (e.g. web-scraping),"Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,Very useful,Very useful,Very useful,,,,,Very useful,,Very useful,,,Somewhat useful,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Business Analyst,Machine Learning Engineer,Other","Online courses (coursera, udemy, edx, etc.)",15,30,25,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,,,,,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,SVMs","Amazon Web services,C/C++,IBM Cognos,IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,Tableau,TensorFlow",,Rarely,,Sometimes,,,,,,Rarely,Rarely,,Sometimes,,,,Often,,,,,Sometimes,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,Sometimes,,,,Rarely,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Sometimes,Most of the time,Most of the time,Sometimes,,,Most of the time,,,,Often,,,Sometimes,,Often,Often,Most of the time,Often,,Sometimes,Sometimes,,Sometimes,Often,,,,40,15,10,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Sometimes,Sometimes,Most of the time,,,,Often,Often,Sometimes,,,,Often,Most of the time,Often,,,Sometimes,Often,,51-75% of projects,More internal than external,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,Most of the time,,,,7,,,,,,,,,,,,,,,,,, +Male,Turkey,27,Employed full-time,,,Yes,,Business Analyst,,,Python,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,3 to 5 years,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",40,20,25,15,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,,"Bayesian Techniques,Decision Trees","MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,QlikView,R,SAS Base,SQL,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,Most of the time,,,,,,,Sometimes,Rarely,,,,,Often,,,,Often,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,Often,Most of the time,,,,30,25,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Scaling data science solution up to full database",,Sometimes,,,Often,Often,,,,,,,,,,,,Sometimes,,,,,10-25% of projects,More external than internal,Business Department,,,,Commercial Data Platform,,Other,,,,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Poland,21,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,5,15,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Mix of fields,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",,1GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Google Cloud Compute,Jupyter notebooks,Python,R,TensorFlow,Other,Other",,,,Rarely,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,,Often,Most of the time,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,Most of the time,,Most of the time,Often,,Sometimes,,,,,Rarely,,Often,,,,Often,,,Sometimes,,,,,Rarely,,,,,,25,55,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Rarely,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,Rarely,,,,,,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,50000,PLN,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Germany,38,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,KNIME (free version),Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,6 to 10 years,"Business Analyst,Other",University courses,30,0,30,30,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Neural Networks - CNNs",High school,Other,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Jupyter notebooks,KNIME (free version),Oracle Data Mining/ Oracle R Enterprise,Python,RapidMiner (commercial version),SAS Enterprise Miner,Other,Other",,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Rarely,,,Rarely,,,Rarely,,,,,Rarely,,,,,,,,,,,Sometimes,Most of the time,"Association Rules,Cross-Validation,Lift Analysis,Logistic Regression,Neural Networks,Segmentation,Text Analytics",,Sometimes,,,,Most of the time,,,,,,,,,Often,Sometimes,,,,Sometimes,,,,,,Sometimes,,,Sometimes,,,,,55,20,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Often,,,,,,Often,Often,,26-50% of projects,Entirely internal,Standalone Team,Geo,-,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Git,Subversion",Sometimes,100000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Israel,29,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Bayesian Methods,R,Government website,"College/University,Friends network,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,Data Analyst,University courses,40,0,30,30,0,0,,,A doctoral degree,Academic,"1,000 to 4,999 employees",,,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),Relational data,,,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,60,20,0,0,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,26-50% of projects,Do not know,Other,,,,Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,,9,,,,,,,,,,,,,,,,,, +Female,Portugal,31,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Newsletters,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,Fewer than 10 employees,Decreased significantly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Image data,Text data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,SVMs","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Orange,Python,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,Sometimes,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,,Sometimes,,,,,Often,,,,Often,Most of the time,Often,,,,Most of the time,,,,Most of the time,Often,Most of the time,,,,60,30,10,0,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,Most of the time,,,Often,Sometimes,,,,,,,,,,,Sometimes,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Bitbucket,Rarely,1000000,INR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,Very useful,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,Software Developer/Software Engineer,Other,20,10,0,0,0,70,Other (please specify; separate by semi-colon),,A master's degree,Other,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Other,Workstation + Cloud service,"Text data,Relational data",Most of the time,100TB,Decision Trees,"Cloudera,Flume,Hadoop/Hive/Pig,Impala,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,Often,,Most of the time,,Most of the time,,,,,Often,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,Sometimes,,,Sometimes,,,,Often,,,Most of the time,Most of the time,,,,40,30,0,5,0,25,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,,Often,,,,,,Often,,,Sometimes,,,,10-25% of projects,More internal than external,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,780000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,TensorFlow,Link Analysis,Python,Google Search,"Arxiv,Podcasts,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,"Data Elixir Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Programmer,Researcher",Self-taught,30,20,40,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Academic,I don't know,Increased slightly,6-10 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,Python,Spark / MLlib,TensorFlow,Unix shell / awk,Other",,,,,,,Rarely,,Sometimes,,,,,,Often,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,Often,,,Most of the time,Often,Sometimes,,,,,,,,Sometimes,,Often,,Sometimes,Often,,,,,Often,,,,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,,,Often,,,Most of the time,,,,,,,,,,,,Most of the time,,,26-50% of projects,More internal than external,Standalone Team,,"Privacy, confidence to share data","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Neural Nets,Python,University/Non-profit research group websites,"College/University,Kaggle,Textbook",,,Very useful,,,,Very useful,,,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,South Africa,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Not Useful,,,,Very useful,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,"FastML Blog,Linear Digressions Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,70,10,2,10,8,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,"5,000 to 9,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Traditional Workstation","Image data,Video data,Other",Always,10GB,"CNNs,Neural Networks","Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,Often,,,Often,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs",,,,Most of the time,Sometimes,Often,Often,Sometimes,,,,Most of the time,,Sometimes,,Sometimes,,Sometimes,,Most of the time,Often,Often,Most of the time,,,,,Most of the time,,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,Most of the time,,Often,Most of the time,Most of the time,,,,,Most of the time,,,,Sometimes,Most of the time,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,R,Deep learning,C/C++/C#,I collect my own data (e.g. web-scraping),"Friends network,Personal Projects,YouTube Videos",,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,10,0,80,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Government,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Other,Text data,Never,,Decision Trees,C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Decision Trees,kNN and Other Clustering",,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,50,30,10,10,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,Most of the time,,,Most of the time,,,10-25% of projects,Entirely internal,Standalone Team,,,,I don't typically share data,,,Never,1200000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Taiwan,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Talking Machines Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Other,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Statistician",University courses,20,30,0,20,0,30,"Machine Translation,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,United Kingdom,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,1 to 2 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Financial,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,RNNs,Time Series Analysis",,,,,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,Often,,,Most of the time,,Often,,,,,Often,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Privacy issues",Often,Often,,,,,,,Most of the time,,,,,,,,Often,,,,,,100% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Netherlands,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Data Analyst,Other",University courses,20,20,60,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,Other",,,,,Often,,,,Most of the time,,,,,,,,Most of the time,,,,,Sometimes,,,,,Often,,,,Often,,Sometimes,,,,,,,,Most of the time,Often,,,,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs",Sometimes,,Sometimes,,,Most of the time,Often,Often,Often,,,Often,,,,Often,,Sometimes,Sometimes,,Often,,Often,,,,,Often,,,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,Often,,,,,,Often,,,,Rarely,,,,,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,Cleaning and changing data feeds,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,60000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,Not Useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Data Analyst,Engineer,Predictive Modeler",University courses,15,0,10,75,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Other,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service",Other,Most of the time,,"Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,R,SAP BusinessObjects Predictive Analytics,SAS Base,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Sometimes,Rarely,,,,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,,Most of the time,Sometimes,Most of the time,,,,,,,Most of the time,,,,Rarely,Often,Rarely,Rarely,,,,,,,Often,,,,20,20,20,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Sometimes,1800000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,Python,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,DBA/Database Engineer,Kaggle competitions,5,70,20,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Decreased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,100MB,Regression/Logistic Regression,"Hadoop/Hive/Pig,IBM SPSS Statistics,MATLAB/Octave,Python,SQL,Tableau",,,,,,,,,Rarely,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,5,5,40,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,Sometimes,,,Most of the time,,,,Sometimes,,,Most of the time,,,,Most of the time,Sometimes,,26-50% of projects,More internal than external,Other,"VTAC, ABS",Quality Assurance that the data processed from the system is a true representation that can be reported on consistently.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,88000,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,Google Search,"Non-Kaggle online communities,Online courses,Tutoring/mentoring",,,,,,,,,Very useful,,Very useful,,,,,,Very useful,,Becoming a Data Scientist Podcast,< 1 year,,,Necessary,,,Nice to have,,,,,Necessary,,,Other,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Netherlands,56,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by college or university,Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,Very useful,,,,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Random Forests,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,40,10,20,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Sometimes,,,,Often,,,,,,,,,,,,Often,,,,,,26-50% of projects,Entirely internal,Other,,data cleansing; multiple sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Share Drive/SharePoint",,Subversion,Sometimes,70000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Random Forests,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Kaggle,YouTube Videos",Very useful,,,,,,Very useful,,,,,,,,,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,,,,Nice to have,Necessary,Nice to have,,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,No,Bachelor's degree,Computer Science,Less than a year,Engineer,Self-taught,100,0,0,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,Jack's Import AI Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,30,15,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Often,Often,,,Often,,,,Often,,,,Most of the time,Most of the time,,Often,,,,,,,Most of the time,,,,25,20,5,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Often,,,,Most of the time,Sometimes,,,Most of the time,,Most of the time,,,Most of the time,,,,,,Most of the time,,Most of the time,76-99% of projects,More external than internal,Standalone Team,imagenet; pascal voc,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,internal file server,"Bitbucket,Git,Other",Never,"7,700,000",JPY,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Singapore,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Bayesian Methods,Python,Other,"Blogs,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,,Somewhat useful,"Jack's Import AI Newsletter,KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,"Coursera,Other",Traditional Workstation,11 - 39 hours,PhD,Yes,Bachelor's degree,Electrical Engineering,,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Reinforcement learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,India,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Cluster Analysis,Python,Google Search,"Arxiv,Blogs,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,,Very useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,3 to 5 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,10,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,CRM/Marketing,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10GB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,Tableau,TensorFlow",,Sometimes,,,,,,Often,,,,,Most of the time,,,,Most of the time,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,,,,Sometimes,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,Often,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,Often,Most of the time,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,70,10,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,,,,,,,,Often,,,,,,,51-75% of projects,Entirely external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,2000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United States,16,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Very useful,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Other,Self-taught,40,4,0,25,1,30,,"Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Financial,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Other,Sometimes,100MB,"Ensemble Methods,Regression/Logistic Regression","Java,Microsoft Excel Data Mining,Python,R,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,"Cross-Validation,Ensemble Methods,Logistic Regression,Neural Networks,Simulation,Other,Other",,,,,,Rarely,,,Often,,,,,,,Often,,,,Rarely,,,,,,,Rarely,,,,Sometimes,Most of the time,,0,50,45,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",Often,,,Sometimes,Sometimes,,,,,,,Sometimes,Often,,,,Sometimes,,Sometimes,,,Often,51-75% of projects,More external than internal,Central Insights Team,,"Not overfitting but still finding a signal; passing ""originality"" on numer.ai","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Other",Always,50000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,South Africa,39,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Regression,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,0,0,10,0,Recommendation Engines,Logistic Regression,"Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,edX,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,,Master's degree,Other,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,France,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,GitHub,"Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Engineer,Researcher,Statistician",University courses,30,20,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,R",,,,,,,,,Most of the time,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Evolutionary Approaches,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Often,,,,Often,,,,,,Often,,,,,Often,,,,,,,,,,,,,40,20,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,,51-75% of projects,More external than internal,Standalone Team,,,Other,"Email,Share Drive/SharePoint",,Git,Sometimes,43000,EUR,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Hungary,50,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),Time Series Analysis,SQL,"Google Search,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,,,,,Somewhat useful,,< 1 year,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,,,,Other,Traditional Workstation,0 - 1 hour,Online Courses and Certifications,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,10,0,0,85,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,United States,65,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Amazon Web services,Neural Nets,R,Google Search,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,A bachelor's degree,Other,"5,000 to 9,999 employees",Stayed the same,Don't know,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Never,1MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Often,Often,Often,,,,,Often,,Often,,Often,,Often,,,Often,,,,,Often,,Often,,,,40,30,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,100% of projects,Entirely external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,165000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Personal Projects,Textbook",Very useful,Very useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Statistician",University courses,50,10,20,20,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Often,Rarely,,,,Often,,,,,,"A/B Testing,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,,,Rarely,,,Often,,,,,,,Sometimes,Often,,Often,Often,Sometimes,Often,,Often,Sometimes,Sometimes,Often,,,,50,25,10,0,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,Often,Often,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,Often,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Rarely,45000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10GB,"Decision Trees,Random Forests","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Naive Bayes,Random Forests",Often,,,,,,Most of the time,Often,,,,,,,,,,Rarely,,,,,Often,,,,,,,,,,,40,10,5,15,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues",Most of the time,Most of the time,,,Most of the time,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,76-99% of projects,Entirely internal,Standalone Team,None,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,2000000,INR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,,Data Machina Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,85,0,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,26,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity,Other","Gaming Laptop (Laptop + CUDA capable GPU),Other",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,University courses,10,60,0,20,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,1-2 years,,Nice to have,,,Necessary,,,,,,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Indonesia,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",R,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,20,40,0,30,0,10,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",Text data,Rarely,100MB,SVMs,"IBM SPSS Statistics,Minitab,Python,R,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,Often,,,,,,,,,Rarely,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,,,,,Often,Rarely,,,,,,Sometimes,,,,,,Rarely,Often,,Sometimes,,,,,Often,Often,Often,,,,30,5,5,50,10,0,Enough to run the code / standard library,"I prefer not to say,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,Sometimes,,,,,,,,,,,,Often,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Poland,26,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,Python,Decision Trees,Python,Google Search,"College/University,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,Very useful,,,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Data Scientist,Engineer",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs",Primary/elementary school,Manufacturing,"5,000 to 9,999 employees",Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data",Never,10MB,Random Forests,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Often,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,20,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues",,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,Often,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Sometimes,,,,2,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,R,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Statistician",Self-taught,0,70,30,0,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning",,A doctoral degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Very useful,,,Somewhat useful,,,,,,Somewhat useful,Very useful,Very useful,,,,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,1 to 2 years,"Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,40,20,40,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",High school,Financial,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Often,,,,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing",,,,,,,Most of the time,Sometimes,Most of the time,,,Often,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,55,15,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Less than 10% of projects,More internal than external,Standalone Team,IP Geolocation; BIN database,Data usually dirty or not structured enough,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,23000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,28,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by professional services/consulting firm,Python,,Python,GitHub,"Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",,,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Computer Scientist,University courses,10,5,5,80,0,0,"Machine Translation,Unsupervised Learning","Ensemble Methods,Hidden Markov Models HMMs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Image data,Most of the time,100GB,"Bayesian Techniques,Evolutionary Approaches,HMMs","C/C++,Java,NoSQL",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Natural Language Processing,Neural Networks,Time Series Analysis",Often,Often,Often,Often,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,Often,,,,20,30,0,20,30,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,Standalone Team,From universities ,data goverance,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",,300000,CNY,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,Kaggle,Podcasts",Very useful,Very useful,,,,,Somewhat useful,,,,,,Very useful,,,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,5,0,75,20,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Internet-based,100 to 499 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,10GB,"CNNs,Ensemble Methods,GANs,Neural Networks","Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,GANs,Neural Networks,Segmentation",,,,,,Often,Often,,,,Often,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,0,60,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,Often,,,Often,,Rarely,Often,,,26-50% of projects,More internal than external,Standalone Team,,,,Other,,,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",Somewhat useful,Very useful,,,,Very useful,,,Very useful,Very useful,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,10,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","IBM Watson / Waton Analytics,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,Rarely,,Sometimes,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,Most of the time,,,,"Association Rules,CNNs,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,Rarely,,Often,,Most of the time,,,Most of the time,,,Most of the time,,Often,,Most of the time,,Sometimes,Most of the time,Often,Often,,Most of the time,,Often,,,Often,Most of the time,,,,,20,15,5,5,15,40,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,Often,,Often,,,,Sometimes,,Most of the time,Most of the time,,100% of projects,More external than internal,Standalone Team,pubmed; patent databases; united nations comtrade; faostat,??,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",ftp server,"Git,Subversion",Sometimes,1400000,RUB,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Russia,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Poorly,Self-employed,MATLAB/Octave,Deep learning,SQL,University/Non-profit research group websites,"College/University,Kaggle,Online courses",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,3-5 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,Less than a year,"Business Analyst,Engineer,Other",University courses,0,10,0,90,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,Other,27,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Partially Derivative Podcast",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,I don't write code to analyze data,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,India,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,R,Google Search,"Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,Other,Self-taught,20,30,50,0,0,0,"Natural Language Processing,Speech Recognition,Unsupervised Learning","Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,10MB,Other,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,40,0,15,20,25,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process",,Often,,,,,,Often,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,Na,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,,Python,,"Blogs,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Talking Machines Podcast,1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Other",Basic laptop (Macbook),11 - 39 hours,,No,Master's degree,Physics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",35,65,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important +"Non-binary, genderqueer, or gender non-conforming",Germany,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Official documentation,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,10,0,10,80,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,Often,,,,Often,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Recommender Systems,Simulation",Often,,,,,Often,Often,,,,,,,,,Often,,,,,,,Rarely,Often,,,Often,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Always,50000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,15,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Not Useful,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,DataCamp,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Other,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,"DataTau News Aggregator,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,People 's Republic of China,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Support Vector Machines (SVM),R,GitHub,"Blogs,Kaggle",,Very useful,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,15,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Neural Nets,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,,< 1 year,,,,,,,,,,,,,,,,,,No,,,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,"Business Analyst,Data Analyst,Other",Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important +Male,Netherlands,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Government,500 to 999 employees,Increased slightly,6-10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,NoSQL,Python,QlikView,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,,,Most of the time,,,Most of the time,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,Sometimes,Sometimes,,Sometimes,Most of the time,Sometimes,,,,,,Often,,Often,,Sometimes,Often,Often,Often,,Sometimes,,Often,,,Often,Often,,,,,65,5,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,Often,,,Often,,,Often,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,Corpora,Getting enough relevant data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,50700,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",Somewhat useful,,,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",25,25,25,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100GB,"CNNs,Ensemble Methods,Regression/Logistic Regression,RNNs","Jupyter notebooks,Mathematica,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation",,,,,,Most of the time,Most of the time,,Often,,,,,Sometimes,,Sometimes,,,Sometimes,,Often,,Often,,Sometimes,,Sometimes,,,,,,,40,20,15,15,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,Often,,,,,,Sometimes,,,,,,,,Often,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,40000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Ukraine,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,R,Google Search,"Official documentation,Personal Projects,Textbook",,,,,,,,,,Very useful,,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer,Statistician",Work,30,9,30,30,1,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,NoSQL,Python,R,Spark / MLlib",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Rarely,,Sometimes,,,Most of the time,Most of the time,Rarely,Most of the time,,,Most of the time,,Most of the time,,Rarely,,,,Often,Most of the time,,Most of the time,,,Sometimes,Sometimes,Sometimes,,Most of the time,,,,70,4,0,20,6,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,Most of the time,Rarely,Rarely,,,,,Most of the time,Often,,,Most of the time,Sometimes,,100% of projects,Entirely internal,Standalone Team,,dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,40000,UAH,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Iran,31,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Engineer,Machine Learning Engineer",University courses,25,50,5,5,10,5,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Sometimes,10GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,SVMs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Most of the time,,Most of the time,Often,,Often,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,,,,,20,50,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,Often,,,,Often,,,,Sometimes,Often,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,500,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Indonesia,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,R,Social Network Analysis,R,Google Search,"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Other,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,26,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Matlab,Google Search,"College/University,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Researcher,Self-taught,20,20,30,30,0,0,Computer Vision,Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,,Not Useful,Very useful,,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,40,0,5,20,35,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Tableau,TensorFlow",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Most of the time,Sometimes,,,Most of the time,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,5,50,10,25,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Git,Never,25000,GBP,I am not currently employed,7,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Data Scientist,,,C/C++,Monte Carlo Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Researcher",,0,50,40,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Technology,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Sometimes,1TB,"Bayesian Techniques,CNNs,Neural Networks,RNNs","C/C++,Cloudera,Jupyter notebooks,Python,R,TensorFlow",,,,Most of the time,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,Often,,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,,,Most of the time,,,,20,50,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Often,,,,,,,Often,,,,Most of the time,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,40,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,Google Search,"College/University,Kaggle,Online courses,Personal Projects,Textbook",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,,No Free Hunch Blog,5-10 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",GPU accelerated Workstation,11 - 39 hours,Github Portfolio,Yes,Master's degree,A humanities discipline,More than 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,16,4,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,India,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Decision Trees,R,GitHub,"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,,Necessary,,Necessary,Necessary,Necessary,Nice to have,,,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,30,20,10,10,10,,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,,,,,,,,Very Important,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Python,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,South Africa,53,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,R,Neural Nets,R,Google Search,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,edX",Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,I never declared a major,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Norway,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Stan,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,"Data Stories Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Researcher",Kaggle competitions,25,20,15,15,25,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased slightly,Don't know,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Sometimes,100MB,"CNNs,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,Sometimes,,Often,Often,,,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,Often,,,,60,15,5,20,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,Often,,,,,,,,,,Rarely,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,Not Useful,,Not Useful,Not Useful,,,Somewhat useful,Somewhat useful,Very useful,,Not Useful,Somewhat useful,,,Not Useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,PhD,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,20,5,30,0,0,,"Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Philippines,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Bayesian Methods,Python,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,Belarus,26,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Predictive Modeler,Statistician",University courses,80,10,0,10,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +Male,Poland,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Management information systems,I don't write code to analyze data,Programmer,Self-taught,60,30,0,10,0,0,Natural Language Processing,Bayesian Techniques,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Other,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,"Not employed, but looking for work",,,,,,,,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,3-5 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician",Kaggle competitions,0,0,0,50,50,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Taiwan,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Newsletters,Podcasts,Textbook,YouTube Videos",,Very useful,,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Telecommunications,500 to 999 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Python,QlikView,R,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,Most of the time,,,,,,,,,,Rarely,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests",,Rarely,Sometimes,,,Sometimes,Often,Often,,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,,Often,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1600000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Russia,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Not Useful,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Software Developer/Software Engineer",Work,10,10,80,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,100TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Perl,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation",,,,,,Often,,Often,Often,,,Often,,Sometimes,,Often,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,5,60,30,5,0,0,Enough to explain the algorithm to someone non-technical,"Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,Less than 10% of projects,Entirely internal,Standalone Team,satellite imagery; vector data,No significant problems ,Other,Company Developed Platform,,Git,Sometimes,3600000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other,Other",Very useful,Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Not Useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Partially Derivative Podcast",3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service,Other",40+,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Other,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,60,5,5,5,5,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Taiwan,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Amazon Machine Learning,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Textbook",,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Other,3 to 5 years,"Engineer,Researcher",University courses,20,0,15,60,5,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other,,,,,,,,,,,,,,,,,,,KDnuggets Blog,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer",University courses,15,10,0,75,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ireland,41,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,"Data Stories Podcast,KDnuggets Blog,Talking Machines Podcast",3-5 years,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Physics,More than 10 years,Engineer,Self-taught,15,24,1,50,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Japan,54,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),Less than a year,Business Analyst,Self-taught,100,0,0,0,0,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,"Data Elixir Newsletter,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Researcher",University courses,15,10,20,40,10,5,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",Sometimes,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,,,Sometimes,,,,Sometimes,Often,Sometimes,Often,,Sometimes,Often,,,,,Sometimes,,,,,40,15,10,10,15,10,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Often,,Often,Often,,Often,Often,,76-99% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,110000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Canada,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Social Network Analysis,Python,"Google Search,Government website,Other","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Scientist,Predictive Modeler,Software Developer/Software Engineer,Other",Self-taught,20,40,20,0,20,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,10 to 19 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Sometimes,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Often,Often,Most of the time,Most of the time,,Most of the time,,,,,,Often,Often,,,,40,40,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,51-75% of projects,Entirely internal,Business Department,None,Limited data quantity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Other,,90000,CAD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Belgium,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,More than 10 years,"Data Scientist,Researcher",University courses,0,0,30,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Perl,Python",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests",Often,,Sometimes,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,Most of the time,,,,,,,,,,,70,10,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,Often,,,,,,Often,,,,,Sometimes,Often,,Less than 10% of projects,More internal than external,IT Department,3rd party conversion data; acxiom,lack of transparency into the ETL,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,330000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",1-2 years,,,Necessary,,Necessary,,,Necessary,,,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Kaggle Competitions,Yes,Professional degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important +Male,Other,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,,,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,Other (please specify; separate by semi-colon),,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Korea,48,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,Jupyter notebooks,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Statistician,Other",University courses,10,40,0,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Government,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Google Cloud Compute,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Python,R,SAS Base,TensorFlow",,,,,,,,Rarely,,,Rarely,Rarely,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,Rarely,,,,,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,Sometimes,Often,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,Often,Sometimes,Most of the time,Often,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Often,Most of the time,Often,Most of the time,Most of the time,Sometimes,Often,Often,Most of the time,Most of the time,Most of the time,,,,10,60,0,20,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,Often,,,10-25% of projects,Entirely internal,Standalone Team,chemical and medicinal data,"Understanding mathematics and algorithms clearly. Developing new algorithms. Changing from Windows to Linux again for Implementing with different languages such as Python, tensorflow since that does not run on Windows properly. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,"50,000,000",KRW,Has decreased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Other,Never,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Often,Often,,,,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,,Rarely,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Often,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,,,Sometimes,,Often,Most of the time,,,,,,,Often,,Often,,,Often,Often,Often,,Often,,Sometimes,,,,,,,,,80,5,0,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,,,Most of the time,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,Bitbucket,Sometimes,70000,GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,10,10,15,50,15,0,"Adversarial Learning,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Don't know,10GB,"CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs",,,,Most of the time,,Most of the time,Most of the time,Often,Sometimes,,Rarely,Often,Rarely,Rarely,,Sometimes,,,Most of the time,Most of the time,Often,,Often,,Most of the time,,,Often,,,,,,30,30,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,Often,,,,,,,,Often,,,51-75% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,30000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Malaysia,24,"Not employed, but looking for work",,,,,,,,Python,Support Vector Machines (SVM),Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,0,85,10,5,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Africa,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Financial,,,,,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Relational data,Sometimes,100MB,"CNNs,HMMs,Neural Networks","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow",,Sometimes,,,,,,,Often,,,,,,Sometimes,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,Often,,,,Most of the time,,,,,,"CNNs,Data Visualization,HMMs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Time Series Analysis",,,,Often,,,Most of the time,,,,,,Often,,,Often,,,,Most of the time,Most of the time,,,,Sometimes,,,Often,,Often,,,,30,50,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,Often,,,,,,,,,,,,,,Most of the time,Often,,Less than 10% of projects,Entirely internal,Standalone Team,,security,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Git,Subversion",Rarely,800000,ZAR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Spain,42,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,,"Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Programmer,Self-taught,50,20,30,0,0,0,Outlier detection (e.g. Fraud detection),Bayesian Techniques,,Technology,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data",Rarely,100MB,"Bayesian Techniques,CNNs","C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction",,,Sometimes,,,Often,Often,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,40,30,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,,Often,,,,Most of the time,,Most of the time,,,,Often,,,,,Often,Sometimes,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Never,31000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Indonesia,34,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer",University courses,10,10,10,70,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches",Primary/elementary school,Academic,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,100MB,"Bayesian Techniques,Evolutionary Approaches,HMMs,Neural Networks","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",,Sometimes,Most of the time,,,Most of the time,Most of the time,,Most of the time,Often,,,Often,Most of the time,,,,Most of the time,Most of the time,Often,Most of the time,,,,,,,Often,,,,,,20,25,20,15,20,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,,"Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,69150000,IDR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Regression,Python,GitHub,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer",Work,50,15,30,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,Academic,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Data Visualization,Decision Trees,Simulation",,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,20,20,10,35,15,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others",,,,,Often,Most of the time,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,550000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Ireland,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,GitHub,"College/University,Conferences,Online courses,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,Somewhat useful,,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher",University courses,90,0,0,10,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Other,Never,<1MB,"CNNs,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,MATLAB/Octave,Minitab,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,SVMs",,,,Often,,Often,,,,Sometimes,,,,Rarely,,Rarely,,,,Most of the time,,,,,Sometimes,,,Rarely,,,,,,40,30,20,5,5,0,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,HAR,Getting it in the right format; Collecting sufficient quantities,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,26000,EUR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,Somewhat useful,Somewhat useful,Not Useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Not Useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Decreased slightly,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation",Text data,Rarely,<1MB,SVMs,"Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,"Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,Sometimes,Most of the time,,,,,,Rarely,Rarely,,Sometimes,,,,,Often,,,,,,,Often,,,,,,35,20,0,10,35,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,100% of projects,More internal than external,Standalone Team,Gene[VA],,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,54000,CHF,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,,Very useful,,Very useful,,,Somewhat useful,,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Necessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity,Other","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,10,20,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,,"GitHub,Google Search","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Machine Translation,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Other,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Neural Networks",,,Often,,,,Often,Often,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,60,70,70,80,70,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Often,,,,,,,Often,,,,,,,,,Sometimes,,,51-75% of projects,,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Git,Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Cluster Analysis,Scala,GitHub,"Blogs,Company internal community,Friends network,Kaggle,Personal Projects",,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,,Very useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,0,0,20,20,20,Speech Recognition,Logistic Regression,No education,Internet-based,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10TB,"Bayesian Techniques,Neural Networks,Random Forests,RNNs,SVMs","C/C++,IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,R,SQL",,,,Often,,,,,,,,Sometimes,,,,,,,,,Sometimes,,Often,,Most of the time,,Often,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,SVMs",,Most of the time,,,,Often,,,,,,,,,,Often,,Most of the time,,Most of the time,,,Often,,,,,Sometimes,,,,,,20,30,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,Sometimes,,Most of the time,,,,Most of the time,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,45000,THB,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Ireland,39,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Newsletters,Online courses,Personal Projects",Very useful,,,,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,"Data Elixir Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Other",University courses,30,30,20,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Tableau,Unix shell / awk",,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,Most of the time,,,,Sometimes,,,Often,,,,"Collaborative Filtering,Cross-Validation,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,Sometimes,Most of the time,,,,,,,,Often,,Often,Sometimes,Sometimes,Often,,Often,,,Most of the time,,Most of the time,,,Most of the time,Most of the time,,,,5,25,40,10,0,20,"Enough to code it again from scratch, albeit it may run slowly","Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Often,,,Sometimes,Often,,,76-99% of projects,Approximately half internal and half external,IT Department,Tmdb;comscore;rentrak,Data quality,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,70000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,3 to 5 years,Data Analyst,University courses,30,0,60,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A master's degree,Academic,I don't know,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Never,100MB,Regression/Logistic Regression,"Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,,,,,,Sometimes,,,Rarely,,,,25,10,0,50,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,Sometimes,,,Often,,,,,,,Often,Most of the time,,,,,,100% of projects,Do not know,Standalone Team,NA,Cleaning and understanding the data generating process. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Dropbox ,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,52500,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,50,Employed full-time,,,No,Yes,Data Scientist,,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,,Master's degree,Electrical Engineering,1 to 2 years,Engineer,Self-taught,50,0,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Belgium,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Government website,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Linear Digressions Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,,,Researcher,University courses,NA,NA,NA,NA,NA,NA,Adversarial Learning,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Poland,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,32,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Other",Self-taught,40,50,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,16-20,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important +Male,United Kingdom,65,Retired,,,Yes,,Statistician,Poorly,,I don't plan on learning a new tool/technology,Neural Nets,,Other,"Online courses,Personal Projects",,,,,,,,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Self-taught,98,0,0,0,2,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,55,Employed full-time,,,Yes,,Data Miner,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Newsletters",Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Miner,Data Scientist,Researcher,Statistician",Self-taught,80,0,10,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,100 to 499 employees,Decreased significantly,6-10 years,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,RNNs,SVMs","MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,Sometimes,Often,,,Sometimes,Often,Sometimes,,,,,,Often,,,,Most of the time,Most of the time,Often,Often,,Sometimes,,Often,,,Often,Most of the time,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",Often,Sometimes,,,,Sometimes,,,Often,,,,,Often,,,,Often,,,,,10-25% of projects,More external than internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,38000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,Very useful,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast",< 1 year,,Nice to have,Nice to have,,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Master's degree,Computer Science,I don't write code to analyze data,Programmer,Self-taught,80,20,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Greece,36,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Personal Projects,Textbook",Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,,,Very useful,,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,70,0,10,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Increased significantly,3-5 years,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,10MB,"Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Often,Sometimes,,Most of the time,,,,"Data Visualization,Gradient Boosted Machines,Logistic Regression,SVMs,Text Analytics",,,,,,,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,Often,,,,,50,20,10,10,10,NA,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,Often,,Often,,,,10-25% of projects,More internal than external,IT Department,O*net; stackoverflow.com; geonames.org,Transformation into appropriate form.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,NoSQL,Genetic & Evolutionary Algorithms,Other,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,"Jack's Import AI Newsletter,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,0,30,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Other,Rarely,100MB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,Other","C/C++,MATLAB/Octave,Python,R,Statistica (Quest/Dell-formerly Statsoft),Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Often,,,,,,,,,,,Often,,,,Sometimes,,,,"A/B Testing,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",Sometimes,,,,Sometimes,,Most of the time,,,,,,,Often,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,56,3,1,13,27,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,Sometimes,Most of the time,,,,,,Rarely,,,Most of the time,,Most of the time,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Other,Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by college or university,Employed by non-profit or NGO",Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,Very useful,,,Very useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,30,20,20,20,10,0,"Natural Language Processing,Recommendation Engines,Time Series","Ensemble Methods,Neural Networks - RNNs",A master's degree,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Decision Trees,Ensemble Methods,Random Forests,RNNs,SVMs","Google Cloud Compute,Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Natural Language Processing,Neural Networks,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Often,,Often,,,,,,,,,,,Often,Sometimes,,,,Often,,,,Often,Often,Sometimes,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Often,,Most of the time,,,Often,Often,,Often,,,,,Often,,,,Sometimes,,,10-25% of projects,More internal than external,IT Department,,data cleaning and forming data pipeline,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,"180,000",TWD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Israel,28,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",0,5,20,5,70,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",High school,Other,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,10GB,Gradient Boosted Machines,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",Often,,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,,,,Often,Often,Often,,Often,Often,Sometimes,,,,Sometimes,Sometimes,,,,40,25,25,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,Often,,Sometimes,Most of the time,,Often,,,Sometimes,,,,,Most of the time,Most of the time,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,150000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,Very useful,,,,,,,Very useful,,Somewhat useful,,,,,,1-2 years,Nice to have,Necessary,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Friends network,Kaggle,Personal Projects,Textbook",,,,,Very useful,Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",Self-taught,20,15,40,20,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Technology,"10,000 or more employees",Increased slightly,3-5 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Text data,Relational data",Always,100GB,"CNNs,Ensemble Methods,Neural Networks,RNNs,SVMs","Python,R,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Most of the time,,,Sometimes,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",Sometimes,Often,,,Often,Most of the time,,,Often,,,,,Often,Often,,,,Most of the time,Most of the time,Often,,Often,Often,Most of the time,,,,Most of the time,Often,,,,50,20,10,10,5,5,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,,,Often,Most of the time,,,Most of the time,,Most of the time,,,,,,Often,Most of the time,,Most of the time,,,10-25% of projects,Entirely external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,3300000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,Other,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,,,,Very useful,Very useful,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Other,90,5,2,1,2,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation","Image data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,,,,,Rarely,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs",,,,Sometimes,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,,Rarely,,Rarely,Sometimes,Most of the time,Rarely,,Often,Often,Rarely,,,Sometimes,,,,,,40,25,20,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,Often,,Often,Often,Rarely,,Rarely,Rarely,,,Sometimes,Rarely,Often,Sometimes,Rarely,Sometimes,,10-25% of projects,More internal than external,IT Department,Social Networking; government public datasets; tax services; ,Big realational/tabular data bases need to be somehow pre-processed to be useful for ML,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Bitbucket,Git",Never,6000000,KZT,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Australia,50,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,37,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,20,0,50,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Most of the time,Often,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Random Forests,Segmentation,Time Series Analysis",,Sometimes,,,,,Often,Most of the time,Often,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,Often,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,Often,,,Often,,Often,,Often,,,Often,,,,,Often,,10-25% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,38000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses",,,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important +Male,Japan,22,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Data Stories Podcast,Linear Digressions Podcast,Siraj Raval YouTube Channel",< 1 year,,,,,,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,60,10,0,30,0,0,"Computer Vision,Recommendation Engines,Speech Recognition","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,Belgium,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Julia,Deep learning,R,"Google Search,University/Non-profit research group websites,Other","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Official documentation,Personal Projects,Textbook",Very useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,Very useful,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Predictive Modeler,Researcher",University courses,20,0,40,30,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Non-profit,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Other,Rarely,10GB,"Ensemble Methods,Random Forests,SVMs","R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Rarely,Often,,Rarely,,,,,Rarely,,,,,,,Often,,Rarely,,,,,Rarely,,,,,,65,10,5,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,,,Often,,,Sometimes,,Sometimes,,,76-99% of projects,Entirely internal,Business Department,data resources from www.ensembl.org,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Git,Sometimes,33000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,C/C++,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",70,15,0,0,15,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,Sometimes,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,Often,,,,Sometimes,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",Rarely,,,Sometimes,Sometimes,Most of the time,Most of the time,Often,Sometimes,,,,,Often,,Often,,Sometimes,Often,,Sometimes,,Often,Often,,Often,,,Often,,,,,50,30,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Often,,,Sometimes,,Often,,,,,,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1950000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Singapore,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Official documentation,Online courses,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Self-taught,40,10,20,30,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Financial,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"CNNs,Neural Networks,RNNs","C/C++,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,Sometimes,,,,Sometimes,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,Often,Often,,Often,Often,,Most of the time,,,,Sometimes,Often,,,,Often,Most of the time,Most of the time,Often,,,,Most of the time,,,Often,Most of the time,,,,,20,20,30,10,20,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,Often,,,,,Often,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,GloVE;pre-trained NNs,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,Not Useful,Somewhat useful,Not Useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Not Useful,,,Somewhat useful,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Other",Self-taught,40,15,15,10,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,"1,000 to 4,999 employees",Increased significantly,Don't know,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Rarely,Rarely,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,Rarely,Sometimes,,,Sometimes,Sometimes,Often,,Rarely,Rarely,,,,,Sometimes,Rarely,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,New Zealand,41,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Other,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Personal Projects",,,,Very useful,,,,,,,,Very useful,,,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,Other,University courses,0,0,50,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,1GB,"Random Forests,Regression/Logistic Regression","IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,,,,Rarely,Most of the time,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,Rarely,Rarely,,,,10,10,0,10,30,40,Enough to explain the algorithm to someone non-technical,"Unavailability of/difficult access to data,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,100% of projects,Entirely internal,Standalone Team,Census;Webscraping; Client;,Reliability and availability on tight deadlines,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,64000,NZD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,40,0,10,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important +Male,Taiwan,31,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,QlikView,Anomaly Detection,R,GitHub,"Arxiv,Friends network,Textbook,Tutoring/mentoring",Very useful,,,,,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Programmer,Researcher",University courses,0,0,80,20,0,0,"Computer Vision,Recommendation Engines,Time Series","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Image data,Sometimes,1GB,"CNNs,Ensemble Methods,HMMs,Neural Networks","Amazon Machine Learning,C/C++,Python,R,TensorFlow",Rarely,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction",Sometimes,Sometimes,,Often,,Often,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,,,,,30,20,30,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Privacy issues",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Nigeria,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Very useful,,Very useful,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Talking Machines Podcast",1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,0,20,10,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Russia,31,Employed full-time,,,No,Yes,Computer Scientist,Fine,Self-employed,,,,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,The Analytics Dispatch Newsletter,< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Engineer,Self-taught,20,20,20,20,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,C/C++/C#,Google Search,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Female,Turkey,28,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,The Analytics Dispatch Newsletter",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",4,15,1,80,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important +Male,Italy,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,Python,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,50,50,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Telecommunications,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,R,SQL",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,Rarely,Most of the time,Sometimes,Most of the time,Sometimes,,Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,,,,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,,,,30,20,20,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Often,,,Often,Often,Often,Often,,,Often,,,Often,,,Often,Often,,100% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,10,20,40,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Insurance,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Ensemble Methods,Random Forests,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Often,,Most of the time,,,Most of the time,,,,,,,,,,Often,,Most of the time,,Often,,Often,,,Often,Often,,,,,60,15,0,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,Rarely,,,Sometimes,Sometimes,,Often,,,,,,,Often,,100% of projects,Approximately half internal and half external,Standalone Team,,dirty/unstructured data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,110000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,23,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,Very useful,,,,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,1 to 2 years,I haven't started working yet,University courses,20,65,0,10,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",Rarely,10MB,Evolutionary Approaches,"Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,Rarely,,Sometimes,,,,,,,,Most of the time,,Rarely,,,,,,,,,Rarely,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Evolutionary Approaches,Prescriptive Modeling,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,,Sometimes,,,,50,0,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Privacy issues",,,,,Often,Often,,,,,,,,,,,Rarely,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,0,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,SAS Base,Regression,R,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,0,40,0,0,"Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Other,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Online courses,Textbook,YouTube Videos",,Very useful,Not Useful,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,India,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,R,"Ensemble Methods (e.g. boosting, bagging)",R,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,60,20,0,20,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,500 to 999 employees,Decreased significantly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1TB,"Neural Networks,Regression/Logistic Regression,SVMs,Other",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,Sometimes,,,,,,10,60,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Most of the time,,,,,,Most of the time,,,,,Often,,,,,,Most of the time,,,100% of projects,More external than internal,Standalone Team,,Converting pdf into csv,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Most of the time,3000000,INR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,35,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,Google Search,"Friends network,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Malaysia,32,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,Very useful,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,6 to 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Recommendation Engines,Time Series",Logistic Regression,Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Never,1GB,Bayesian Techniques,"Jupyter notebooks,Python,R,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,Rarely,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,0,30,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Subversion,Sometimes,30000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,College/University,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,20,30,20,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Java,NoSQL,Python,R,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,Sometimes,Sometimes,Sometimes,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,Sometimes,,Most of the time,,Sometimes,,Often,Most of the time,,,,,,,Most of the time,Often,,,,,35,30,10,5,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,,,,Often,Often,,10-25% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,2100000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,South Africa,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,Not Useful,Somewhat useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",Self-taught,70,20,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Orange,Python,R",,Most of the time,,,,,,Often,,,,,,,,,Most of the time,,Sometimes,,,Often,,Sometimes,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,,Often,Often,,,,,Often,,,,,,,Sometimes,,Often,,,,,,,Often,,,,60,15,5,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Most of the time,Often,,Sometimes,Most of the time,,,Often,Often,,Often,,,Often,,Sometimes,,Most of the time,,,,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,37000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"Data Elixir Newsletter,DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,30,50,5,0,15,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,500 to 999 employees,Increased slightly,6-10 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,IBM Cognos,Jupyter notebooks,NoSQL,Python,QlikView,SQL,Unix shell / awk",,,,,,,,,Often,Often,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,SVMs,Text Analytics",,,Often,,,Most of the time,Most of the time,Often,,,,,,Often,,Often,,Often,,,,,Most of the time,,,,,Most of the time,Sometimes,,,,,80,5,2,10,3,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",Most of the time,Sometimes,,,Often,,,,,Sometimes,Often,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,Eliminating or replacing blank data (missing values),"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,1200000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,South Korea,NA,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Internet-based,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",Sometimes,Often,,,Often,,,,Most of the time,,,,,Sometimes,,,Often,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,CNNs,PCA and Dimensionality Reduction,Random Forests",Often,,,Sometimes,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,40,20,20,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",Often,Often,,,Often,Sometimes,,Sometimes,Most of the time,,,,Sometimes,,,Most of the time,,,,,,,51-75% of projects,More external than internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,150000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",70,20,0,0,0,10,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important +Male,Israel,35,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Time Series Analysis,R,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",40,25,15,20,0,0,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression",A master's degree,Academic,I don't know,Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,,,"Java,Python,R,TensorFlow",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Naive Bayes,Neural Networks,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,,,Often,,Most of the time,,Often,,,,,,,,,,Most of the time,,,,10,70,0,10,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,Often,,,,,,,,,,,,,Often,Often,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Git,Other",Always,75000,USD,Has decreased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"I collect my own data (e.g. web-scraping),Other","Blogs,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,,,,,Not Useful,"Data Elixir Newsletter,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Mix of fields,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Most of the time,1GB,"Bayesian Techniques,CNNs","Amazon Machine Learning,Amazon Web services,Java,Python,R",Rarely,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics",Sometimes,,,,,,,,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,,Often,,,,,60,20,10,0,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Often,,,,None,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,600000,INR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Company internal community,Conferences,Kaggle,Personal Projects,YouTube Videos",,,,Very useful,Somewhat useful,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,20,20,40,20,0,0,Speech Recognition,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Image data,Text data,Relational data",Always,1GB,"Decision Trees,Regression/Logistic Regression","Java,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,Most of the time,,,,Often,Most of the time,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,Often,,,,,Most of the time,Most of the time,,,,,30,30,10,10,0,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,Less than 10% of projects,Entirely internal,Standalone Team,,,,,,,,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Sweden,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Podcasts,Stack Overflow Q&A",,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,60,0,40,0,0,0,Recommendation Engines,,A doctoral degree,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Don't know,,,"Jupyter notebooks,NoSQL,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Naive Bayes,Natural Language Processing",,,Rarely,Rarely,Rarely,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,,10,0,0,0,0,90,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Other",,,,,,,,,Sometimes,,,,,,,Sometimes,,Rarely,,,,Most of the time,None,Entirely internal,,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)","I don't typically share data,Other","Jira, GitLab",Git,Most of the time,384000,SEK,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United Kingdom,26,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,R,Anomaly Detection,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Conferences,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Very useful,,Very useful,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,"Data Stories Podcast,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",Kaggle competitions,30,10,10,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",High school,Technology,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Naive Bayes",,,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,6,2,1,1,0,90,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,Most of the time,,,,,,,Often,,,,,Often,,Less than 10% of projects,More external than internal,Other,House price data,"Transformation, understanding of data meaning ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,42000,GBP,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,Very useful,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Other,Work,45,25,30,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Most of the time,10GB,"CNNs,Neural Networks","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,TensorFlow",,Often,,Sometimes,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,CNNs,Data Visualization,Logistic Regression,Neural Networks,SVMs",Often,,,Most of the time,,,Often,,,,,,,,,Sometimes,,,,Often,,,,,,,,Rarely,,,,,,40,40,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,,INR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Support Vector Machines (SVM),R,Google Search,"Friends network,Kaggle,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Other",Work,60,20,20,0,0,0,Time Series,,A bachelor's degree,Internet-based,500 to 999 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,"Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,R,SQL",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",Often,,,,,,Often,Often,,,,,,Rarely,,Rarely,,,,,,,Rarely,,,Often,,,,,,,,20,20,10,40,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,,Often,,,,Often,,Often,,,Often,Often,,,Often,Often,Often,,,26-50% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1900000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,Python,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,30,0,30,20,NA,Unsupervised Learning,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,France,30,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,53,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Official documentation,Textbook",,,Very useful,,,,,,,Somewhat useful,,,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,University courses,0,0,0,95,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation","Image data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","C/C++,Java,Mathematica,MATLAB/Octave,Python,R,Statistica (Quest/Dell-formerly Statsoft),TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Most of the time,,,,,Often,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,Sometimes,,Often,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Often,Often,Often,Most of the time,,,,,,,,,,Often,,,Often,,,,Often,,,Often,,,,20,50,10,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,Often,Often,,,,,,Often,,Sometimes,,Less than 10% of projects,Approximately half internal and half external,IT Department,UCI,Building analytical models,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Pakistan,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Official documentation,Online courses,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Researcher,Other",University courses,20,5,10,40,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,10MB,"Decision Trees,Random Forests,RNNs,SVMs,Other","IBM SPSS Statistics,Java,MATLAB/Octave,Orange,Python,RapidMiner (free version),TensorFlow",,,,,,,,,,,,Sometimes,,,Rarely,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,Most of the time,,,,Sometimes,,Most of the time,Sometimes,,,,,,,Sometimes,,,,Sometimes,,,Most of the time,,Sometimes,,,Most of the time,,Sometimes,,,,15,30,30,15,10,0,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,52480,PKR,Other,8,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,Other,Self-taught,30,30,30,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Other,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,<1MB,"Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",Often,,,,,,Often,,,,,,,,,,,,,Most of the time,Often,,,,,,Often,,,Most of the time,,,,25,30,10,10,15,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,,Often,Sometimes,Often,,,,,Most of the time,,,Sometimes,,Sometimes,,,,,,Often,,51-75% of projects,Approximately half internal and half external,Business Department,"Market data from Bloomberg, Reuters and other sources",Outdated company politics,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Never,65000,EUR,Other,3,,,,,,,,,,,,,,,,,, +Male,South Africa,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,Time Series,Logistic Regression,A bachelor's degree,Financial,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Never,100MB,Regression/Logistic Regression,"Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,kNN and Other Clustering,Logistic Regression,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,Often,,,,80,2,0,15,3,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,,,Sometimes,,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,IT Department,Property Data,Data Quality,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Git,Rarely,600000,ZAR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Spain,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,5,5,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression",No education,Pharmaceutical,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Rarely,10MB,"Decision Trees,Random Forests","Amazon Web services,IBM Cognos,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R",,,,,,,,,,Rarely,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Rarely,,Rarely,,,Rarely,Sometimes,Often,,,,,,,,,,,,,Often,,Often,,,,,,,Sometimes,,,,40,40,5,5,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Sometimes,,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,France,56,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Other,,R,Google Search,"College/University,Conferences,Personal Projects",,,Somewhat useful,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Miner,DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",10,0,60,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Decision Trees,Random Forests",,Sometimes,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,40,10,5,20,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",Often,,,,Most of the time,Sometimes,,,Often,,Often,,,,,,Often,,,,Most of the time,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,Other,Python,Google Search,"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,60,10,15,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,Don't know,1TB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Often,,Often,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Often,,,,,,Sometimes,Sometimes,,Often,,Sometimes,Most of the time,Most of the time,Often,,,,Most of the time,,,Sometimes,Most of the time,,,,,50,30,0,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Bitbucket,Git,Subversion",Sometimes,,,,5,,,,,,,,,,,,,,,,,, +Male,Finland,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Google Search,"Blogs,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)",High school,CRM/Marketing,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Ensemble Methods,Evolutionary Approaches,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Sometimes,,,,,,,,,Rarely,,,,,,Sometimes,,,,Sometimes,,Often,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Often,,,Sometimes,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis,Other",Sometimes,,,,,Most of the time,Most of the time,Often,,Often,,,Sometimes,Often,,Often,Most of the time,,,,Most of the time,,Often,,,Often,Most of the time,,Sometimes,Most of the time,,,Most of the time,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,,,,Sometimes,,,Most of the time,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,55000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Finland,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Personal Projects,Textbook",,Very useful,,,,,Very useful,,,,,Very useful,,,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",40,40,5,5,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Often,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Rarely,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,,,Often,,,,Sometimes,Sometimes,,Sometimes,,,Often,,,,Often,,,,40,10,15,10,25,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",,,,,,,,,Often,,Often,,,,Often,,Sometimes,Often,,,,,76-99% of projects,More internal than external,Central Insights Team,Datasets from Finnish Bureau of Statistics,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,"39,600",EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Anomaly Detection,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Not Useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Programmer,Researcher",University courses,60,20,0,20,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,6-10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Never,10GB,"CNNs,GANs,Neural Networks,RNNs","Amazon Machine Learning,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",Rarely,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",,,,Often,,Most of the time,Sometimes,,,,Most of the time,,,Sometimes,,,,,Often,Most of the time,Sometimes,,,,Most of the time,,,,Often,,,,,20,40,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,Most of the time,,Often,,,,Rarely,,Sometimes,,Sometimes,,,,,,Often,Sometimes,,10-25% of projects,Entirely external,Central Insights Team,opensubtitles;google quick draw;WMT14,data preprocess,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,2400,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,0,10,20,0,"Computer Vision,Machine Translation,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,KNIME (commercial version),Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,,Very useful,,,,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,35,35,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,Rarely,,Sometimes,Most of the time,Often,,Often,,,,,,,,Often,,,Often,,Sometimes,,,Most of the time,,,,,Often,,,,,50,20,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,,,,,,,,,,Rarely,,,,,Often,Sometimes,Often,,10-25% of projects,Entirely internal,Other,Na,Connect with multiple stakeholders to explain the problem we are working and the need for the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Most of the time,510000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,25,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"Collaborative Filtering,Natural Language Processing",,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,30,30,40,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,Most of the time,,Often,Sometimes,,,,,Sometimes,,,,,,Often,Rarely,,Less than 10% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,Never,1000000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,47,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,R,"Google Search,University/Non-profit research group websites","Blogs,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Very useful,,,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,More than 10 years,"Statistician,Other",University courses,20,20,10,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,CRM/Marketing,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Statistics,Jupyter notebooks,Orange,Python,R,RapidMiner (free version),Tableau,Other",,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,Rarely,,Sometimes,,Often,,Rarely,,,,,,,,,,Sometimes,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other,Other,Other",Sometimes,,Sometimes,,,Often,Most of the time,Often,Sometimes,,,,,Most of the time,,Most of the time,,Sometimes,Often,,Most of the time,Most of the time,Often,,,Most of the time,Often,Rarely,Often,Often,Often,Most of the time,Often,40,10,5,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,Often,,,,,,,,Often,,Sometimes,,,,76-99% of projects,Entirely internal,Standalone Team,almost none,"data format, age of data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,120000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Genetic & Evolutionary Algorithms,Python,Google Search,"Arxiv,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Other",Self-taught,45,25,10,0,20,0,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",Text data,Most of the time,1GB,"CNNs,Neural Networks","Amazon Web services,C/C++,Cloudera,Impala,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,Rarely,Often,,,,,,,,,Often,Rarely,,Rarely,,,,,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Rarely,,,,Most of the time,,Often,,,,"A/B Testing,CNNs,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Segmentation,Text Analytics",Often,,,Often,Rarely,,Often,,,,,,,Rarely,,,,Rarely,Often,Often,,,,,,Sometimes,,,Often,,,,,15,55,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,Sometimes,,Rarely,Most of the time,Most of the time,Often,,Sometimes,,,,,,Often,Often,Often,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Rarely,70000,GBP,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Australia,33,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Podcasts",Very useful,Very useful,,,,,Very useful,,,,,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,10,30,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A professional degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Image data,Most of the time,10GB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation",,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,60,15,10,5,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,MSCoco; ImageNet,Lack of data for the problem we are trying to solve,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,504000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Netherlands,27,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Friends network,Online courses,Stack Overflow Q&A",,,,,,Very useful,,,,,Very useful,,,Very useful,,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,10,25,0,60,5,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,Germany,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Podcasts,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,Somewhat useful,,,,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,10,40,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Relational data,Rarely,100MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Simulation,SVMs,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,,Sometimes,,,Sometimes,,,,,,,,Often,,,,,,,Often,Often,,Often,,,,50,15,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues",,,,,Most of the time,,,,,Sometimes,,,,,,,Often,,,,,,76-99% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,45000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +A different identity,Greece,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Very useful,,Not Useful,,,,,Somewhat useful,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Researcher",University courses,80,0,0,20,0,0,"Machine Translation,Natural Language Processing",Bayesian Techniques,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Decreased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Bayesian Techniques,Other","MATLAB/Octave,Perl,Python,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,,,,Often,Most of the time,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,Sometimes,Often,,,,10,50,15,10,15,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,76-99% of projects,Do not know,Other,N/A,NULL values; synchronization,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,Git,Rarely,"12,000",EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,20,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",C/C++,Deep learning,Python,Google Search,"Arxiv,Blogs,Company internal community,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Not Useful,Somewhat useful,,Very useful,Very useful,,,,,Very useful,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Not Useful,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,70,0,0,0,Unsupervised Learning,Neural Networks - CNNs,A master's degree,Non-profit,100 to 499 employees,Increased significantly,Don't know,An external recruiter or headhunter,Not very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Other,,100MB,Neural Networks,"C/C++,Jupyter notebooks,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Neural Networks,Other",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,0,70,0,30,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning",,,,Rarely,Most of the time,,,,,,,Often,,,,,,,,,,,100% of projects,Do not know,Other,Stanford Graphical Repository," Find the proper number of nodes for reconstruction","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,6700,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Nigeria,23,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Linear Digressions Podcast,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Other,3 to 5 years,"Programmer,I haven't started working yet",Self-taught,40,10,0,50,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Taiwan,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Business Analyst,University courses,70,10,15,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow",,Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,Often,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",Often,,,Sometimes,Often,Sometimes,Most of the time,Often,Sometimes,,,,,,,Often,,,Often,Often,Often,,Sometimes,Often,,Sometimes,,,Often,,,,,30,15,20,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,,,Often,Most of the time,,,,,,,Most of the time,,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,,dirty internal data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,910000,TWD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Spain,59,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Anomaly Detection,Python,"GitHub,Google Search","Blogs,College/University,Company internal community,Conferences,Kaggle,YouTube Videos",,Somewhat useful,Somewhat useful,Very useful,Very useful,,Very useful,,,,,,,,,,,Not Useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Computer Scientist,Data Scientist,Researcher,Statistician",University courses,40,15,25,15,5,0,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - RNNs,Primary/elementary school,Technology,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Most of the time,10GB,"GANs,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Python,R,RapidMiner (free version),SQL,TIBCO Spotfire",,Sometimes,,Rarely,,,,,,,,Rarely,Rarely,,,,Often,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,Often,,,,,,,Rarely,,,,,Often,,,,,"Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Simulation,Time Series Analysis",,,,,,Rarely,Often,,,,,,,,,,,,,Sometimes,Often,Often,,Sometimes,,,Most of the time,,,Often,,,,25,20,30,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Often,,,,Often,,Sometimes,,,,Most of the time,Sometimes,,Often,Often,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Never,65000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Scientist,Predictive Modeler,Software Developer/Software Engineer",Kaggle competitions,10,10,20,10,50,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,,,,,,,,,,,,,,, +Female,Ireland,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,Somewhat useful,,,Very useful,,,Very useful,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,Other,University courses,0,0,20,80,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,100MB,"Evolutionary Approaches,Regression/Logistic Regression,Other","Microsoft SQL Server Data Mining,Python,QlikView,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,Often,Sometimes,,,,,,,,,,,,Often,,,,,,,"Evolutionary Approaches,kNN and Other Clustering,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,,,,,Sometimes,,,,Often,,,,,,,,Most of the time,,,,Sometimes,,,,Sometimes,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Sometimes,Often,,Sometimes,Often,,,Often,,,,,,Often,,,,,,,Often,,51-75% of projects,More internal than external,IT Department,kaggle data,getting it into a usable format,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,40000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Social Network Analysis,Scala,Google Search,"College/University,Company internal community,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,Somewhat useful,,,,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,20,40,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1TB,,"Hadoop/Hive/Pig,Java,NoSQL,Python,QlikView,Spark / MLlib,SQL,Tableau,Unix shell / awk,Other,Other",,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,Most of the time,,,,Sometimes,Rarely,,,,,,,,,Most of the time,Sometimes,,,Rarely,,,Most of the time,Most of the time,Most of the time,,"A/B Testing,Data Visualization,Segmentation,Time Series Analysis",Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,,60,10,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Often,,Most of the time,,,Often,,,,Rarely,,,Most of the time,,Often,,,Most of the time,Most of the time,,76-99% of projects,More internal than external,IT Department,IATA geocode; predictHQ;,Corporate policy and lack of means.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Bitbucket,Rarely,45000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Sweden,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,YouTube Videos",,,,,,,,,,,,,,,,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,,Kaggle competitions,20,30,10,10,30,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10MB,Regression/Logistic Regression,"R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression",,,,,,Often,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,20,20,10,20,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,"Climate data, consumer surveys",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,Italy,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,Emergent/Future Newsletter (Algorithmia),1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,I prefer not to answer,Computer Science,,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,35,Employed full-time,,,Yes,,Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by government",MATLAB/Octave,Neural Nets,C/C++/C#,University/Non-profit research group websites,Conferences,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,Researcher,Work,0,0,100,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Government,100 to 499 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data,Relational data",,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,Java,Microsoft Azure Machine Learning,R,SQL",,,,Most of the time,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,Natural Language Processing,Simulation",,,,,,,,Most of the time,,,,,,,,Often,,,Sometimes,,,,,,,,Most of the time,,,,,,,5,80,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of significant domain expert input,Limitations of tools",,Most of the time,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Researcher,Other",University courses,75,5,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Financial,10 to 19 employees,Decreased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,Rarely,,,Rarely,,,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,,,,Often,,Rarely,,,,,,,,,,,,,Often,,Rarely,,,Often,Often,,,,,,,15,40,30,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,,,,Often,,,Most of the time,,,,,,,Most of the time,,,,100% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Subversion,Sometimes,,,,5,,,,,,,,,,,,,,,,,, +Male,South Korea,35,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,10,20,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,France,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,Python,Government website,"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer",Self-taught,30,20,30,15,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Other","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,R,SAS Base,SQL,Tableau,Unix shell / awk",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,Often,,,Often,,,Sometimes,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,,,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Most of the time,,Most of the time,Often,Most of the time,Often,,Most of the time,Sometimes,,Sometimes,,Often,Often,,,,,60,10,1,1,8,20,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Sometimes,Sometimes,,,Often,,,,Most of the time,,,Most of the time,Sometimes,,76-99% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Web services,Text Mining,Python,GitHub,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Computer Scientist,Data Miner,Predictive Modeler,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Financial,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Mathematica,Microsoft Excel Data Mining,NoSQL,Python,R,RapidMiner (free version),Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,Rarely,,,Often,,,,Often,,,,Often,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,Often,Sometimes,Often,,Often,Often,Often,Often,,,Often,,Sometimes,,Often,,Sometimes,,Often,,,Often,,,Often,Sometimes,,,Sometimes,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Often,Often,,,Often,Sometimes,,Often,Often,,Often,Often,Often,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Never,1100,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed part-time,,,No,Yes,Programmer,Fine,Employed by college or university,Orange,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,Not Useful,Not Useful,,,Very useful,Somewhat useful,"Data Stories Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,60,15,0,20,0,5,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,,"Blogs,Company internal community,Online courses,Podcasts,Textbook,YouTube Videos",,Not Useful,,Not Useful,,,,,,,Very useful,,Not Useful,,Very useful,,,Somewhat useful,"Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Data Analyst,Researcher",Self-taught,50,20,0,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,Regression/Logistic Regression,"Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Simulation,Text Analytics",,,Sometimes,,,Rarely,Most of the time,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,Sometimes,,,,,30,10,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Rarely,100000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Hungary,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",5,30,15,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Telecommunications,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,Decision Trees,"IBM SPSS Statistics,Jupyter notebooks,Python,QlikView,R,SQL",,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",Most of the time,Sometimes,,,,Often,Most of the time,Often,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,,,50,5,5,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,,Sometimes,,,,,,,,,,Sometimes,Often,,,51-75% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,,Never,3840000,HUF,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by non-profit or NGO,Tableau,Regression,R,Google Search,"Official documentation,Online courses,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,"Data Machina Newsletter,Data Stories Podcast,Emergent/Future Newsletter (Algorithmia)",< 1 year,,,Necessary,,,,,,Necessary,,,,,Coursera,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,Yes,Master's degree,Computer Science,I don't write code to analyze data,"Data Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Computer Vision,Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Norway,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Other,"Blogs,Newsletters,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Military/Security,20 to 99 employees,Decreased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Other,Basic laptop (Macbook),Relational data,Never,100MB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Python,SQL",,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Segmentation",,,,,,Sometimes,Often,Sometimes,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Sometimes,,Sometimes,,,,,,Sometimes,,,Often,,,100% of projects,Entirely internal,Other,weather; demographics,data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,"620,000",NOK,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,80,0,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Insurance,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Amazon Web services,C/C++,Hadoop/Hive/Pig,IBM Cognos,QlikView,R,SQL,Tableau",,Rarely,,Sometimes,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,Often,,,Often,,,,Sometimes,,,,,,Often,,,,,20,50,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,50000,GBP,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Ukraine,25,Employed full-time,,,No,Yes,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,Mathematica,Neural Nets,C/C++/C#,Google Search,Blogs,,Very useful,,,,,,,,,,,,,,,,,"Emergent/Future Newsletter (Algorithmia),Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation",,Kaggle Competitions,No,Master's degree,Computer Science,I don't write code to analyze data,Data Scientist,University courses,10,40,0,50,0,0,Time Series,Neural Networks - CNNs,A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Singapore,24,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,Very useful,,Very useful,,KDnuggets Blog,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,30,20,0,50,0,0,Outlier detection (e.g. Fraud detection),Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,34,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,KDnuggets Blog,< 1 year,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,,"Business Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,"FlowingData Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"DBA/Database Engineer,Researcher,Software Developer/Software Engineer",University courses,20,0,40,40,0,0,,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Other,"Amazon Web services,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,60,5,20,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Rarely,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,76-99% of projects,Entirely internal,IT Department,None,Building the data collection and pipeline to a useable environment,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Always,75000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United Kingdom,29,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Amazon Machine Learning,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,10,15,25,35,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Google Cloud Compute,Jupyter notebooks,Python,Tableau,TensorFlow,Unix shell / awk",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Often,Sometimes,Sometimes,,,,,Most of the time,,Often,,,Often,Most of the time,Often,,Often,,,Often,,,Sometimes,Often,,,,50,15,25,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,Most of the time,,,,,,,,,,Sometimes,Often,Most of the time,,,,Most of the time,,,100% of projects,Entirely internal,Standalone Team,,Data cleanup,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Other",Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,,20,60,0,0,20,0,Enough to run the code / standard library,"Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,76-99% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Never,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Germany,25,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Online courses,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",University courses,20,60,0,20,0,0,Other (please specify; separate by semi-colon),,A professional degree,Technology,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,Never,,,C/C++,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,10,30,30,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Rarely,Sometimes,,,,,,,,,Most of the time,,,,,Sometimes,,Sometimes,Sometimes,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Subversion,Rarely,1820000,RUB,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Philippines,37,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Decision Trees,SQL,Google Search,Blogs,,Very useful,,,,,,,,,,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,,,"Perl,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,20,20,10,10,40,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,100% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Other (please specify; separate by semi-colon),Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,India,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,Necessary,,,,Other,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Other,Self-taught,40,50,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,Business Analyst,University courses,45,5,20,10,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,100GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,Other",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Often,,,Sometimes,,,,Rarely,,,,40,35,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Sometimes,,Sometimes,,,,,,,Often,Sometimes,,,,51-75% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,50000,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler",Work,5,35,60,0,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,Google Search,"Blogs,Kaggle,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,20,20,0,10,0,"Adversarial Learning,Natural Language Processing,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,CRM/Marketing,100 to 499 employees,Decreased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,Sometimes,,,Often,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Sometimes,Often,,,Often,Often,Often,,,,,,Sometimes,,Most of the time,,Sometimes,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,Sometimes,Often,,,,,40,10,10,20,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,Sometimes,,,,,Often,,Often,,Sometimes,,,Often,,,,,Sometimes,,10-25% of projects,More internal than external,Business Department,Google analytics,cleaning,Key-value store (e.g. Redis/Riak),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,"1,200,000",INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,France,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,1 to 2 years,"Engineer,Software Developer/Software Engineer",Self-taught,30,0,0,0,0,70,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Financial,500 to 999 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Often,,Often,,,Often,Often,Often,Often,,,Often,,,,Often,,Often,,,Often,,Often,,,,,,Often,,,,,45,25,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Rarely,"63,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Canada,49,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,Tableau,Deep learning,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,Very useful,,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Very useful,"Data Elixir Newsletter,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,Recommendation Engines,,High school,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,100MB,,"C/C++,Java,NoSQL,Python,SQL,Tableau,Unix shell / awk",,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,"Collaborative Filtering,Recommender Systems",,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,20,10,70,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,Often,,,,,,,,,,,,,Rarely,Often,Rarely,,,,Less than 10% of projects,More external than internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"173,000",CAD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,France,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,32,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",University courses,40,2,18,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"10,000 or more employees",Decreased significantly,Don't know,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Other,Sometimes,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,R,Unix shell / awk",,Often,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,Sometimes,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,20,20,10,20,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data,Other",,,,,Most of the time,,,,,,,Often,,,,,,,Often,,Often,Most of the time,100% of projects,More internal than external,Standalone Team,HCP;,"Since it is clinical data from patients, methods developed for healthy controls do not always apply.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Cloud Sharing in-house,Git,Most of the time,8000,EUR,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,A general-purpose job board,Very important,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,,,"KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,Rarely,,Rarely,Rarely,Most of the time,,Most of the time,,,,,,Most of the time,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,Most of the time,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction",,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,40,10,20,10,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Poland,31,Employed full-time,,,No,Yes,Data Scientist,,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Unnecessary,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,,Kaggle Competitions,,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,0,30,40,0,30,0,Computer Vision,Neural Networks - CNNs,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,31,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,GitHub,"Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,R Bloggers Blog Aggregator,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Kaggle Competitions,No,Master's degree,Other,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,Other (please specify; separate by semi-colon),No education,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,Russia,19,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,,"Arxiv,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,,Very useful,Very useful,,,,Very useful,,,Very useful,,,Very useful,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,Machine Learning Engineer,Other,19,10,1,0,0,70,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Internet-based,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,Most of the time,10MB,"Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,Tableau",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,"Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,Most of the time,,,,Sometimes,,,,,50,40,0,10,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,None,Entirely internal,Other,,,,,,Git,Always,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Company internal community,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",Very useful,,,Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Other,15,0,35,5,10,35,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Don't know,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks","Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Recommender Systems",,,,,,,Sometimes,Often,Often,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,50,20,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,Most of the time,,,,Often,,,Often,,,,,,,,,,,100% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",,30000,RUB,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,Singapore,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,MATLAB/Octave,Survival Analysis,SQL,GitHub,"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,India,27,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,Government website,"Friends network,Textbook",,,,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst",Work,30,0,70,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Other,"Oracle Data Mining/ Oracle R Enterprise,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,20,0,0,40,40,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,26-50% of projects,More internal than external,Standalone Team,None,Lack of learning opportunities and focus on execution without getting deeper in data science. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1150000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,"FlowingData Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Biology,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,,3-5 years,,,,,,,,,,,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Data Stories Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer",Work,10,25,50,0,15,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,Bayesian Techniques,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,15,25,20,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Often,Often,,,Most of the time,,Most of the time,,,,,Most of the time,,,Often,,,,51-75% of projects,Entirely internal,Standalone Team,,,,,,Git,Sometimes,,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Spain,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Text Mining,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,50,10,20,0,20,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Technology,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Sometimes,10GB,"CNNs,Neural Networks",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks",,,,Most of the time,,Often,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,25,20,10,25,20,0,Enough to explain the algorithm to someone non-technical,"Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,,,,,Often,,,,,,,Often,,100% of projects,Entirely internal,IT Department,,,,Share Drive/SharePoint,,Mercurial,Rarely,35000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Social Network Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Tutoring/mentoring",,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,45,0,15,0,10,Computer Vision,Support Vector Machines (SVMs),A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Netherlands,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Julia,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",19,60,20,0,1,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service,Other","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Impala,Java,Jupyter notebooks,NoSQL,Perl,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Sometimes,Most of the time,,Sometimes,Often,,Often,Often,Sometimes,,,,Sometimes,Rarely,Often,,Most of the time,,,,,,,,,,Often,,,Often,Often,Sometimes,Often,,,,,,,,Often,Often,,,Sometimes,Often,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Sometimes,Most of the time,Often,,Most of the time,Most of the time,Often,Sometimes,Sometimes,Often,Most of the time,Often,Often,Rarely,Most of the time,Often,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,50,20,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,Sometimes,,,,,,Most of the time,,Most of the time,,Often,Often,Often,,51-75% of projects,More external than internal,Standalone Team,weather;maps,cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,85000,EUR,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Not Useful,Somewhat useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX",Workstation + Cloud service,2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",Kaggle competitions,15,0,0,70,15,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Indonesia,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Self-employed",I don't plan on learning a new tool/technology,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,,,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Reinforcement learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Most of the time,,,Sometimes,Rarely,,,,,,"Association Rules,Cross-Validation,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,,Often,,Often,,Sometimes,,,,,,Often,,,,,50,20,20,0,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Often,,,,Most of the time,,,,,,,,,,,,Often,,,,,,51-75% of projects,More internal than external,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,30,0,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,IBM Cognos,IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Orange,Python,R,RapidMiner (free version),SAS Base,SQL,Tableau,TensorFlow",Rarely,Rarely,,,,,,,,Sometimes,Sometimes,,Sometimes,,,,Often,,Sometimes,,,Sometimes,Sometimes,Often,Often,,,,,,,,Most of the time,,Sometimes,,,Sometimes,,,,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation",Sometimes,Sometimes,,Sometimes,Sometimes,Most of the time,Most of the time,Often,,,,Often,,,,Often,,Often,,Sometimes,Sometimes,Often,Often,Often,Sometimes,Often,,,,,,,,50,10,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,,Sometimes,,,26-50% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer,Other",University courses,40,10,30,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Rarely,,,,,Often,,Often,,,Often,,Often,,Often,,,,Sometimes,,Sometimes,Rarely,,,,40,5,5,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,160000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Google Search,"Friends network,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Github Portfolio,No,I did not complete any formal education past high school,,,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Stan,Time Series Analysis,Python,Google Search,"Arxiv,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,Very useful,The Data Skeptic Podcast,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,Other,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,3 to 5 years,Software Developer/Software Engineer,Self-taught,40,30,0,30,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important +Male,Germany,35,"Not employed, but looking for work",,,,,,,,R,Regression,R,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"DataTau News Aggregator,KDnuggets Blog,The Data Skeptic Podcast",3-5 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Predictive Modeler",University courses,35,10,10,40,5,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important +Male,Germany,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Tutoring/mentoring",Very useful,Very useful,,,Very useful,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Statistician",University courses,75,0,0,25,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Mix of fields,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,SVMs,Other","NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,,,Sometimes,,,,,,,,,Often,,Sometimes,Sometimes,Most of the time,,,Often,,,,Often,Often,Often,,,,15,45,15,15,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Other",Often,Often,,,Sometimes,,,,,,,,Sometimes,,,Most of the time,,,,,,Most of the time,100% of projects,Entirely internal,Standalone Team,,Licenses. We need free date like in GPL free software. CC-0 or CC-BY does not work because people do not give back. CC-BY-SA may work but no-one uses it.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,IPFS,Mercurial,Sometimes,,,Other,8,,,,,,,,,,,,,,,,,, +Male,Australia,27,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects",,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,30,10,10,40,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Not important +Male,Denmark,38,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Management information systems,More than 10 years,"Operations Research Practitioner,Researcher",University courses,50,0,30,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Other,100 to 499 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics,Other",Sometimes,Rarely,Sometimes,,Rarely,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,Often,,Most of the time,,,Most of the time,,Often,,Often,Rarely,,,Sometimes,,Most of the time,,,,Most of the time,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,Sometimes,,Most of the time,Sometimes,,,Often,,Sometimes,,,,,,Often,,,,Most of the time,,100% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A professional degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,"N/A, I did not receive any formal education",Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Don't know,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM Watson / Waton Analytics,Java,Jupyter notebooks,NoSQL,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,Most of the time,Often,Often,,Often,,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,Often,Often,,,,,,,,Most of the time,Often,,,,Most of the time,,Often,,,,"Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,,,,,,,,,,,Most of the time,,,,Often,Often,,Often,,,Often,,,,,,,,90,3,2,4,1,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,,,,,,,Sometimes,,,,Often,,,,Sometimes,Most of the time,Often,,,Most of the time,10-25% of projects,More external than internal,IT Department,,Neartime,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Other,Sometimes,150000,EUR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,10,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",Other,Sometimes,100GB,"CNNs,Decision Trees,Random Forests,RNNs","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,RapidMiner (free version),SQL,TensorFlow,Unix shell / awk",,Most of the time,,Rarely,,,,Often,,,,,,,,,Most of the time,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",,,,Often,,Sometimes,Most of the time,Sometimes,Rarely,,,,,,,,,,,,Sometimes,,,,Rarely,,,,Sometimes,Most of the time,,,,10,5,5,10,20,50,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,100% of projects,More external than internal,Other,,Noisy Data and the scale of the data.,Other,Other,Google Drive,Bitbucket,Rarely,720000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Arxiv,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,,Self-taught,50,5,5,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service",Other,Rarely,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Java,Microsoft Azure Machine Learning,R",,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,kNN and Other Clustering,Logistic Regression,Recommender Systems,Simulation,Time Series Analysis",,,Rarely,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,,,Often,,,Most of the time,,,,10,19,50,20,1,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,Sometimes,,,,,,,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,Unsure,Datasets are not well documented,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,144,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Very useful,"Data Elixir Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,40,20,30,10,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1GB,"Evolutionary Approaches,Neural Networks","Python,R,RapidMiner (free version),TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,Sometimes,,,,,,,,,,,Often,,Most of the time,,,,"Association Rules,Cross-Validation,Evolutionary Approaches,Neural Networks",,Sometimes,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,Often,,,,,,,,,,,,,Most of the time,Often,,Most of the time,,,76-99% of projects,Do not know,Central Insights Team,Kaggle datasets; Synthetic datasets,Data Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Sometimes,"1,000,000",INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by government,Spark / MLlib,Deep learning,Python,"Google Search,Government website","Arxiv,College/University,Newsletters,Online courses,Stack Overflow Q&A",Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Logistic Regression",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Rarely,,Often,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,Rarely,Sometimes,,,Rarely,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",Sometimes,Rarely,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Often,Most of the time,,Sometimes,Sometimes,Sometimes,Often,Sometimes,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,,,40,10,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Often,,,,Sometimes,Often,,,,,Sometimes,,,Often,,,Often,Often,,76-99% of projects,More internal than external,Other,,Source data are not in a data warehouse.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,78300,SGD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,55,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,Other,Other,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Other,Other,Other",,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,I don't write code to analyze data,Other,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Technology,Fewer than 10 employees,Stayed the same,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Other,Other,Sometimes,100GB,Neural Networks,Microsoft SQL Server Data Mining,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Data Visualization,Logistic Regression,Prescriptive Modeling,Simulation",,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,Sometimes,,,,,,,0,20,10,20,50,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,100% of projects,Entirely internal,Business Department,"Finance,FinTech,Tax","Government, have to support us joint venture application big data barter their Aadhar Card include (Unique Data with DNA History biometric)",,Other,"Government, have to support us joint venture application big data barter their Aadhar Card include (Unique Data with DNA History biometric)",Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,0,INR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,40,60,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Social Network Analysis,C/C++/C#,University/Non-profit research group websites,"College/University,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,I never declared a major,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,"Machine Translation,Natural Language Processing","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,India,37,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Regression,Matlab,Google Search,"Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,Very useful,Very useful,KDnuggets Blog,< 1 year,,,Necessary,,,Necessary,Necessary,,Necessary,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Business Analyst,Self-taught,50,30,0,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,39,Employed full-time,,,No,Yes,Data Analyst,,Employed by a company that doesn't perform advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,Very useful,,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Not important +Female,Other,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,"Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Engineer,Software Developer/Software Engineer",University courses,40,30,0,20,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,1-2 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,Python,R,Tableau",,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,,Often,,,,,,,,,Often,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,Sometimes,Often,Sometimes,,,,30,20,20,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Most of the time,Sometimes,,,Most of the time,,,,,Often,,Sometimes,Sometimes,,Often,Sometimes,Often,,,,,,26-50% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,310000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,,,,Very useful,,,,,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Operations Research Practitioner,Statistician","Online courses (coursera, udemy, edx, etc.)",50,20,0,0,30,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Argentina,37,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Company internal community,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,Data Miner,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,Other,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Newsletters,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,I don't write code to analyze data,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,1GB,"Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Often,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs",,,,,,Most of the time,,Sometimes,Sometimes,Sometimes,,Often,,Sometimes,,Sometimes,,,,Most of the time,,,Sometimes,,Most of the time,,,Sometimes,,,,,,30,70,0,0,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,Most of the time,,Often,,,,,Most of the time,Sometimes,,,,,,Less than 10% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Git,Subversion",Rarely,42000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Netherlands,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Google Cloud Compute,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,Random Forests","Amazon Web services,Google Cloud Compute,SQL,Tableau,Other",,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,Most of the time,,,"A/B Testing,Data Visualization,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems",Most of the time,,,,,,Most of the time,,,,,,,,,,,Often,Most of the time,,,,Often,Most of the time,,,,,,,,,,0,0,100,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Need to coordinate with IT,Privacy issues",Often,,Often,,Most of the time,,,,,,,,,,Often,,Often,,,,,,51-75% of projects,Entirely internal,IT Department,geo-ip; weather,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Most of the time,96000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Hungary,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Link Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Statistician,Other",Self-taught,50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,"Random Forests,Regression/Logistic Regression","NoSQL,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Prescriptive Modeling",Rarely,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,50,0,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,,,,,,,,,,Often,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,"5,000,000",HUF,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,20,10,0,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SAS Enterprise Miner,SQL,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,Rarely,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Text Analytics",Often,,,,,Often,Most of the time,Sometimes,Often,,,,,,,Often,,,,,,,Often,,,,,,Often,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,Git,Rarely,1200000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Belarus,22,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Jupyter notebooks,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,Not Useful,Very useful,Somewhat useful,Not Useful,Somewhat useful,Not Useful,Somewhat useful,Very useful,,,Somewhat useful,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Other,Yes,Professional degree,,3 to 5 years,Software Developer/Software Engineer,University courses,60,10,5,15,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,28,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Random Forests,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,Somewhat useful,Somewhat useful,No Free Hunch Blog,< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Other,Less than a year,Other,Self-taught,80,20,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Not Useful,,,,,Very useful,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,60,5,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,10GB,Regression/Logistic Regression,"Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Python",,Most of the time,,Rarely,,,,,Rarely,,,,,,Most of the time,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression",,,,,,Rarely,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,10,5,0,5,0,80,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Most of the time,Most of the time,Sometimes,,,Sometimes,,Often,Often,,Often,,,Often,,Often,Rarely,,Often,Often,,Most of the time,Less than 10% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Rarely,88500,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Belgium,55,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,Deep learning,Python,GitHub,"Blogs,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,20,20,10,50,0,0,"Adversarial Learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important +Male,Pakistan,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring",,,,,,,Somewhat useful,,,,Very useful,,,,,,Very useful,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,20,30,0,40,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and private datacenters,Relational data,Never,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,RNNs,SVMs",,,,Often,,Often,,Often,Often,,,,,Often,,Often,,Often,,Often,,,Often,,Often,,,Often,,,,,,20,60,5,10,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,15000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Data Analyst,DBA/Database Engineer,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",30,45,25,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Recommendation Engines,Reinforcement learning","Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",Primary/elementary school,Mix of fields,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Text data,Relational data,Other",Rarely,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Gradient Boosted Machines,Lift Analysis,Neural Networks,Random Forests,Segmentation",Often,,Sometimes,,,,,,,,,Rarely,,,Sometimes,,,,,Sometimes,,,Rarely,,,Rarely,,,,,,,,30,25,40,5,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT",,Sometimes,Sometimes,,,,,,,Rarely,,Rarely,,,Rarely,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,,,,,,300000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,Tableau,Neural Nets,SQL,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Tutoring/mentoring",,,Somewhat useful,,,,,,,,,,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Statistician",Work,50,0,25,25,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Ensemble Methods,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Nigeria,33,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,,"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"Data Machina Newsletter,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,India,29,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,,"Blogs,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A",,Very useful,,,,,,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Python",,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Segmentation",Often,,,,,Often,Often,Sometimes,,,,,,,Often,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,35,10,2,13,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,Often,,,,,,Sometimes,,,,Often,Sometimes,Most of the time,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,4000000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,34,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Kaggle Competitions,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,5,0,5,60,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,62,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,Other,Social Network Analysis,Python,,"Blogs,Conferences,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,Researcher,Self-taught,40,0,60,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,Bayesian Techniques,"Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,Unix shell / awk,Other",,Most of the time,,Often,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Often,,Sometimes,,,,,,,,,,,,,,,Sometimes,Rarely,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,50,20,0,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Often,,Most of the time,,,,Often,,,,,,,Often,Most of the time,,,Often,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Always,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,36,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,1-2 years,Necessary,Necessary,,,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",7,85,0,0,8,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Norway,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,8,90,0,2,0,"Supervised Machine Learning (Tabular Data),Time Series",,A professional degree,Financial,Fewer than 10 employees,Stayed the same,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Other,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Most of the time,Sometimes,,,Sometimes,,,,,,,,Rarely,,,Most of the time,,,,Most of the time,,,Most of the time,,,,20,55,0,5,20,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,900000,NOK,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Survival Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,Very useful,,Somewhat useful,,Somewhat useful,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,20,0,60,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Sometimes,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,Sometimes,,,,,Sometimes,,,,Sometimes,,,,50,10,20,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Rarely,,,,Most of the time,Sometimes,,Rarely,Often,Sometimes,,,,,,Often,,,Most of the time,,,,26-50% of projects,More internal than external,Business Department,Economic statistics; company financials,Dirtiness of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,68000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by non-profit or NGO,Mathematica,Neural Nets,Java,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Textbook,Trade book,YouTube Videos",,,,,,,,,,Very useful,,,,,Very useful,Very useful,,Very useful,"Becoming a Data Scientist Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Researcher,Self-taught,100,0,0,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Non-profit,100 to 499 employees,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Video data,Text data",Always,100MB,"Bayesian Techniques,CNNs,Decision Trees,Evolutionary Approaches,Neural Networks","C/C++,Java,Mathematica,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Perl,SQL,TIBCO Spotfire,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,Sometimes,,,,,,,,,,,,Often,,,,,Rarely,Rarely,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Natural Language Processing,Neural Networks,Prescriptive Modeling,Text Analytics",,Often,,Often,,Often,Often,Often,,,,,,,,,,,Often,Often,,Often,,,,,,,Often,,,,,10,50,10,20,10,0,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Always,,INR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",,Other,500 to 999 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Never,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","C/C++,IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,,,Rarely,,,,,,,,Often,,,,,,,,,,,Often,Often,,,,Often,,,Sometimes,,Often,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Often,Often,,,Often,Often,,Often,Often,Often,,Often,,,,,Often,Often,,100% of projects,Entirely internal,Standalone Team,Kaggle; Uganda Bureau of statistics,Cleaning and Preparation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Never,"40,000,000",UGX,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Spain,42,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,R,,"Company internal community,Online courses",,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Programmer",Work,0,20,60,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Insurance,"5,000 to 9,999 employees",Increased slightly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Sometimes,10GB,Regression/Logistic Regression,"IBM SPSS Statistics,SAS Base,SQL,Unix shell / awk,Other,Other",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,Rarely,Often,Often,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,60,30,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",Often,Sometimes,,,Sometimes,,,,Often,,,,,,Most of the time,,,,,,,,100% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Always,"45,000",,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Iran,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,Python,Cluster Analysis,Python,Google Search,"Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,Very useful,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,20,50,0,0,Other (please specify; separate by semi-colon),Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Rarely,100GB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Often,,Often,Sometimes,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,5,80,5,9,1,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,Most of the time,,,,Most of the time,Often,,76-99% of projects,Entirely internal,Standalone Team,,Memory;hardware,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Most of the time,12000000,IRR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Germany,26,"Not employed, but looking for work",,,,,,,,Python,,Python,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,Coursera,,2 - 10 hours,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Computer Vision,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,34,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,O'Reilly Data Newsletter,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Kaggle Competitions,Yes,Master's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,France,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,KNIME (commercial version),"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Russia,29,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Jupyter notebooks,Bayesian Methods,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Other,Other,Other",Somewhat useful,Very useful,,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",15,25,40,20,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data",Rarely,10GB,"CNNs,GANs,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Google Cloud Compute,MATLAB/Octave,Python,TensorFlow,Unix shell / awk,Other,Other",,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,Often,Sometimes,,"CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Other",,,,Most of the time,,Most of the time,Often,,,,Often,,,Sometimes,,Rarely,,,,Most of the time,Rarely,,,,,Sometimes,,Rarely,,,Most of the time,,,10,50,5,10,15,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,Often,Often,,,,,,,Most of the time,Most of the time,Sometimes,,,Most of the time,Rarely,Often,,Often,,100% of projects,Entirely internal,IT Department,feret; celeba; multipie; megaface; pascal voc; facenet; casia and many others,"collect sufficiently big and at the same time balanced and clean dataset; come up good labelling strategy","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,52000,USD,Other,8,,,,,,,,,,,,,,,,,, +Female,South Africa,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,Other,"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,Very useful,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",45,50,5,0,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Financial,"1,000 to 4,999 employees",Decreased slightly,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,100MB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Orange,Python,QlikView,R,RapidMiner (free version),SAP BusinessObjects Predictive Analytics,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,Often,,Sometimes,,Sometimes,,Most of the time,Most of the time,Most of the time,,Sometimes,,Rarely,,,,,Most of the time,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",,Most of the time,Often,Often,,Most of the time,Most of the time,Often,,,,,,Sometimes,,Most of the time,Sometimes,Often,,Often,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,Most of the time,,,,40,30,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,Business Department,None,"Our data is solely to do with investments and fund returns. Issue arise when returns are incorrect, incorporate incorrect fees, funds change names etc.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,472000,ZAR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,20,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Belarus,22,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Engineer",Other,5,10,5,0,0,80,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,,,,Sometimes,Often,,,,,,,,,Rarely,,,Sometimes,Rarely,,,Sometimes,,,,,,Often,,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,10800,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,Very useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,"Partially Derivative Podcast,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",University courses,95,5,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Relational data,Always,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,Simulation",Sometimes,,,,,Often,Most of the time,Sometimes,,,,Sometimes,,,,Most of the time,,,,,,,Most of the time,,,Sometimes,Often,,,,,,,50,15,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT",,,Often,Rarely,Most of the time,Often,,,Most of the time,,Sometimes,,,,Sometimes,,,,,,,,76-99% of projects,Entirely internal,Other,Creditbureaus; Google maps APIs,Cleaning and joining it together not to create any time leak in it,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",Network disk,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"52,000",EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Other,No,Bachelor's degree,Computer Science,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing",Neural Networks - CNNs,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Czech Republic,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community",,Very useful,,Very useful,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,DBA/Database Engineer",University courses,5,0,10,85,0,0,"Natural Language Processing,Survival Analysis","Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression","Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Segmentation",Sometimes,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,Often,,,,,,,,70,10,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Unavailability of/difficult access to data",,,Often,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Entirely internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"480,000",CZK,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"DataTau News Aggregator,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Iran,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Poorly,Self-employed,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Not Useful,,,,,Very useful,,Somewhat useful,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,No,Master's degree,Electrical Engineering,Less than a year,Programmer,University courses,10,2,0,80,8,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Researcher",University courses,5,10,25,55,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,Financial,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau",,,,Rarely,Sometimes,,,Sometimes,Sometimes,,,,,,Rarely,,Often,,,Rarely,Rarely,,Often,,,,Sometimes,,,,Often,,Sometimes,,,,,Most of the time,Often,,Sometimes,Most of the time,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,Sometimes,,Sometimes,Sometimes,Most of the time,Often,Sometimes,,,Often,,Often,Most of the time,Most of the time,,Sometimes,Often,Sometimes,Sometimes,,Often,Often,,Often,Most of the time,,Often,Often,,,,30,20,20,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Often,,,Often,,,,,Often,Often,,Often,Most of the time,,Most of the time,,Sometimes,,,,Often,,76-99% of projects,More internal than external,Business Department,Public statistics;fb;twitter,Data in different silos and not accesible from the same location,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Stack Overflow Q&A",Very useful,Very useful,,,Very useful,Somewhat useful,Very useful,,,,,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,6 to 10 years,"Software Developer/Software Engineer,Other",Kaggle competitions,40,0,0,20,40,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,"FlowingData Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,25,25,25,0,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Technology,20 to 99 employees,Stayed the same,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,1TB,"Ensemble Methods,Regression/Logistic Regression,Other","Amazon Web services,NoSQL,Python",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,,,Often,,,Often,,,,,Most of the time,,,,,,,Often,,Most of the time,,,,,,,,,,,25,25,25,0,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT",Most of the time,,,Often,,,,,Often,,Often,,,,Sometimes,,,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Sometimes,50000,EUR,Has decreased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Republic of China,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs",Very useful,Very useful,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Engineer,Work,50,20,15,10,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",High school,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Video data",Sometimes,100GB,"CNNs,Ensemble Methods,Neural Networks,RNNs","C/C++,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Most of the time,,Often,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,Often,,,,Often,,,Often,,,,,,40,30,10,10,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Often,,,,,,,Sometimes,,,,,,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,imagenet,data preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Always,"1,000,000",CNY,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Sweden,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,35,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Other,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Sometimes,,,,Sometimes,,,,,Sometimes,,Often,,,,Often,,,Often,,,,40,15,35,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Often,,,,,,,,Often,,Sometimes,,Sometimes,,,Sometimes,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",Sometimes,200000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Nigeria,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by non-profit or NGO,Amazon Machine Learning,Anomaly Detection,Python,GitHub,"Arxiv,Kaggle,Podcasts,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,,,Very useful,,,,,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,Less than a year,Data Analyst,Self-taught,10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Markov Logic Networks,"Some college/university study, no bachelor's degree",Non-profit,"5,000 to 9,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Other,Rarely,1GB,Markov Logic Networks,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,5,60,5,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,100000,NGN,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Hungary,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,FlowingData Blog",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,15,5,0,Recommendation Engines,Decision Trees - Gradient Boosted Machines,A doctoral degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,Denmark,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Other",Self-taught,45,10,25,15,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,Tableau",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,Most of the time,,Rarely,,,,,Sometimes,Sometimes,,Sometimes,,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation",Most of the time,,,,,Most of the time,Most of the time,Often,Often,,,Often,,,,Often,,Sometimes,,,Sometimes,,Sometimes,,,Most of the time,,,,,,,,70,5,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,,26-50% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,,Rarely,600000,DKK,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Italy,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,RapidMiner (free version),Social Network Analysis,R,Google Search,"Friends network,Online courses,Stack Overflow Q&A",,,,,,Very useful,,,,,Somewhat useful,,,Very useful,,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",University courses,30,20,0,50,0,0,"Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",Relational data,Never,1GB,"Regression/Logistic Regression,Other","Microsoft Excel Data Mining,QlikView,R,Tableau,TIBCO Spotfire,Other",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,Often,,,,,,,,,,,,Rarely,,Sometimes,,,,Most of the time,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,,,,,Rarely,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Often,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Sometimes,,Often,Most of the time,,,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Business Department,,To understand the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,34000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,France,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Very useful,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,France,23,"Not employed, but looking for work",,,,,,,,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,"Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,Time Series,"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,Australia,22,Employed part-time,,,No,Yes,Other,Poorly,Self-employed,SAS Enterprise Miner,Deep learning,Python,GitHub,"Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,45,15,0,30,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Ireland,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Not Useful,,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Other",University courses,10,50,30,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"1,000 to 4,999 employees",Stayed the same,1-2 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Random Forests,Regression/Logistic Regression","Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),SQL,Tableau,Unix shell / awk",,,,,,,,Most of the time,,,,,Rarely,,,,Sometimes,,,,,Sometimes,,,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Most of the time,,,Most of the time,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics",Often,,,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Often,,Sometimes,Often,,,,Often,,,,,,Often,,,,,60,4,1,15,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,Often,,,Most of the time,,Most of the time,,,,,Often,,,,Most of the time,,,51-75% of projects,Entirely external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Gsuite ,Bitbucket,Always,110000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,1 to 2 years,Statistician,Self-taught,30,20,15,35,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Retail,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1MB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,R,SQL,Tableau",,,,,,,,,,,,Often,Sometimes,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",Most of the time,,Often,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,20,15,20,20,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Most of the time,,,,,,Often,,,,Sometimes,,,,,,,Most of the time,,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,650,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Iran,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,SQL,,,,"Blogs,Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,,University courses,20,0,0,60,20,0,Computer Vision,Neural Networks - RNNs,,Other,20 to 99 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,,,Text data,,,SVMs,"KNIME (free version),MATLAB/Octave,SQL",,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation",,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,20,0,20,50,NA,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,Google Search,"Arxiv,Blogs,Company internal community,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Other",Self-taught,20,10,30,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Java,Julia,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,Sometimes,,,,Rarely,,,,,,,Sometimes,Rarely,Rarely,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Time Series Analysis",Sometimes,,Often,,Often,Often,Most of the time,Often,Often,,Rarely,Sometimes,Sometimes,Often,,Sometimes,Sometimes,Sometimes,Rarely,Often,Sometimes,Often,Often,Often,Sometimes,Often,,Sometimes,,Often,,,,20,20,10,10,20,20,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,Sometimes,Sometimes,Often,,,Often,,Often,Often,Often,Often,Often,,,Often,Often,,,,51-75% of projects,More internal than external,Standalone Team,youtube;facebook;spotify;,getting it fast enough,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,65000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects",,,Very useful,,,,,,,,Somewhat useful,Very useful,,,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,30,0,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other",Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests","MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks",,,Sometimes,,,Most of the time,Most of the time,Often,Often,,,Often,Rarely,Often,,Most of the time,,Often,,Most of the time,,,,,,,,,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Sweden,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Conferences,Kaggle,Textbook",,,Very useful,,Very useful,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Researcher,University courses,10,0,40,50,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Relational data",Sometimes,10MB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Perl,Python,R,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,Sometimes,,Often,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",,,,Sometimes,,Most of the time,,,Often,,,,,Sometimes,,Often,,,,Most of the time,Sometimes,,Often,,Sometimes,Sometimes,,Often,,,,,,40,40,5,5,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,10-25% of projects,More internal than external,Other,Mostly medical data,"Limited size, missing data, lack of or unclear gold standard status","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,700000,SEK,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Japan,59,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,40,40,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,Fewer than 10 employees,Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow",,Rarely,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,Most of the time,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Often,Most of the time,Often,,,,,,Rarely,,,,Sometimes,Most of the time,Often,,,Often,,,Sometimes,Sometimes,Often,Most of the time,Most of the time,,,,35,20,5,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Sometimes,Often,,,Sometimes,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,100% of projects,Entirely internal,Standalone Team,"twitter, weather, movie reviews",cost,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Bitbucket,Sometimes,6000000,JPY,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Denmark,40,Employed full-time,,,Yes,,Researcher,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by college or university",TensorFlow,Time Series Analysis,Python,Google Search,"Arxiv,Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Researcher,Statistician",University courses,40,10,40,10,0,0,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,10GB,"CNNs,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Most of the time,,Often,Most of the time,,,,,,,Sometimes,,,,,,Most of the time,Often,,,,,Often,,Sometimes,,,,,,10,70,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,,,,,,,,Sometimes,Often,Often,,,Sometimes,,,Often,,,Less than 10% of projects,More internal than external,Standalone Team,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Jupyter notebooks,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,Data Scientist,Work,30,20,30,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,Often,Often,Most of the time,,,Most of the time,,Often,Most of the time,Most of the time,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,Often,,,,Often,,,Often,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,,"Company Developed Platform,Email",,,,750000,INR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Online courses,Personal Projects,Textbook",,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",68,10,20,2,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,20 to 99 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,,,"Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,Often,,,,,,Most of the time,,Most of the time,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,,,,Most of the time,Most of the time,Often,,,Often,,,Often,,Often,,,,Often,Often,,Often,,,,Often,Often,,,,,,50,15,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",,,,,,Often,,,Sometimes,Often,Often,,,,Most of the time,Often,Most of the time,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Italy,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),,Online Courses and Certifications,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,0,40,0,50,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,India,33,Employed full-time,,,No,Yes,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Time Series Analysis,SAS,"Google Search,I collect my own data (e.g. web-scraping)",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Other",Self-taught,20,80,0,0,0,0,Time Series,,Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important +Male,Germany,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Deep learning,R,"Google Search,University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Stack Overflow Q&A",Very useful,,,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Other,University courses,50,0,10,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,3-5 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Other",Relational data,Rarely,1GB,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes,Simulation,Time Series Analysis",,,Rarely,,,Sometimes,Often,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,20,40,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,,,,,Often,,,,Most of the time,,,,Most of the time,,10-25% of projects,More external than internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),,15000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,Very useful,,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Researcher,Software Developer/Software Engineer,Other",University courses,5,5,90,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",Sometimes,,Often,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Sometimes,,Most of the time,,,,Sometimes,Often,,Sometimes,,,,,,,,,,,5,15,5,25,50,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Most of the time,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,100% of projects,Entirely internal,Standalone Team,linkedin;glassdoor;yahoo finance,non-uniformity,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,155000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,10,10,0,0,,Neural Networks - CNNs,High school,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,1GB,Decision Trees,"Java,Jupyter notebooks,Python,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Often,,Often,,,,"Decision Trees,kNN and Other Clustering,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,Often,,,,,,Often,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,,,50,30,1,14,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,,,,Often,Most of the time,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,4800,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,"Arxiv,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,,,,,,,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Researcher",University courses,20,0,40,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A professional degree,Technology,100 to 499 employees,Increased significantly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Often,,Often,Most of the time,,Most of the time,,Most of the time,Most of the time,,,Most of the time,Often,Often,,,,Sometimes,,Most of the time,Often,,Often,,,,,Often,,Often,,,,25,25,25,10,15,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,,,,Bitbucket,,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Monte Carlo Methods,Python,"Google Search,University/Non-profit research group websites","College/University,Company internal community,Non-Kaggle online communities,Online courses,Personal Projects,Tutoring/mentoring",,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,"Data Machina Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,40,20,10,30,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,Statistica (Quest/Dell-formerly Statsoft)",Sometimes,Often,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,Sometimes,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs",Often,Sometimes,Sometimes,,Often,Often,Often,Often,Often,,,Often,,,,Often,,Sometimes,,Sometimes,,Often,Often,Often,,,,Sometimes,,,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Often,,Sometimes,Sometimes,,,Often,Sometimes,,,,,Often,,,,Sometimes,,,Sometimes,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,65000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Kaggle,Personal Projects",,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner",Work,30,0,50,20,0,0,"Computer Vision,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Traditional Workstation",Image data,Sometimes,100GB,CNNs,"C/C++,Java,Jupyter notebooks,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,SVMs,Time Series Analysis",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,10,50,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"24,000",EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,Very useful,,Somewhat useful,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Reinforcement learning,Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Italy,24,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,Very useful,,,,,Very useful,,,,,FastML Blog,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,Workstation + Cloud service,11 - 39 hours,PhD,No,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,India,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Not Useful,Somewhat useful,,,Very useful,Somewhat useful,,Very useful,,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",< 1 year,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Recommendation Engines,Bayesian Techniques,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,29,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by non-profit or NGO,Other,Deep learning,Matlab,University/Non-profit research group websites,"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Programmer,Researcher",Work,30,10,30,25,5,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,20 to 99 employees,Stayed the same,Don't know,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Sometimes,1GB,Ensemble Methods,"MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction",,,,,,,,,Most of the time,,,,,Rarely,,,,,,,Most of the time,,,,,,,,,,,,,25,25,10,15,25,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Rarely,"550,000",THB,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,36,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",Work,20,0,60,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Sometimes,10GB,"Decision Trees,Evolutionary Approaches,Random Forests,Regression/Logistic Regression,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,SVMs",,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,40,30,20,0,10,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Sometimes,,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Engineer,Self-taught,10,50,0,10,30,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,South Korea,29,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,SQL,Google Search,"Blogs,Textbook",,Very useful,,,,,,,,,,,,,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",5-10 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Miner,Predictive Modeler",University courses,20,40,0,40,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Random Forests,R,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,"1,000 to 4,999 employees",Increased significantly,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,R,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation",Sometimes,,,,,,Often,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,,,,,,,,70,15,5,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Sometimes,Often,,,Often,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,10-25% of projects,More internal than external,Business Department,none,correctly clean data ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,"84,000",PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,South Africa,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Other,Other,Python,Google Search,"Arxiv,College/University,Conferences,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,Researcher,Self-taught,19,10,60,10,1,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Mix of fields,"1,000 to 4,999 employees",Increased slightly,6-10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data",Never,100GB,"Bayesian Techniques,CNNs,HMMs,Random Forests","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,HMMs,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Most of the time,Often,,,Most of the time,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,10,20,0,15,15,40,Enough to refine and innovate on the algorithm,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,Most of the time,,,Often,,,,,,,,Often,,,100% of projects,Entirely internal,Other,Imagenet; Oasis; Anything I can download to demonstrate an algorithm,Labelling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Git,Subversion",Rarely,550000,ZAR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters",,Very useful,,,,,Very useful,Somewhat useful,,,,,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,High school,Internet-based,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,NoSQL,R,SQL,Tableau",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Often,Often,,,,,,,,Often,,,,,,,Often,,,,,,,Sometimes,,,,40,40,0,20,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools",,Often,,,Often,,,,Most of the time,,,,Sometimes,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Never,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,6 to 10 years,"Data Scientist,Researcher",Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"5,000 to 9,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,Often,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Often,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,Often,,,,30,20,10,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Most of the time,Most of the time,Most of the time,,,Often,Most of the time,Most of the time,Most of the time,Sometimes,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,100% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"53,000",GBP,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,Cloudera,Deep learning,Python,"GitHub,Google Search,Government website",Online courses,,,,,,,,,,,Very useful,,,,,,,,"FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,GANs,Neural Networks,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,TensorFlow",Most of the time,Most of the time,,Most of the time,Often,,,Most of the time,Often,,,,,,Most of the time,,Most of the time,,,Sometimes,Often,Often,Most of the time,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,Often,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics",,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,10,60,10,10,10,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Mercurial",Sometimes,1400000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Italy,26,"Not employed, but looking for work",,,,,,,,,,,,"College/University,Conferences,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,,,,,,,,Very useful,,,,,,1-2 years,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,I haven't started working yet,Self-taught,70,10,10,10,0,0,Other (please specify; separate by semi-colon),"Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,35,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Matlab,GitHub,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,Logistic Regression,A doctoral degree,Non-profit,"5,000 to 9,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,Gradient Boosted Machines,"Java,MATLAB/Octave",,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,80,0,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Privacy issues",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,10-25% of projects,,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,48000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Singapore,32,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Newsletters,Textbook,YouTube Videos,Other",,,,,,Somewhat useful,,Not Useful,,,,,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",5-10 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Other",Other,0,0,0,0,0,100,"Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Taiwan,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Government website,"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Data Analyst,Work,20,10,50,20,0,0,Time Series,"Bayesian Techniques,Logistic Regression",Primary/elementary school,Non-profit,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods","C/C++,IBM SPSS Statistics,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Python,R,SAS Enterprise Miner,Statistica (Quest/Dell-formerly Statsoft)",,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,Sometimes,,,,,,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Logistic Regression,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,Often,,,,,,,,,,Often,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Talking Machines Podcast",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,PhD,Yes,I did not complete any formal education past high school,,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important +Male,Spain,20,"Not employed, but looking for work",,,,,,,,SAS JMP,Regression,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Miner,University courses,0,0,40,60,0,0,Time Series,Support Vector Machines (SVMs),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,India,22,Employed part-time,,,No,Yes,Machine Learning Engineer,,,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,,,,Nice to have,Nice to have,,,Nice to have,Nice to have,,,,,"Coursera,edX,Udacity",,,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",60,20,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,,,,,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Military/Security,100 to 499 employees,Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,"CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Python,R,TensorFlow",,Sometimes,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,,Often,,,,Sometimes,,,,,,,,Often,,,,Most of the time,Sometimes,,,,,,,,,,,,,30,30,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Sometimes,,,Often,,,Often,,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,800000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Brazil,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,Python,I collect my own data (e.g. web-scraping),Company internal community,,,,Somewhat useful,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,Self-taught,50,20,20,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Orange,Python,Spark / MLlib,SQL,TensorFlow",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Most of the time,Often,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Most of the time,Often,Sometimes,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,Often,Most of the time,,Most of the time,,Often,,,Often,Most of the time,Often,,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,Most of the time,Most of the time,,,,,,Sometimes,,,,Often,,100% of projects,More internal than external,Standalone Team,kaggle datasets;quandl datasets;web-scraped data,Lack of important domain knowledge input,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Engineer",Self-taught,75,0,10,15,0,0,Speech Recognition,Logistic Regression,A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,France,50,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Neural Nets,Python,Google Search,"Arxiv,College/University,Personal Projects",Somewhat useful,,Somewhat useful,,,,,,,,,Very useful,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,"Data Analyst,Data Miner,Machine Learning Engineer,Operations Research Practitioner",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Evolutionary Approaches,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important, +Female,Italy,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Data Scientist,University courses,0,10,0,80,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Basic laptop (Macbook),Other,Most of the time,100MB,Bayesian Techniques,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Naive Bayes",,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,0,0,0,0,80,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations in the state of the art in machine learning,,,,,,,,,,,,Often,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,1000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Stack Overflow Q&A,Other",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,Researcher,Self-taught,45,0,30,20,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,500 to 999 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,Regression/Logistic Regression,"R,SAS Base,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,Often,,,,,Sometimes,,,,,"Data Visualization,Logistic Regression,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Often,Sometimes,,,Sometimes,,,,50,15,15,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,,,,Often,,51-75% of projects,Entirely internal,Other,"macroeconomic outlooks, financial market data","data quality, poor documentation, no clear business interpretation of data field content","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Sometimes,170000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,50,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,Python,"GitHub,Google Search,Government website","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,,,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,,,,Necessary,,,Necessary,,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,Other (please specify; separate by semi-colon),,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,Norway,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,45,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"FastML Blog,FlowingData Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Other,University courses,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Singapore,25,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Personal Projects,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,1 to 2 years,Software Developer/Software Engineer,University courses,10,20,20,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Decision Trees,Neural Networks","Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Data Visualization,Natural Language Processing,Neural Networks,Text Analytics",,,,,,,Often,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,,,,,20,10,50,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,Sometimes,Often,,,,,,,,,,Sometimes,Often,,,,Most of the time,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,Social media dataset,Incomplete dataset,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,"Git,Other",Sometimes,65000,SGD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Other,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,Flume,,,,"Blogs,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,20,20,20,20,10,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Gradient Boosted Machines,No education,,"10,000 or more employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10TB,,Oracle Data Mining/ Oracle R Enterprise,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,10,30,10,20,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,Most of the time,,,,,,,Most of the time,Often,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,Access,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,,Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Support Vector Machines (SVM),,,"Kaggle,Personal Projects,Podcasts",,,,,,,Very useful,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Computer Vision,Neural Networks - CNNs,Primary/elementary school,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Video data",Sometimes,10MB,"CNNs,Neural Networks","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,kNN and Other Clustering,SVMs",,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,60,0,0,40,0,0,Enough to run the code / standard library,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics,Employed by government",Jupyter notebooks,Random Forests,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Professional degree,,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Speech Recognition,,,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Greece,45,Employed full-time,,,Yes,,Other,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Jupyter notebooks,Other,Java,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Company internal community,Conferences,Newsletters,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,30,20,0,10,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Financial,100 to 499 employees,Stayed the same,3-5 years,Some other way,Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,Other","Amazon Web services,C/C++,Hadoop/Hive/Pig,MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk,Other",,Rarely,,Often,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,Rarely,,,Often,,,Most of the time,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Segmentation,Text Analytics,Other,Other",Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Sometimes,,Most of the time,,,,Often,,,,Most of the time,,,Rarely,,Sometimes,Rarely,,50,5,5,5,10,25,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",Most of the time,Often,Sometimes,,Sometimes,,,Sometimes,,,,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,Postcode database (for geographical area and coordinates mapping); Automobile specifications database (for auto insurance projects),Obtaining historical data; obtaining clear definitions of dataset contents and assumptions,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,91500,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Google Search,University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,,,,,,,,,,,,,,,"Laptop or Workstation and local IT supported servers,Other",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Other,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Natural Language Processing,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,10,5,0,80,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Most of the time,1TB,"CNNs,Ensemble Methods,Neural Networks","C/C++,Perl,Python,SQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Ensemble Methods,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,Sometimes,Most of the time,,Most of the time,,,Most of the time,,,,,,,,,Sometimes,,Most of the time,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,,,,,30,20,40,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,Often,Sometimes,,,,,,,,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,80000,SGD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Other,31,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,40,20,20,10,0,"Machine Translation,Natural Language Processing","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,100GB,"CNNs,Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Natural Language Processing,Neural Networks,RNNs",,,,Often,,,,,,,,,,,,,,,Often,Often,,,,,Often,,,,,,,,,20,60,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Often,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git",,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,45,0,10,5,0,"Computer Vision,Speech Recognition","Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important +A different identity,United States,31,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Other,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Computer Scientist,University courses,0,20,0,70,10,0,Reinforcement learning,"Bayesian Techniques,Neural Networks - CNNs",A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Germany,26,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Management information systems,1 to 2 years,I haven't started working yet,University courses,5,25,0,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,South Korea,56,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,C/C++,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Other,20 to 99 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Most of the time,Most of the time,Sometimes,Often,,,Often,,Sometimes,,Often,,Sometimes,Sometimes,,Sometimes,,Often,,,,Sometimes,,,,,,,50,20,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,,,,,,,,,Sometimes,,,,Often,,,,Often,Often,,76-99% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Other,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Personal Projects",,Very useful,Somewhat useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Increased significantly,1-2 years,Some other way,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Relational data,Other",Most of the time,100MB,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,SVMs","Java,NoSQL,Orange,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,Sometimes,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Rarely,Often,Most of the time,Often,Rarely,,Sometimes,,,,,Sometimes,Most of the time,,Often,Often,,Often,Sometimes,Sometimes,Often,,Most of the time,,,,5,70,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,Sometimes,Sometimes,,,,,,Sometimes,Often,,,,,,Often,,,,,100% of projects,Entirely internal,Standalone Team,openweathermap,Performance measuring; Comparabilty of performance between different models,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Malaysia,26,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,Supervised Machine Learning (Tabular Data),,,Mix of fields,100 to 499 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Never,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Cloudera,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,Sometimes,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,Segmentation,SVMs",,,,Rarely,,,Most of the time,Sometimes,Often,,,Sometimes,,Sometimes,,Most of the time,,,,Rarely,,,Often,,Rarely,Often,,Sometimes,,,,,,80,10,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,Often,,,,,Often,,Often,Sometimes,,,,Often,,,,Rarely,,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic cloud file sharing software (Dropbox/Box/etc.),,,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Other",,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),1 to 2 years,Engineer,University courses,25,0,25,50,0,0,Other (please specify; separate by semi-colon),"Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Other,Basic laptop (Macbook),Other,Rarely,1MB,"Evolutionary Approaches,Neural Networks,RNNs,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,,,,,Often,Most of the time,,,Sometimes,,,,Sometimes,,,,,,Often,Most of the time,,,,Often,Often,,Sometimes,,Often,,,,15,55,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Most of the time,,,,,Often,,,,,,Often,,,100% of projects,Entirely internal,Other,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Never,25000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Psychology,1 to 2 years,Researcher,Self-taught,50,25,25,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,I prefer not to answer,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,Regression/Logistic Regression,"IBM SPSS Modeler,IBM SPSS Statistics,R",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling",,,,,,,,,,,,,,,,Often,,,,,Often,Sometimes,,,,,,,,,,,,50,10,10,10,20,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Rarely,,INR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Turkey,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Nice to have,Nice to have,,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,edX,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Ukraine,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Java,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,"Coursera,DataCamp,edX",Traditional Workstation,,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,,,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,Japan,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,R,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,0,0,75,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data,Text data",Sometimes,10GB,CNNs,"C/C++,Python,SQL,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,30000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Data Scientist,University courses,20,20,40,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Sometimes,,,,,Often,,,,Often,,Often,Often,,,,Sometimes,,,Sometimes,,,Often,Often,,,,20,25,15,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization",Sometimes,Often,,,,Often,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,Find signals in it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Never,1800000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Linear Digressions Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Recommendation Engines,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Spain,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Genetic & Evolutionary Algorithms,Julia,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Trade book",Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner","Online courses (coursera, udemy, edx, etc.)",60,10,5,0,25,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Workstation + Cloud service",Relational data,Always,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Rarely,,,Sometimes,Often,Often,Often,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Often,,Sometimes,Sometimes,,Most of the time,Sometimes,Sometimes,Sometimes,Sometimes,,,,70,20,5,0,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,,,,,10-25% of projects,Entirely internal,Business Department,Geoespatial data,Data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,C/C++,Deep learning,Python,GitHub,"Blogs,College/University,Friends network,YouTube Videos",,Very useful,Very useful,,,Very useful,,,,,,,,,,,,Very useful,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),Linear Digressions Podcast",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,11 - 39 hours,Github Portfolio,Yes,Master's degree,Computer Science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,22,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,,Very useful,,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Work,40,5,40,10,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Insurance,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Rarely,100MB,"Bayesian Techniques,Regression/Logistic Regression","Python,SQL,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Time Series Analysis",Sometimes,,Often,,,,Most of the time,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,Rarely,,,Most of the time,,,,Often,,,,5,15,0,25,55,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,Often,Often,Sometimes,,,,,Most of the time,,,26-50% of projects,More internal than external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Rarely,38000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Denmark,30,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by government,R,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses",,Somewhat useful,,,,,Somewhat useful,,,Not Useful,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,,6 to 10 years,"Business Analyst,Researcher",Self-taught,50,50,0,0,0,0,Time Series,,A master's degree,Telecommunications,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,Regression/Logistic Regression,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,Rarely,Often,,,,,,Sometimes,,,,Often,,,,,,,Often,,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,450000,DKK,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Denmark,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,C/C++,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Very useful,Very useful,,,,,< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,No,Master's degree,Electrical Engineering,Less than a year,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),40+,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,50,10,10,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Italy,26,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,R,Time Series Analysis,R,"Google Search,University/Non-profit research group websites","Arxiv,College/University,Personal Projects,Textbook,YouTube Videos",Very useful,,Very useful,,,,,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,1 to 2 years,I haven't started working yet,Self-taught,80,0,0,20,NA,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",High school,Academic,I don't know,Decreased slightly,Don't know,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Other,Don't know,,"Regression/Logistic Regression,Other","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,Sometimes,Sometimes,,,,,,Rarely,,,,,Sometimes,Most of the time,,,Most of the time,,,,0,NA,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,Sometimes,,Sometimes,Most of the time,,,Rarely,Most of the time,,,,,Sometimes,Often,Most of the time,,,100% of projects,,Other,,,,,,,Sometimes,12000,EUR,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Female,Other,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Unix shell / awk,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,40,35,10,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,1GB,"Regression/Logistic Regression,Other","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling",,,,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,30,15,5,25,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,Sometimes,,,,Often,,,,Often,,,,,,Often,Often,,76-99% of projects,More internal than external,Other,Kaggle datasets,Getting companies to agree to share their data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Git,Never,34500,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Switzerland,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Somewhat useful,Not Useful,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Not Useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,1 to 2 years,"Researcher,I haven't started working yet",Self-taught,95,0,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Academic,,,,,"N/A, I did not receive any formal education",Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Simulation",,,,,,Most of the time,Often,Most of the time,Rarely,,,Often,,,,,,,,Often,,,Sometimes,,,,Most of the time,,,,,,,50,20,2,13,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,Often,,,,,,,,Sometimes,,,,76-99% of projects,Entirely internal,Other,,Data readin/out and disagreement between simulation and real data,"Column-oriented relational (e.g. KDB/MariaDB),Other",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,0,CHF,Other,9,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Newsletters,Personal Projects,Textbook,YouTube Videos,Other",Very useful,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,,,,Very useful,,,Very useful,,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Data Analyst,Researcher",Self-taught,25,20,30,0,25,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Internet-based,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Image data,Never,1GB,"CNNs,Gradient Boosted Machines","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks",,,,,,Most of the time,Most of the time,,,,,Sometimes,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,10,40,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Sometimes,,Sometimes,Sometimes,,,Often,,Often,,,Often,,,Most of the time,,,10-25% of projects,More internal than external,Central Insights Team,"NASA.org-for-MODIS-8-GIS-Data, USDA-for-agricultural-trading, CME-Agriculture-data",Most of data source don't provide well API and you need to work around by writing crawler and extractor for individual source.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,558000,THB,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,"Ensemble Methods (e.g. boosting, bagging)",Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,"Data Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,,"Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,70,0,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,55000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Czech Republic,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,Somewhat useful,,Very useful,Somewhat useful,"Data Elixir Newsletter,FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,20,50,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",I don't know/not sure,Telecommunications,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,Often,,,,Often,,,,,Often,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,Often,,,Often,Most of the time,Often,Often,,,,,Sometimes,,Often,,Sometimes,,,Sometimes,,Often,,,Sometimes,,,,Often,,,,60,5,20,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Privacy issues,Unavailability of/difficult access to data",,,,Often,,,,Sometimes,,,,,,,,,Most of the time,,,,Often,,100% of projects,Entirely internal,Standalone Team,,Data integration,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Subversion,Sometimes,288000,CZK,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Israel,31,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,GitHub,"Arxiv,College/University,Kaggle,Podcasts,Stack Overflow Q&A",Very useful,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,10,10,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Video data,,10MB,"CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Sometimes,,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,,Often,,,,Often,Sometimes,,Most of the time,,,,,Sometimes,,,,,,50,15,0,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,Often,,,,Most of the time,,Most of the time,,,,Most of the time,,10-25% of projects,Approximately half internal and half external,Other,,Coordination,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Bitbucket,Rarely,40000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects",,Somewhat useful,,,Not Useful,,Very useful,Somewhat useful,,,Very useful,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,10GB,Other,"NoSQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"A/B Testing,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Recommender Systems,Segmentation,Text Analytics",Most of the time,,,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,Most of the time,,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Privacy issues",,,,Often,,,,,Sometimes,,,,,,,,Often,,,,,,100% of projects,Entirely internal,Other,,Different formats from different clients,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,"Git,Other",Never,37000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Australia,33,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,"DataCamp,edX,Other","Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Physics,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Adversarial Learning,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,France,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,1,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,C/C++,Neural Nets,Python,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,35,40,10,10,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Technology,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,Often,,,,"Decision Trees,Logistic Regression",,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,15,35,25,10,15,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,51-75% of projects,More internal than external,Central Insights Team,public data;,clean the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,12000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Official documentation,Textbook",,,Very useful,,,,,,,Very useful,,,,,Somewhat useful,,,,"Data Elixir Newsletter,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,43,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,,Python,I collect my own data (e.g. web-scraping),Newsletters,,,,,,,,Somewhat useful,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,,Experience from work in a company related to ML,No,Master's degree,Other,I don't write code to analyze data,Business Analyst,Other,100,0,0,0,0,0,,,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Female,Russia,21,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Official documentation,Stack Overflow Q&A",,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Researcher",University courses,10,0,20,20,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,I don't know,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Always,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner,Spark / MLlib",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,Often,Often,,Sometimes,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,,,,Often,Most of the time,Often,Often,Most of the time,,,Often,,Often,,Often,,Often,Sometimes,,Often,,Often,Often,,Sometimes,,Sometimes,Sometimes,,,,,10,50,10,20,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,,,,Most of the time,,,,Often,Often,,Often,Sometimes,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,an ability to talk with source data builders,Document-oriented (e.g. MongoDB/Elasticsearch),"Company Developed Platform,Email",,Mercurial,Sometimes,"540,000",RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Analyst,Data Scientist,Researcher",Self-taught,15,20,60,5,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Retail,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),Orange,Python,R,RapidMiner (free version),SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,Rarely,,Sometimes,,Most of the time,,Rarely,,Sometimes,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Recommender Systems,Segmentation",Sometimes,Sometimes,,,Often,,Often,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,Rarely,,Often,,,,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Often,,,Sometimes,,,,Often,,,,,,,Often,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,SAP licensing issues,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,60000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Malaysia,49,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Regression,Python,Government website,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Nice to have,,,,"DataCamp,edX",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Google Search,"Arxiv,Blogs,Conferences,Friends network,Personal Projects,Trade book,YouTube Videos",Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,,,Somewhat useful,,,,Somewhat useful,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,40,40,0,20,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,30,10,0,10,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Video data,Text data,Relational data",Sometimes,,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Often,,Often,,Often,,,,,Sometimes,,,Most of the time,,,,Often,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,Often,Sometimes,,,Most of the time,Sometimes,,,,,,,,Often,,,,Sometimes,,,Rarely,Rarely,,Sometimes,,,Most of the time,,,,,25,25,15,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization",Often,Sometimes,,,,Often,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Entirely external,Standalone Team,,finding source,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,80000,LKR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Egypt,31,Employed full-time,,,No,Yes,Data Miner,Perfectly,Employed by a company that performs advanced analytics,Python,Decision Trees,R,"Google Search,I collect my own data (e.g. web-scraping)",Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,< 1 year,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Data Analyst,Data Miner,Predictive Modeler,Other",Self-taught,100,0,0,0,0,0,,Decision Trees - Random Forests,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,Germany,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Researcher",Self-taught,50,20,30,0,0,0,Computer Vision,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data",,1GB,"Bayesian Techniques,CNNs,SVMs","C/C++,Jupyter notebooks,KNIME (free version),Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,Rarely,Often,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,Often,Most of the time,,Sometimes,,,Sometimes,,Often,,,,,,50,20,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Other",,,,,,,,,Sometimes,,,,,,,,,,,,,Often,51-75% of projects,More internal than external,Other,MNIST; CIFAR10; Imagenet; SCAPE; FAUST,"Correctly normalizing, preprocessing and calibrating the data (images and resulting 3D reconstructions)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,28000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Researcher,Self-taught,30,25,15,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,100GB,"CNNs,SVMs","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Logistic Regression",Sometimes,,,Often,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,35,10,10,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",Sometimes,,,,Most of the time,,,,Sometimes,,,,,,Often,Sometimes,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,Only proprietary,Lack of structure and quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,260000,AED,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,24,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,O'Reilly Data Newsletter,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,Support Vector Machines (SVMs),A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Sweden,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),Other","Company internal community,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,,Very useful,Not Useful,Not Useful,Very useful,,,Not Useful,"KDnuggets Blog,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,40,20,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1GB,"Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,QlikView,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Often,,,,,,,,,Often,,,,,,,Often,,,"Data Visualization,Decision Trees,Ensemble Methods,Random Forests,Time Series Analysis",,,,,,,Often,Often,Often,,,,,,,,,,,,,,Often,,,,,,,Rarely,,,,30,20,20,20,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,,,Sometimes,Sometimes,,,,Often,,,,Often,,,,,,,,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Sometimes,498000,SEK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Bayesian Methods,SQL,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Company internal community,Tutoring/mentoring",,,,Not Useful,,,,,,,,,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Other",Other,60,0,0,0,0,40,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Other,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,,Other,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,QlikView,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,Sometimes,Sometimes,,,,,Rarely,,,,Sometimes,,,Often,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",Often,,,,,,Often,Often,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,Data Parity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Never,1624400,INR,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,66,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,"Partially Derivative Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Other,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,30,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important +Male,Taiwan,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Monte Carlo Methods,Python,Google Search,"Blogs,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,20,0,0,60,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,Singapore,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,Very useful,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,50,20,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,Regression/Logistic Regression,"Python,SAS Enterprise Miner,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,Often,,,Most of the time,,Most of the time,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,Often,Often,,,,,,,Often,,,,10,30,10,10,40,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",,,,,Most of the time,,,,Most of the time,,,,,,Sometimes,,Most of the time,,,,,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,55000,SDG,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,1 to 2 years,"Business Analyst,Data Scientist,Programmer",Self-taught,50,15,30,0,5,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,,Ensemble Methods,"NoSQL,Python,R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,Often,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Random Forests",,,,,,,Often,,,,,,,Often,,,,Sometimes,,,,,Often,,,,,,,,,,,40,20,20,20,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data",,Often,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Never,190000,CNY,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Spain,47,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Excel Data Mining,Deep learning,SQL,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Official documentation",,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,< 1 year,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,,Less than a year,Other,Self-taught,100,0,0,0,0,0,"Adversarial Learning,Survival Analysis,Unsupervised Learning",Decision Trees - Gradient Boosted Machines,Primary/elementary school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Egypt,48,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,Spark / MLlib,Decision Trees,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Newsletters",,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Engineer",University courses,20,20,10,40,10,0,Outlier detection (e.g. Fraud detection),,A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,,10GB,"Bayesian Techniques,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,SQL",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Sometimes,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Random Forests,Segmentation,Simulation,Time Series Analysis",,Often,Often,,Often,,,,,,,,,,,,,,,,,,Often,,,Often,Often,,,Often,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues",Often,,,,,,,,Often,,Often,,Often,,,,Often,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,12000,USD,Has decreased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Engineer,Fine,Employed by non-profit or NGO,R,,R,GitHub,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Other,500 to 999 employees,Stayed the same,Don't know,A career fair or on-campus recruiting event,Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Don't know,,Regression/Logistic Regression,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,20,20,20,30,10,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,Often,,,,,,Often,,,,,,,,51-75% of projects,Do not know,Other,,,,Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,122000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,Employed part-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Somewhat useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Predictive Modeler,Programmer",Self-taught,50,0,30,20,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1MB,Bayesian Techniques,"Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Text Analytics",,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,10,30,40,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,Often,,,,,Often,,,,,Most of the time,,Sometimes,,Sometimes,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,50000,PKR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"Arxiv,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Computer Vision,Neural Networks - CNNs,High school,Manufacturing,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Image data,Rarely,100GB,"CNNs,GANs","Amazon Web services,C/C++,Python,SQL",,Sometimes,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"CNNs,Cross-Validation,GANs",,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,40,30,10,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,I don't know,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Always,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,RNNs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Orange,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,Often,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Often,,,Most of the time,,,,Most of the time,,,,,Often,Often,Most of the time,,,,,,,Most of the time,,,,25,40,5,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,65000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,,,Necessary,,Necessary,Nice to have,Necessary,,Necessary,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,25,0,55,0,0,"Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Taiwan,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,,,,,,,Arxiv,Very useful,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,I haven't started working yet",Self-taught,50,10,10,20,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,,,,,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation","Image data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,Most of the time,,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,Most of the time,Sometimes,,Sometimes,,Most of the time,,,Sometimes,Sometimes,Sometimes,,,,50,10,20,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,Most of the time,,,,,Often,,,Often,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Email,Other",GoogleDrive,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,0,TWD,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Male,Ukraine,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,40,25,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,Less than one year,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,Rarely,,,Often,,Sometimes,Often,,,Often,,Sometimes,,Often,,Sometimes,,,Sometimes,,Often,,,Often,,,,,,,,50,30,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,10-25% of projects,More internal than external,Business Department,,the size of the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"10,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,New Zealand,27,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Julia,Regression,Julia,GitHub,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,O'Reilly Data Newsletter,1-2 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,A social science,,"Data Analyst,Data Miner,Predictive Modeler,Researcher",Work,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,Italy,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Newsletters,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Not Useful,,,,,Somewhat useful,,,,,Somewhat useful,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",3-5 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,3 to 5 years,I haven't started working yet,University courses,45,5,0,50,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Julia,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist",Self-taught,60,20,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,,,,,,,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Often,,,,,,,Often,,,Often,Often,,,,,Often,Often,Often,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,Hungary,25,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,29,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,TensorFlow,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,3 to 5 years,"Business Analyst,Data Miner,Researcher,Statistician",Self-taught,50,0,10,30,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"5,000 to 9,999 employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,QlikView,R,RapidMiner (free version),SAS Base,Tableau,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,Sometimes,,,,Most of the time,Rarely,Most of the time,,Rarely,,,Rarely,,,,,,,Sometimes,Often,,,Sometimes,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Often,,Sometimes,,,Sometimes,Most of the time,Most of the time,,Sometimes,,,,Most of the time,,Most of the time,Often,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,Often,,,,,,Most of the time,Rarely,,,,50,10,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Often,Sometimes,,,Often,Most of the time,,,Most of the time,,Most of the time,,Sometimes,Often,Often,,Often,,,Most of the time,Sometimes,Most of the time,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"89,000",EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Portugal,47,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,,Other,Google Search,"Official documentation,Online courses,Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,Python,"Government website,University/Non-profit research group websites","Arxiv,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,Very useful,,Very useful,,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"GPU accelerated Workstation,Traditional Workstation",11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Researcher,University courses,25,30,15,30,0,0,"Recommendation Engines,Unsupervised Learning",Ensemble Methods,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Brazil,41,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,1-2 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Unnecessary,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,A social science,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,Very Important,Very Important,Very Important,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,10,40,10,30,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Retail,100 to 499 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,,,Often,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",,,,,,,Often,Often,,,,,,Sometimes,,Most of the time,,,,,Often,,,,,Most of the time,,,,,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,10-25% of projects,More internal than external,Standalone Team,,Data is not consistent. Lots of changes in data over time.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,700000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,10 to 19 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,10GB,"Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,RapidMiner (commercial version),SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,Rarely,,,,,,,,Often,,,,Sometimes,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,Time Series Analysis",,,,Sometimes,,Often,,,,,,,,Often,,Sometimes,,,,Sometimes,,,Often,,Sometimes,,,,,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Often,,Often,,,,,,Sometimes,,,,,Sometimes,,Often,,,Often,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Never,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Chile,40,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Poorly,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by non-profit or NGO,Employed by government,Self-employed",Python,Monte Carlo Methods,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Newsletters,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,,,,Very useful,,Very useful,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation,Other",11 - 39 hours,Other,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Other,0,50,0,50,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,Very Important,,,, +Male,Italy,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,Google Cloud Compute,Anomaly Detection,,University/Non-profit research group websites,Podcasts,,,,,,,,,,,,,Very useful,,,,,,"Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Researcher",Self-taught,90,10,0,0,0,0,"Adversarial Learning,Machine Translation","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Sometimes,1GB,"Decision Trees,Ensemble Methods,GANs,RNNs","DataRobot,Google Cloud Compute,Java,TensorFlow",,,,,,Often,,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"kNN and Other Clustering,Prescriptive Modeling,Recommender Systems",,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,Often,Often,Often,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Mercurial,,100000,AFN,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Germany,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Miner,Data Scientist",University courses,20,40,40,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",A doctoral degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests","Hadoop/Hive/Pig,R,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,,Rarely,,,,,,,Rarely,Often,,Sometimes,Sometimes,Most of the time,,Often,Most of the time,,,Often,,,Sometimes,Often,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,Rarely,Sometimes,Often,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Other,,Git,Most of the time,120000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Hong Kong,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Data Analyst,University courses,50,0,20,30,0,0,"Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Retail,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,10GB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Simulation,Text Analytics",,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Often,,,,,,,Most of the time,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Always,"400,000",HKD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,,,,,,,Blogs,,Very useful,,,,,,,,,,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,80,0,0,20,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Text data,Other",Always,10GB,Neural Networks,"Google Cloud Compute,Spark / MLlib",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,"CNNs,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Neural Networks",,,,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,0,0,0,0,0,0,,"Dirty data,Other",,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,10,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),0 - 1 hour,,No,I did not complete any formal education past high school,,,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important +Male,Denmark,33,Employed full-time,,,Yes,,Data Miner,,,NoSQL,Neural Nets,Python,Google Search,"Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,"Business Analyst,DBA/Database Engineer,Researcher",Work,25,25,50,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",Primary/elementary school,Telecommunications,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,NoSQL,Python,SQL",,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Logistic Regression,Random Forests",Sometimes,,,,,Most of the time,,,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,60,20,15,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Sometimes,Often,,Rarely,Often,,,,,Most of the time,,,Sometimes,,,Most of the time,Sometimes,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,Poland,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,Google Search,"Blogs,Company internal community,Conferences,Kaggle,Stack Overflow Q&A,Textbook",,Very useful,,Not Useful,Somewhat useful,,Somewhat useful,,,,,,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",20,20,20,35,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Tableau,Other",,Rarely,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Most of the time,,,,Sometimes,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Sometimes,Rarely,,,Most of the time,Most of the time,Often,Sometimes,,,Often,,Most of the time,Sometimes,Most of the time,,,,,,Often,Often,Often,,Sometimes,Sometimes,,Often,Often,,,,50,15,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Most of the time,Most of the time,,,,,Sometimes,,,Often,,,,Most of the time,Often,Sometimes,,Often,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,100000,PLN,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Friends network,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Operations Research Practitioner",University courses,50,30,0,20,0,0,"Adversarial Learning,Time Series","Bayesian Techniques,Evolutionary Approaches",A doctoral degree,Government,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,<1MB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,Spark / MLlib,SQL,Stan",,Rarely,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,Often,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Sometimes,Rarely,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Simulation,Time Series Analysis",,,Often,,,Most of the time,Often,,,,,,Sometimes,Often,,Most of the time,,Sometimes,,,,,Sometimes,,,,Most of the time,,,Often,,,,10,20,60,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",Most of the time,,,,,Often,,,Sometimes,Most of the time,Sometimes,,Often,,Most of the time,,,,,,,,100% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Distributed file store ; iRods,Git,Sometimes,30000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Google Search,"Blogs,Conferences,Kaggle,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,Less than a year,"Computer Scientist,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",90,0,0,0,10,0,"Computer Vision,Speech Recognition","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,Somewhat important +Male,Taiwan,27,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,60,10,10,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Workstation + Cloud service","Text data,Other",Sometimes,1GB,"Bayesian Techniques,HMMs,Markov Logic Networks,Neural Networks,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,TensorFlow",Rarely,Rarely,,,,,,,Sometimes,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Most of the time,Most of the time,,Most of the time,,Sometimes,,,,,Most of the time,Often,,,,Often,Most of the time,Often,,,Sometimes,Often,Often,,,Most of the time,Most of the time,Most of the time,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,karsterson;wikipidia;dictionary;,data preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,800000,TWD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Female,Switzerland,23,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,Google Search,"College/University,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Somewhat useful,,,,Somewhat useful,,Very useful,,Very useful,,,,Very useful,"Talking Machines Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",3-5 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,20,0,20,60,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,,,,,,,,,,Very useful,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,10,0,0,90,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,52,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,Other,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",60,20,0,0,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Italy,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,,,,Very useful,Very useful,Very useful,Somewhat useful,Not Useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",University courses,10,5,30,50,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Government,20 to 99 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Orange,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Often,,,Often,Most of the time,,,,,,Most of the time,,,,,,,,,Rarely,,,Most of the time,,Rarely,,,Often,Often,Sometimes,Sometimes,,,Sometimes,Often,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,Sometimes,Most of the time,,,Often,Most of the time,Often,Sometimes,,,Sometimes,,Sometimes,,Often,,Most of the time,Most of the time,Sometimes,Sometimes,,Often,,,,,,Most of the time,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,private datacenter,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,42,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,18,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,1 to 2 years,"Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100MB,"CNNs,Decision Trees,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Mathematica,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Often,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,CNNs,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Segmentation,Simulation,Text Analytics,Time Series Analysis",Most of the time,,,Sometimes,,,Most of the time,,,,,,,,,Sometimes,,,,Often,,,,,,Often,Often,,Most of the time,Most of the time,,,,40,10,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,Sometimes,,,,Most of the time,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",,Git,Always,40000,USD,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Turkey,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,R,GitHub,"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist",Work,40,20,40,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,CRM/Marketing,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,QlikView,R,SQL,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Collaborative Filtering,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Most of the time,,,,Sometimes,,,Often,,,,,,,,Often,Sometimes,Sometimes,,,,,Most of the time,Most of the time,,Often,,,Sometimes,Most of the time,,,,65,15,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Most of the time,Often,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Business Department,"AWS, SQL Server",,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,84000,TRY,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Python,,,"GitHub,Google Search",Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,Less than a year,Researcher,Self-taught,60,20,5,11,4,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Academic,100 to 499 employees,Stayed the same,Don't know,Some other way,Not very important,Other,Basic laptop (Macbook),Text data,Never,,Neural Networks,"C/C++,Java",,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,10,20,30,5,20,15,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,nil,knowledge base of machine learning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,100000,INR,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Switzerland,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other,Other",,Somewhat useful,,,Somewhat useful,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,"Data Scientist,Software Developer/Software Engineer",Work,5,25,70,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,10 to 19 employees,Stayed the same,3-5 years,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Don't know,1TB,"CNNs,Decision Trees,GANs,Neural Networks,Random Forests,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Other",,,,,Sometimes,,,,Often,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Often,,,,Often,,,Most of the time,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,GANs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs",Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,,,,,,80,5,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Often,Often,Sometimes,,,,,Sometimes,,Often,,,Most of the time,Sometimes,,76-99% of projects,More internal than external,Business Department,GoogleAnalytics,Understanding the data because sometimes even IT people do not know what the data field is.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,140000,CHF,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Spain,46,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,Random Forests,R,Google Search,"Blogs,Kaggle,Official documentation,Online courses",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,20,40,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series",Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Decreased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,100GB,"Decision Trees,Random Forests","Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,Often,,,,,,,,,,,,,,Often,,,,Most of the time,Often,,,Most of the time,,,,0,40,0,30,40,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,Often,,,,,,Often,Most of the time,,,Most of the time,Often,,,,,,,,Often,,26-50% of projects,More internal than external,Business Department,"Google, Geographic and Statistical Public Offices",Customer Behaviour Prediction,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,75000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Online courses,YouTube Videos",,,,,,Very useful,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"Data Elixir Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Recommendation Engines,"Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,South Africa,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,Python,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Work,30,25,30,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - RNNs",High school,Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,HMMs,Neural Networks,RNNs","Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Simulation,Text Analytics,Time Series Analysis",,,,,,Often,Often,Often,,,,,Sometimes,Sometimes,,,,Sometimes,Most of the time,Often,,,,,,,Sometimes,,Often,Often,,,,15,40,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT",Often,,,,Often,Often,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Less than 10% of projects,More external than internal,Other,CMU; Leipzig,cleaning and extracting a meaningful representation to model,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,git,Git,Sometimes,430000,ZAR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Netherlands,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,10,0,0,75,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,Spark / MLlib,Other",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,"kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Simulation,Text Analytics",,,,,,,,,,,,,,Sometimes,,Often,,Often,Often,Often,,,,,,,Sometimes,,Often,,,,,20,50,20,0,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,,,,Often,,,,Often,,,,Often,Often,,Often,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,62000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,France,49,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Podcasts,Textbook",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,,,A master's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,Never,10GB,"Bayesian Techniques,Neural Networks,Other",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,Often,,Sometimes,,Sometimes,,Sometimes,Often,,,,,,,,,Often,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Often,,,,,,,,,Often,,,,Often,,Sometimes,Most of the time,,26-50% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Git,Never,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",1-2 years,Necessary,,,,Necessary,,,Necessary,,,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Doctoral degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Not important,Not important,,,,,,,,,,,,, +Male,Brazil,25,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,Government website,"Blogs,College/University,Official documentation,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,No,Bachelor's degree,Computer Science,1 to 2 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",9,80,0,10,1,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Other,37,Employed part-time,,,Yes,,Data Miner,,"Employed by college or university,Employed by government",RapidMiner (commercial version),Cluster Analysis,C/C++/C#,Google Search,"Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Miner",University courses,30,10,20,20,0,20,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",No education,Academic,20 to 99 employees,Increased slightly,Don't know,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,RapidMiner (free version),Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Most of the time,,,Most of the time,Sometimes,Often,,,,,,,,Sometimes,,Most of the time,,Sometimes,,Sometimes,,,,,,Often,Most of the time,Sometimes,,,,25,25,15,10,25,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,Most of the time,,,Often,Most of the time,,,,,,,,,,,,,51-75% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Social media sites,Other,Rarely,,,Other,5,,,,,,,,,,,,,,,,,, +Female,Australia,25,"Not employed, but looking for work",,,,,,,,Tableau,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,,,,Not Useful,Very useful,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"DataCamp,edX",Gaming Laptop (Laptop + CUDA capable GPU),40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,Other,University courses,20,20,0,60,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Italy,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,30,10,20,25,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,10 to 19 employees,Increased significantly,6-10 years,Some other way,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Most of the time,,Often,,Often,,,Often,,Often,,,Most of the time,,Often,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Unavailability of/difficult access to data",,Often,,,Often,,,,,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Central Insights Team,"Kaggle datasets, uci repository",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,"Bitbucket,Git,Subversion",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,32,Employed part-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Conferences,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,"Partially Derivative Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer",University courses,30,5,20,45,0,0,Natural Language Processing,"Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,,HMMs,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,Most of the time,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Association Rules,Data Visualization,Natural Language Processing,Text Analytics",,Rarely,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,20,20,35,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",slack,Git,Sometimes,90,AUD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,32,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Somewhat useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,A health science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",0,33,0,33,0,34,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,Germany,27,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Very useful,Somewhat useful,Very useful,,Not Useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Researcher,Statistician",University courses,50,5,0,10,35,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Academic,20 to 99 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Other,,1GB,"Gradient Boosted Machines,Regression/Logistic Regression","R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,5,10,0,15,20,50,Enough to refine and innovate on the algorithm,"Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Rarely,Often,,,,,,Often,,Sometimes,,,10-25% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,"30,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,65,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,TensorFlow,Deep learning,Matlab,GitHub,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,,,,Manufacturing,,,,,,,,,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,100,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,22,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,C/C++,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Kaggle,Newsletters,Online courses,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,Very useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","IBM Watson / Waton Analytics,Python,R,TensorFlow",,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation",,,,Sometimes,,Most of the time,,Often,Most of the time,,,,,Often,,,,,Most of the time,Often,,,Often,,Often,Sometimes,,,,,,,,70,20,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Privacy issues",,,,Rarely,Rarely,,,,,,,,,,,,Rarely,,,,,,,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Female,Singapore,32,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by professional services/consulting firm,Tableau,Deep learning,R,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Hungary,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Anomaly Detection,R,Google Search,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,Less than a year,"Business Analyst,Data Scientist",Work,25,10,60,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased significantly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Rarely,,,,,,,,Rarely,Often,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,Often,,,,,,,,Often,,,,,,,Often,,,,,,,Rarely,,,,60,10,15,10,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,Often,Most of the time,,,Sometimes,,,,,,,Often,Sometimes,,,,Rarely,Often,,76-99% of projects,Entirely internal,Standalone Team,-,Cleaning it and transforming,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,400000,HUF,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Spain,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Random Forests,Python,University/Non-profit research group websites,"Blogs,College/University,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,,,,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,,Logistic Regression,A bachelor's degree,Other,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Other,Never,1GB,Regression/Logistic Regression,"Jupyter notebooks,Mathematica,Python,R,SAS Base,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,Rarely,,,,,Sometimes,,,,Rarely,,,,Rarely,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,Sometimes,,,,,,,Sometimes,,,,,Sometimes,,,,Often,,,,40,10,0,35,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,Often,,,,,,Often,Often,,,,100% of projects,More internal than external,IT Department,NOAA database; Aemet Open Data,Getting useful data out of a dirty database,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,24000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Spain,42,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Regression,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,"Business Analyst,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",0,60,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,A professional degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation",,,,,,,Most of the time,,,,,,,,,Often,,,,,,Often,Often,,,Often,Often,,,,,,,30,10,5,25,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Most of the time,Often,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,,,,Most of the time,,100% of projects,Approximately half internal and half external,Standalone Team,,Enriquecerlo,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,40000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Pakistan,28,"Not employed, but looking for work",,,,,,,,IBM SPSS Modeler,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Podcasts,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,,,Very useful,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Mathematics or statistics,Less than a year,Statistician,Kaggle competitions,40,0,0,20,40,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Taiwan,51,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Decision Trees,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Australia,53,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,"FastML Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,NA,90,0,0,10,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,"5,000 to 9,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,Random Forests",,,,,,,,Often,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,50,5,0,0,45,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Need to coordinate with IT,Privacy issues",,Often,,,,,,,,,,,,,Often,,Sometimes,,,,,,None,Entirely internal,Central Insights Team,Don't know ,Understanding what the data represents ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,150000,AUD,Other,8,,,,,,,,,,,,,,,,,, +Female,Japan,21,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Non-Kaggle online communities,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,Very useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Biology,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,50,20,20,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,,Other,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression",,,,,,,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,20,5,8,7,60,0,Enough to tune the parameters properly,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Often,Sometimes,,100% of projects,Do not know,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,576000,JPY,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Ireland,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",5,50,20,0,25,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Pharmaceutical,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Most of the time,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,Sometimes,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,,Sometimes,,Often,Most of the time,,Often,,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,Often,,Most of the time,,Sometimes,,,,Often,,,,,25,25,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,,,,,Sometimes,Sometimes,,,,,,,,,,Often,Sometimes,,76-99% of projects,Approximately half internal and half external,Other,,Scientific data is highly biased towards positive results,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Never,50000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Italy,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer",University courses,100,0,0,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Technology,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service",Text data,Sometimes,100MB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Google Cloud Compute,Java,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,Rarely,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,,,,,Often,,Most of the time,,,,"Cross-Validation,Evolutionary Approaches,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,,,,Rarely,,,,,,,,,Most of the time,Often,Often,,,,,,,Often,Most of the time,Often,,,,20,80,0,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,Sometimes,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,Standalone Team,social,lack of data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,filesystem,"Bitbucket,Git",Most of the time,65000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,SQL,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",5,75,0,10,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Singapore,26,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Other,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,"FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,6 to 10 years,Data Analyst,Self-taught,80,0,0,20,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,I don't know,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Video data",Don't know,100GB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,RNNs","C/C++,Julia,Python,Stan,TensorFlow",,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests,RNNs,Segmentation",,,Often,Most of the time,,Often,,,,,,Rarely,Often,Sometimes,,,,Rarely,,Most of the time,,,Rarely,,Most of the time,Sometimes,,,,,,,,30,30,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,Often,Sometimes,Rarely,,,Often,Sometimes,,Most of the time,Often,Often,,,,,Sometimes,,Most of the time,,76-99% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Bitbucket,Rarely,70000,SGD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Denmark,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Microsoft Azure Machine Learning,Decision Trees,SQL,,"Company internal community,Online courses,Personal Projects,YouTube Videos",,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Statistician,University courses,20,10,50,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A master's degree,Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10GB,Regression/Logistic Regression,"Microsoft Excel Data Mining,QlikView,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,Often,Often,,,,20,10,20,40,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,,,,,,,,,,Often,,,51-75% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint,Other",Tableau,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed part-time,,,No,Yes,Other,Poorly,Self-employed,IBM SPSS Modeler,Deep learning,C/C++/C#,GitHub,"Arxiv,Online courses",Not Useful,,,,,,,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Other,Less than a year,Other,Self-taught,100,0,0,0,0,0,Adversarial Learning,Neural Networks - RNNs,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,New Zealand,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,C/C++,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,100MB,Other,"R,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Simulation",,,,,,,Often,,,,,,,Often,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,40,20,10,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues",Often,,,,,,,,Sometimes,,,,,,,,Often,,,,,,76-99% of projects,Entirely internal,Other,None,Data set security,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,France,48,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Online courses,Personal Projects,Trade book",,,,,Very useful,,,,,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer,Other",Self-taught,20,10,30,20,0,20,"Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches",A master's degree,Academic,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",Always,10MB,"Decision Trees,Evolutionary Approaches,Random Forests,Other","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,Often,,,,Often,Most of the time,Sometimes,,Sometimes,,,,Most of the time,,Rarely,,,Often,Sometimes,,,Sometimes,,,,,Sometimes,Often,,,,,10,20,0,10,5,55,Enough to refine and innovate on the algorithm,"Limitations of tools,Other",,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,51-75% of projects,Entirely internal,Other,students,be clear in my explanantions,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",web ,"Git,Other",Rarely,48000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,Google Search,"College/University,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",,Programmer,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Greece,29,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Textbook",,Very useful,Very useful,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,30,20,30,0,0,"Speech Recognition,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I prefer not to answer,Decreased slightly,Don't know,,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Don't know,<1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Julia,Jupyter notebooks,MATLAB/Octave,Python,R,Other,Other",,,,,,,,,,,,,,,,Rarely,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,Most of the time,Most of the time,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics",,,Often,,,Most of the time,Most of the time,Most of the time,Often,,,Sometimes,,Often,,Often,,Most of the time,Sometimes,Sometimes,Sometimes,,Most of the time,,,Often,Sometimes,Most of the time,Sometimes,,,,,20,40,10,30,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,Often,,,Sometimes,Most of the time,,76-99% of projects,Approximately half internal and half external,Standalone Team,classification; regression; speech recognition,build robust learners for small labeled ratios,,"Email,Other",clouds,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,4000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Text Mining,SQL,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Other,Self-taught,40,20,40,0,0,0,Recommendation Engines,,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Always,10GB,,"QlikView,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,25,15,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,365000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Matlab,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,0,50,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","C/C++,MATLAB/Octave,Python,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Often,,,Often,,,,,,,,,,,Often,Often,,,,,,,Often,,,,,,10,50,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Generic cloud file sharing software (Dropbox/Box/etc.),,75000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Java,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,,Very useful,,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,I haven't started working yet,Self-taught,50,25,25,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches",A doctoral degree,Academic,I don't know,Stayed the same,Don't know,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,,1GB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Simulation",,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,Often,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,15,30,45,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Often,,,,,100% of projects,Do not know,Other,aflowlib,appropriate parsing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,scp,"Bitbucket,Git",Rarely,18000,GBP,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Pakistan,22,Employed part-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,Very useful,,,Very useful,,Somewhat useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,"CNNs,Random Forests,Regression/Logistic Regression","C/C++,KNIME (free version),MATLAB/Octave,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,Sometimes,,,,Most of the time,Most of the time,,,,,,,,Most of the time,Often,,,,Often,Sometimes,,,,,,,,,,,,,25,25,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Most of the time,,Most of the time,Often,,,,,,Most of the time,Sometimes,Often,,Most of the time,,,Most of the time,Most of the time,Often,,10-25% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,,180000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Germany,32,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,Data Scientist,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,France,27,Employed full-time,,,Yes,,Data Miner,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Deep learning,Python,Government website,"Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Programmer",University courses,30,10,20,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",CRM/Marketing,500 to 999 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10GB,"Random Forests,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Segmentation",Sometimes,,,,,,Sometimes,Often,,,,,,,Rarely,Most of the time,,,,Sometimes,,,Most of the time,,,Often,,,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database",Often,,,Sometimes,,,,,Most of the time,,,,Sometimes,,,,,Sometimes,,,,,51-75% of projects,More internal than external,Central Insights Team,density population; zip codes coordinates,ensuring data quality when gathering data from different data sources,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,Network LAN/ external drive,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,35000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,DataRobot,,,,"Non-Kaggle online communities,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,,Text data,,,,"Amazon Machine Learning,Amazon Web services,Impala,Python,Tableau",Often,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Data Science results not used by business decision makers,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Perl,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Time Series",,High school,Financial,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Always,100GB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Often,,,,60,5,5,20,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,16000,EUR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Switzerland,59,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A humanities discipline,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Insurance,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,,Often,,,Most of the time,,Sometimes,,Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,Rarely,,Sometimes,,,Sometimes,Sometimes,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Often,,,,Sometimes,Rarely,,Sometimes,,,,,Often,,,Often,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,100000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Blogs,College/University,Online courses,Textbook",,Very useful,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Australia,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Researcher,Other",University courses,10,0,0,80,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Female,France,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,SQL,Decision Trees,Java,University/Non-profit research group websites,"Blogs,Podcasts,Trade book,YouTube Videos",,Somewhat useful,,,,,,,,,,,Somewhat useful,,,Very useful,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Other (please specify; separate by semi-colon)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,36,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Other,"Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,Other,10,0,0,0,0,90,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,Belgium,27,Employed full-time,,,Yes,,Statistician,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,Somewhat useful,,,Very useful,"Data Machina Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,60,0,35,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Orange,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Sometimes,,Often,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,,,,Often,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,SVMs,Text Analytics,Time Series Analysis",Rarely,,Rarely,,,Rarely,Often,Rarely,,,,,,,,Often,,Often,,,,,,,,,,Often,Often,Rarely,,,,25,25,5,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,,,,,Often,,,Often,,,Often,,Often,Often,Often,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,,Sometimes,35000,EUR,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,0,20,40,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,I prefer not to answer,Stayed the same,Don't know,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,,,Sometimes,Often,,,,,,,Often,,,Most of the time,,,,Rarely,,,,,Often,Most of the time,,,,,10,30,55,5,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,Dbpedia,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,72000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Self-employed,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Telecommunications,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",,Relational data,Never,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,30,20,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,"Geographical data, household/census data",Volume,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Rarely,200000,AUD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,India,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Self-employed,,,,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,30,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,"Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,Often,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (free version),Deep learning,Python,GitHub,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Management information systems,6 to 10 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",5,30,20,45,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,40,10,15,15,20,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Privacy issues",,,,,Often,,,,Sometimes,,,,,,,,Sometimes,,,,,,51-75% of projects,Entirely internal,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Bitbucket,Never,55000,EUR,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Finland,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,,,,Very useful,,Very useful,Somewhat useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",University courses,90,0,0,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Java,Jupyter notebooks,Python,R,Spark / MLlib,Stan,TensorFlow,Unix shell / awk",,,,,Most of the time,,Sometimes,,Most of the time,,Sometimes,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,Sometimes,,,Most of the time,,Most of the time,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,Often,,Most of the time,Most of the time,Sometimes,Sometimes,Most of the time,,Sometimes,Rarely,Sometimes,Sometimes,Often,,Often,Most of the time,Often,Often,Often,Sometimes,,Often,Often,Often,Often,Most of the time,Most of the time,,,,20,40,10,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,I prefer not to say,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Often,Often,,,Most of the time,,,,Most of the time,,,Sometimes,,,,Often,,100% of projects,More external than internal,Other,,access,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial,Subversion",Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Engineer,Poorly,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,10,20,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,,Don't know,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Never,100MB,Decision Trees,"Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS JMP",,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,Often,,Often,,,,,,,Sometimes,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,40,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,,Never,42000,EUR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,"Employed by professional services/consulting firm,Self-employed",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Online courses,Personal Projects,YouTube Videos",,Very useful,,,Very useful,,,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","GPU accelerated Workstation,Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,Less than a year,"Programmer,Software Developer/Software Engineer,Other",Self-taught,60,30,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,India,26,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,DataCamp,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,20,30,10,30,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Romania,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,Spark / MLlib,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"edX,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important +,,NA,"Not employed, but looking for work",,,,,,,,,Bayesian Methods,,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,< 1 year,,,,,,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Other,No,Doctoral degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Amazon Machine Learning,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Analyst,Data Scientist,Researcher",Other,10,10,50,20,1,9,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Ensemble Methods,Logistic Regression",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,Rarely,Rarely,,,,,Often,,Rarely,,Rarely,,Often,,Sometimes,,Sometimes,,,,Often,Rarely,Most of the time,,Rarely,,,Rarely,Rarely,Rarely,Rarely,Most of the time,,,Most of the time,,,Often,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,Sometimes,,,,,,Rarely,,Often,,,Rarely,Rarely,Rarely,Most of the time,Sometimes,,,Rarely,,,Sometimes,Sometimes,,,,70,10,0,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,I prefer not to say,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,Most of the time,,,,,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,100% of projects,More internal than external,Standalone Team,WSJ,Clean up data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,100000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Switzerland,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,,,Very useful,,,Very useful,,Very useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Data Scientist,Programmer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,India,24,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Perl,,Python,,"Blogs,Newsletters,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,Very useful,"FastML Blog,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",< 1 year,Unnecessary,,Nice to have,,Nice to have,,,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,Somewhat important,,,,,,, +Male,India,41,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,0,0,30,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,United Kingdom,60,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Random Forests,Python,I collect my own data (e.g. web-scraping),"Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),15+ years,,,,,,,,,,,Necessary,,,,Other,2 - 10 hours,Other,No,Some college/university study without earning a bachelor's degree,,,Other,Work,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,United States,38,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Podcasts,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Other,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,University courses,50,10,0,40,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,Poland,28,"Not employed, but looking for work",,,,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,A social science,1 to 2 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",50,25,0,0,25,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,South Africa,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,DataRobot,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,Not Useful,Not Useful,Somewhat useful,,,,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,25,0,5,0,,"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,,Random Forests,"Microsoft Excel Data Mining,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,Often,,Often,,,,Often,,,,,Sometimes,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Segmentation,Time Series Analysis",Sometimes,,,,,,Often,Often,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,30,10,20,30,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,Sometimes,Often,Often,,,,Most of the time,,,,Often,,,100% of projects,More external than internal,Business Department,,Silos,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Spain,42,Employed full-time,,,No,Yes,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Orange,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,Coursera,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Professional degree,,Less than a year,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Female,India,21,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,Very useful,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Other,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,I haven't started working yet",University courses,30,10,20,30,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Bayesian Methods,Haskell,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Machine Learning Engineer,University courses,15,15,0,70,0,0,"Computer Vision,Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Increased slightly,Don't know,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,10GB,"CNNs,Ensemble Methods","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,Often,Sometimes,,Often,,,,,Often,,,,Often,,Most of the time,Often,,,,,,,,,,,,,35,30,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,ImageNet;VK,Combining different datasets in one,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial",Never,1350000,RUB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search","Arxiv,Blogs,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Physics,,Data Scientist,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Belgium,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects",,Very useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Stayed the same,Less than one year,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Never,1TB,CNNs,"Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,Most of the time,,Often,Most of the time,,,,,,,Sometimes,,,,,,Often,Often,,,,,Often,,,,,,,,40,10,0,10,10,30,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,Other",Most of the time,,,,Sometimes,Sometimes,,,Most of the time,,Most of the time,,Often,,Most of the time,,Most of the time,,,,,Most of the time,10-25% of projects,More internal than external,Other,Some pretrained models,Collecting validated data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,47000,EUR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",35,30,0,0,35,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,SVMs",,,,Often,,,Often,Sometimes,,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,Often,,,,,Often,,,,,,20,40,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Often,,10-25% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,,,I am not currently employed,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,47,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,,Python,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Other,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,R,"Google Search,University/Non-profit research group websites","College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,3-5 years,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Female,France,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Data Miner,University courses,0,25,5,70,0,0,,,"Some college/university study, no bachelor's degree",CRM/Marketing,10 to 19 employees,Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","IBM SPSS Statistics,Perl,QlikView,R,SAS Base,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,Most of the time,,,,,Rarely,,,,Most of the time,,,Sometimes,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Segmentation",,,Rarely,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,10,25,5,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Git,Subversion",,34000,EUR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Spain,41,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Never,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,Most of the time,,Sometimes,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,Sometimes,Sometimes,,,Often,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Most of the time,Sometimes,,,Sometimes,,Sometimes,,Most of the time,,Sometimes,,Most of the time,Often,,Often,,Sometimes,Sometimes,,Often,Sometimes,Sometimes,,,,20,25,0,5,0,50,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,Often,,,Often,,Sometimes,Sometimes,Often,,Sometimes,Sometimes,,Often,Often,,10-25% of projects,More internal than external,Business Department,kaggle; UCI Machine Learning Repository,Working with datasets of hundred or thousand of GB size.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",AWS S3,Git,Rarely,42000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Switzerland,29,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"College/University,Kaggle,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,Very useful,,,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,10,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Don't know,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,65,10,0,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,Often,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,85000,CHF,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,53,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,24,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Statistician,Poorly,Employed by professional services/consulting firm,R,Neural Nets,R,GitHub,"Blogs,College/University,Conferences,Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,Basic laptop (Macbook),40+,Other,Yes,Bachelor's degree,Engineering (non-computer focused),,Statistician,University courses,20,20,20,30,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,Somewhat important,,,,,Somewhat important,,,,,,,,, +Male,Italy,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,0,10,10,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Other,24,Employed part-time,,,Yes,,Programmer,Fine,Employed by college or university,Jupyter notebooks,Neural Nets,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Image data,,,"Neural Networks,Regression/Logistic Regression",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Cross-Validation,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Netherlands,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Neural Nets,Matlab,University/Non-profit research group websites,"College/University,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,A social science,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Netherlands,34,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by non-profit or NGO,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Predictive Modeler,Researcher,Statistician",University courses,5,5,35,50,5,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United Kingdom,33,Employed full-time,,,Yes,,Other,Poorly,Employed by government,TensorFlow,Text Mining,Python,"Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,,Very useful,,,Somewhat useful,"Partially Derivative Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Other,Work,50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Government,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Never,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,Sometimes,,,,,"Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction",,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,,50,20,0,30,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,Often,,,,,,,,,Most of the time,,,100% of projects,More internal than external,Standalone Team,"RefSeq, TCGA, GEO","Dealing with small sample sizes, clarifying the question to answer","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"Dropbox, MoinMoin wiki","Bitbucket,Git",Rarely,40000,GBP,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Somewhat useful,,,Very useful,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Self-taught,20,30,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",,,,,,Most of the time,Often,Most of the time,,,,,,Often,,Often,,,Most of the time,Sometimes,Sometimes,,Most of the time,Often,,,,,Most of the time,,,,,40,30,5,10,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",Rarely,,,,Often,,,,,,,Often,Often,,,,,,,,Often,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,15600,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United Kingdom,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,"GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,5,15,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,I don't know,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Other,"Basic laptop (Macbook),Other","Text data,Relational data",,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,R,SQL",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,SVMs",,,,,,Often,Often,Sometimes,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,Sometimes,,,,,,83,10,1,4,2,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,Sometimes,,,,,,Most of the time,,,,,Often,,,10-25% of projects,Entirely internal,Other,,,Other,"Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,39.5,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Kenya,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Recommendation Engines,Neural Networks - CNNs,A doctoral degree,Other,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,1GB,CNNs,"Amazon Web services,C/C++,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,NoSQL,SQL,TensorFlow",,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,Rarely,,Most of the time,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,CNNs,Data Visualization,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Recommender Systems,RNNs,Text Analytics",Sometimes,,,Rarely,,,Often,,,,,,,,,Sometimes,,,Rarely,,,Rarely,,Sometimes,Often,,,,Sometimes,,,,,10,20,30,30,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Sometimes,,,,,Often,,Most of the time,Often,Most of the time,,,Rarely,Sometimes,,Most of the time,,,,,,,Less than 10% of projects,More internal than external,IT Department,kaggle,not enough information on the data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Email,,Git,Never,1200000,KES,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Taiwan,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Engineer,Machine Learning Engineer,Researcher",University courses,0,10,80,0,10,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Relational data,Most of the time,1GB,"Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,Python,R,Spark / MLlib,Tableau,TensorFlow",,,,,Sometimes,,,,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,Rarely,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",Sometimes,Often,Rarely,,Often,Often,,,Often,,,Often,Rarely,,,Often,,,Often,,Sometimes,,Often,Often,,Often,,Often,Sometimes,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Privacy issues",,Sometimes,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,,35000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,South Korea,29,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,,Very useful,,Very useful,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,40+,Master's degree,Sort of (Explain more),Bachelor's degree,A social science,Less than a year,I haven't started working yet,Other,50,0,0,0,0,50,Recommendation Engines,Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,,,,,,,,,,,,,,,, +Male,Malaysia,26,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,SQL,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Very useful,,,,,Not Useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",,Master's degree,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,0,0,0,80,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,42,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,R,GitHub,"Blogs,College/University,YouTube Videos",,Not Useful,Somewhat useful,,,,,,,,,,,,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,25,15,0,50,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Academic,10 to 19 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data",Sometimes,1GB,"Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Random Forests,Segmentation,SVMs",,,Sometimes,,,,,,,,,,,,,,,,,,,,Often,,,Rarely,,Often,,,,,,25,35,15,10,15,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,Rarely,,,,Rarely,,,,,,,10-25% of projects,,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)","Email,Share Drive/SharePoint",,,Rarely,,,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Neural Nets,R,Google Search,"Online courses,Podcasts,YouTube Videos",,,,,,,,,,,Very useful,,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,Italy,36,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Sometimes,10MB,"Neural Networks,Regression/Logistic Regression","C/C++,Java,Python,R,Spark / MLlib,SQL",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Logistic Regression,Neural Networks,Simulation",,,,,,Sometimes,Often,,,Sometimes,,,,,,Sometimes,,,,Often,,,,,,,Often,,,,,,,10,30,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,Often,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,20000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,India,30,"Not employed, but looking for work",,,,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",SQL,Google Search,"YouTube Videos,Other",,,,,,,,,,,,,,,,,,Somewhat useful,KDnuggets Blog,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Predictive Modeler,Other",Work,0,20,60,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Belgium,20,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Fine,Self-employed,Python,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,KDnuggets Blog,< 1 year,Nice to have,,Nice to have,,Necessary,,Nice to have,Nice to have,,,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Germany,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Text Mining,Python,I collect my own data (e.g. web-scraping),"College/University,Company internal community,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,,University courses,25,10,40,15,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","C/C++,Google Cloud Compute,KNIME (free version),Mathematica,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),SQL,Tableau",,,,Rarely,,,,Rarely,,,,,,,,,,,Rarely,Rarely,,Sometimes,,Rarely,,,,,,,Often,,Most of the time,,Rarely,,,,,,,Rarely,,,Rarely,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,Often,,,,Often,Often,Often,,,,Often,,,,Sometimes,,,,,Rarely,Rarely,Often,,,Sometimes,,,,Often,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Textbook",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,"Data Machina Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,20,20,0,50,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,Regression/Logistic Regression,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Prescriptive Modeling",,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,30,30,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Sometimes,,,,,,,Rarely,,,,,,,76-99% of projects,More internal than external,IT Department,,,Other,Other,S3,"Bitbucket,Git",Sometimes,40000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,No,Yes,Predictive Modeler,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,,,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,"Predictive Modeler,Programmer,Researcher",Self-taught,25,50,0,0,25,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,49,Employed full-time,,,No,Yes,Data Analyst,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",RapidMiner (free version),Time Series Analysis,Python,"Government website,University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,O'Reilly Data Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,A social science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Pakistan,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Google Cloud Compute,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,30,10,15,40,5,NA,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,,100MB,"Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,SVMs",,,,,,Often,Often,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,20,40,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,,Sometimes,,,,,,,,,,,Often,Often,,26-50% of projects,Entirely internal,IT Department,"stocks, Cifar",Not enough data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",flash disks,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,20000,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Malaysia,24,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,Not Useful,Somewhat useful,,,Very useful,,,,Very useful,"Data Machina Newsletter,Data Stories Podcast,FastML Blog",< 1 year,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Work,20,25,40,15,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Netherlands,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,65,0,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,I prefer not to answer,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Don't know,1MB,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R,Tableau,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,Rarely,,,,Most of the time,,,"Cross-Validation,Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,Sometimes,Sometimes,,,,10,5,10,70,5,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Limitations of tools,Privacy issues",,,,,,,,,,,Often,,Often,,,,Often,,,,,,76-99% of projects,Entirely internal,Other,NA,NA,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Box,Git,Sometimes,,INR,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Israel,46,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,,Python,,"Podcasts,Stack Overflow Q&A",,,,,,,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Adversarial Learning,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",High school,Manufacturing,500 to 999 employees,,3-5 years,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100MB,Neural Networks,"Cloudera,Impala,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,Often,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,0,100,0,0,0,0,Enough to tune the parameters properly,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,Less than 10% of projects,Do not know,Other,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests","Hadoop/Hive/Pig,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Tableau",,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,Sometimes,Often,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Recommender Systems,Text Analytics",,Often,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,,Often,,,,,20,10,5,25,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,76-99% of projects,Entirely internal,IT Department,,"Multiple data sources. No clear understanding of expections.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Subversion",Rarely,17000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Germany,27,"Not employed, but looking for work",,,,,,,,Other,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Laptop or Workstation and local IT supported servers,11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,25,10,30,30,5,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,Russia,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Other,"Kaggle,Online courses",,,,,,,Not Useful,,,,Very useful,,,,,,,,No Free Hunch Blog,< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,"Gradient Boosting,Logistic Regression",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important +Male,Russia,36,Employed full-time,,,No,Yes,Data Miner,Poorly,Employed by a company that doesn't perform advanced analytics,SAS Enterprise Miner,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,30,5,0,15,0,"Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,,,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important +Male,Spain,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",Very useful,Somewhat useful,Not Useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Other",Self-taught,50,0,50,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Not at all important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,Often,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,Sometimes,,Sometimes,,,Most of the time,Most of the time,Sometimes,,Sometimes,Sometimes,Most of the time,Sometimes,,Sometimes,Most of the time,Often,,,,30,40,5,10,5,10,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,,,,,,,,,Often,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,"Wikipedia, Imagenet",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,100000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Conferences,Friends network,Kaggle,YouTube Videos",Very useful,,,,Very useful,Somewhat useful,Somewhat useful,,,,,,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,"Natural Language Processing,Recommendation Engines",Neural Networks - CNNs,High school,Technology,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Always,10GB,"CNNs,Neural Networks","Java,Jupyter notebooks,Python,SQL,Other",,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Most of the time,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Natural Language Processing,Neural Networks,Recommender Systems",Often,,,Most of the time,Sometimes,Most of the time,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,,,,,20,20,10,10,20,20,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others",,,,,Often,Often,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Rarely,400000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,C/C++/C#,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),,Github Portfolio,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,70,30,0,0,0,0,Time Series,Neural Networks - RNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Physics,3 to 5 years,I haven't started working yet,Self-taught,30,30,0,0,40,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,United Kingdom,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,24,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,,University courses,0,20,30,50,0,0,"Natural Language Processing,Recommendation Engines",Bayesian Techniques,A bachelor's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,100GB,,"C/C++,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,Most of the time,,,,,,,,Most of the time,Often,,,,,,,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,60,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,10-25% of projects,More external than internal,,,,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,IRR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Company internal community,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,,University courses,10,10,30,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Often,Often,Often,Sometimes,,,,,Sometimes,,Sometimes,,Rarely,,,,,Sometimes,,,Sometimes,,Rarely,,Sometimes,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,,Most of the time,,,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Git",Rarely,1500000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,33,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Statistician,Other",Work,40,10,50,0,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning",,High school,Other,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,10GB,"Decision Trees,Random Forests","Cloudera,Flume,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,Tableau,Other,Other",,,,,Most of the time,,Sometimes,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Often,,,,Sometimes,Sometimes,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation,Simulation,Time Series Analysis",Sometimes,Rarely,,,,,Most of the time,Sometimes,,,,,,Often,,,,,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,30,20,0,20,30,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,,Most of the time,,,,,,,,Sometimes,,76-99% of projects,Entirely internal,Other,None,data munging/map-reduce,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Other",internal web resource,Other,Never,35000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,,Employed by professional services/consulting firm,DataRobot,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,NoSQL,Python,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,Sometimes,,,,Most of the time,,,,Most of the time,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,Often,,,,Often,Often,,,,Often,,Often,,Often,,,Often,,Often,,Often,Often,,Often,,,Often,,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,920000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,NA,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Textbook",,,Very useful,Very useful,,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",9,35,30,25,1,0,"Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,,Other,"Jupyter notebooks,Python,R,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,Rarely,,,"Logistic Regression,Random Forests,Simulation",,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,Often,,,,,,,20,35,10,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Sometimes,,,,,,Sometimes,,,,,,Often,,,51-75% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,France,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook",Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Miner,Engineer,Programmer",University courses,20,10,10,50,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Technology,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Other,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis,Other",,,,,,,Most of the time,Most of the time,Often,,,Sometimes,,,,,,,,,Most of the time,,Most of the time,,,,,,,Most of the time,Most of the time,,,40,15,10,30,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Often,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,Sometimes,,,100% of projects,Entirely internal,IT Department,,Fetching and preparing data in an efficient way,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,39000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",30,40,0,0,30,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,Indonesia,33,Employed full-time,,,Yes,,Data Analyst,,Employed by a company that performs advanced analytics,MATLAB/Octave,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,Somewhat useful,Somewhat useful,,Somewhat useful,Not Useful,,,,Not Useful,Very useful,,,Not Useful,,Not Useful,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist",Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,10 to 19 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,Always,100GB,"Neural Networks,Regression/Logistic Regression",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,0,100,0,0,0,0,Enough to run the code / standard library,Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Often,,,,,100% of projects,Entirely internal,Other,,,Other,Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,100000000,IDR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,35,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by a company that performs advanced analytics,R,Decision Trees,R,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Predictive Modeler,Statistician",Self-taught,0,50,40,10,0,0,Unsupervised Learning,Logistic Regression,High school,Academic,10 to 19 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,30,60,0,10,0,0,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,26-50% of projects,More external than internal,Central Insights Team,,,Key-value store (e.g. Redis/Riak),Email,,Subversion,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,58,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Talking Machines Podcast,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Unsupervised Learning,"Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important +Male,France,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Friends network,Online courses,Personal Projects",,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Analyst,Data Scientist,Researcher",Self-taught,60,20,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Sometimes,,,,Often,Most of the time,Often,Rarely,,,,,,,,Often,,,,,Often,,Sometimes,Most of the time,,,,,,,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring",Sometimes,Sometimes,,Often,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,55000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Africa,37,Employed full-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Researcher,Statistician",University courses,10,10,10,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Non-profit,20 to 99 employees,Decreased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,,,Most of the time,,Sometimes,,,Most of the time,,Sometimes,,,,Often,,,Sometimes,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,,,,,Often,,,,,Often,,,,,,Often,,,76-99% of projects,Entirely internal,Other,Department of Education Data; Statistics South Africa census; Household survey data; School Enrolment data,Data Inconsistencies,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Rarely,600000,ZAR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,R,Google Search,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Not Useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Spain,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,Google Search,"Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,25,10,5,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Rarely,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,,Most of the time,Often,,,,,Sometimes,,Sometimes,,Sometimes,,,Most of the time,,Often,,,,,Often,,,,,,50,35,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,Often,,Sometimes,,,Most of the time,Sometimes,,,,Often,,Less than 10% of projects,More internal than external,Other,,My Phd. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Most of the time,14000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,NA,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A,Other",,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,,Necessary,,Necessary,,,Nice to have,,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,,,Very Important,,,Very Important,,,,,Somewhat important,,,Somewhat important, +Male,Russia,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Not Useful,,,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,,,< 1 year,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,5,15,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Female,India,36,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,edX,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,A humanities discipline,I don't write code to analyze data,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,30,"Not employed, but looking for work",,,,,,,,,Deep learning,,,"Blogs,Kaggle,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,Less than a year,Researcher,Self-taught,90,10,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Spain,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,Amazon Web services,Monte Carlo Methods,Python,,"Blogs,Online courses,Personal Projects",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,60,15,5,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Java,Python,R,Other",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests,Time Series Analysis",Sometimes,,Sometimes,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,Often,,,,35,15,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,,,,,,Sometimes,,,Sometimes,,,,,,Often,,Often,,76-99% of projects,More external than internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,,Sometimes,,,,3,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website",Company internal community,,,,Somewhat useful,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,DBA/Database Engineer",Work,0,20,60,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,20 to 99 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Text data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,SVMs","IBM SPSS Statistics,Python,R,SAS Base",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Often,,,,,,,,,,,,,,"CNNs,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,,35,35,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,,,,,,Most of the time,,,,,Often,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,360000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Sweden,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,Very useful,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important +Male,India,29,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,GitHub,"Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",Very useful,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,Very useful,,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,25,5,10,50,10,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important +Male,Germany,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,YouTube Videos",,Very useful,,,,,,,,Very useful,Somewhat useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A doctoral degree,Technology,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,Neural Networks,"Google Cloud Compute,Java,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,Sometimes,,,,Sometimes,,,,,,"CNNs,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks",,,,Rarely,,,Often,,,,,,,,,Sometimes,,,Often,Rarely,,,,,,,,,,,,,,30,40,20,10,0,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,Most of the time,,,,Often,,Often,Rarely,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,Git,Never,50000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Portugal,46,Employed full-time,,,No,Yes,Engineer,Fine,Employed by non-profit or NGO,Google Cloud Compute,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Master's degree,Yes,Master's degree,Physics,3 to 5 years,Engineer,University courses,30,15,0,50,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important +Female,United States,23,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",R,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,Researcher,University courses,20,15,20,40,5,0,Time Series,"Bayesian Techniques,Logistic Regression",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Never,10GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,Most of the time,,Sometimes,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization",Sometimes,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,5,15,10,0,Enough to tune the parameters properly,"Dirty data,I prefer not to say,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,Most of the time,,Sometimes,,,,,,,,Often,,,Most of the time,Often,,51-75% of projects,More internal than external,Standalone Team,Government public records; census,Cleaning it,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,72000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Greece,50,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by college or university,Other,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Tutoring/mentoring",,,Very useful,,,,,,,,,,,,,,Somewhat useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Engineer,Researcher",Self-taught,60,40,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Sometimes,1TB,"Bayesian Techniques,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,,,,,,,,,,,,,Sometimes,Most of the time,,,,,Often,,Often,,Most of the time,,,,0,75,20,5,0,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,100% of projects,More external than internal,Standalone Team,Free available remote sensed images,Speed,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Local Server,"Git,Mercurial,Subversion",Most of the time,20000,EUR,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Denmark,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,20,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,10 to 19 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Other,Sometimes,1GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,RNNs","C/C++,Jupyter notebooks,Python",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Neural Networks,Random Forests,RNNs,Simulation,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,Often,,,Often,,,,,,,Often,Most of the time,,,Often,,Often,,Often,,,Often,,,,40,40,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Git,Subversion",Sometimes,720,DKK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by college or university",IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Online courses,YouTube Videos",,,Very useful,,,,,,,,Very useful,,,,,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Master's degree,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,25,25,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,28,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,,KDnuggets Blog,5-10 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,,"Data Analyst,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,South Korea,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Friends network,Online courses",Very useful,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Miner,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,0,20,80,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series",,A bachelor's degree,Internet-based,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,Rarely,,,,Rarely,,,,,Rarely,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,Rarely,Sometimes,,,,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Recommender Systems",Often,Often,,,Often,Often,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,Often,,,,,,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization",,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,Nothing,Preparing to use easily,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,42000000,KRW,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Monte Carlo Methods,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,20,0,30,50,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Other",Always,100GB,"CNNs,HMMs,Neural Networks,RNNs","Amazon Web services,C/C++,Java,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Often,,Often,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Text Analytics,Time Series Analysis",,,,Rarely,,,,,,,,,,Sometimes,,,,,Often,Most of the time,Sometimes,,,,Most of the time,Often,,,Often,Often,,,,25,25,30,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Rarely,,,Often,Often,,,,Sometimes,Sometimes,Most of the time,Often,Often,,,Most of the time,Sometimes,,Sometimes,,Often,,Less than 10% of projects,More external than internal,Other,,Insufficient data available relevant to projects and domains of interest.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,NFS,"Bitbucket,Git",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Poland,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics",,Other,Python,Google Search,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,30,40,10,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Random Forests,Regression/Logistic Regression,RNNs","Google Cloud Compute,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,Often,,,,"Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,RNNs,Text Analytics",,,,,,Most of the time,,,,,,Often,,Rarely,,Often,,,Sometimes,,,,Often,,Often,,,,Often,,,,,20,60,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,,,,,,,,,Often,,,,,,Most of the time,,,Less than 10% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",I don't typically share data,,"Git,Other",Sometimes,120000,PLN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Finland,39,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Physics,Less than a year,Researcher,Self-taught,60,0,0,5,35,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,Other,29,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,Python,Text Mining,Python,,"Blogs,Conferences,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Predictive Modeler",University courses,5,5,80,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Python,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Segmentation,Text Analytics",Often,,,,,,Often,Often,Sometimes,,,,,,,Most of the time,,,Sometimes,Sometimes,,Often,,,,Sometimes,,,Sometimes,,,,,50,40,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring",,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Never,"17,000",EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Russia,21,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,I don't plan on learning a new ML/DS method,Python,Other,"College/University,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Not Useful,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"5,000 to 9,999 employees",,,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,,,"Ensemble Methods,Random Forests,Other","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",,,,,,Most of the time,,,Often,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,Most of the time,,,,,Most of the time,,,,,50,50,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Most of the time,,,,,,,,,,,,,,Most of the time,,,None,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Rarely,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Software Developer/Software Engineer",University courses,20,50,0,10,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Other,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,RNNs","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,Python,R,RapidMiner (free version),SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,Rarely,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,Rarely,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"CNNs,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,RNNs",,,,Sometimes,,,Often,,,,,,,,,Sometimes,,,Most of the time,Sometimes,,,,,Sometimes,,,,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,Often,,,Often,,,,Sometimes,,Often,,,Most of the time,,26-50% of projects,Entirely internal,IT Department,Connl data sets; Wikipedia; Wikidata; ,Creating annotated data is expensive and tedious. There are not enough free annotated data sets.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,53000,EUR,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,23,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses",,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,,No,I did not complete any formal education past high school,,,"Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Female,Pakistan,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"GitHub,University/Non-profit research group websites","Arxiv,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,10,20,0,0,"Machine Translation,Natural Language Processing","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,More than 10 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Traditional Workstation",Text data,Rarely,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,RNNs",,,,,,,,,Sometimes,,,,,Often,,Sometimes,,,Most of the time,Sometimes,,,,,Most of the time,,,,,,,,,40,40,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,Sometimes,,,,,Often,,,Sometimes,,,,10-25% of projects,More internal than external,Standalone Team,,Text preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Subversion",Rarely,40000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,25,45,30,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Financial,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SAS Enterprise Miner,SQL",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,Sometimes,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction",Sometimes,,,,,,Often,Sometimes,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,30,20,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,,,,Sometimes,,Sometimes,,Most of the time,,,,Sometimes,Often,Often,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,Git,,40000,GBP,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +A different identity,South Africa,38,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by non-profit or NGO,Employed by government",Tableau,Neural Nets,R,Government website,"Company internal community,Conferences,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,Very useful,Not Useful,Very useful,,,,Somewhat useful,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Operations Research Practitioner,Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,70,20,10,0,0,0,,,A master's degree,Financial,"5,000 to 9,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data,Other",Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,Spark / MLlib,SQL,Tableau,Other",,Sometimes,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,Sometimes,Rarely,,,Often,Sometimes,Often,,,,,,,,Sometimes,Often,,,Rarely,,,,Often,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,Sometimes,,Sometimes,Often,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,Often,,,Sometimes,Sometimes,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Often,,Sometimes,Often,Sometimes,,Often,Sometimes,,,,,Sometimes,Sometimes,,Most of the time,Rarely,Often,,Often,,76-99% of projects,More external than internal,Standalone Team,Any applicable that we can find,Building teams tuned to dataset ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint,Other",Physical hard drives,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,160000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,"Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,30,0,40,0,20,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",No education,Mix of fields,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","IBM SPSS Modeler,Python,R,SQL",,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,,,,,,Company Developed Platform,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"GitHub,Google Search,Government website","Arxiv,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,,,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,"Data Stories Podcast,FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,30,20,30,10,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Other,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Most of the time,,Often,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,,Often,,Often,Often,,Often,Often,,,Often,,Often,,Often,,Often,Often,Often,,,Often,Often,Often,,,Often,Often,,,,,40,30,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database",,Often,Sometimes,,,,,,Sometimes,Sometimes,Often,,,,,,Sometimes,Sometimes,,,,,Less than 10% of projects,More internal than external,IT Department,ImageNet;Poem,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Email",,Git,Sometimes,300000,CRC,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Italy,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,University/Non-profit research group websites,"Non-Kaggle online communities,Online courses",,,,,,,,,Very useful,,Very useful,,,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Insurance,500 to 999 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Never,100MB,"Bayesian Techniques,CNNs,Neural Networks,RNNs,SVMs","Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,TensorFlow",,,,,,,,,,,,,,,Often,,Often,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,Sometimes,,,,,,,,,,,,,,Sometimes,Often,Often,Often,,,,Often,,,Often,Often,,,,,80,5,0,15,0,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,,,,Often,Most of the time,,,Most of the time,,,,,,,,Most of the time,,,,,,51-75% of projects,Entirely external,IT Department,,Read the scan document like accident report form,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Other,Never,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,New Zealand,53,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Microsoft R Server (Formerly Revolution Analytics),Bayesian Methods,R,GitHub,"Blogs,College/University,Friends network,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,30,10,10,10,30,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,New Zealand,38,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by college or university,Other,Deep learning,Java,I collect my own data (e.g. web-scraping),"College/University,Official documentation,Personal Projects",,,Not Useful,,,,,,,Not Useful,,Very useful,,,,,,,Other (Separate different answers with semicolon),5-10 years,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,90,0,0,10,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Not important,Not important,Not important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Male,Japan,37,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Link Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Management information systems,1 to 2 years,Business Analyst,Self-taught,10,50,0,40,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Israel,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Self-employed,TensorFlow,Association Rules,Python,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher,Statistician",Other,20,20,0,60,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Most of the time,10GB,"CNNs,Random Forests,Other","Python,R,TensorFlow,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,Most of the time,Sometimes,,"CNNs,Data Visualization,PCA and Dimensionality Reduction,Random Forests",,,,Most of the time,,,Rarely,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,20,20,55,5,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely external,Standalone Team,images,prefer not to say,Other,Other,prefer not to say,Bitbucket,Never,0,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Professional degree,,6 to 10 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Association Rules,CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,35,35,20,5,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Often,,,Often,,,,,,Often,,,,,,Often,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,33,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Yes,Master's degree,Engineering (non-computer focused),,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,29,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,France,25,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Newsletters,Online courses,Personal Projects",,Very useful,,,,Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,,,,,,,"Data Elixir Newsletter,KDnuggets Blog",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,University courses,5,70,0,0,5,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Belgium,36,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"DBA/Database Engineer,Other",Self-taught,90,10,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"5,000 to 9,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,100GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Google Cloud Compute,Jupyter notebooks,KNIME (free version),Minitab,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,Rarely,,,,,,,,,Most of the time,,Rarely,,,,,,,Often,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,,,Often,,Often,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,Often,,,,35,5,1,5,5,50,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Sometimes,,Most of the time,Often,,Often,Often,,,Often,,Sometimes,,,,,,Most of the time,Sometimes,,51-75% of projects,Entirely internal,Other,,"Data access Data quality","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Company Developed Platform,Other",Samba share,"Git,Subversion",Sometimes,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Web services,Bayesian Methods,Python,Google Search,"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,40,20,20,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Russia,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,Deep learning,Python,Google Search,"Arxiv,Blogs,Online courses",Very useful,Very useful,,,,,,,,,Very useful,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Adversarial Learning,Computer Vision","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Retail,I don't know,Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,Sometimes,,Sometimes,Sometimes,,,,,50,20,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,Often,,,Often,,,,,,,,,,,Often,,,,26-50% of projects,More internal than external,Central Insights Team,,,,,,Git,Don't know,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,IBM SPSS Statistics,Random Forests,Python,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",5-10 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",0 - 1 hour,PhD,Sort of (Explain more),Doctoral degree,Psychology,Less than a year,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Statistician",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,Primary/elementary school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,NA,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,YouTube Videos",,Very useful,,,,,Not Useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",50,10,40,0,0,0,"Computer Vision,Unsupervised Learning","Gradient Boosting,Logistic Regression",A master's degree,Pharmaceutical,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,"N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,Often,Often,,,Often,,,,Most of the time,,,,,,,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,,,80,5,5,10,0,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,Most of the time,,10-25% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Shiny,Git,Sometimes,1440000,RUB,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Kenya,35,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by government,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"Arxiv,College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",3-5 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",,Experience from work in a company related to ML,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer",University courses,50,20,10,20,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,Self-taught,60,10,0,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,Other,"Amazon Web services,C/C++,NoSQL,Perl,Python,R,SQL,Tableau,Unix shell / awk,Other,Other,Other",,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,Sometimes,,,,,,,,,Often,,,Rarely,,,Most of the time,Most of the time,Most of the time,Most of the time,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,10,40,15,30,5,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Sometimes,,,Most of the time,Most of the time,,Most of the time,,,,Often,,,Often,,,,76-99% of projects,Approximately half internal and half external,Other,tagCalls; impressions data,cant say,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,300000,INR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,20,35,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow",,Often,,,,,,Often,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Often,Sometimes,Often,,,,,,,,Sometimes,Sometimes,,,,Most of the time,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,Sometimes,Often,Most of the time,Often,Sometimes,,,Sometimes,,Often,,Often,,,,Sometimes,Often,,Often,,,Most of the time,,Often,Sometimes,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",,,,,Often,Often,,Often,Often,,,,,,Often,,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Git",Rarely,700000,THB,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,France,31,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,Very useful,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,"DBA/Database Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",15,50,10,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Australia,62,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos,Other",,,,Somewhat useful,Very useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,Very useful,"No Free Hunch Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,IBM Watson / Waton Analytics,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL,Tableau,TensorFlow",,,,,,,,Rarely,,,,,Often,,,,,,,,,,Often,Often,Most of the time,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Sometimes,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,Rarely,Often,,Often,,Often,Most of the time,Sometimes,Often,,Most of the time,Sometimes,,Sometimes,,Often,Most of the time,,,,,45,15,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Sometimes,,Often,,,,Most of the time,,Sometimes,,,Often,Most of the time,,76-99% of projects,More internal than external,Central Insights Team,Many,Getting the business to fund projects. Opening teh business peoples minds to what is possible.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,180000,AUD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Scala,Google Search,"Blogs,Friends network,Kaggle,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",Self-taught,40,20,30,10,0,0,Recommendation Engines,Other (please specify; separate by semi-colon),,Technology,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Rarely,10GB,Decision Trees,"R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,"Association Rules,Collaborative Filtering,Naive Bayes",,Often,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,20,30,5,20,25,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues",,,,,,,,,Often,,,Often,,,,,Most of the time,,,,,,26-50% of projects,,,,,,,,,,1650000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Poland,25,"Not employed, but looking for work",,,,,,,,Python,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Conferences,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Other",University courses,20,0,0,80,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important +Male,People 's Republic of China,22,"Not employed, but looking for work",,,,,,,,R,Random Forests,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",40+,Master's degree,No,Master's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,20,10,20,30,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Time Series","Bayesian Techniques,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,36,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Association Rules,R,Other,"Blogs,College/University,Conferences,Official documentation,Online courses,Textbook",,Somewhat useful,Very useful,,Somewhat useful,,,,,Very useful,Very useful,,,,Very useful,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler",University courses,60,0,0,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Telecommunications,"1,000 to 4,999 employees",Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,Rarely,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests",Rarely,,,,,Rarely,Often,Rarely,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,50,20,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,Rarely,,Often,Often,,,,,,,,,,Often,,Often,,,,Rarely,,10-25% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,130000,PLN,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Neural Nets,Python,GitHub,"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,20,30,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Traditional Workstation,Other","Relational data,Other",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Often,,Sometimes,,,,,Sometimes,,Sometimes,,,,,Often,,Most of the time,,,,65,20,0,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,Most of the time,Often,,,,,,,,,,Often,Most of the time,,51-75% of projects,Entirely internal,IT Department,weather data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,31500,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Philippines,20,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Friends network,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,0,0,0,100,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important +Male,Russia,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Somewhat useful,,Not Useful,Somewhat useful,,Very useful,,Very useful,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Engineer,Researcher",University courses,20,0,0,80,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Retail,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,Most of the time,,Sometimes,,Sometimes,,,,,Often,,Often,,,Sometimes,,,,Most of the time,,,,10,40,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Most of the time,,,,,,,100% of projects,More internal than external,Business Department,,size of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,1320000,RUB,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Turkey,24,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,Very useful,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,35,10,5,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,,,,Rarely,,,,,,,,,Sometimes,,,,Rarely,Rarely,Rarely,,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,Rarely,Most of the time,,,Most of the time,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,,,,,,,Often,,Sometimes,,,,,Sometimes,,,,,Often,Often,,,Most of the time,,,,50,5,5,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,Often,,,,,,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,40000,TRY,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Support Vector Machines (SVM),Other,Google Search,"College/University,Trade book,Tutoring/mentoring,YouTube Videos",,,Not Useful,,,,,,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,< 1 year,Necessary,,Necessary,Necessary,,,,,,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Management information systems,I don't write code to analyze data,DBA/Database Engineer,Self-taught,50,0,0,50,0,0,Other (please specify; separate by semi-colon),Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,,,,,,,,,,,,,Very Important,,, +Male,Australia,25,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,Oracle Data Mining/ Oracle R Enterprise,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,Somewhat useful,,,,,Very useful,Somewhat useful,,,Not Useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,,University courses,40,10,30,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Government,"1,000 to 4,999 employees",Decreased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Java,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Often,,,,,,,,,,"Cross-Validation,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation,SVMs,Text Analytics",,,,,,Most of the time,,,Often,,,,,Rarely,,Sometimes,,,,,,,Sometimes,,,,Often,Often,Often,,,,,60,10,10,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,Often,,Sometimes,,,,Often,,Most of the time,,Rarely,,Sometimes,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,,Sometimes,80000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,France,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Self-employed",TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Not Useful,,Very useful,,Very useful,,Very useful,,,,,"Jack's Import AI Newsletter,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Researcher",University courses,25,0,50,10,15,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Other",Most of the time,10GB,"CNNs,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation",,,,Often,Rarely,Often,Often,Rarely,Often,Sometimes,,Often,,Sometimes,,Sometimes,,,,Often,Rarely,,Sometimes,Rarely,Rarely,Rarely,Rarely,,,,,,,15,25,10,5,5,40,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,DeepFashion; ImageNet; WIDER Face,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Always,18000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,,,,Necessary,Necessary,Necessary,Necessary,,Necessary,,,,Other,Basic laptop (Macbook),40+,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Sweden,39,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by professional services/consulting firm,NoSQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",0,40,60,0,0,0,Time Series,"Decision Trees - Random Forests,Ensemble Methods",A master's degree,Technology,500 to 999 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,,,,Often,Sometimes,Often,,,,,,,,,,,,,,,,,,Often,,Often,Often,,,,20,10,40,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,Privacy issues",,,,,Often,Often,,,,,,,Sometimes,,,,Most of the time,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Ireland,55,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,NoSQL,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,20,10,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,1GB,"Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests",,,,,,Most of the time,,Often,Often,Sometimes,,Often,,Sometimes,,Sometimes,,Rarely,,Often,,,Often,,,,,,,,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Rarely,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,none,none,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Sometimes,"85,000",EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Online courses,Tutoring/mentoring",,,Very useful,,,,,,,,Somewhat useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,10,10,20,55,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Text data,,100MB,"Decision Trees,SVMs","C/C++,Java,Jupyter notebooks,Mathematica,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL",,,,Sometimes,,,,,,,,,,,Often,,Often,,,Sometimes,,Rarely,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Simulation",,,,,,Most of the time,Most of the time,Often,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,,,,70,5,0,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Rarely,,,,,,Sometimes,,,,,,,,Most of the time,Most of the time,,,76-99% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,30000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,"Data Machina Newsletter,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",30,20,20,10,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Internet-based,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"CNNs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Time Series Analysis",Rarely,,,Sometimes,,Often,Often,Sometimes,Sometimes,,,,,Often,,Most of the time,,Sometimes,Most of the time,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,30,20,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Most of the time,,,Sometimes,,Sometimes,,Often,,,,Sometimes,,,51-75% of projects,More internal than external,IT Department,"Full Contact, Alexa, Diffbot",Heterogeneity,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Most of the time,"51,000",EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,Yes,,Researcher,Poorly,Employed by company that makes advanced analytic software,Julia,"Ensemble Methods (e.g. boosting, bagging)",Julia,Government website,"Arxiv,Blogs,College/University,Company internal community,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,5,5,20,70,0,0,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1PB,"Bayesian Techniques,CNNs,Ensemble Methods,GANs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Julia,Jupyter notebooks",,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Recommender Systems,Time Series Analysis",Most of the time,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,10,70,20,0,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,80000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,Very useful,,,,,Somewhat useful,,,,Somewhat useful,No Free Hunch Blog,1-2 years,Unnecessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Github Portfolio,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,50,0,0,0,50,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Neural Networks - CNNs",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,France,37,Employed full-time,,,No,Yes,Computer Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,Necessary,,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,No education,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,Other,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Unix shell / awk,Survival Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,30,40,25,0,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Tableau",Sometimes,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",Often,,Often,,,,Most of the time,Sometimes,,,,,,,Often,Sometimes,,,,,,,Sometimes,,,Often,,,,Often,,,,30,10,10,30,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,26-50% of projects,Entirely internal,Business Department,n/a,Size and lack of computational capacity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,80000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Cluster Analysis,Python,GitHub,"Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,15,5,0,0,Recommendation Engines,"Bayesian Techniques,Logistic Regression",High school,Internet-based,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,Spark / MLlib",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Recommender Systems",Often,,,,Often,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,,,,,Sometimes,,,,,,,,,Often,,,,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,2600000,INR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,42,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,,,,,,,,,,,,Very useful,Very useful,,,,,3-5 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Other,Yes,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,90,0,0,0,0,10,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +A different identity,Australia,40,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,,Other,,"Arxiv,Stack Overflow Q&A,Other",Very useful,,,,,,,,,,,,,Very useful,,,,,,1-2 years,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Other",2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,,,"Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Reinforcement learning","Evolutionary Approaches,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important +Male,New Zealand,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,Other","Arxiv,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Textbook",Very useful,,,,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,70,10,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,Often,,Sometimes,,,,Rarely,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,Often,Most of the time,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Sometimes,Sometimes,Often,,,,,,,,,,Sometimes,,,,,Rarely,,,Often,Sometimes,,Rarely,Sometimes,,,,60,30,8,1,1,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",,,Often,,,,,Often,,,Most of the time,,,,Sometimes,,Sometimes,Often,,,,,10-25% of projects,Entirely internal,Central Insights Team,Demographics ,Narrow interface of the data sources together with total incompetence of the Business Intelligence.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,120000,NZD,Has decreased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,72,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,"FlowingData Blog,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Statistician",Work,50,5,20,25,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",A professional degree,Academic,"5,000 to 9,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Decision Trees,Neural Networks,Random Forests","KNIME (commercial version),MATLAB/Octave,Python,R,SAS Enterprise Miner,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,Sometimes,,,,,,Most of the time,Sometimes,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Sometimes,,,,,,Most of the time,Often,Sometimes,,,,,Most of the time,Often,Sometimes,,,,Sometimes,Most of the time,,Most of the time,,,Most of the time,,,,Sometimes,,,,65,15,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,Most of the time,Often,,,,,,Often,,Most of the time,,,,Most of the time,,76-99% of projects,Do not know,Other,,Lack of knowledge of provenience,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,,Sometimes,500000,DKK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Hungary,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,PCA and Dimensionality Reduction",Often,Sometimes,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,,HUF,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,44,"Independent contractor, freelancer, or self-employed",,,No,Yes,Machine Learning Engineer,Fine,"Employed by a company that doesn't perform advanced analytics,Self-employed",IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,No,Master's degree,Computer Science,Less than a year,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,5,5,10,80,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - RNNs",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Friends network,Kaggle,Online courses",Somewhat useful,Somewhat useful,,Very useful,,Very useful,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,0,90,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A professional degree,Internet-based,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Never,10GB,"CNNs,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"CNNs,Neural Networks,Random Forests",,,,Often,,,,,,,,,,,,,,,,Often,,,Often,,,,,,,,,,,0,0,0,100,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,,,Sometimes,,,,,,Often,,,,,,Most of the time,Often,,76-99% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,120000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Russia,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Not Useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",45,40,10,5,0,0,,,A bachelor's degree,Internet-based,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Never,,CNNs,"Jupyter notebooks,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Neural Networks,Segmentation",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,Rarely,,,,,,,,25,20,40,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,,,,,,Sometimes,,,,Rarely,,,26-50% of projects,Entirely internal,IT Department,Imagenet,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Bitbucket,Sometimes,6000,,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,SQL,"GitHub,Google Search","Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,,,,,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",5,75,15,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,Sometimes,,,,Often,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,,Sometimes,,,Sometimes,,,,,,,,30,10,15,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,Often,Sometimes,,Most of the time,,,Often,Most of the time,,Most of the time,Most of the time,Often,Sometimes,Sometimes,Most of the time,,,,,Often,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,320000,CNY,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Official documentation,Online courses,Textbook,Tutoring/mentoring",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,,Very useful,,KDnuggets Blog,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Physics,Less than a year,I haven't started working yet,Self-taught,30,60,0,5,5,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,19,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects",,,Not Useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Traditional Workstation",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Other,,"Machine Learning Engineer,Predictive Modeler,Programmer",Work,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Traditional Workstation,0 - 1 hour,Kaggle Competitions,Yes,Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst",University courses,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,Canada,48,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Matlab,Google Search,"College/University,Friends network,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,Very useful,Very useful,,,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",3-5 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Markov Logic Networks,Other (please specify; separate by semi-colon)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,70,0,0,30,0,0,Computer Vision,"Decision Trees - Random Forests,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Female,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,,,,KDnuggets Blog,1-2 years,,Necessary,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,40,60,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",70,20,0,0,0,10,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",Poland,44,Employed full-time,,,No,Yes,Statistician,Fine,Employed by government,R,Regression,R,Google Search,"Blogs,College/University,Friends network,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Not Useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,A humanities discipline,I don't write code to analyze data,"Data Analyst,Statistician",Work,49,5,5,40,1,0,Survival Analysis,Logistic Regression,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Switzerland,29,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Julia,Time Series Analysis,Python,Google Search,"College/University,Company internal community,Kaggle,Official documentation,Personal Projects",,,Very useful,Very useful,,,Very useful,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,5,35,30,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Rarely,Sometimes,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics",,,,Rarely,Sometimes,Most of the time,Sometimes,Sometimes,Often,,,Often,,Sometimes,,Sometimes,,Rarely,Most of the time,Sometimes,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,Most of the time,,,,,20,60,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Privacy issues",,,,,Rarely,,,,,,,Rarely,,,,,Rarely,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,Employed part-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Time Series Analysis,Python,GitHub,"College/University,Friends network,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,Less than a year,Statistician,Self-taught,100,0,0,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,"10,000 or more employees",Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"IBM SPSS Statistics,Minitab,R",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Time Series Analysis",,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,50,20,10,10,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Business Department,Idk,Very big data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,5000000,IDR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Spain,37,"Not employed, but looking for work",,,,,,,,TensorFlow,Link Analysis,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,"Data Elixir Newsletter,Linear Digressions Podcast,Partially Derivative Podcast",1-2 years,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Workstation + Cloud service,2 - 10 hours,Github Portfolio,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +Male,United Kingdom,28,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Not Useful,Somewhat useful,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Biology,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Kenya,31,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,10,5,15,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Financial,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,100GB,"Bayesian Techniques,Decision Trees","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SAP BusinessObjects Predictive Analytics,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau",,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Often,,,,,Most of the time,,,,Often,,Sometimes,,Often,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Simulation,Text Analytics",,,Often,,,,Most of the time,Often,,,,,,Sometimes,,,,Often,,,,,,,,,Often,,Often,,,,,20,40,5,15,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,,Most of the time,,,,,,Often,,,,Often,Most of the time,,,,,,10-25% of projects,More internal than external,Standalone Team,Government provided data,Usually already summarised. Not in its true raw form,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,50000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,South Korea,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,DataRobot,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,0,10,30,0,Computer Vision,Gradient Boosting,No education,Academic,Fewer than 10 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Image data,Sometimes,1GB,CNNs,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,CNNs,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,20,5,5,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Share Drive/SharePoint,,Git,Sometimes,1000,KRW,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Singapore,17,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",0 - 1 hour,,No,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Iran,30,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Company internal community,Kaggle",Very useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,60,10,20,5,5,0,"Speech Recognition,Unsupervised Learning,Other (please specify; separate by semi-colon)","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Other",Most of the time,100GB,"CNNs,GANs,HMMs,Neural Networks,RNNs","Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Perl,Python,Unix shell / awk",,,,,,,,Rarely,,,,,,,,,Rarely,,,,Often,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,GANs,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Text Analytics",,,,Most of the time,,Often,Sometimes,,,,Most of the time,,Most of the time,Often,,Sometimes,,,,Most of the time,Most of the time,,,,Most of the time,Often,,,Sometimes,,,,,20,30,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,Often,,,,,,Most of the time,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Most of the time,18000,GBP,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,Not Useful,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Professional degree,,Less than a year,I haven't started working yet,University courses,50,10,20,15,5,0,Unsupervised Learning,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods","IBM SPSS Modeler,Jupyter notebooks,Python,R,Tableau,TensorFlow",,,,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Rarely,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,Often,,,Most of the time,,,,,,,,Often,,Sometimes,,,,,Most of the time,,Often,,,Most of the time,,,,,,,,40,40,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others",,,,Often,Often,Often,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,iris,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Never,45000,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Switzerland,32,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,KNIME (commercial version),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Friends network,Newsletters,Official documentation,Personal Projects",Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,100,0,0,0,0,0,"Computer Vision,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",High school,Mix of fields,20 to 99 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Python,R,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,Sometimes,,,,Sometimes,,,,,Sometimes,Rarely,,Sometimes,Most of the time,Most of the time,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,Often,,,,Sometimes,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,Sometimes,,Most of the time,,Often,Most of the time,Sometimes,,,,,,,,,,,Sometimes,Sometimes,Often,,Most of the time,,,,,,,,,,,30,40,10,20,0,0,Enough to explain the algorithm to someone non-technical,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,86000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United Kingdom,38,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Friends network,Non-Kaggle online communities,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,"KDnuggets Blog,Linear Digressions Podcast,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Electrical Engineering,,"Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Israel,34,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Kaggle,Online courses,Personal Projects,Other",,,,,Very useful,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,50,0,10,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Important,Other,Laptop or Workstation and private datacenters,Other,Rarely,1GB,"Random Forests,Other","Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,Sometimes,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,25,10,10,25,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",,240000,ILS,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Greece,33,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,Machine Learning Engineer,University courses,20,20,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,100 to 499 employees,,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",,,,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,,Share Drive/SharePoint,,Git,Rarely,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Not Useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,Data Scientist,Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,Fewer than 10 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,1TB,Other,"Amazon Web services,Jupyter notebooks,NoSQL,Python,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Random Forests,Text Analytics",Sometimes,,,,,Often,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,40,20,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,Most of the time,Often,,,,,,,,,,Sometimes,,Most of the time,Often,,Most of the time,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,90000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,Python,Deep learning,Stata,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,,Very useful,,Very useful,Very useful,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Data Miner,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",70,10,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,Government,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Always,1GB,"Bayesian Techniques,Other","Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,20,10,30,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",,Often,,,Often,,,,Most of the time,,,Often,Sometimes,,Often,,,,,,,,76-99% of projects,More external than internal,Other,price; sales; tax; financial statements ,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,3000000,AMD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,Not Useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Not Useful,Somewhat useful,,,,,"Jack's Import AI Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,20,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Other,Relational data,Rarely,100MB,"Decision Trees,HMMs,Random Forests","C/C++,IBM SPSS Statistics,IBM Watson / Waton Analytics,Julia,Jupyter notebooks,NoSQL,Python,R,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,Rarely,Rarely,,,Rarely,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,,,,,,,,,,Rarely,,Sometimes,,,,"A/B Testing,Data Visualization,Decision Trees,HMMs,Logistic Regression,Prescriptive Modeling,Random Forests,RNNs,Time Series Analysis",Rarely,,,,,,Sometimes,Sometimes,,,,,Often,,,Sometimes,,,,,,Sometimes,Sometimes,,Sometimes,,,,,Most of the time,,,,75,10,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization",Most of the time,,,Most of the time,,,,,Often,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,2000000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,,"Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Internet-based,20 to 99 employees,Increased significantly,Less than one year,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Web services,Cloudera,Python,R,SQL",,Most of the time,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",Often,Sometimes,Rarely,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Often,,Often,,Most of the time,,,,Sometimes,Most of the time,Most of the time,Rarely,,,,,,,Sometimes,,,,50,5,0,25,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Rarely,Sometimes,,,,,,,,,,,Rarely,,Rarely,,Rarely,,,76-99% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),,,Bitbucket,,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Pakistan,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,QlikView,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,Very useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Data Scientist,Researcher",University courses,30,10,20,30,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Financial,10 to 19 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,,Gradient Boosted Machines,"Jupyter notebooks,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,Most of the time,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Text Analytics",,,,,,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,Sometimes,,,,,40,25,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,,,Often,Sometimes,,,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,National institute of statics (INE),Cleaning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,40000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Iran,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Mathematica,Neural Nets,Python,Google Search,"College/University,Online courses",,,Very useful,,,,,,,,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,"Data Scientist,Machine Learning Engineer",University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Speech Recognition","Logistic Regression,Neural Networks - GANs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important +Male,Belgium,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Conferences,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Other,Work,10,5,35,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A professional degree,Academic,500 to 999 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Python,R,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,"A/B Testing,Cross-Validation,Logistic Regression,Random Forests",Often,,,,,Often,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,10,35,0,15,40,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Sometimes,,,Sometimes,,,,,,,,,Often,,Often,Most of the time,,51-75% of projects,Do not know,Standalone Team,information R-package; Hillstrom MineThatData,Getting the data,Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,27000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Official documentation,Stack Overflow Q&A",,,,,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,,,,,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Machine Learning Engineer",Self-taught,50,30,0,0,20,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Jupyter notebooks,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Rarely,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,Sometimes,,,,,,,Sometimes,,Often,,,Often,Often,,,Often,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,Most of the time,Rarely,,,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Business Department,,Incompleteness of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,477600,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Other",Self-taught,20,10,60,0,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Most of the time,10TB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Perl,Python,R,SQL,TensorFlow,Unix shell / awk,Other,Other",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,Most of the time,Most of the time,Often,,"CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",,,,Most of the time,,,Often,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,Often,,,Often,,,,,,,,20,20,40,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Sometimes,Often,,,Most of the time,,Often,Often,,,,Sometimes,Sometimes,,Often,Most of the time,,,100% of projects,Entirely internal,IT Department,SSDM,Lack of clear formulation of the problems,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Phabricator,Git,Never,50000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Perfectly,Employed by company that makes advanced analytic software,Python,Deep learning,R,"GitHub,Google Search","Blogs,Conferences,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,,Somewhat useful,,,,,,,Very useful,,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Machine Translation,Speech Recognition,Survival Analysis","Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",High school,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,1TB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Hadoop/Hive/Pig,NoSQL,Python,R,Tableau,TensorFlow",Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,,,Often,Most of the time,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Often,,,Most of the time,Often,,,,0,40,20,25,15,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,Most of the time,,,Often,,,,,,Most of the time,,,,51-75% of projects,More external than internal,IT Department,We mostly work on Client data.,fewer Projects from clients.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,25200,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ireland,26,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,University courses,0,0,0,80,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Pakistan,26,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,10,10,10,0,60,Speech Recognition,Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important +Male,Iran,22,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Friends network,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,Researcher,University courses,35,10,30,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,10 to 19 employees,,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Laptop or Workstation and private datacenters,Other,Rarely,1GB,"Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,Sometimes,,,,Rarely,,Often,Most of the time,,,,20,30,0,25,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Often,Often,,,Sometimes,,,,,,Most of the time,,,,26-50% of projects,More external than internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Rarely,,,,6,,,,,,,,,,,,,,,,,, +Female,France,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Statistician",University courses,15,0,15,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","NoSQL,Python,QlikView,R,SAS Base,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,Often,Most of the time,,,,,Rarely,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,,,,,Often,,Often,,,,,,,,Most of the time,,,,,Most of the time,,Often,,,Most of the time,Often,Often,Often,Sometimes,,,,60,15,0,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database",Often,,,,,,,,,,,,Often,,Often,,,Often,,,,,76-99% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,40000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Germany,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,45,20,10,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Often,Sometimes,Sometimes,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,Sometimes,,,,70,20,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,Most of the time,,,,Sometimes,,,,Sometimes,Often,,,,,Sometimes,,,,51-75% of projects,More internal than external,Business Department,External Scores,We don't know how it was developed so it's hard to tell if it fits our needs,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"44,000",EUR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Female,France,35,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,SQL,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Arxiv,Blogs,Friends network,Newsletters",Very useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,70,15,10,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,,,,,,5,60,20,10,5,0,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Often,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,40000,EUR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Online courses,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,,,,,Very useful,,Very useful,,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Image data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,Unix shell / awk,Other,Other,Other",Rarely,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,Rarely,,Often,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,Rarely,Often,Sometimes,Sometimes,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",,,Often,Often,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Often,Sometimes,Sometimes,,Most of the time,Sometimes,,Sometimes,,Often,,,Often,,Often,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,,,,Sometimes,,,,,,,Often,Often,,,,,,100% of projects,Entirely internal,Standalone Team,Government Open Data; Meteorogical Data Sets; Tourism Information Data sets; TV Channels Guide,Quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,Git,Always,80000,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,Yes,Other,Fine,Employed by non-profit or NGO,Tableau,,Python,"Google Search,I collect my own data (e.g. web-scraping)",Other,,,,,,,,,,,,,,,,,,,"DataTau News Aggregator,Siraj Raval YouTube Channel",< 1 year,,Nice to have,Necessary,,Necessary,Necessary,Necessary,Necessary,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),,Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,35,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAS Base,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,Most of the time,Sometimes,Most of the time,,,,,,Often,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Rarely,Sometimes,,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Rarely,Sometimes,Sometimes,Sometimes,Most of the time,Often,Most of the time,,,,Most of the time,,Often,,Most of the time,,,,Most of the time,Most of the time,,Most of the time,,,Often,,Often,Sometimes,Often,,,,20,40,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Often,Sometimes,,,Often,,,,,Sometimes,,,,,,Often,,Often,,,,,26-50% of projects,Entirely internal,Business Department,Data provided by clients,Size of data and parameter tuning of models,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,880000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Business Analyst,,,TensorFlow,,,,Personal Projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5,10,5,30,50,0,Adversarial Learning,,,Academic,,,,,,,,,,,,Amazon Machine Learning,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,36,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by college or university,MATLAB/Octave,Neural Nets,Matlab,Google Search,"Arxiv,College/University,Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos",Not Useful,,,,Somewhat useful,,,,,,,Not Useful,,,,,,Very useful,Data Stories Podcast,1-2 years,Necessary,,,,Necessary,,,,,,,,,,Traditional Workstation,0 - 1 hour,Online Courses and Certifications,Yes,Professional degree,,Less than a year,"Computer Scientist,Data Analyst,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,20,20,20,20,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,,Very Important +Male,Norway,36,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)",Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,I never declared a major,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,70,0,30,0,0,0,,,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",,,,"Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,NoSQL,Python,SQL",Rarely,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,20,0,0,20,20,40,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,Sometimes,,,Often,Sometimes,,51-75% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,1100000,NOK,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Germany,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Self-employed",Julia,Proprietary Algorithms,R,Other,"Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Data Scientist,Work,60,0,20,10,10,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,10GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Python,R,SAS Base,SAS Enterprise Miner,SQL,TensorFlow",,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Often,Often,,,Most of the time,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Random Forests,Segmentation,Text Analytics",,,,,,Often,Often,Often,Sometimes,,,,,,Rarely,,,,,,,,Sometimes,,,Often,,,Rarely,,,,,70,10,10,0,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,Often,,Sometimes,Often,Sometimes,,Often,,,,,,Sometimes,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Never,220000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,New Zealand,42,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,R,Neural Nets,R,I collect my own data (e.g. web-scraping),"Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,Very useful,Somewhat useful,,,,,Very useful,,Very useful,,,Very useful,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,A health science,6 to 10 years,"Business Analyst,Data Analyst,Operations Research Practitioner",Self-taught,90,8,0,0,2,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Sometimes,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Java,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",Often,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Rarely,Sometimes,Sometimes,Most of the time,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,,Sometimes,,,,,,Sometimes,Often,,,,65,10,5,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,Sometimes,,,,,,,Most of the time,,,,,,,Often,,,,,,,26-50% of projects,Entirely internal,Other,"Weather, 3rd party mapping data ",Query time,Other,Other,"S3, private intranet ",Git,Sometimes,110000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Israel,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Online courses,Podcasts",,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,20,20,10,0,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",Rarely,Often,,,,,,,Often,,,,,,Rarely,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,Sometimes,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Most of the time,,,,Sometimes,Often,Often,Sometimes,Sometimes,,,Sometimes,,Sometimes,Often,Sometimes,,Sometimes,,,Sometimes,,Sometimes,Often,,,,Sometimes,,Sometimes,,,,25,20,30,20,5,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Sometimes,,,,,Sometimes,,,,,,,,Often,Sometimes,,,Sometimes,,26-50% of projects,More internal than external,Other,Demographic data,Getting and cleaning the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other",Other,S3,Git,Rarely,133000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Friends network,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,Somewhat useful,,Not Useful,,,,,Somewhat useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",Work,30,20,20,30,0,0,,Logistic Regression,High school,Other,"10,000 or more employees",Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,<1MB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Microsoft Excel Data Mining,R,SAS Base,SQL,TIBCO Spotfire",,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,,,Often,,,,,Sometimes,,,,,"Association Rules,Data Visualization,Logistic Regression,Segmentation,Simulation",,Often,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,Sometimes,Rarely,,,,,,,10,10,30,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Sometimes,Often,,,Often,,,,Sometimes,,,,,,Most of the time,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1100000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Female,India,38,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Online courses,Textbook,YouTube Videos,Other",Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Other,University courses,25,35,0,25,15,0,Other (please specify; separate by semi-colon),"Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,Some other way,Important,Research that advances the state of the art of machine learning,Traditional Workstation,Relational data,Sometimes,1GB,Neural Networks,"C/C++,Java,MATLAB/Octave,SQL",,,,Often,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Naive Bayes,Neural Networks,RNNs",,Sometimes,,,,Most of the time,,Sometimes,,Often,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,,30,40,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations of tools,Other",,,,,,,,,Often,,,,Most of the time,,,,,,,,,Most of the time,Less than 10% of projects,More external than internal,Standalone Team,"ATVS (Biometrics data), Satellite images ",#NAME?,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,1200000,INR,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Not Useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,IBM SPSS Statistics,Impala,Jupyter notebooks,NoSQL,Python,R,SQL,Unix shell / awk",,,,,Rarely,,,,Often,,,Rarely,,Sometimes,,,Often,,,,,,,,,,Often,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",,,,,,Often,Most of the time,Rarely,,,,Sometimes,,Most of the time,,Sometimes,,,Sometimes,,Most of the time,,,,,Often,,Rarely,Sometimes,,,,,50,15,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Rarely,,Often,,,,,,,,,,Often,,,,,Sometimes,,,76-99% of projects,More internal than external,Other,N/A,Cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Most of the time,45000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Australia,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Non-Kaggle online communities,YouTube Videos",,Very useful,,,Not Useful,,,,Very useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Python,R",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Sometimes,,,,Sometimes,,,Rarely,,,,,,Sometimes,,Often,,,,,,,Often,,,,50,25,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,,,,,,,,,,,,,,,Often,Often,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,120000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Kenya,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Statistician",Self-taught,50,10,20,15,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important +Male,Argentina,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Very useful,,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,,"DataTau News Aggregator,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,I never declared a major,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",0,40,50,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Technology,100 to 499 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Always,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,NoSQL,Python,R,SQL",,Most of the time,,,,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Often,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,40,10,10,30,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,Sometimes,,,,,Most of the time,,,,Often,,,,Most of the time,Often,Often,,100% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Rarely,40000,ARS,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Not Useful,Not Useful,Somewhat useful,Not Useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Not Useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Management information systems,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Canada,42,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by government,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Trade book",,Somewhat useful,,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,,,"DataTau News Aggregator,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,Researcher,University courses,20,10,20,40,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Perl,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,Rarely,Often,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,Often,,,,,Most of the time,,,,,,Most of the time,,,Often,,,,30,20,0,30,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,Often,,,,Most of the time,,Sometimes,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,190000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Romania,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,20,40,5,35,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10MB,Neural Networks,"Amazon Web services,Java,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,Most of the time,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Data Visualization,Neural Networks",,,,,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,5,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Often,,,,,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git,Subversion",Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"GitHub,Google Search","Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,Very useful,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Other,10,45,0,45,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,Don't know,Some other way,Somewhat important,Other,Traditional Workstation,Text data,Don't know,,,"C/C++,Java,MATLAB/Octave,Python,SQL",,,,Often,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,100,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Privacy issues,,,,,,,,,,,,,,,,,Rarely,,,,,,76-99% of projects,Do not know,Standalone Team,,,,Email,,Git,Rarely,7400,MAD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects",,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,Spain,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,15,10,25,20,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Stan,Tableau,TensorFlow",,,,,,,,,Most of the time,,,,,Often,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Most of the time,Most of the time,Sometimes,,Most of the time,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,Segmentation,Time Series Analysis",Often,,Often,,,Most of the time,Most of the time,,Sometimes,,,Most of the time,,,Often,Often,,,,Often,,,,,,Often,,,,Sometimes,,,,60,0.5,10,19.5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data,Other",,Sometimes,,Most of the time,Often,,,,Sometimes,,,,,Often,,,,,Often,,Rarely,Most of the time,51-75% of projects,More internal than external,Business Department,"demographics, gps locations, ",privacy and legal issues,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,55000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Norway,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,Self-taught,50,10,30,10,0,0,Survival Analysis,Logistic Regression,A doctoral degree,Other,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,Regression/Logistic Regression,"Python,QlikView,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,Segmentation,Time Series Analysis",Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Sometimes,560000,NOK,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Spain,41,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,Very useful,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,O'Reilly Data Newsletter,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,Recommendation Engines,"Bayesian Techniques,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,India,28,Employed full-time,,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist",Work,20,10,70,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SAS Enterprise Miner,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,Sometimes,,,Often,,Often,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,Sometimes,,,,,Often,Sometimes,,,,,,Sometimes,Often,Sometimes,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,,Often,,,,30,20,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Often,,,Often,Sometimes,,Most of the time,,,Often,,,,,,Often,Most of the time,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1900000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Denmark,39,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,TensorFlow,Uplift Modeling,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,,"Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",High school,Internet-based,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,10GB,"Bayesian Techniques,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Python,R,TensorFlow",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,HMMs,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Rarely,,Sometimes,,,,Most of the time,Sometimes,,,,,Sometimes,,,,,,Sometimes,Often,,,Sometimes,Sometimes,,,,,Often,Most of the time,,,,20,10,45,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,Often,Most of the time,,,,,Most of the time,,Sometimes,,,,Most of the time,,,76-99% of projects,Do not know,Central Insights Team,None,Authentication ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,120000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Czech Republic,32,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Online courses,Textbook,YouTube Videos",,Very useful,,,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,6 to 10 years,Data Analyst,University courses,40,NA,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Julia,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Stan,Other",,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,Rarely,,,,,,Rarely,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Random Forests,Time Series Analysis",,,Rarely,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,Rarely,,,,,Rarely,,,,,,,Often,,,,25,10,10,25,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,Sometimes,,Often,,,,,Often,,Most of the time,,,Often,Often,,,,100% of projects,Entirely internal,Other,Bloomberg;Thomson Reuters,confidentiality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Rarely,"820,000",CZK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Decision Trees,R,Google Search,"Blogs,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,Not Useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,More than 10 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,10,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,100GB,"Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Collaborative Filtering,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,SVMs",,,,,Rarely,,,,,,,,,Sometimes,,,,,Rarely,,Often,,,Sometimes,,,,Often,,,,,,80,10,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,,Often,,,,,,,,Sometimes,,,,Often,,Less than 10% of projects,More internal than external,IT Department,,I/O and network bottlenecks,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,83000,EUR,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Neural Nets,Python,"Google Search,Government website","Blogs,Conferences,Stack Overflow Q&A,Textbook",,Very useful,,,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,Researcher,Work,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,Regression/Logistic Regression,"IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,Often,Sometimes,,,,,,Often,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,Often,,,Often,,,,60,20,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Often,,76-99% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,"42,000",GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +A different identity,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SAS Base,Survival Analysis,SAS,I collect my own data (e.g. web-scraping),"College/University,YouTube Videos",,,Very useful,,,,,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Pharmaceutical,"10,000 or more employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,100MB,Decision Trees,"Microsoft Excel Data Mining,SAS Base,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,Cross-Validation,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,30,5,15,NA,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,Text files;Raw data;Excel,Graphs,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,455000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,40,0,35,20,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A master's degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,HMMs,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,Sometimes,,Rarely,Most of the time,Most of the time,Rarely,,,,,Rarely,Sometimes,Often,Often,,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,40,15,5,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Often,,,,,,,,,,,,Sometimes,Rarely,,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Subversion,Rarely,55500,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,R,Deep learning,R,,Textbook,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Data Scientist",Work,0,0,70,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Other,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1TB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,SAS Base,SAS Enterprise Miner,Tableau",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,Often,Often,,,,,,Sometimes,,,,,,,"Association Rules,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,,,,,Often,,,,,,,Sometimes,,Most of the time,,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,Often,Sometimes,,,,55,30,5,5,5,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Often,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,IBM Watson / Waton Analytics,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Very useful,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,6-10 years,A tech-specific job board,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Text data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression,RNNs","Google Cloud Compute,NoSQL,Python,Spark / MLlib,Tableau",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,,,Often,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction",,,,,,Often,Often,,,,,,,,,Often,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,45,35,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,Often,Most of the time,,,,,,Often,,,,,,Often,,,Sometimes,Often,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"36,000",EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Poland,41,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Other,,"College/University,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,,,,,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Data Analyst,University courses,30,10,50,10,0,0,Outlier detection (e.g. Fraud detection),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10MB,"Neural Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,"Data Visualization,Neural Networks",,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,10,5,5,40,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Mathematica,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Very useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,Less than a year,"Business Analyst,Data Analyst,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Academic,,,,,Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Other,Most of the time,100MB,"Evolutionary Approaches,Regression/Logistic Regression","Mathematica,Microsoft Excel Data Mining,R,RapidMiner (free version),TensorFlow,Other",,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Neural Networks,Text Analytics,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,Most of the time,,,,40,20,10,15,15,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues",,,Sometimes,,Often,Sometimes,,,,Most of the time,Often,,Often,,Often,,Most of the time,,,,,,51-75% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Bitbucket,Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Engineer,Researcher",Self-taught,50,20,25,0,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Often,,,,Sometimes,Often,,Often,,,,,Often,,Most of the time,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,Sometimes,,,Sometimes,Sometimes,,,,Often,,,Often,,,Sometimes,Most of the time,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,58000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"College/University,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,,1 to 2 years,Engineer,University courses,50,30,0,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,,Not important,Very Important +Male,India,22,"Not employed, but looking for work",,,,,,,,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Association Rules,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation,Stack Overflow Q&A",,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",70,10,0,10,10,0,"Time Series,Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Somewhat important,Very Important,,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important +Male,Hungary,22,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,24,Employed full-time,,,No,Yes,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,,University courses,10,50,15,25,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,India,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,,Very useful,,< 1 year,,,,,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"DataCamp,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,France,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,"Jack's Import AI Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,10,10,15,50,15,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data,Text data,Relational data",,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,Often,,Most of the time,Most of the time,,,,,,,Often,,,,,Often,Often,Often,,,,,,,Sometimes,Often,,,,,50,25,0,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,Often,,,,,,,,,Most of the time,,100% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,18000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Japan,67,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Trade book",,,,,,Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,Data Scientist,Work,30,0,60,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",High school,Mix of fields,Fewer than 10 employees,Stayed the same,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,Often,,,,,,,,,,Rarely,,,,,,,"Bayesian Techniques,Cross-Validation,Lift Analysis,Logistic Regression,Naive Bayes,Segmentation,Simulation,Time Series Analysis",,,Sometimes,,,Sometimes,,,,,,,,,Sometimes,Often,,Sometimes,,,,,,,,Often,Sometimes,,,Sometimes,,,,60,10,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Often,,,,,,,Often,,,,,,,,Often,,,10-25% of projects,Do not know,Standalone Team,None,Noise in data,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,,Most of the time,5000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,,1-2 years,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,NoSQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Operations Research Practitioner,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,20,5,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,TensorFlow",,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,Most of the time,,,,,Rarely,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Rarely,,,,Sometimes,,Most of the time,Sometimes,,,,,,Often,,Sometimes,,,Often,Often,Often,Most of the time,Often,,,Sometimes,,,Often,Sometimes,,,,20,50,15,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Sometimes,Most of the time,,,Most of the time,,,,,,Often,Most of the time,,,,,Often,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Rarely,660000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,Python,Monte Carlo Methods,R,"Google Search,Government website","Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Engineer,Self-taught,35,55,10,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Government,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,Bayesian Techniques,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Time Series Analysis",,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,33,33,18,16,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization",,Sometimes,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,100000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Sweden,42,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Kaggle,Personal Projects,Textbook",Somewhat useful,,,,,,Very useful,,,,,Very useful,,,Very useful,,,,Data Elixir Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Computer Scientist,Data Scientist,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",95,0,0,0,5,0,Time Series,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Ukraine,25,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,"Coursera,edX,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",3,92,0,0,0,5,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,Germany,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst,Other",Work,30,40,20,0,10,0,,,A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Regression/Logistic Regression,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,QlikView,R,SQL,Tableau",,Rarely,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,,,,,Often,Sometimes,Rarely,,,,,,,,,Most of the time,,,Often,,,,,,,"Collaborative Filtering,Logistic Regression,Segmentation",,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,70,0,0,0,30,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,,Often,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,65000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Switzerland,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,,The Data Skeptic Podcast,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,20,15,20,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Ukraine,35,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,,Not Useful,Very useful,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,FastML Blog",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,70,0,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Not important +Male,United Kingdom,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,Very useful,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Work,25,15,45,10,5,0,"Natural Language Processing,Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",Sometimes,,Rarely,,,Often,Most of the time,,,,,,,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,Sometimes,,,,50,5,5,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",,Often,Sometimes,,Most of the time,,,Often,,,,,,,Often,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,"Hand-entered, typos","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Rarely,450000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Never,100MB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Often,,,,,,Most of the time,,Most of the time,,,,Often,Most of the time,,Often,,,,Most of the time,,,Sometimes,,,,15,0,0,25,60,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Sometimes,Often,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,,Share Drive/SharePoint,,,Rarely,85000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Tutoring/mentoring",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Scientist,Engineer,Researcher",University courses,0,20,30,50,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,Random Forests","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,KNIME (free version),NoSQL,Python,QlikView,R,SQL,Tableau",,Most of the time,,Rarely,,,,,Rarely,,,,,,Often,,,,Often,,,,,,,,Rarely,,,,Often,Rarely,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Sometimes,,Often,,,Most of the time,Most of the time,,,,,,,,,,,,,,Often,,Often,Often,,,,Often,,Often,,,,50,30,5,10,5,0,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Often,Rarely,,76-99% of projects,More internal than external,Central Insights Team,cliente data; adserver data,"attribution model, budget optimization","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Commercial Data Platform,,Git,Sometimes,50000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,27,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,Somewhat useful,No Free Hunch Blog,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,10,10,0,70,10,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Female,Turkey,27,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by government,R,Deep learning,Python,Google Search,"College/University,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,Very useful,Very useful,,,,,Somewhat useful,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,0,70,0,0,Natural Language Processing,"Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,Rarely,10GB,"Bayesian Techniques,HMMs,Neural Networks,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,Other,Other",,,,Sometimes,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Often,Most of the time,,"HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,Often,Sometimes,,,,Often,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,20,40,10,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,Most of the time,,,,,,Sometimes,Sometimes,,,,Often,,10-25% of projects,Entirely internal,Other,"MNIST, Reuters, Standford text dataset",too big to store,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,40000,TRY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Turkey,21,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important +Male,Other,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Very useful,,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important +A different identity,Hungary,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,DataRobot,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)",College/University,,,Very useful,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,,,,,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Simulation",,Rarely,Sometimes,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,Most of the time,Sometimes,,,Often,,,,Most of the time,,,,,,,15,15,40,10,20,0,,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,,Sometimes,,Often,,,,Often,,,,Sometimes,,,,,,,,Most of the time,,26-50% of projects,More external than internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Other",Github,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Iran,26,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Deep learning,Matlab,GitHub,"College/University,Friends network",,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,PhD,Yes,I prefer not to answer,A social science,,Data Analyst,University courses,NA,NA,NA,NA,NA,NA,Recommendation Engines,Hidden Markov Models HMMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Russia,27,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Regression,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle",,Somewhat useful,,,,,Very useful,,,,,,,,,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,60,20,0,0,20,0,"Recommendation Engines,Unsupervised Learning",,A master's degree,Technology,10 to 19 employees,,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1TB,,"Cloudera,Java,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"A/B Testing,Data Visualization,Simulation",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Often,,,,,,,40,20,20,20,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,,,,Often,,Often,,Most of the time,,,Most of the time,,,,,,,10-25% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Rarely,2500000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Rule Induction,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Stack Overflow Q&A",Very useful,,Very useful,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,20,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Other,Sometimes,10MB,Neural Networks,"MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Often,Sometimes,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,Sometimes,,,,,,35,35,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,36000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Italy,47,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,C/C++,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,"Data Scientist,Researcher,Statistician",Self-taught,80,10,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,Rarely,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Random Forests,Regression/Logistic Regression","C/C++,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Sometimes,Sometimes,Often,Often,,,Most of the time,,Often,,,,,Most of the time,,Sometimes,,,,,,,,,,,20,40,0,30,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources",,Sometimes,,,,Often,,,,Most of the time,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"45,000",EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,Kaggle competitions,0,40,0,20,40,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Netherlands,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Online courses,Personal Projects",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,60,0,40,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Insurance,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Text Analytics",Sometimes,,,,,,Often,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,Sometimes,,,,,10,20,20,30,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,Often,,,,,,Often,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Sometimes,90000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Other,17,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,,,Necessary,,Necessary,,Necessary,,,,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,,,,Somewhat important,,,Somewhat important,Very Important +Male,Australia,33,Employed part-time,,,Yes,,Other,Poorly,Employed by college or university,Cloudera,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Programmer,Researcher,Other",University courses,40,0,10,50,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Never,100MB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,Rarely,Rarely,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Rarely,,Sometimes,,,,Sometimes,,Often,,,Rarely,Sometimes,Sometimes,,Often,Rarely,,,,,Sometimes,Rarely,,,,20,20,0,20,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Rarely,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,76-99% of projects,More external than internal,Standalone Team,,dirty,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",Git,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,85000,AUD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Israel,45,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,100GB,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,SVMs,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,40,60,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Often,Most of the time,,,,Most of the time,,Most of the time,,,,,,Often,,,,,,,Most of the time,,26-50% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Newsletters,Textbook,YouTube Videos",Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Java,Jupyter notebooks,NoSQL,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Natural Language Processing,Text Analytics",,,,,,Often,,Sometimes,Sometimes,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,50,20,20,5,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input",Sometimes,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,common crawl; dbpedia,inconsistency of results by human analysts,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,"Windows shares, WWW servers",Git,Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Blogs,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Engineer,Researcher",Self-taught,50,30,0,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",,Mix of fields,I don't know,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Rarely,10GB,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation",,,,Most of the time,,Often,,,,,,,,Often,,,,,,Often,Often,,,,Most of the time,Sometimes,,,,,,,,35,35,5,25,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,Sometimes,,Most of the time,,,Most of the time,,,,,Sometimes,Often,,Often,,Often,Often,Often,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",,"36,000",EUR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Poland,29,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Engineer,Machine Learning Engineer,Programmer",Self-taught,40,30,20,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Retail,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,"N/A, I did not receive any formal education",Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks","Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,Rarely,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Other",Rarely,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,Often,,Often,,,Sometimes,Rarely,Sometimes,,Sometimes,,,Often,,Rarely,Often,,Rarely,,,35,20,30,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,10-25% of projects,Entirely internal,IT Department,,Testong and monitoring,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Never,2000000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,31,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,Other,"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,75,0,0,15,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,Indonesia,22,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,Text Mining,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,"Data Elixir Newsletter,Data Stories Podcast,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,Laptop or Workstation and local IT supported servers,0 - 1 hour,Github Portfolio,Yes,Bachelor's degree,Management information systems,Less than a year,"Software Developer/Software Engineer,I haven't started working yet",University courses,20,0,0,60,20,0,"Speech Recognition,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - GANs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,,,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher",University courses,30,20,20,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Financial,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,,,Sometimes,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,Rarely,,,Most of the time,,,Sometimes,,Most of the time,,Most of the time,Often,,,,,Sometimes,Most of the time,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,Sometimes,Often,,,,,Most of the time,,Often,,Often,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,54000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Deep learning,Scala,I collect my own data (e.g. web-scraping),"Personal Projects,Textbook",,,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Data Scientist,University courses,20,0,20,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1GB,"Bayesian Techniques,Ensemble Methods,HMMs,SVMs","Amazon Web services,Python,Spark / MLlib",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,SVMs,Time Series Analysis",,,,,Often,Most of the time,Most of the time,,Often,,,,Often,Often,,,,,,,Sometimes,,,Sometimes,,Sometimes,,Often,,Sometimes,,,,15,15,10,10,25,25,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Often,Often,,,,,,,,,,Often,,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,44000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Netherlands,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Online courses,Personal Projects,Textbook",,,Somewhat useful,,Somewhat useful,,,,,,Very useful,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,20,20,30,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Tableau,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,Sometimes,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,Most of the time,,,,,,,,Often,,,Often,,,,Often,,,Most of the time,,Often,Sometimes,Often,,,,40,15,15,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,Often,Most of the time,Often,,Most of the time,,,Often,,,Often,,,,,,Sometimes,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,"Git,Other",Never,52000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Tutoring/mentoring",,Very useful,,,,,Very useful,,,,Very useful,,,,,,Very useful,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,50,20,10,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Pharmaceutical,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,1MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SAS Base,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,Often,,Often,,,,,Most of the time,,,,,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,,,,,Often,Most of the time,Often,Sometimes,,,,,,,Sometimes,,,,,Sometimes,Most of the time,Often,,,Often,Most of the time,,,Sometimes,,,,50,10,0,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Sometimes,,Often,Rarely,,,,,Sometimes,,,Sometimes,,,Most of the time,,,Sometimes,Sometimes,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,78000,INR,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,Hungary,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Java,Deep learning,Scala,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Online courses,Personal Projects",,,Very useful,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,20,10,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Government,20 to 99 employees,Decreased slightly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,SQL,Tableau,TIBCO Spotfire",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,Often,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,,,Most of the time,,,,Often,,,Often,,,,,,,Most of the time,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization",Sometimes,Sometimes,,,Often,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",Sometimes,30000,USD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Female,Malaysia,27,Employed part-time,,,Yes,,Programmer,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,C/C++/C#,Google Search,"Blogs,College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer",Self-taught,100,0,0,0,0,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Relational data,Most of the time,10MB,"CNNs,Decision Trees,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","MATLAB/Octave,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Sometimes,Sometimes,Sometimes,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Most of the time,Often,,Rarely,,,,,,,,Sometimes,,Often,Often,Often,,,,,,,Rarely,Sometimes,,,,20,60,5,10,5,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Most of the time,,,Often,,Often,,,Often,,,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,UCI,"Guidance is missing, it is hit and trial.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,10000,MYR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Python,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,"Data Stories Podcast,Linear Digressions Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),,Experience from work in a company related to ML,No,Doctoral degree,Physics,1 to 2 years,"Data Analyst,Engineer,Researcher",Kaggle competitions,25,25,0,0,50,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,41,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,R,Social Network Analysis,Python,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",5-10 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Computer Science,3 to 5 years,Researcher,Self-taught,70,0,20,10,0,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,France,37,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Java,MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,Rarely,,,,,,Rarely,,Sometimes,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,Sometimes,Often,,,Most of the time,,Often,Sometimes,,,Often,,Often,,Most of the time,,Most of the time,Sometimes,Often,Most of the time,,Most of the time,,,,,,Most of the time,,,,,5,25,10,10,50,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Other",,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,Often,26-50% of projects,Approximately half internal and half external,Standalone Team,biological database: Uniprot; KEGG; Wikipathways; Genecards; Pubmed,"compressing statistical results in a biologically significant way, and statistics/ML are not up to it already. Also, accessing data allowing to interpret some statistical results (bibliography, which is not available, not free, or non existent)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Git,Sometimes,36000,EUR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Nigeria,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,60,0,40,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Python,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,Data Analyst,Self-taught,40,20,30,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,Sometimes,10GB,Neural Networks,"Jupyter notebooks,MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,"Neural Networks,Simulation",,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,20,30,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,Most of the time,,,,,Often,,,,,,,,Most of the time,,Most of the time,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,Somewhat useful,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Researcher,Statistician",University courses,15,0,25,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,Most of the time,,,,25,25,0,30,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,Rarely,,Often,,,,Often,,,,,,Often,,Often,,,Sometimes,Often,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,"45,000",EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,France,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics,Self-employed",Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,Not Useful,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,35,5,0,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,,,,"Oracle Data Mining/ Oracle R Enterprise,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression",,,,,,Most of the time,Sometimes,Often,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,30,40,15,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,Often,,,,,,Sometimes,Often,Often,,Less than 10% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Poland,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,GitHub,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",Self-taught,50,20,20,0,10,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Hospitality/Entertainment/Sports,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Image data,Text data",Most of the time,100MB,Neural Networks,"Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,RapidMiner (free version),SAS Enterprise Miner,TensorFlow",,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,Often,,Rarely,,,,Rarely,,,,,,,Most of the time,,,,,,"A/B Testing,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,,,Often,,,,,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,10,50,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",,60000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Italy,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Online courses,Stack Overflow Q&A,Textbook",Not Useful,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,edX,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,PhD,No,Master's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Ukraine,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Newsletters,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,,,,"Data Machina Newsletter,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",25,5,70,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Gradient Boosted Machines,Random Forests","Google Cloud Compute,Jupyter notebooks,Python,R,Spark / MLlib",,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,"Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,,,,,,,,,,Sometimes,,,,,Often,,Often,,,,,,,,,,,35,20,30,5,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,Git,Rarely,45000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,DataRobot,,Python,GitHub,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,"Computer Vision,Survival Analysis,Time Series,Unsupervised Learning",Neural Networks - GANs,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,Very useful,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"Data Stories Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,Software Developer/Software Engineer,Self-taught,60,20,5,15,0,0,"Computer Vision,Reinforcement learning","Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation",,,,Most of the time,,Often,Often,,,,,,,,,Often,,,,Most of the time,Sometimes,,,,Often,Sometimes,Often,,,,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,51-75% of projects,Entirely external,Other,None,Preparing data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Russia,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Time Series Analysis,Stata,University/Non-profit research group websites,"College/University,Kaggle,Podcasts,Textbook",,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Yes,Bachelor's degree,A social science,,"Data Analyst,Data Miner,Researcher",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,Ukraine,0,Employed full-time,,,No,Yes,Programmer,,Employed by government,Python,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"Data Stories Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Professional degree,,1 to 2 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,30,30,10,0,30,0,"Computer Vision,Machine Translation,Natural Language Processing,Time Series",Logistic Regression,,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,India,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",35,35,15,5,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,10MB,"CNNs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,"A/B Testing,CNNs,Cross-Validation,Natural Language Processing,Neural Networks,RNNs",Often,,,Often,,Most of the time,,,,,,,,,,,,,Most of the time,Often,,,,,Most of the time,,,,,,,,,40,20,30,9,1,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,,,,,,,,Often,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,wiki database,data collection and data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,650000,INR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Decision Trees,Python,Other,"College/University,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Evolutionary Approaches,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,,Python,Other,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Sometimes,Most of the time,Often,,,,,,,,Sometimes,,,,Often,Often,,Often,,,Sometimes,,,,Often,,,,30,50,NA,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,100% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Never,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,18,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Other,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,20,0,0,10,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Gradient Boosted Machines,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,18,"Not employed, but looking for work",,,,,,,,TensorFlow,Text Mining,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,40,40,0,10,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Very useful,"Jack's Import AI Newsletter,Talking Machines Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,A health science,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,70,5,0,15,0,Adversarial Learning,"Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Germany,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Textbook,Tutoring/mentoring",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,,,,Very useful,,Very useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,1 to 2 years,Researcher,Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Mix of fields,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,1GB,Gradient Boosted Machines,"Amazon Machine Learning,Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,Other",Rarely,Often,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,,,,,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation",,,Often,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Most of the time,,Most of the time,,Often,,Often,,,Often,,Often,Sometimes,,,Sometimes,,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,,,,,,,,,,Most of the time,,,,,Often,Often,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,75000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,37,Employed full-time,,,Yes,,Other,Poorly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Bayesian Methods,Scala,Other,"YouTube Videos,Other",,,,,,,,,,,,,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher",Self-taught,50,20,20,0,0,10,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Neural Networks - GANs,Other (please specify; separate by semi-colon)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,100MB,"Decision Trees,Neural Networks","MATLAB/Octave,Python,SQL,Other,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,Sometimes,Most of the time,,"Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Simulation,Time Series Analysis",,,,,,,Sometimes,Often,,,,,,Sometimes,,,,,,Most of the time,,,,,,,Most of the time,,,Most of the time,,,,20,10,30,10,20,10,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,client data,unclean data and lack of understanding of the vertical's domain knowledge,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Box,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Most of the time,65000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,73,Retired,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,R,Proprietary Algorithms,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,Somewhat useful,,,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Engineer,Other",Self-taught,40,30,30,0,0,0,Time Series,Other (please specify; separate by semi-colon),A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,R,Survival Analysis,Python,University/Non-profit research group websites,"College/University,Conferences,Online courses,YouTube Videos",,,Very useful,,Very useful,,,,,,Somewhat useful,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,,Self-taught,40,0,40,20,0,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,High school,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,Some other way,Very important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","Java,Tableau",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,"Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks",,,,,,,,Often,Often,,,,,Often,,,,Often,,Often,,,,,,,,,,,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Limitations of tools,Privacy issues,Scaling data science solution up to full database",,,,Sometimes,Sometimes,,,,,,,,Often,,,,Often,Often,,,,,26-50% of projects,More internal than external,Standalone Team,UCI ML datasets for research,Gaining insights from data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,INR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Weka,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,,,Very useful,Very useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,France,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Friends network,Stack Overflow Q&A",,,,,,Somewhat useful,,,,,,,,Very useful,,,,,DataTau News Aggregator,1-2 years,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,PhD,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Male,Indonesia,41,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,R,Decision Trees,R,University/Non-profit research group websites,"College/University,Friends network,Official documentation,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,Very useful,,,,Somewhat useful,Very useful,,,,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,"DataCamp,Udacity",Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Doctoral degree,Engineering (non-computer focused),I don't write code to analyze data,Researcher,University courses,0,10,0,90,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,South Africa,43,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Statistician",University courses,10,20,30,30,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Telecommunications,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,10GB,"CNNs,Gradient Boosted Machines,Neural Networks,RNNs","Amazon Web services,Cloudera,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,Rarely,,,Most of the time,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Rarely,,,,Most of the time,,,,,,"Gradient Boosted Machines,Lift Analysis,Natural Language Processing,Neural Networks,Recommender Systems,RNNs",,,,,,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,Most of the time,,,,Often,Most of the time,,,,,,,,,40,40,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,Sometimes,,,,,,,,Often,,,,Most of the time,,,Sometimes,,10-25% of projects,Entirely internal,Business Department,,Volume and velocity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,892000,ZAR,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Switzerland,31,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Online courses",,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher",University courses,30,0,20,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,Stan,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,Sometimes,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Most of the time,,,Often,Most of the time,,Sometimes,,,,,,,Sometimes,,Sometimes,,,Sometimes,,Rarely,,,,Often,,,Sometimes,,,,5,70,0,5,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,Often,,,,,,,,Sometimes,,,,,,Often,Sometimes,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,,"Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,80000,CHF,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,Other,Self-taught,60,5,15,0,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,Fewer than 10 employees,,,"A friend, family member, or former colleague told me",Not very important,Other,Other,Relational data,Sometimes,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,Often,,,Most of the time,Most of the time,Most of the time,Often,,,,,Most of the time,,Most of the time,,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,,,45,10,0,25,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,100% of projects,Do not know,Standalone Team,,"poor phenotyping of patients. Missing data, batch effects, poor experimental design (see batch effects)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,27000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Netherlands,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts",,,,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,30,30,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Always,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,Rarely,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Text Analytics,Time Series Analysis",Rarely,,,,,,Most of the time,Often,,,,,,,,Often,,Often,Most of the time,,,Most of the time,,,,,,,Most of the time,Most of the time,,,,45,10,20,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,Most of the time,,,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,Rarely,,,,Rarely,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,36000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Denmark,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,,,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,6 to 10 years,"Researcher,Other",University courses,0,0,20,80,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Never,1TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),Python,R,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,Often,,,,Often,,,,Often,,Rarely,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,Often,Often,Often,,,,Most of the time,,,Often,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,Sometimes,,Often,,,,Sometimes,,100% of projects,Entirely external,Other,Prefer not to say,Transform from 3NF to a form that can be used for analysis,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,"110,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Tutoring/mentoring,Other",,,,,,,Very useful,,,,Very useful,,,,,,Very useful,,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,,,Necessary,,,Necessary,,,Necessary,,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,,,,,,,,,,,,,,, +Female,Ireland,48,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Regression,R,University/Non-profit research group websites,"College/University,Friends network",,,Somewhat useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,Other,University courses,0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,20,70,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,,,,Rarely,,,,,,,,Often,,,Most of the time,,,None,Entirely internal,Business Department,,Access,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Subversion,Never,74000,EUR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,23,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,Very useful,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,"Data Stories Podcast,Linear Digressions Podcast,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,,Less than a year,Other,University courses,50,30,0,20,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Philippines,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,Less than a year,,Self-taught,90,10,0,0,0,0,,,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Italy,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Other,Self-taught,35,10,20,35,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Academic,100 to 499 employees,Increased slightly,Don't know,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,,60,15,0,7,18,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,,,,,Sometimes,Sometimes,,76-99% of projects,More internal than external,Other,"Open Street Map, Wikipedia, Open data",Finding an interesting research question,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Rarely,,,Other,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Researcher",University courses,25,10,40,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"CNNs,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,IBM SPSS Statistics,Jupyter notebooks,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,TensorFlow,Unix shell / awk",,Often,,Often,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,Rarely,Most of the time,,,Often,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,Sometimes,Most of the time,Often,Sometimes,Most of the time,,,Often,,,Most of the time,Rarely,,,,25,40,5,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues",,Often,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,100% of projects,More internal than external,Other,Too many to list,Data availability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Most of the time,70000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Italy,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,R,Decision Trees,SQL,Google Search,"Online courses,Other",,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,25,25,50,0,0,0,,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,Don't know,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,,Decision Trees,"Java,KNIME (free version),Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,Often,,,,Rarely,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,50,5,5,40,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Sometimes,,,Sometimes,,,,,,,,,,,Most of the time,,,26-50% of projects,,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Germany,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites,Other","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Researcher,University courses,90,0,10,0,0,0,"Reinforcement learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches",A master's degree,Other,,,,,Somewhat important,,Basic laptop (Macbook),Relational data,Never,,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Other","Java,Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Simulation,Time Series Analysis",,,Most of the time,,,Often,Most of the time,,Often,Most of the time,,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,10,40,0,50,0,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,26-50% of projects,Do not know,IT Department,,,,,,,,,EUR,,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,37,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,30,0,30,10,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data",Never,100MB,"CNNs,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,,Most of the time,,Most of the time,,Rarely,Sometimes,Sometimes,,,,,,Sometimes,,,Rarely,Most of the time,,,Sometimes,,,,,,Rarely,,,,,30,40,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Often,Most of the time,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Subversion",Sometimes,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Other,19,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,10,10,0,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs",High school,Academic,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Text Analytics",,,,,,Most of the time,Often,,,,,,,,,Most of the time,,,Most of the time,Often,,,,,,,,,Often,,,,,35,45,5,10,5,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,,Sometimes,Most of the time,,,,,,Often,,,Sometimes,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,Dirty data and imprecise labeling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Always,30000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Taiwan,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,GitHub,Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,"Information technology, networking, or system administration",,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Russia,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Official documentation,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,Other,University courses,0,10,50,40,0,0,"Computer Vision,Machine Translation,Speech Recognition","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"1,000 to 4,999 employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,Never,100GB,"Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Data Visualization,Neural Networks,RNNs",,,,,,,Often,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,20,35,30,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Sometimes,,,,,Most of the time,Sometimes,,,,,,,,,,Sometimes,,100% of projects,More external than internal,IT Department,Tedlium; free spoken digit dataset,"Noisy, spontaneous speech","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,720000,RUB,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Online courses,Personal Projects",,,Not Useful,,,,,,,,Very useful,Very useful,,,,,,,,< 1 year,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,Less than a year,"Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",Russia,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,C/C++,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Podcasts,Textbook",Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,40,20,20,0,20,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,,,,,"N/A, I did not receive any formal education",Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk,Other",,,,Sometimes,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,Rarely,,Often,Often,,,"Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction",,,,,,,,,,,,Often,,Sometimes,,,,,Often,Sometimes,Often,,,,,,,,,,,,,40,25,20,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of funds to buy useful datasets from external sources,Privacy issues",,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,Less than 10% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Bitbucket,Mercurial",Most of the time,45000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Portugal,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,30,20,0,20,10,"Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,SQL",,,,,,,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,Often,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Recommender Systems,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,Often,,,,55,10,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Organization is small and cannot afford a data science team,Privacy issues",,,Sometimes,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,"20,000",EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Belgium,24,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,University courses,25,15,10,50,0,0,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Pharmaceutical,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Rarely,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python,Tableau,TensorFlow,Unix shell / awk",Sometimes,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,Rarely,Most of the time,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Simulation,Text Analytics",,,Rarely,,,Often,Most of the time,,,,,,,,,Often,,,,Often,Sometimes,,,Sometimes,,,Sometimes,,Often,,,,,10,10,5,15,5,55,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Sometimes,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,Often,,,,Often,Often,,51-75% of projects,More external than internal,Other,PubMed,Getting the right source data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"24,000",EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Italy,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Personal Projects,Trade book,YouTube Videos",Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,,Very useful,,Very useful,"FlowingData Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Engineer,Researcher",Self-taught,70,4,25,0,1,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Government,"10,000 or more employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,Often,,Rarely,Often,Most of the time,,,Sometimes,Rarely,,Sometimes,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Rarely,Often,,,,Sometimes,Rarely,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,,,,,Often,Often,Sometimes,,Sometimes,,Sometimes,Most of the time,,Most of the time,,,Most of the time,,Sometimes,Often,Often,,,,50,15,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,Often,,,,,,Sometimes,,,,,,Most of the time,,,76-99% of projects,Approximately half internal and half external,Standalone Team,prefer not to say,Database formats & technicalities,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Power BI,Generic non-cloud file sharing software (Email/Shared Server/etc.),,108000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,University/Non-profit research group websites,"College/University,Newsletters,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,Somewhat useful,Somewhat useful,KDnuggets Blog,5-10 years,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",University courses,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important +Male,France,39,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Very useful,Very useful,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst,Statistician",University courses,5,10,20,40,5,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,16-20,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Female,India,31,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,KNIME (free version),Genetic & Evolutionary Algorithms,Python,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,Necessary,Unnecessary,Nice to have,Necessary,,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important +Male,Russia,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",0 - 1 hour,Kaggle Competitions,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Reinforcement learning,,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Java,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Unsupervised Learning,Logistic Regression,"Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,R,SQL",,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Logistic Regression,Naive Bayes,Text Analytics",Often,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,Often,,,,,0,0,0,0,0,100,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,Often,,,,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)","Company Developed Platform,Other",kafka,Git,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Other,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,0,30,20,0,,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important +Male,India,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,,GitHub,"Blogs,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,50,0,0,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,25,Employed full-time,,,Yes,,Data Miner,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,Somewhat useful,,,Very useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Telecommunications,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Always,10TB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Neural Networks,Random Forests,RNNs,SVMs","DataRobot,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (commercial version),KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,Often,,,Rarely,,,,,,,,Often,Sometimes,Sometimes,,Rarely,Sometimes,,,Most of the time,,Sometimes,,,,Most of the time,Sometimes,Sometimes,Most of the time,Most of the time,,,,,,Often,Most of the time,,,Often,Most of the time,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",Often,,Often,Often,Often,Often,Most of the time,Most of the time,,,,,,Often,,,,Often,Most of the time,Most of the time,,,Most of the time,Most of the time,,Often,,Often,Most of the time,,,,,15,40,40,2,3,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,Often,Most of the time,Most of the time,,Often,Sometimes,,,Often,Sometimes,,Most of the time,Most of the time,,Often,Most of the time,,,,51-75% of projects,Entirely internal,Standalone Team,"Breast Cancer Data; Parkinson Data; Balance Scale Data, Self-Created Data with Web-Crawler(I collect); Dry Cleaning Customer's Data","My Company's Data. Data include text values, missing values and dirty values. In addition, data size is very big and I write python and python slow with sql server. In Turkey, I say we must to change DB Systems to big data systems but anyone has enough skills for this including me.","Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Cloud Storage or CSV files,"Bitbucket,Git,Subversion",Rarely,57000,TRY,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",University courses,60,5,20,10,5,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Don't know,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Association Rules,Data Visualization,Natural Language Processing,Text Analytics",,Most of the time,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,50,10,5,15,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,,Most of the time,,Often,,,,Most of the time,,Often,Often,Most of the time,,100% of projects,Entirely external,Business Department,,"Data Privacy,infra acessing,processing data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,,,,Nice to have,Nice to have,,Nice to have,,,,,,"DataCamp,Udacity,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,45,25,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important +Male,Germany,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Online courses,YouTube Videos",Very useful,,,,,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,0,0,0,50,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Pharmaceutical,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,,"Amazon Web services,Java,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,Rarely,,,,,,,"Decision Trees,Logistic Regression,RNNs",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,40,10,50,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",Most of the time,,,,Often,,,Often,,,,,,Often,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Bitbucket,,100000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,India,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,Google Search,"Blogs,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Management information systems,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs,Other","Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Segmentation,SVMs",,Often,,,Often,Often,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Often,,Often,,Sometimes,,,,,,Often,,Often,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,I prefer not to say,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,,Often,,,Often,Often,,,,,,,,,Most of the time,,,10-25% of projects,More internal than external,IT Department,None,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,2000000,INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Social Network Analysis,R,"Google Search,I collect my own data (e.g. web-scraping),Other","Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",25,20,0,0,5,50,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SQL,Unix shell / awk",Rarely,Most of the time,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,Often,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Sometimes,,,,,,,Most of the time,,,Most of the time,,Often,Often,Often,,,,45,15,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,Sometimes,Most of the time,Often,,,Most of the time,,,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,100% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,165000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Australia,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,,,Stan,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Personal Projects,Stack Overflow Q&A,Other",,Very useful,,,,Somewhat useful,,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Work,20,0,25,5,25,25,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Financial,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Decision Trees,Logistic Regression,Segmentation,Text Analytics",,,,,,,,Often,,,,,,,,Often,,,,,,,,,,Sometimes,,,Sometimes,,,,,93,2,1,2,2,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,Sometimes,,,Sometimes,,,,Often,Sometimes,Sometimes,,,,Often,Sometimes,,,76-99% of projects,More external than internal,Business Department,,"Lack of understanding of intention behind warehoused data structures. Integrating new, legacy and external data sources.",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,250000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,78,22,0,0,0,Time Series,,A bachelor's degree,Retail,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Basic laptop (Macbook),Text data,,,Other,"Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,20,30,5,45,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Rarely,450000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Bayesian Methods,R,Google Search,"College/University,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,Somewhat useful,,Very useful,,,Very useful,Very useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,Taiwan,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Trade book,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,500 to 999 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Java,Python,R,SQL",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,Often,,,,Often,Often,Often,,,,,,Sometimes,,Often,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,70,10,10,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,Sometimes,,,,,,,,,,,Most of the time,,,Most of the time,,,10-25% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Never,1200000,TWD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,France,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Neural Nets,Python,University/Non-profit research group websites,"College/University,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Programmer,Researcher",University courses,5,0,15,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Technology,20 to 99 employees,Increased significantly,6-10 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees","Amazon Web services,Jupyter notebooks,Python,SQL",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Naive Bayes,Simulation",Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,40,30,5,0,5,20,Enough to refine and innovate on the algorithm,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,Less than 10% of projects,Entirely internal,IT Department,,Understanding and integrating the specific domain knowledge that come with the data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Never,47000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,65,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook",,,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,,3 to 5 years,Researcher,Self-taught,50,50,0,0,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Military/Security,"10,000 or more employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Java,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau,TensorFlow",Sometimes,Often,,,,,,,,,,,,,Sometimes,Often,Most of the time,,,,Sometimes,,Sometimes,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,Often,,,Sometimes,Often,Sometimes,,,,,Sometimes,,Sometimes,Often,Sometimes,Often,Often,Sometimes,,Often,Often,Often,,,Often,,Often,Often,,,,50,10,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Sometimes,Often,Often,,Sometimes,Most of the time,Often,Most of the time,,,Often,,,Sometimes,,,Most of the time,Most of the time,,26-50% of projects,Approximately half internal and half external,Central Insights Team,twitter;facebook;loyds;internet,Regulations governing the use of all source data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Never,"175,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,University courses,20,0,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A bachelor's degree,Other,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Other,Traditional Workstation,"Image data,Text data",,1GB,Other,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,Other,"I don't typically share data,Share Drive/SharePoint",,Other,Always,725000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Hungary,70,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,80,0,20,0,0,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Mix of fields,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Never,10GB,"Decision Trees,Random Forests","Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,Often,,,Most of the time,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",Rarely,Sometimes,,,,Sometimes,Most of the time,Most of the time,,,,,,,,,,,,,Sometimes,,Often,,,,Often,,Most of the time,Most of the time,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,Often,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,30000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,80,Retired,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer",Self-taught,30,20,10,20,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Operations Research Practitioner,Fine,Employed by non-profit or NGO,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,R,GitHub,"Blogs,Kaggle,Newsletters,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Master's degree,Electrical Engineering,I don't write code to analyze data,Other,Self-taught,100,0,0,0,0,0,"Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important +Male,Other,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,,Very useful,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Other,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs",,,,Sometimes,,Most of the time,Often,Most of the time,Most of the time,,,,,,,Often,,,Often,Often,,,Most of the time,,Sometimes,,,,,,,,,20,30,50,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Privacy issues",,,,,,,,,Often,,,,,,,,Often,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,No,We use internal data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Never,4200000,LKR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,Very useful,Very useful,Very useful,,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Statistician",University courses,20,10,50,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Julia,Jupyter notebooks,Python,R,Tableau",,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,Most of the time,,,Often,,Sometimes,Most of the time,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,,,,,30,30,10,20,10,0,Enough to refine and innovate on the algorithm,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Rarely,,10-25% of projects,Entirely external,Other,,,,Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Finland,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Trade book",,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,35,20,35,10,0,0,,,A master's degree,Mix of fields,Fewer than 10 employees,Increased significantly,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data,Other",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,MATLAB/Octave,NoSQL,Python,R,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Rarely,Sometimes,,,Sometimes,Most of the time,Often,Sometimes,Rarely,,Sometimes,,Often,,Often,,Sometimes,,,Often,,Sometimes,,,Often,,,,Sometimes,,,,20,15,10,30,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,Sometimes,,,,,,Most of the time,,Sometimes,,,,Most of the time,,,76-99% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,65000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Denmark,22,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by a company that performs advanced analytics,C/C++,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,,,,,,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Physics,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,65,0,15,0,0,Computer Vision,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,Italy,49,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,,Very useful,,"KDnuggets Blog,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,,,High school,Technology,"5,000 to 9,999 employees",Decreased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",,,,"R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,1,0,0,2,2,95,,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Sometimes,,,Less than 10% of projects,Entirely internal,Other,,"Finding Commitment, Funding and Time.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,45000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,SQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,,,Somewhat useful,,,,Very useful,"DataTau News Aggregator,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,Researcher,University courses,15,5,60,20,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A professional degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,Some other way,Very important,Other,"Laptop or Workstation and local IT supported servers,Other",Relational data,,1TB,Gradient Boosted Machines,"C/C++,Jupyter notebooks,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,Simulation",,,,,,Sometimes,Most of the time,Most of the time,Most of the time,,,,,,,,,,,Often,,,,,,,Often,,,,,,,35,15,5,30,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,,,,,100% of projects,More internal than external,Other,,,,"Email,Other",Data is stored centrally,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,20400,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,40,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Random Forests,Python,Google Search,"Official documentation,Online courses,Personal Projects",,,,,,,,,,Very useful,Very useful,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,Singapore,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Google Cloud Compute,Deep learning,Python,GitHub,"Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,85,5,10,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Government,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Other,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,Often,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,,Sometimes,Often,,Often,Often,Often,Often,,Often,Often,Sometimes,Sometimes,,Sometimes,,,Rarely,Sometimes,Rarely,,Often,Rarely,Often,Rarely,Rarely,Rarely,Sometimes,Often,,,,50,30,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Often,Often,,Sometimes,Often,Sometimes,,,,,Most of the time,,Most of the time,,,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,80000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Germany,29,Employed full-time,,,Yes,,Data Scientist,,,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,10,0,70,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Video data,Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,Most of the time,,Often,,Often,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,40,5,5,25,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Often,,Most of the time,,,Sometimes,,,Most of the time,,,,,Sometimes,Often,,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email,Other","Google Sheets, Metabase",Git,Never,"60,000",EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Spain,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Cloudera,Deep learning,Java,University/Non-profit research group websites,"College/University,Company internal community,Conferences",,,Very useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,University courses,10,10,20,60,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,10GB,"Decision Trees,Neural Networks,SVMs","C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib",,,,Rarely,,,Often,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,,,,,Often,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Simulation",,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,10,10,5,70,0,5,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Limitations of tools",,,,Often,,,,,,,,,Sometimes,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,7000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,"GitHub,I collect my own data (e.g. web-scraping)","College/University,Company internal community,Conferences,Kaggle,Online courses,Textbook,YouTube Videos",,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,Engineer,Self-taught,30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Minitab,R,RapidMiner (free version)",,,,Rarely,,,,,,,Sometimes,Often,,,,,,,,,Sometimes,,,Most of the time,Sometimes,Often,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,Sometimes,,,,Most of the time,Most of the time,Often,,,,,,Often,,Often,,,,Often,Sometimes,,Sometimes,,,Often,Sometimes,Sometimes,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data",,Sometimes,Often,,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,50000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Ireland,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,edX,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,70,30,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,United Kingdom,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,Kaggle competitions,40,10,0,0,50,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,Rarely,Sometimes,,Rarely,,,,,,,,Most of the time,,Most of the time,,,,,Rarely,Rarely,Rarely,,Rarely,,,Sometimes,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,Often,,Most of the time,Most of the time,Often,Most of the time,,,,,Often,,Often,,,Often,Most of the time,Most of the time,,Often,,,,,Often,Often,,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,Inability to integrate findings into organization's decision-making process,,,,,,,,Often,,,,,,,,,,,,,,,100% of projects,More external than internal,IT Department,Cancer datasets;,Cleaning the data;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Git",Sometimes,15000,GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Physics,3 to 5 years,Researcher,Kaggle competitions,50,0,0,0,50,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United Kingdom,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Support Vector Machines (SVM),SQL,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,Textbook",,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,1 to 2 years,"Business Analyst,Researcher",Self-taught,70,0,0,0,30,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,,Rarely,,,Sometimes,Often,Rarely,,,,,,,,Often,,Sometimes,,,Rarely,,Rarely,,,Often,,,,Often,,,,30,20,0,20,30,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,Often,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,30000,GBP,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Not Useful,,,,,,,,Somewhat useful,,,,No Free Hunch Blog,< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Master's degree,No,Master's degree,Mathematics or statistics,Less than a year,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,Indonesia,34,Employed full-time,,,Yes,,Predictive Modeler,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Julia,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Official documentation,Personal Projects,Textbook",Very useful,,,,,,,,,Somewhat useful,,Very useful,,,Somewhat useful,,,,"FlowingData Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Researcher,Statistician",Self-taught,80,10,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks",A doctoral degree,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks","Amazon Machine Learning,Impala,Mathematica,Python,Stan,Tableau,TensorFlow",Sometimes,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,Often,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Segmentation,Simulation,Text Analytics",Often,,Most of the time,,,,Most of the time,,Most of the time,Often,,Sometimes,,,,Sometimes,Often,Sometimes,,,,,,,,Sometimes,Often,,Often,,,,,35,30,10,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,Most of the time,Most of the time,,,Most of the time,Often,,,,Most of the time,,,,Sometimes,,Most of the time,Often,,51-75% of projects,More internal than external,Standalone Team,demographic data; social media data; geospatial data,connecting different databases with different data formats to a usable single database,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","I don't typically share data,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,20000000,IDR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Italy,27,Employed full-time,,,Yes,,Business Analyst,,,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,,University courses,30,10,10,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,"5,000 to 9,999 employees",Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression",SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,,,Often,,,,Rarely,,,,,,Often,Most of the time,,,Often,,,,50,15,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Most of the time,,,,,,Sometimes,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,40000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,Not Useful,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,60,10,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,Singapore,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,"FlowingData Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",Work,40,30,20,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Technology,"1,000 to 4,999 employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,<1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,KNIME (free version),Microsoft Azure Machine Learning,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Tableau,TIBCO Spotfire",,Rarely,,,,,,Often,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,Sometimes,,,,,,,,,Most of the time,,Often,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",,Often,,,,Often,Often,Often,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,Often,,Sometimes,Often,,,,,,Sometimes,,,,40,10,0,10,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,,,100% of projects,Entirely internal,Other,,Getting it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,,,170000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Greece,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Self-employed,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Trade book,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,,,Somewhat useful,,Not Useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,10,10,0,60,0,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - RNNs,A bachelor's degree,Mix of fields,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Sometimes,,Most of the time,,Often,,,,Often,,Often,,,,,Sometimes,Most of the time,Often,,Often,,Often,,,,,Often,,,,10,50,10,10,10,10,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Bitbucket,Git",Sometimes,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,No,Yes,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,Very useful,,"Data Machina Newsletter,No Free Hunch Blog",< 1 year,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,,No,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,,,A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Other,23,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,Not Useful,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,Engineer,Self-taught,30,40,0,20,10,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Stayed the same,Don't know,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Don't know,1GB,,"Amazon Web services,Google Cloud Compute,Java,NoSQL,Python,SQL",,Most of the time,,,,,,Sometimes,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs",Often,Rarely,Rarely,,,Most of the time,Most of the time,Often,Often,Rarely,,Often,,Most of the time,,Often,,Rarely,,,Most of the time,Sometimes,Often,,,Often,,Sometimes,,,,,,40,20,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,,,,,,,,,Often,,,,,,Often,,,76-99% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Rarely,70000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Netherlands,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,SQL,I collect my own data (e.g. web-scraping),"Company internal community,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer",University courses,30,20,30,20,0,0,Recommendation Engines,Logistic Regression,High school,Internet-based,100 to 499 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10TB,"Decision Trees,Regression/Logistic Regression","Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,NoSQL,R,SQL,Tableau,Other",,,,,Often,,,Most of the time,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,Most of the time,,,"A/B Testing,Data Visualization,Recommender Systems",Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,50,10,5,5,30,0,Enough to explain the algorithm to someone non-technical,"Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,,,,,,,,,,,,,,,,,Sometimes,,,Often,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,"Super big data set, time consuming.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Other",Sometimes,71500,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Official documentation,Stack Overflow Q&A",,,Very useful,,,,,,,Very useful,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,45,5,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Neural Networks - CNNs",High school,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Relational data,Never,10MB,"Evolutionary Approaches,Neural Networks","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"Evolutionary Approaches,Neural Networks",,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,15,20,40,20,5,0,"Enough to code it again from scratch, albeit it may run slowly",Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Rarely,,,,100% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Git,Rarely,"14,400",EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,Ukraine,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"College/University,Company internal community,Kaggle,Online courses",,,Very useful,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A health science,6 to 10 years,"Data Analyst,Researcher",University courses,10,25,50,10,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Other,,1GB,"Gradient Boosted Machines,Random Forests","KNIME (free version),Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Rarely,Rarely,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Often,Sometimes,,,Sometimes,,Often,,,,,,,Often,,Often,,,,,,,,,,,50,10,0,30,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,19200,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),40+,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Software Developer/Software Engineer,Other",Other,0,0,0,0,0,100,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,25,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Other,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,,University courses,5,0,0,95,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,CRM/Marketing,20 to 99 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Rarely,10MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,Most of the time,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",Most of the time,Often,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,Most of the time,,,Most of the time,Sometimes,,,,40,5,10,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data,Other",,,,Often,Most of the time,,,,Often,,,,,,,Sometimes,,,,,Most of the time,Most of the time,100% of projects,Entirely internal,Central Insights Team,Non,Collecting sufficent data of the correct quality.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),,"45,000",GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Kaggle,Online courses",Somewhat useful,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,30,0,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased significantly,1-2 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100MB,"CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,NoSQL,Python,R,SQL,Unix shell / awk,Other",,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,Sometimes,,Often,,,,,,,,,Sometimes,,,,,,Often,Most of the time,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Rarely,,,Rarely,Rarely,Most of the time,Often,Rarely,Most of the time,,Rarely,Most of the time,,Often,,Often,,Sometimes,Sometimes,Sometimes,Often,,Often,Rarely,,,,Rarely,Often,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,55000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Portugal,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Neural Nets,R,I collect my own data (e.g. web-scraping),"Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,Very useful,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Statistician",Self-taught,60,15,15,10,0,0,"Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",High school,Retail,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Always,1EB,"Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Java,Microsoft Excel Data Mining,Minitab,R",,Rarely,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,Rarely,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Random Forests,RNNs,Simulation,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Often,Rarely,,,,,,Rarely,,Rarely,,Most of the time,,,Sometimes,,,,30,25,20,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,Most of the time,,Often,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,Often,,Most of the time,,,26-50% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,38500,EUR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,52,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,MATLAB/Octave,Other,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Official documentation",,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Operations Research Practitioner,Researcher,Statistician",University courses,20,0,0,80,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Regression/Logistic Regression,SVMs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Often,,Most of the time,,Often,Often,Sometimes,,,Sometimes,Often,,Often,,Often,,Often,Most of the time,,Sometimes,,Sometimes,Sometimes,Most of the time,Most of the time,Sometimes,Most of the time,,,,10,0,0,0,0,90,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,Lots....,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Other",Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,,Self-taught,80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,CRM/Marketing,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,100GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Java,Spark / MLlib,TensorFlow",,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"kNN and Other Clustering,Neural Networks,Recommender Systems,Segmentation",,,,,,,,,,,,,,Often,,,,,,Rarely,,,,Most of the time,,Most of the time,,,,,,,,20,50,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,,,,,,,,,Often,Often,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Bitbucket,,900000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Java,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Conferences,Friends network,Kaggle,Newsletters,Online courses,YouTube Videos",Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,"Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,10,30,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Other",Rarely,100MB,"CNNs,Decision Trees,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,Sometimes,,,,,Rarely,Most of the time,,Often,,,,Most of the time,Most of the time,,Sometimes,Most of the time,,Most of the time,,Often,,Often,,,,50,30,0,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,Most of the time,,Most of the time,,Most of the time,Often,Most of the time,,,,Most of the time,Most of the time,,Sometimes,Sometimes,Most of the time,Most of the time,,,100% of projects,More internal than external,Standalone Team,"MNIST, CIFAR10/100, SUN, Food-100, OpenImages, OpenSurfaces, other public datasets","partitioning for cross-validation, effective caching, sampling","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,840000,RUB,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Hungary,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Personal Projects",Very useful,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Scientist,DBA/Database Engineer",University courses,25,5,35,15,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,Often,,,,Most of the time,,,Sometimes,Sometimes,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Sometimes,,Sometimes,,,,Rarely,Sometimes,,Often,,,Often,,Rarely,,Often,,,,40,25,10,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Most of the time,Often,,,,,,,,,,,Sometimes,,,Often,,,51-75% of projects,Entirely internal,Other,kaggle; uci,It's often dirty.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,3360000,HUF,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Social Network Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher",University courses,15,5,20,50,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Rarely,,,,,,,Rarely,,,,,,Rarely,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Cross-Validation,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Sometimes,Sometimes,,,,Most of the time,,,Often,Rarely,,Often,,Often,,Often,,Sometimes,Often,,Often,,Often,,,Often,,Often,Sometimes,,,,,10,30,40,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Often,Most of the time,,,Most of the time,,,Often,Most of the time,,,,,Most of the time,Sometimes,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,None,Data is lost at random,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,65000,GBP,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,Python,GitHub,"Arxiv,College/University,Conferences,Friends network,Kaggle,Stack Overflow Q&A",Somewhat useful,,Very useful,,Very useful,Very useful,Very useful,,,,,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Engineer,Programmer",University courses,20,0,0,80,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Sometimes,10GB,"Ensemble Methods,Neural Networks,Random Forests","IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,"Collaborative Filtering,Ensemble Methods,PCA and Dimensionality Reduction,Recommender Systems",,,,,Often,,,,Often,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,70,10,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,,Often,,,,,Often,Often,,,,,Often,,,,,Often,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,30000,TWD,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Male,Hungary,42,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Not Useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",24,75,0,0,1,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Don't know,,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,KNIME (free version),Orange,R,TIBCO Spotfire",,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,"Data Visualization,Decision Trees,Natural Language Processing,Text Analytics",,,,,,,Most of the time,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,0,0,0,50,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Most of the time,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,Subversion,Most of the time,218700,HUF,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Ukraine,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,35,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,,3-5 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,30,60,0,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,France,25,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Conferences,Kaggle,Personal Projects",,,Somewhat useful,,Somewhat useful,,Very useful,,,,,Very useful,,,,,,,"KDnuggets Blog,Partially Derivative Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Statistician",Self-taught,30,10,15,15,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,India,35,Employed full-time,,,No,Yes,Data Miner,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,Nice to have,,Necessary,,Nice to have,Necessary,Necessary,,,Necessary,,,,,Traditional Workstation,,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Miner,Programmer,Software Developer/Software Engineer",Other,30,60,0,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important +Female,India,NA,"Not employed, but looking for work",,,,,,,,Amazon Web services,Social Network Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Operations Research Practitioner,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Work,30,10,40,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important +Female,Italy,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by college or university",Amazon Machine Learning,Deep learning,Python,,"Arxiv,Blogs,College/University,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,40,10,0,10,Speech Recognition,"Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A professional degree,Technology,Fewer than 10 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Other,Rarely,,"Neural Networks,Other","C/C++,Java,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,HMMs,Natural Language Processing,Neural Networks,Segmentation,Simulation,Text Analytics",,,,,,Often,Often,,,,,,Often,,,,,,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,Most of the time,,,,,10,30,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,,,,,Often,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,"Wall Street Journal, TIMIT",,Other,I don't typically share data,,"Bitbucket,Git",Most of the time,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Czech Republic,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Genetic & Evolutionary Algorithms,Python,"GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Personal Projects,Textbook",,Somewhat useful,,,Somewhat useful,Very useful,,,,,,Very useful,,,Somewhat useful,,,,"Data Stories Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher",Self-taught,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Video data,Text data,Relational data,Other",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests","Amazon Web services,C/C++,Java,Jupyter notebooks,NoSQL,Perl,Python,Unix shell / awk",,Most of the time,,Most of the time,,,,,,,,,,,Often,,Rarely,,,,,,,,,,Most of the time,,,Rarely,Often,,,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Evolutionary Approaches,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,,Often,Often,,Often,,,,,,,,Often,Often,Often,,,,,,,,,Often,Often,,,,50,20,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Often,,,,Often,,Often,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,,Often,,,10-25% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other","Commercial Data Platform,Company Developed Platform,Other",MarkLogic,"Bitbucket,Git,Subversion",Always,140000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Kenya,36,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by non-profit or NGO,Python,Deep learning,Python,Google Search,"Blogs,Newsletters,Online courses,Podcasts,Textbook,YouTube Videos",,Very useful,,,,,,Very useful,,,Very useful,,Very useful,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Finland,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,70,0,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,,,"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,,,,Very useful,"Jack's Import AI Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,,Self-taught,50,30,0,0,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),,,,,"Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Most of the time,,,,Often,,,Often,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to tune the parameters properly,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,New Zealand,42,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,R,Random Forests,R,Government website,"College/University,Kaggle,Non-Kaggle online communities,Online courses,Textbook",,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,Less than a year,Other,University courses,25,30,0,45,0,0,,,A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1MB,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,55,0,0,25,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools",,Sometimes,Rarely,,Most of the time,,,,,,,,Often,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Rarely,75000,NZD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Neural Nets,R,"Google Search,Government website,University/Non-profit research group websites","Blogs,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Biology,3 to 5 years,Data Analyst,Self-taught,80,0,0,20,0,0,Survival Analysis,"Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Germany,69,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,R,Neural Nets,,I collect my own data (e.g. web-scraping),"Friends network,Official documentation,Personal Projects,Textbook,YouTube Videos",,,,,,Very useful,,,,Very useful,,Very useful,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,I never declared a major,More than 10 years,"Researcher,Statistician",Self-taught,60,0,35,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",I prefer not to answer,CRM/Marketing,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,Regression/Logistic Regression,"IBM SPSS Statistics,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction",,,,,,,,Sometimes,,,,,,Often,Sometimes,Most of the time,,,,,Sometimes,,,,,,,,,,,,,40,30,0,0,30,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Scaling data science solution up to full database",,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,Less than 10% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,,EUR,I am not currently employed,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Turkey,38,Employed full-time,,,No,Yes,Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Electrical Engineering,I don't write code to analyze data,Engineer,University courses,60,10,0,30,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +Male,Turkey,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,Yes,Business Analyst,Perfectly,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX",Workstation + Cloud service,2 - 10 hours,Other,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Business Analyst,Data Analyst,Researcher",Self-taught,80,10,10,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,India,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,"Ensemble Methods (e.g. boosting, bagging)",,Other,"Friends network,Kaggle,Personal Projects,Textbook",,,,,,Very useful,Somewhat useful,,,,,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Scientist,Predictive Modeler",Self-taught,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,R,RapidMiner (commercial version),SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Often,,,,,Rarely,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,Sometimes,,Sometimes,,Most of the time,Most of the time,Often,Often,,,,,Most of the time,,Often,,Often,,,Most of the time,,Often,,,Most of the time,,Most of the time,,,,,,60,20,0,10,10,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,Often,,Often,,,,Most of the time,,,Sometimes,,,,100% of projects,Entirely internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1000000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Spain,55,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Data Scientist,Self-taught,90,0,10,0,0,0,Computer Vision,Decision Trees - Gradient Boosted Machines,,Government,Fewer than 10 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Most of the time,1TB,CNNs,"Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,10,30,40,0,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,"Uci,opendata,...",Brain,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,"Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,45,5,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Relational data",Most of the time,100GB,"Random Forests,Regression/Logistic Regression","C/C++,Mathematica,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Rarely,,Sometimes,,,,"Data Visualization,Random Forests,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,20,25,30,20,5,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Other,Public data from scientific satellites,Waiting for the data to be available,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,41500,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Poland,34,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,R,Bayesian Methods,R,,"Blogs,Kaggle,Non-Kaggle online communities,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,,,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Business Analyst,Data Analyst,Data Miner",Self-taught,50,30,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,CRM/Marketing,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,R,SAS Base,SAS Enterprise Miner,SQL,Other",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,,,,Sometimes,Often,,Most of the time,,,Most of the time,,,Most of the time,Often,,,Sometimes,Often,,,Often,,,,,,Sometimes,,,,,30,30,20,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Often,,Sometimes,,,,,,,,,,,,,Sometimes,Often,Often,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Germany,25,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,"Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,R,Cluster Analysis,Python,GitHub,"Company internal community,Personal Projects,YouTube Videos",,,,Very useful,,,,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,20,30,0,10,10,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",No education,Technology,20 to 99 employees,Increased slightly,3-5 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,100MB,Regression/Logistic Regression,"Python,R,RapidMiner (free version),SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Often,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,,30,20,10,30,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",,Most of the time,,,,,,,Often,,Often,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,to generate proper insights form 2 GB data sert,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,25000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,"Arxiv,Blogs,Conferences,Non-Kaggle online communities,Personal Projects,Tutoring/mentoring",Very useful,Very useful,,,Very useful,,,,Very useful,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,Machine Learning Engineer,Self-taught,80,0,0,20,0,0,Recommendation Engines,"Bayesian Techniques,Support Vector Machines (SVMs)",,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Neural Networks,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,Most of the time,,,Rarely,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Naive Bayes,Natural Language Processing,Neural Networks",Most of the time,,Sometimes,,Often,,,,,,,,,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,,,30,15,20,10,25,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues",,,,,,,,,Often,,Often,Sometimes,Often,,,,Often,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,Not Useful,,,Very useful,,Very useful,,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,Work,15,5,45,35,0,0,"Natural Language Processing,Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Rarely,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,,,Often,Often,Most of the time,,Rarely,,,Sometimes,,Often,,Often,,Rarely,Rarely,,Often,,Rarely,,,Often,Rarely,Rarely,Often,Often,,,,35,15,10,25,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Rarely,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,,Most of the time,,,,Often,Often,,,Often,,10-25% of projects,More internal than external,Standalone Team,National address file;road traffic authority;census,Size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,120000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Spain,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Cloudera,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Telecommunications,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Cloudera,Jupyter notebooks,Python,TensorFlow",,Often,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Sometimes,,Sometimes,Sometimes,,,,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues",Often,,,,Often,,,Sometimes,Sometimes,,,,,,,,Often,,,,,,26-50% of projects,More external than internal,Central Insights Team,Socio-ecnomic,Desestructured,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,"Bitbucket,Git",,60000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A",,,Very useful,,,,,,,,,,,Very useful,,,,,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,PhD,No,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,10,0,0,80,10,0,Computer Vision,"Bayesian Techniques,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Other",Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",25,45,0,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Cluster Analysis,R,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst",University courses,20,30,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Insurance,"10,000 or more employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,,"Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,R,SAS Base,SQL,Tableau",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,Prescriptive Modeling,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,40,0,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,90000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Neural Nets,Python,Google Search,"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Very useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician",Work,0,0,100,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,IBM Cognos,"Ensemble Methods (e.g. boosting, bagging)",Python,,"Blogs,Company internal community,Conferences,Kaggle",,Somewhat useful,,Very useful,Not Useful,,Somewhat useful,,,,,,,,,,,,"DataTau News Aggregator,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,30,10,0,20,10,"Outlier detection (e.g. Fraud detection),Survival Analysis",,A bachelor's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,,Text data,Rarely,,,"IBM Cognos,SQL,Tableau",,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Prescriptive Modeling",,Rarely,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,30,30,10,30,0,0,Enough to run the code / standard library,"Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Often,Often,,,,Most of the time,,76-99% of projects,More external than internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),"Email,Share Drive/SharePoint,Other",,Other,,,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Australia,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,Less than a year,"Researcher,Statistician",University courses,15,75,8,2,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A bachelor's degree,Academic,I don't know,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Regression/Logistic Regression,Other","C/C++,MATLAB/Octave,Minitab,R",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Often,,Sometimes,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,10,60,10,10,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Scaling data science solution up to full database,Unavailability of/difficult access to data,Other",,,,,,,,,Sometimes,Most of the time,Sometimes,,,,,,,Sometimes,,,Most of the time,Most of the time,10-25% of projects,Approximately half internal and half external,Standalone Team,yahoo,data are not labeled. missing data. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint,Other",github,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,24000,AUD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Podcasts,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Work,20,0,50,20,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,I don't know,Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,<1MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,Often,Often,,,,40,30,5,20,5,0,Enough to refine and innovate on the algorithm,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,1200000,XOF,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Other",Self-taught,30,0,50,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"Ensemble Methods,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Azure Machine Learning,R,SQL",,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression",,,,,,,Most of the time,,Rarely,,,,,Often,,Often,,,,,,,,,,,,,,,,,,50,25,0,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,Often,,,,,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,2500000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,44,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,DBA/Database Engineer",Work,50,0,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",,Technology,"10,000 or more employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Neural Networks","Java,Microsoft Excel Data Mining,NoSQL,R,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,Often,,,,,,Often,,,,,,,,,,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Decision Trees,Natural Language Processing,Neural Networks",,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,20,30,20,0,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Company Developed Platform",,Git,Most of the time,3760000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Germany,28,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Personal Projects,Textbook,Other",Very useful,,Somewhat useful,,,,Very useful,,,,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),1-2 years,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important +Male,India,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,,,Somewhat useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,25,50,25,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,IBM Cognos,Julia,KNIME (commercial version),Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow",,Sometimes,,,,,,,,Rarely,,,,,,Rarely,,Often,,,,,Often,Most of the time,Sometimes,,Sometimes,Sometimes,,,Most of the time,Sometimes,Most of the time,Often,Often,,,,,,,,,,Sometimes,Rarely,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Often,Sometimes,,,Sometimes,Most of the time,Often,,,,,,Sometimes,,Often,,Sometimes,Most of the time,,Sometimes,Sometimes,Sometimes,,,Sometimes,,,Most of the time,Often,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Privacy issues",Most of the time,,Often,Sometimes,Often,,,,,Often,Often,,Sometimes,,,,Sometimes,,,,,,26-50% of projects,Entirely internal,IT Department,project / customer data,quality and consistency of the data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,2500000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Ireland,49,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Other,Sort of (Explain more),Master's degree,A social science,1 to 2 years,Other,University courses,20,30,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,Poland,32,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,Java,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Online courses,YouTube Videos",Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,"FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Programmer",University courses,15,20,20,40,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Government,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression","Julia,Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Text Analytics",,,Rarely,,,Sometimes,,,,,,,,Sometimes,,,,Rarely,Most of the time,Sometimes,,,Sometimes,,,Sometimes,,,Often,,,,,30,30,30,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,PAN antiplagiarism competition corpus; nltk newsgroups,getting dirty text coming from pdf,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,84000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Rule Induction,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Kaggle,Personal Projects",Very useful,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,1 to 2 years,"Programmer,Researcher",Self-taught,50,20,0,0,20,10,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",A professional degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10GB,"Bayesian Techniques,Neural Networks,Random Forests,RNNs","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"Cross-Validation,Natural Language Processing,RNNs,Text Analytics",,,,,,Often,,,,,,,,,,,,,Often,,,,,,Often,,,,Sometimes,,,,,20,40,30,0,0,10,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Rarely,,Sometimes,,,,,,Rarely,,,,Often,,,,,,Most of the time,,,Less than 10% of projects,Entirely external,Other,squad;marco;mspc;quora question pairs; conll,Evaluation scripts need to be setup for some datasets.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Rarely,900000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Australia,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Not Useful,Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,,Self-taught,10,0,50,0,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Insurance,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Always,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,Sometimes,Rarely,Most of the time,,,,,Often,Rarely,,,Often,,,Most of the time,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Rarely,,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,Often,Most of the time,,Often,,Sometimes,Often,,Most of the time,,,Most of the time,Most of the time,,Sometimes,Often,,,,20,30,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,Often,Most of the time,Sometimes,,Sometimes,Sometimes,,,,,,Often,,Often,,Most of the time,Often,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,200000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Time Series Analysis,R,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Data Analyst,Engineer,Programmer",University courses,30,20,0,50,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Other,"10,000 or more employees",Decreased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Never,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",,,,,Often,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,Most of the time,Most of the time,,Most of the time,Most of the time,,Sometimes,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,10-25% of projects,Do not know,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Most of the time,1039000,INR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Egypt,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Online courses,Personal Projects",,Very useful,Very useful,,,,,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp","Basic laptop (Macbook),Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,Yes,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Reinforcement learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important +Male,France,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Haskell,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,90,5,5,0,0,0,"Recommendation Engines,Time Series","Bayesian Techniques,Logistic Regression",,Internet-based,500 to 999 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",,,,"Amazon Web services,C/C++,NoSQL,SQL,Unix shell / awk",,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Segmentation,Time Series Analysis",Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,90,0,5,5,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,,,Often,,,Sometimes,,,,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,Git,Never,"55,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Italy,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,15,10,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10MB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,Often,,,,,,,Often,Often,,,,50,5,15,25,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Unavailability of/difficult access to data",Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,51-75% of projects,Entirely internal,Business Department,,api configuration,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Sometimes,25000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Very useful,,Very useful,,Very useful,,,,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Manufacturing,10 to 19 employees,Increased slightly,Less than one year,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Never,10GB,Regression/Logistic Regression,"Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression",,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,60,10,0,10,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Often,,,,,,Most of the time,,Often,,,,,,Often,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Never,500000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Personal Projects",,Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Machine Learning Engineer",University courses,30,10,30,20,5,5,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,Telecommunications,"1,000 to 4,999 employees",Decreased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Sometimes,1GB,"Decision Trees,Random Forests","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (commercial version),Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,Often,,,Often,,Often,,,,,,,,Sometimes,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Often,,,,,,Often,,Often,,,,,Often,,Often,,,,,,,,,,,40,20,10,20,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Need to coordinate with IT",,Sometimes,,,Often,,,,,,,Often,,,Often,,,,,,,,26-50% of projects,Entirely external,IT Department,no,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Always,8000,CNY,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Software Developer/Software Engineer,University courses,25,25,10,40,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,35,"Not employed, but looking for work",,,,,,,,Python,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,,,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Fine arts or performing arts,Less than a year,Other,University courses,27,10,0,60,3,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,Nigeria,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,Less than a year,Other,Self-taught,50,40,10,0,0,0,Survival Analysis,Decision Trees - Random Forests,,Financial,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Random Forests,"Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,Most of the time,,,,,,,,,Often,,,Often,,,,,,,"Cross-Validation,Decision Trees,Random Forests,Text Analytics",,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,30,10,10,30,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,Often,,,,,Often,,,,,,,,,26-50% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,3000000,NGN,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Newsletters,Online courses,Stack Overflow Q&A",,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression",,Financial,10 to 19 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Segmentation",Often,,,,,Often,Most of the time,,,,,,,,,Often,,,,,,,Sometimes,,,Sometimes,,,,,,,,10,10,10,40,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,Most of the time,,51-75% of projects,More internal than external,Other,"Credit bureau data, UK companies house",Small dataset of our historical customer base,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,63000,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Spain,35,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Deep learning,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Tutoring/mentoring",,Somewhat useful,,,,,,,,,,,,,,,Somewhat useful,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Business Analyst,Self-taught,80,10,10,0,0,0,"Speech Recognition,Survival Analysis",Logistic Regression,,CRM/Marketing,10 to 19 employees,Decreased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,100MB,Neural Networks,Google Cloud Compute,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,20,0,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,10-25% of projects,More internal than external,Business Department,,,Graph (e.g. GraphBase/Neo4j),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Israel,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Engineer",University courses,50,5,40,5,0,0,"Natural Language Processing,Unsupervised Learning","Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,100MB,Regression/Logistic Regression,"NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Text Analytics",,,,,,Sometimes,Often,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,75,2,5,8,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization",Sometimes,Sometimes,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Rarely,240000,ILS,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Other,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Official documentation,Personal Projects,Textbook",Very useful,Very useful,,,,,,,,Very useful,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Physics,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Other,"5,000 to 9,999 employees",Increased significantly,6-10 years,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Relational data,Other",Most of the time,1TB,"CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,RapidMiner (free version),Spark / MLlib,Tableau,TensorFlow,Other",,,,Sometimes,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,Rarely,,,,Often,,,,Rarely,,,,,,Rarely,,,,Rarely,Often,,,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,Time Series Analysis",,,,Often,,Often,Often,Often,Often,,Often,Often,,Often,,Often,,,,Often,Often,,Often,,Often,Often,Most of the time,,,Often,,,,40,20,30,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,Integration of Data sources.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,Own tools,"Git,Other",Rarely,-1,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Natural Language Processing,Recommendation Engines","Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,Other,22,"Not employed, but looking for work",,,,,,,,Java,Text Mining,Other,"GitHub,University/Non-profit research group websites","College/University,Company internal community,Online courses,Stack Overflow Q&A,YouTube Videos",,,Not Useful,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Coursera,Udacity",,2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,8,10,0,2,"Machine Translation,Natural Language Processing,Speech Recognition,Other (please specify; separate by semi-colon)","Hidden Markov Models HMMs,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ukraine,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Textbook",Very useful,,,,,,Very useful,,,,,,,,Somewhat useful,,,,,3-5 years,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,0,0,50,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,Italy,31,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Other,Other,R,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher",University courses,15,15,30,40,0,0,"Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Java,Jupyter notebooks,Python,R,TensorFlow",,Often,,Sometimes,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",Most of the time,,,,,,Sometimes,Sometimes,,,,,,,,Often,,,,,,,Rarely,,,,,,,Most of the time,,,,40,10,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,Rarely,,,,,,Most of the time,Often,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,55000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",Self-taught,40,15,5,35,5,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,Less than one year,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Rarely,100MB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,"Association Rules,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics",,Most of the time,,,,,,Often,,,,,,Often,,Often,,Often,Most of the time,,,,,,,,,,Most of the time,,,,,55,25,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team",,,,,Most of the time,,,Rarely,,,,,,,,Rarely,,,,,,,Less than 10% of projects,Entirely internal,IT Department,no,to build model with good score because of small and dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Never,450000,RUB,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Not Useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Perl,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Often,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,Rarely,Most of the time,,Often,,,,,,,,Sometimes,Often,,,Rarely,Sometimes,,Often,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,Sometimes,Sometimes,Most of the time,,Often,,,,Sometimes,,Sometimes,,Often,,,Often,Often,,,Often,Sometimes,Sometimes,Often,,,Often,Often,,,,50,30,15,0,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Often,,Often,Most of the time,Sometimes,,Often,Often,,,,Often,Often,Most of the time,,,,Often,,,,Less than 10% of projects,Do not know,Standalone Team,differs by client,differs by client,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,differs by client,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,50200,GBP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Pakistan,41,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Tableau,Deep learning,Python,Google Search,"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),No education,Academic,500 to 999 employees,Increased slightly,Don't know,Some other way,Important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,<1MB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,SQL",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,40,20,10,20,10,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,More external than internal,Other,,missing data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"100,000",PKR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Spain,40,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,"No Free Hunch Blog,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Yes,Master's degree,,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Not important +Female,South Africa,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,45,0,5,50,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Other,10 to 19 employees,Increased significantly,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Don't know,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Most of the time,Often,Often,,,,,Sometimes,,Rarely,,,,,Most of the time,,Often,,,,,,,Often,,,,35,14,1,25,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Rarely,,,,Often,,Often,Most of the time,,Often,Sometimes,,Often,,Most of the time,,Often,,Most of the time,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,255600,ZAR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Australia,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,RapidMiner (free version),Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,Very useful,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler",Kaggle competitions,15,15,15,25,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,Most of the time,,,Most of the time,,,,"A/B Testing,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,,,40,10,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,,,,,,,,Often,,,Often,,,,,,,,26-50% of projects,More internal than external,Standalone Team,Sports results,Data Cleaning,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,100000,AUD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,,40,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,,,,"Blogs,College/University,Kaggle,Personal Projects,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Master's degree,I never declared a major,,"Engineer,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,Not Useful,,Very useful,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Belgium,80,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Not Useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,"Researcher,Software Developer/Software Engineer",Self-taught,75,0,25,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Financial,20 to 99 employees,Increased slightly,Don't know,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Other,Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Recommender Systems,Simulation,SVMs,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,Most of the time,Sometimes,,Most of the time,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Sometimes,,,,,,Most of the time,,,,Sometimes,Often,Sometimes,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,"Bitbucket,Git",Rarely,85000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ireland,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,Google Cloud Compute,Genetic & Evolutionary Algorithms,Scala,GitHub,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Telecommunications,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Other",,,,Sometimes,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,Rarely,Most of the time,,,,Rarely,,,Most of the time,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Random Forests,Segmentation,Time Series Analysis",,,,,,,Often,Most of the time,,,,Most of the time,,Often,,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,Often,,,,50,20,10,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,Most of the time,,,Often,,Most of the time,Sometimes,,,,,Most of the time,,Often,,Often,,10-25% of projects,More internal than external,IT Department,kaggle competitions,collect them and make them available in production,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other","I don't typically share data,Share Drive/SharePoint",,Other,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Russia,37,Employed full-time,,,No,Yes,Researcher,Fine,Employed by professional services/consulting firm,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,A humanities discipline,3 to 5 years,"Business Analyst,Data Analyst,Researcher",Self-taught,85,0,0,0,15,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting",A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Other,40,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,KNIME (commercial version),Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Textbook",,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Other,Self-taught,80,0,0,0,10,10,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Telecommunications,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,,,,"Microsoft Azure Machine Learning,R,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Time Series Analysis",,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,,,,,,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Other,R,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Trade book",Very useful,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,,"DataTau News Aggregator,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Programmer,Other",University courses,20,10,7,60,3,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Italy,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Online courses,Podcasts,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,,,,Very useful,"Linear Digressions Podcast,O'Reilly Data Newsletter",< 1 year,,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Poland,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Company internal community,Friends network,Personal Projects",,,,Very useful,,Very useful,,,,,,Very useful,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,10,40,10,20,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,CRM/Marketing,20 to 99 employees,Stayed the same,Less than one year,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,Rarely,,,Sometimes,,,,,,,,,,Often,Often,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems",,,Sometimes,,,Most of the time,Sometimes,,Often,,,Often,,,,Often,,Sometimes,,Sometimes,,,Most of the time,Sometimes,,,,,,,,,,45,25,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,Sometimes,,Most of the time,Sometimes,,Often,Most of the time,Rarely,Most of the time,,,,,,,Sometimes,,Often,,,Less than 10% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Rarely,6500,PLN,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Singapore,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",70,20,0,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Other,,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","KNIME (free version),Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Often,,,Often,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Random Forests,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,,,,,,Sometimes,,,,,Sometimes,,,,Often,,,,,,,Most of the time,,,,30,30,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,Often,,,,,Most of the time,,,,Often,,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,384000,CZK,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Neural Nets,R,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Machine Learning Engineer,Predictive Modeler,Programmer",Other,0,40,60,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Internet-based,20 to 99 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,,,,,Often,Often,Most of the time,,,,,,,,Sometimes,,,Sometimes,,,,Most of the time,,,,,,Often,,,,,40,25,5,5,25,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,Often,,,Less than 10% of projects,Entirely external,Other,Social Media Data,Available in a very raw format,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1400000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,0,20,0,10,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks","Amazon Web services,IBM Watson / Waton Analytics,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk,Other",,Often,,,,,,,,,,,Rarely,,Often,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,Most of the time,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics",,,,,,Rarely,,Sometimes,,,,Often,,Sometimes,,,,,Often,Most of the time,,,Often,,Often,,,Most of the time,Most of the time,,,,,50,22,10,8,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Often,Often,,,Often,Often,Most of the time,Rarely,,,Rarely,,Often,,,Most of the time,Often,,100% of projects,Entirely internal,IT Department,DBpedia,"unorganized, Lack of business experts to explain what the data means, Sometimes business analyst will not be knowing whether the data is being captured or not.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Email",,Git,Sometimes,170000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,39,"Independent contractor, freelancer, or self-employed",,,No,Yes,Statistician,Fine,Employed by professional services/consulting firm,SAS Base,Regression,R,GitHub,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,,I don't write code to analyze data,"Business Analyst,Statistician",Self-taught,100,0,0,0,0,0,Time Series,,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Poland,27,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,R,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",35,50,0,0,15,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Other,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,,"Neural Networks,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,SVMs,Text Analytics",,,,,Rarely,Rarely,Most of the time,,,,,,,,,Often,,,,Sometimes,Rarely,Rarely,,Rarely,,Rarely,,Rarely,Sometimes,,,,,20,10,20,40,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Rarely,Sometimes,,Rarely,Often,,,Most of the time,,Often,,,,,,Most of the time,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,Data accuracy ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,130000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Russia,70,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,SAP BusinessObjects Predictive Analytics,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,Other","Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",University courses,30,0,20,30,20,0,Time Series,"Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Internet-based,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Other,,100MB,"Evolutionary Approaches,HMMs,Regression/Logistic Regression","IBM SPSS Statistics,Mathematica,Microsoft Excel Data Mining",,,,,,,,,,,,Rarely,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Evolutionary Approaches,Logistic Regression,Simulation,Time Series Analysis,Other,Other,Other",,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,Often,Most of the time,10,30,0,0,0,60,Enough to run the code / standard library,"Dirty data,I prefer not to say,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,Most of the time,,,Often,,,,,,,,,,Often,Often,,Less than 10% of projects,Entirely external,Standalone Team,__‰Û_Ì¢_å_‰Û_Ì݉Û_ÌωÛ_Ì¢‰Û_ÌÏ_åµ _åÇ_åÁ____‰Û_ÌÏ_åµ _____å _Ì_ _Ì___‰Û_偉Û_å_Ì__Ì_,_̪_____Ìö_Ì_‰Û_å_Ìö_Ì_ ‰Û_å_åÁ___å_‰Û_åÊ_Ì__Ì_ _Ì_ _åÇ__‰Û_偉Û_Ì¢‰Û_Ìã_ÌÛ _å_ __‰Û_Ì¢_å_‰Û_Ì݉Û_ÌωÛ_Ì¢‰Û_ÌÏ__ _åÇ_åÁ____‰Û_ÌÏ__ ‰Û_偉Û_Ì¢_åÁ_Ìö ___åµ_____ÌÁ_____å__åµ__,Other,Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,14000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,Somewhat useful,,,,Very useful,,Very useful,,,,,"Data Elixir Newsletter,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",University courses,50,0,10,20,20,0,"Natural Language Processing,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Python,R,Unix shell / awk",,Most of the time,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Random Forests,SVMs,Time Series Analysis",Most of the time,,Sometimes,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,,Often,,,,,Often,,Most of the time,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Sometimes,,Often,Sometimes,,,,,,,,Often,,,Sometimes,,,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,,56000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",University courses,30,20,20,20,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,Often,,Most of the time,Most of the time,,,,Most of the time,,Often,,,,30,30,10,10,20,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Unavailability of/difficult access to data",,,,,Sometimes,Often,,,,,,,,,,,,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,"50,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation",,,,,,,Very useful,,,Very useful,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",65,10,0,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAP BusinessObjects Predictive Analytics,SQL",Rarely,,,,,,,Rarely,,,,,,,,,Rarely,,,,,Sometimes,Most of the time,,,,,Often,,,Sometimes,,Sometimes,,,,Often,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Often,Sometimes,,,,Rarely,,Rarely,,Rarely,,Often,Often,,,,Sometimes,,,,,Sometimes,Most of the time,,,,,50,25,10,10,5,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,RUB,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Other,67,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,TensorFlow,Deep learning,Python,Google Search,"Arxiv,College/University,Conferences,Kaggle,Online courses,YouTube Videos",Somewhat useful,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,Computer Vision,"Evolutionary Approaches,Neural Networks - CNNs",A master's degree,Other,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Rarely,10GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,SQL,TensorFlow",,,,Often,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,,,,,Rarely,,,,Often,,,,,,"CNNs,Neural Networks,Segmentation",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,45,15,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team",,,,Often,Often,,,,Often,,Sometimes,Sometimes,Sometimes,,,Often,,,,,,,100% of projects,More internal than external,Standalone Team,,Scale and signal quality in data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,Git,Sometimes,540000,DKK,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Poland,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Web services,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,,,Very useful,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,70,25,5,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"5,000 to 9,999 employees",Stayed the same,Less than one year,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Relational data,Other",Sometimes,10GB,"Decision Trees,Evolutionary Approaches,GANs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Orange,Python,R,RapidMiner (free version),TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,Rarely,,Sometimes,,Often,,Most of the time,,Most of the time,,Often,,,,,,,,,,,Rarely,,Rarely,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",Sometimes,Sometimes,Often,,,Sometimes,Most of the time,Often,,,,,Sometimes,Often,,Often,,Often,,Often,Often,,Sometimes,,,,Most of the time,Sometimes,,,,,,20,15,10,30,20,5,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,Sometimes,Often,Sometimes,,Often,Sometimes,,Most of the time,,Often,,,,,Often,,Most of the time,Often,,51-75% of projects,Approximately half internal and half external,Other,ML repository,Mine and visualize multidimensional dependencies,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Mercurial,Subversion",Sometimes,40000,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Greece,37,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"FastML Blog,FlowingData Blog,No Free Hunch Blog",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Traditional Workstation,11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer",University courses,10,30,0,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,41,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,,"Blogs,College/University,Online courses,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Engineer,Other",Self-taught,50,45,0,5,0,0,,,Primary/elementary school,Financial,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and private datacenters,Text data,Never,100MB,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,0,20,40,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools",,,Most of the time,,Often,,,,,,,,Often,,,,,,,,,,26-50% of projects,Entirely internal,Business Department,,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Never,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by government,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities",,,Somewhat useful,,,,Somewhat useful,,Very useful,,,,,,,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,0,25,10,65,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,QlikView,R,SQL",,Often,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Segmentation",,,Often,,,Often,Most of the time,Sometimes,,,,,,,,Often,,Sometimes,,,,,Rarely,Rarely,,Often,,,,,,,,10,30,0,40,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,Often,,,,,,,Often,Sometimes,,,,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Greece,27,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,PhD,Yes,Master's degree,Management information systems,3 to 5 years,Data Analyst,University courses,0,40,10,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Very Important +Male,United Kingdom,28,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Company internal community,Online courses,Stack Overflow Q&A",,,,Very useful,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1TB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,Often,,,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Sometimes,Sometimes,,,Rarely,,,,,,Often,,,,"A/B Testing,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Time Series Analysis",Sometimes,,,,,,,Rarely,Rarely,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,Sometimes,,,,30,10,10,10,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Rarely,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,110000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by government,Spark / MLlib,Neural Nets,R,Google Search,"Blogs,Kaggle,Newsletters,YouTube Videos",,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",University courses,10,20,35,35,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,Sometimes,,,,"Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,Rarely,,Rarely,,,Sometimes,,,Sometimes,,,,,50,5,0,35,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,,,,Often,,Sometimes,,,,,Sometimes,,,Often,Often,Sometimes,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",slack,Bitbucket,Rarely,17000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,Very useful,,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,,,"Coursera,Udacity",GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,France,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,,University courses,0,30,0,40,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Unix shell / awk,Other",Rarely,Often,,,Often,,,,Often,,,,,Often,,,Most of the time,,,,,,,,,,Often,Rarely,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,,,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Sometimes,Most of the time,,,Often,,,,Most of the time,Most of the time,,Often,,Most of the time,Often,Most of the time,,,,40,20,30,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues",Sometimes,Sometimes,,,Often,,,,Often,,,,,,,,Sometimes,,,,,,51-75% of projects,More internal than external,Standalone Team,Open data ,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git",Rarely,"50,000",EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,20,20,20,10,30,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A professional degree,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Always,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,Rarely,,Rarely,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,Sometimes,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Often,,,Most of the time,,Often,,Often,,,Most of the time,Often,,Often,Often,,,Most of the time,,,,15,25,10,30,20,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Business Department,weather;internet images;events calender,data veracity,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,40000,EUR,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by college or university,DataRobot,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,1 to 2 years,Researcher,Self-taught,70,10,20,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs",High school,Academic,20 to 99 employees,Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,,1GB,"Decision Trees,HMMs,Random Forests",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,HMMs,Segmentation",,Sometimes,,,,,Most of the time,Often,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,20,20,0,40,20,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,Often,Often,,,Often,Sometimes,,,,,,,,,,76-99% of projects,More external than internal,IT Department,TCGA; GEO;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,15000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Spark / MLlib,Anomaly Detection,Python,Google Search,"Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,Somewhat useful,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,50,20,0,10,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,Sometimes,,,,,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,,,Most of the time,,Most of the time,,Most of the time,Often,,Most of the time,,,,Sometimes,Most of the time,,Sometimes,,,,30,30,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,,,,,,Often,Often,,,,,,Often,,,Often,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,20000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,Proprietary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst",University courses,0,0,50,0,50,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,Gradient Boosted Machines,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Gradient Boosted Machines,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,33,33,0,0,34,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,maxmind,What?,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,200000,USD,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,Very useful,,,Very useful,,,Very useful,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Udacity","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Yes,Bachelor's degree,Computer Science,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Ireland,40,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Sweden,55,"Not employed, but looking for work",,,,,,,,SQL,Link Analysis,C/C++/C#,GitHub,"Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,5-10 years,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,"edX,Other","Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,A social science,More than 10 years,"Programmer,Other",University courses,5,20,0,75,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)",,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Other","Arxiv,Blogs,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,3 to 5 years,"Researcher,Other",University courses,30,0,0,70,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Perl,Python,R,SQL",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",Sometimes,Rarely,Often,,,Most of the time,Most of the time,Often,,,,,,,Sometimes,Often,,,,Sometimes,Sometimes,,Often,,,,,,,,,,,45,20,10,15,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,Often,Sometimes,,,,,,Often,Often,,,,,Sometimes,Sometimes,,Most of the time,,,100% of projects,More internal than external,Standalone Team,public transit data; census data,"The fact that data is often collected before we figure out what problems we need answered. As a result, we sometimes have to resort to answering what questions we can ask of the data, as opposed to get data to answer the questions of primary interest.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"160,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,23,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Electrical Engineering,,"Business Analyst,Data Analyst,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Israel,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Other,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences",Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher",University courses,33,0,33,34,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Other",Text data,Never,10GB,"Ensemble Methods,Neural Networks,RNNs","C/C++,IBM Cognos,Java,Jupyter notebooks,Python",,,,Often,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Natural Language Processing,Neural Networks,RNNs,SVMs",,,,,,Sometimes,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,Sometimes,,,,,,40,30,0,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,Often,,51-75% of projects,Entirely internal,Other,,,,Email,,Git,Sometimes,30000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,24,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,Not Useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Statistician",University courses,20,15,15,40,5,5,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Don't know,1GB,"Random Forests,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SAS Base,Spark / MLlib",,,,,,,,,Rarely,,,,,,,,Often,,,,Sometimes,,Often,,,,,,,,Most of the time,,Most of the time,,,,,Rarely,,,Rarely,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",,,,,Often,Most of the time,Most of the time,Often,,,,,,Often,,Most of the time,,,,,Most of the time,,Often,Often,,Often,,,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,Often,,,,,,,,,,,,Often,,76-99% of projects,Approximately half internal and half external,Central Insights Team,government data ; economical data ; ,identifying and avoiding bias,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,EUR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,South Africa,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,Haskell,GitHub,"Arxiv,Blogs,Friends network,Official documentation,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,Very useful,,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,60,30,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Rarely,100MB,,"Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other,Other",Sometimes,Most of the time,,,,,,,Sometimes,,,,,,Most of the time,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,Often,,,,Sometimes,,Most of the time,Most of the time,Most of the time,,"kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,Often,,Often,,,Most of the time,,Often,,,Sometimes,,,,Sometimes,Most of the time,Sometimes,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Sometimes,,Often,Often,,,Often,,,,,Often,,Often,Often,Sometimes,,Most of the time,Sometimes,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Most of the time,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Data Scientist,Predictive Modeler",Self-taught,50,0,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1TB,"CNNs,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,Often,,Most of the time,,,,Often,,,Sometimes,,,Sometimes,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,,Sometimes,,Often,Often,Often,Sometimes,,,,,Often,Often,Most of the time,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,Often,Sometimes,,,,25,25,15,10,25,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United Kingdom,24,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Other",2 - 10 hours,Other,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,University courses,50,5,0,45,0,0,Natural Language Processing,"Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Not important,Not important +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Cloudera,Uplift Modeling,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,Somewhat useful,Somewhat useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Machine Learning Engineer,Kaggle competitions,10,40,15,5,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Cloudera,Julia,Jupyter notebooks,MATLAB/Octave,Python,R,Stan,TensorFlow",Rarely,,,,Often,,,,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,,,Often,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,Often,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Often,Most of the time,Often,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,Often,Often,,,,10,30,30,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,,,,Often,Most of the time,Most of the time,,100% of projects,More internal than external,Standalone Team,government data;open data;twitter data;forum data;,Find a challenge =D,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,55000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,,,Python,,"Arxiv,Blogs,Newsletters",Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,,Kaggle competitions,30,20,0,40,10,0,Other (please specify; separate by semi-colon),,A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,,Never,10GB,"CNNs,Neural Networks,RNNs","MATLAB/Octave,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Neural Networks,RNNs",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,0,10,0,0,90,0,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,Privacy issues",,,,,,,,,,,,Often,,,,,Often,,,,,,None,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,240000,USD,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Python,Text Mining,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,Other,University courses,10,15,10,50,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Simulation,Text Analytics",,Sometimes,Sometimes,,,Often,,Often,Sometimes,Most of the time,,,,,,Often,,Sometimes,Sometimes,Sometimes,,,Often,,,,Often,,Sometimes,,,,,15,20,10,5,20,30,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",Sometimes,,,,Sometimes,,,,Sometimes,Often,,,,,,Sometimes,Sometimes,,,,,,26-50% of projects,Approximately half internal and half external,Other,None.,Accommodating errors that were either difficult to predict or artifacts of either compression or poorly had derivation.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Other,Rarely,"5,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,42,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Fine,"Employed by a company that doesn't perform advanced analytics,Self-employed",R,Time Series Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,More than 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,75,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Tableau",Rarely,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,,,Often,Often,,,,65,15,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,Sometimes,,,Often,,,,,,,Often,,,,Often,Often,,76-99% of projects,More internal than external,IT Department,,"It's a relational database, but has over 5000 tables and no relationships defined in the database.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,42000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Japan,41,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Python,Random Forests,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,1 to 2 years,Researcher,Work,25,50,20,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,Mix of fields,500 to 999 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Ensemble Methods,Random Forests",,,,Sometimes,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,30,20,20,10,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Subversion,Sometimes,5000000,JPY,Other,7,,,,,,,,,,,,,,,,,, +Female,Portugal,32,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,SAS Base,"Ensemble Methods (e.g. boosting, bagging)",Stata,,Textbook,,,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,1 to 2 years,Researcher,Self-taught,5,5,0,90,0,0,Unsupervised Learning,Logistic Regression,A doctoral degree,Academic,100 to 499 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,Markov Logic Networks,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Logistic Regression,Neural Networks",,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,0,90,5,5,0,0,Enough to tune the parameters properly,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Never,,,,9,,,,,,,,,,,,,,,,,, +Male,Canada,40,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Perfectly,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,C/C++/C#,I collect my own data (e.g. web-scraping),"Arxiv,Kaggle,Personal Projects,Textbook",Very useful,,,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),10-15 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,PhD,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,60,40,0,0,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Ukraine,17,"Not employed, but looking for work",,,,,,,,Mathematica,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Other","Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,< 1 year,,,,,,,,,,,,,,"Coursera,DataCamp",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,50,50,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hungary,21,Employed part-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,Very useful,Very useful,Very useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,Other,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer,Researcher",University courses,5,5,30,50,10,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,"1,000 to 4,999 employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Don't know,10TB,"Bayesian Techniques,Decision Trees","IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,Sometimes,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Often,Sometimes,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",,,Sometimes,,,,Often,Sometimes,,,,,,,,,,Sometimes,Often,Sometimes,,,,,,,,,Most of the time,,,,,20,40,0,40,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,Most of the time,,,Most of the time,,,,76-99% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Don't know,87,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Ireland,47,"Not employed, but looking for work",,,,,,,,SQL,Time Series Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,A social science,Less than a year,Other,University courses,10,10,0,60,20,0,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,49,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Bayesian Methods,R,"I collect my own data (e.g. web-scraping),Other","Blogs,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Engineer,Predictive Modeler,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - GANs",A master's degree,Technology,"10,000 or more employees",Stayed the same,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,,,Often,,,Most of the time,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,Often,Often,,Often,,,,,Most of the time,,,,Most of the time,,Often,Most of the time,,,,40,25,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,,,,,Most of the time,,51-75% of projects,Entirely internal,Business Department,,"Lack of data dictionary and relational structure Need to create data lakes Most data is in cubes or data warehouses and requires data pulls to bring into toolsets","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Rarely,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Portugal,35,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,Amazon Web services,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Tutoring/mentoring",,,,,,Very useful,Very useful,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",University courses,30,40,0,10,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Rarely,1TB,Regression/Logistic Regression,"Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,Often,,,,,,,,Rarely,Often,Rarely,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,PCA and Dimensionality Reduction",,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,60,10,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,49,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Bayesian Methods,R,"Government website,University/Non-profit research group websites","Blogs,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,Statistician,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Academic,500 to 999 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,R",,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,Often,,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,5,20,0,5,20,50,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Do not know,Other,"Government survey data; census data ",Time.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Most of the time,110000,GBP,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,India,26,"Not employed, but looking for work",,,,,,,,IBM SPSS Modeler,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,France,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,"DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Master's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,Germany,34,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,,5-10 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Kaggle Competitions,No,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Researcher,University courses,40,10,10,10,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,,"Online courses,Podcasts,Textbook",,,,,,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,6 to 10 years,"Data Analyst,Data Scientist,Statistician",University courses,20,10,30,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",High school,Internet-based,20 to 99 employees,Decreased significantly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression","Impala,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Simulation",Rarely,,Often,,,Often,Often,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,20,40,35,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,67000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Web services,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,Not Useful,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,25,10,50,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,100MB,"CNNs,Ensemble Methods,SVMs","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,Often,,Most of the time,Most of the time,,,,,,,,,,,,Often,,Most of the time,,,,,,,Often,Most of the time,,,,,20,45,15,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,Most of the time,Sometimes,,,Often,Sometimes,,Most of the time,,,Often,,,10-25% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Git",Sometimes,950000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,6 to 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,1GB,SVMs,"Amazon Web services,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Rarely,,,Rarely,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,,,Sometimes,Often,,,,,,,Often,,Often,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,Sometimes,Often,Often,Sometimes,,,,25,25,10,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,,,Sometimes,,,,,,,,,Often,Often,Often,,Sometimes,,51-75% of projects,Approximately half internal and half external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,40000,GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Data Scientist,Researcher",Self-taught,30,30,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,Tableau,TensorFlow",,Often,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Most of the time,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",Self-taught,90,10,0,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Often,Sometimes,,,Often,,,,,,,Often,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,Most of the time,,Often,Often,Sometimes,,,,20,40,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,Often,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,3000000,INR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Unix shell / awk,Bayesian Methods,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer",University courses,0,60,20,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,500 to 999 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,10MB,,"Amazon Web services,C/C++,Hadoop/Hive/Pig,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,Spark / MLlib,SQL",,Most of the time,,Sometimes,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,Often,,,,,,Often,,,,,,,,,,Often,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,40,0,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,26-50% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,,,1000000,INR,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,Kaggle competitions,40,0,0,0,60,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,Rarely,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,,Most of the time,,Most of the time,,,Most of the time,,,,Often,,,,50,10,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization",Most of the time,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Never,"42,000",EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Nigeria,40,"Not employed, but looking for work",,,,,,,,Java,Social Network Analysis,Java,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog,The Analytics Dispatch Newsletter",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,40+,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,10,10,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Markov Logic Networks",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,,Very Important,Very Important,Very Important,Very Important +Male,Portugal,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Deep learning,R,GitHub,"Kaggle,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,"DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,30,30,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Mathematica,MATLAB/Octave,R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,Often,,,,,,Rarely,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,,,,,,Often,Often,,,,,,,,Often,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,25,25,15,20,15,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others",,,,,Sometimes,Often,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,18500,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,France,40,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",50,30,0,0,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data",Always,100GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",,,Sometimes,Most of the time,,Often,,Often,,,,,,Sometimes,,Often,,Sometimes,Often,Often,Often,,Often,,Often,Often,,Often,Most of the time,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",Most of the time,,Often,,Sometimes,,,,,Most of the time,,,,,,,Most of the time,,,,Often,,Less than 10% of projects,Entirely internal,Standalone Team,ImageNet; COCO; Rimes,Privacy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Subversion",Never,40000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Ukraine,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,Google Search,"College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A professional degree,Telecommunications,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,Other,"Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,20,60,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,Do not know,Standalone Team,do not know,understand what does it mean from the business point of view,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Other,Never,"14,400",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Other,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Conferences,Friends network,Stack Overflow Q&A",,,,,Somewhat useful,Very useful,,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,6 to 10 years,Researcher,Self-taught,70,20,5,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",High school,Academic,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,Not at all important,Other,Basic laptop (Macbook),"Image data,Text data",,<1MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,,,,,,Often,,Often,,,,,Sometimes,,Rarely,,,,,,,Most of the time,,,,45,10,10,25,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input",,,,,,Sometimes,,,,,Often,,,,,,,,,,,,100% of projects,Entirely internal,Other,,Collecting it!,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,40500,GBP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Amazon Web services,Bayesian Methods,R,Google Search,"Blogs,College/University,Company internal community,Conferences,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,Not Useful,Not Useful,,,,,Somewhat useful,Not Useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,Other,Self-taught,70,5,25,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Military/Security,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,Sometimes,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,,,,Most of the time,Rarely,,,,,,Sometimes,,Often,,Sometimes,Often,Rarely,Often,,,,,,,,Often,,,,,20,10,0,10,20,40,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Often,,Often,,,,Often,,100% of projects,More internal than external,Other,Newswire; Social Media,Cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"160,000",USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Russia,20,Employed part-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Deep learning,Python,Other,"Conferences,Kaggle,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Very useful,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher",University courses,10,0,0,85,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Retail,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Natural Language Processing,Recommender Systems,Text Analytics",Rarely,,,,,Often,Sometimes,Often,,,,Often,,,,,,,Most of the time,,,,,Sometimes,,,,,Most of the time,,,,,35,50,5,0,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Bitbucket,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Female,Spain,24,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Monte Carlo Methods,C/C++/C#,I collect my own data (e.g. web-scraping),"College/University,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,Very useful,,Very useful,,Somewhat useful,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",15,30,20,30,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data,Relational data",Sometimes,10GB,"CNNs,Evolutionary Approaches,Neural Networks,Random Forests,SVMs","C/C++,Google Cloud Compute,IBM SPSS Statistics,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Minitab,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,Sometimes,,,,Often,,,Most of the time,,Most of the time,,,Sometimes,Often,,,Rarely,,Rarely,Rarely,,,,Most of the time,,Often,,,,Rarely,,,,,Sometimes,,,,Often,,Most of the time,,,,"CNNs,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Simulation",,,,Often,,,,,,,,,,,,Most of the time,,,,Most of the time,,,Often,,,,Sometimes,,,,,,,10,50,30,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Limitations of tools",,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),,,Git,Sometimes,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Turkey,27,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other",Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",10,50,10,30,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,1MB,"Bayesian Techniques,Decision Trees","Amazon Web services,Python,R,SQL",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Other,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,24000,TRY,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Hungary,20,Employed part-time,,,Yes,,Researcher,Fine,Employed by government,TensorFlow,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Company internal community,Kaggle,Personal Projects",Very useful,Somewhat useful,,Very useful,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Researcher,Work,40,10,40,0,10,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Gradient Boosted Machines,A master's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Often,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,Rarely,Sometimes,,Often,Often,,Often,,,,,Rarely,,Often,,,Rarely,Sometimes,Rarely,,,,,,,Sometimes,,Often,,,,50,30,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,Rarely,,,,,Often,,,,Most of the time,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,1600000,HUF,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,20,30,50,0,0,Recommendation Engines,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",,100GB,Neural Networks,"C/C++,Microsoft Excel Data Mining,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,0,10,10,50,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Often,,,Often,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,Bloomberg,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Other,Rarely,750000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Italy,41,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Researcher,Statistician",Self-taught,20,10,50,20,0,0,"Time Series,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1TB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Neural Networks,SVMs","C/C++,Hadoop/Hive/Pig,Mathematica,MATLAB/Octave,Perl,Python,R,TensorFlow",,,,Rarely,,,,,Often,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Often,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,SVMs,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,Often,,Often,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,50,30,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,Often,Often,,Often,Often,Often,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Email",,"Bitbucket,Subversion",,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Newsletters,Non-Kaggle online communities,Online courses",,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,25,10,10,5,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Retail,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,QlikView,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,,,,,Often,Most of the time,,Most of the time,,,,,,,Often,,,,60,8,7,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,,,Sometimes,Often,,,,Often,,Sometimes,,,,,,,,,,,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Rarely,,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Friends network,Online courses,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",5,40,20,30,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Most of the time,10MB,"Bayesian Techniques,SVMs","Python,R,RapidMiner (free version),Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,,"Association Rules,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,Often,,,,Often,,Most of the time,,,,,,Often,,Often,,Often,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,55,10,20,15,0,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,Often,,,,,,Most of the time,,,,,Often,,Less than 10% of projects,Entirely internal,Other,wikipedia,It is slow,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,"960,000",PKR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Africa,32,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Non-Kaggle online communities,Personal Projects,YouTube Videos",Somewhat useful,Very useful,,,,,,,Very useful,,,Very useful,,,,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,QlikView,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis,Other,Other",Often,,Often,Sometimes,,Sometimes,Most of the time,Sometimes,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,Sometimes,Often,Sometimes,Sometimes,,Sometimes,Sometimes,Often,Sometimes,,Often,Often,Rarely,,90,1,1,1,7,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Often,,,,,,,,,,,,Most of the time,Most of the time,,76-99% of projects,More internal than external,Standalone Team,"Afri-pop, OSM, nightlights data, Financial inclusion access points",Not updated regularly ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,1100000,ZAR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,France,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Self-employed",Julia,Monte Carlo Methods,Python,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,50,30,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Telecommunications,,,,,Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,Python,R,TensorFlow",,,,Sometimes,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Gradient Boosted Machines,Logistic Regression,Neural Networks",,,Sometimes,Most of the time,,,,,,,,Often,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,60,20,5,10,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team",,Most of the time,,,,,,,Most of the time,Often,,,,,Often,Most of the time,,,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"84,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Stan,Factor Analysis,Other,Google Search,"Official documentation,Textbook",,,,,,,,,,Somewhat useful,,,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,40,30,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,500 to 999 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,10MB,Regression/Logistic Regression,"R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Often,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,Sometimes,,,,70,1,0,10,19,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues",,Often,,,Most of the time,,,Often,,,,,,,,,Most of the time,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,public surveys; censuses,inconsistency of different sources ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,git,Git,Rarely,1300000,RUB,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Egypt,22,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Natural Language Processing",,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,27,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Bayesian Methods,R,,"Blogs,Newsletters,Official documentation,Online courses,Textbook",,Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,35,25,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Julia,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL",,,,,,,,,,,,,,,,Often,Often,,,,,Rarely,,Sometimes,,,,,,,Rarely,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis,Other",Sometimes,Rarely,,,,Often,Often,Often,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,30,15,5,30,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Sometimes,,,,,,,,,Sometimes,,Often,,Sometimes,Sometimes,Sometimes,,Sometimes,,26-50% of projects,Entirely internal,Business Department,,data comes from different teams; lack of documentation; some data sources are hard to combine;,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,New Zealand,45,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Neural Nets,SAS,Other,"Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,30,30,20,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks",High school,Other,500 to 999 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Other,Relational data,Sometimes,1TB,"Bayesian Techniques,Regression/Logistic Regression","SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,,,Sometimes,Most of the time,,,,80,5,1,10,4,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,Most of the time,Most of the time,Sometimes,,76-99% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,115000,NZD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",10,50,25,10,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,500 to 999 employees,Increased slightly,Less than one year,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,100MB,"Regression/Logistic Regression,Other","Jupyter notebooks,NoSQL,Python,SQL,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Often,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Other",Sometimes,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,,,72,3,0,10,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Australia,52,Employed part-time,,,No,Yes,Other,Perfectly,Employed by government,Python,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,Less than a year,Other,Kaggle competitions,0,60,0,30,10,0,,Logistic Regression,"Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,40,40,0,0,20,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,100GB,CNNs,"C/C++,Jupyter notebooks,Python,SQL,TensorFlow,Other",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Often,,,Often,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Segmentation",,,,Most of the time,,Most of the time,Often,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,15,35,20,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Belarus,49,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Employed by government,Self-employed",R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,20,0,40,30,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Sometimes,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Often,Often,Often,,Sometimes,Rarely,Sometimes,,Sometimes,Sometimes,,Sometimes,,Most of the time,,Most of the time,,,,,Often,Often,Sometimes,,,,50,10,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,Most of the time,,,,,,,Sometimes,,,,,Rarely,,,Often,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Support Vector Machines (SVM),Python,GitHub,"College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,0 - 1 hour,Master's degree,No,Master's degree,Computer Science,,"Computer Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Speech Recognition,Time Series","Hidden Markov Models HMMs,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Female,Germany,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Online courses,Personal Projects,Podcasts",,,,Somewhat useful,,,,,,,Very useful,Very useful,Somewhat useful,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,,Self-taught,20,50,20,5,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests","Jupyter notebooks,Python,R,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,,,Often,,Sometimes,,,,,,,Sometimes,,Often,,,,,,Sometimes,Sometimes,,,,30,30,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,,,,,,Often,,,Often,Often,,Often,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,60,20,0,0,20,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,United Kingdom,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Very useful,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Engineer,Researcher",Self-taught,40,30,0,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Sometimes,1TB,"Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,R,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Often,Often,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,,,,,,Most of the time,,,,20,30,10,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,Sometimes,,,,,,,,,Often,,,26-50% of projects,Entirely internal,,,the size of the files,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Russia,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,"Computer Scientist,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",80,10,5,0,5,0,"Computer Vision,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Image data,Rarely,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,Most of the time,Sometimes,Sometimes,,,Often,,,,Sometimes,,,,,Rarely,,Sometimes,,,,,Sometimes,,,,,,50,10,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,Sometimes,,51-75% of projects,More external than internal,Standalone Team,,,,Email,,Git,Never,1680000,RUB,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Singapore,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Not important +Male,Indonesia,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",SQL,GitHub,"Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",Self-taught,30,20,25,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,Often,,,,,,Sometimes,,Often,,,,,Often,Sometimes,,,Most of the time,,,,,,,,,,"Association Rules,Decision Trees,Ensemble Methods,Random Forests",,Sometimes,,,,,,Often,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,60,10,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,Sometimes,,,,,,,,,,,Often,,,76-99% of projects,Approximately half internal and half external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Bitbucket,Sometimes,9000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,"Jack's Import AI Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,20,20,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Python,Deep learning,Python,GitHub,"Blogs,College/University,Conferences,Kaggle",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,Emergent/Future Newsletter (Algorithmia),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Other,Self-taught,70,0,25,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,I don't know,Increased slightly,Don't know,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,SVMs","C/C++,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Neural Networks,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,,Often,,,,,,40,40,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources",,,,,Most of the time,,,,,Often,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,Australia,48,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Jupyter notebooks,Monte Carlo Methods,Python,Google Search,"Arxiv,Kaggle,Textbook,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,Researcher,Self-taught,20,40,10,5,0,25,Computer Vision,Bayesian Techniques,High school,Other,"10,000 or more employees",Decreased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Other,Laptop or Workstation and local IT supported servers,Image data,Rarely,<1MB,Bayesian Techniques,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,5,5,5,5,0,80,Enough to run the code / standard library,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Never,150000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Malaysia,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,Other,15,5,5,0,0,75,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,Telecommunications,100 to 499 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,1GB,,"Jupyter notebooks,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Often,,,Most of the time,,,,"Data Visualization,Decision Trees,Random Forests",,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,40,5,5,10,40,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools",,,,Often,,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"104,000",MYR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,35,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,,,,Government,500 to 999 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,,Regression/Logistic Regression,"MATLAB/Octave,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,30,0,0,40,30,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations in the state of the art in machine learning,,,,,,,,,,,,Most of the time,,,,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,40,,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,63,Retired,,,No,Yes,Other,Perfectly,Employed by non-profit or NGO,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,"Coursera,DataCamp,edX,Udacity",GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,I don't plan on learning a new ML/DS method,Python,Government website,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,,,,,Necessary,Nice to have,,Nice to have,Nice to have,,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Computer Vision,Logistic Regression,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,,Somewhat important,Very Important,,,,Very Important,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Python,,R,Google Search,"College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,Very useful,,,Very useful,,Very useful,,,Not Useful,Not Useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,90,0,0,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Russia,21,Employed part-time,,,No,Yes,Data Scientist,Poorly,Employed by professional services/consulting firm,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Friends network,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,Very useful,Somewhat useful,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,Not Useful,Not Useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,60,5,15,0,0,"Adversarial Learning,Reinforcement learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +A different identity,India,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,,Python,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,,Work,20,30,20,10,0,20,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Don't know,10MB,"Decision Trees,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems",,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Do not know,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,22,"Not employed, but looking for work",,,,,,,,R,Support Vector Machines (SVM),R,Google Search,"Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,0,10,10,0,Unsupervised Learning,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Google Cloud Compute,Bayesian Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",Very useful,Very useful,Very useful,Very useful,,,,,,,,Very useful,,Very useful,Very useful,Somewhat useful,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,20,10,0,70,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Other",Never,1TB,"CNNs,Decision Trees,Random Forests","C/C++,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Data Visualization,Random Forests",,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,50,20,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues",,Often,,,,,,,,Sometimes,,Sometimes,,,,,Often,,,,,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Julia,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"FlowingData Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,30,10,20,30,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Retail,"1,000 to 4,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Simulation,Time Series Analysis",Often,,,,Often,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,Sometimes,,Most of the time,,,,Sometimes,Often,,Sometimes,Sometimes,Sometimes,,Sometimes,,,Most of the time,,,,20,25,35,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,Often,,,Most of the time,Sometimes,,,Most of the time,,,,Sometimes,Rarely,,,,,,,,,76-99% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,75000,GBP,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,Data Stories Podcast,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Adversarial Learning,Time Series,Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Financial,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Java,Python,SQL",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,Often,Most of the time,,,,20,30,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",,Often,,,,,,,,,,,,,,Most of the time,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Subversion",,700000,RUB,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +,India,33,Employed full-time,,,No,Yes,Data Analyst,,,Python,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,,,,,,,,,,,Necessary,Necessary,Necessary,,Other,2 - 10 hours,Online Courses and Certifications,No,,Computer Science,Less than a year,Other,Self-taught,30,40,10,10,10,0,,Support Vector Machines (SVMs),A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,,,,,,,,,,,,,,,,Very Important +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Personal Projects,Textbook",,,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,"Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,HMMs,Natural Language Processing,Neural Networks,Recommender Systems,Text Analytics",Often,,,,,,Often,,,,,,Often,,,,,,Often,Often,,,,Often,,,,,Often,,,,,30,20,40,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,resumes; job postings,non-uniform standardization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,175000,USD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A health science,6 to 10 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10MB,Random Forests,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,0,50,0,50,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Netherlands,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Official documentation,Online courses,Textbook,YouTube Videos",Very useful,,,,,,,,,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,0,30,10,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Sometimes,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests,Segmentation,SVMs",,Sometimes,,,,Most of the time,Most of the time,Often,,,,Often,,,,,,,,,,,Often,,,Often,,Sometimes,,,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,Often,Most of the time,,,,,,,,Often,,,,Most of the time,,26-50% of projects,More internal than external,Standalone Team,none,Invalid / noise data.,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Singapore,25,"Not employed, but looking for work",,,,,,,,DataRobot,Text Mining,,Government website,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Computer Vision,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Not important +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Tutoring/mentoring",,,,,,,Very useful,,,Very useful,,Somewhat useful,,,,,Not Useful,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Miner,Programmer,Researcher,Other",University courses,10,15,15,25,0,35,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,Fewer than 10 employees,Increased slightly,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Sometimes,,Other,"Amazon Web services,Google Cloud Compute,NoSQL,Python,TensorFlow",,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,"Logistic Regression,Neural Networks,Recommender Systems,Simulation",,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,,Most of the time,,,,,,,50,20,20,10,0,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Limitations in the state of the art in machine learning",,,,,,Rarely,,,,,,Often,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Bitbucket,Rarely,38500,GBP,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,"FlowingData Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",15+ years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","GPU accelerated Workstation,Workstation + Cloud service",11 - 39 hours,Kaggle Competitions,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Other",University courses,20,20,10,10,40,0,"Computer Vision,Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Greece,27,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Netherlands,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Other,Text Mining,R,,"Online courses,Podcasts,Stack Overflow Q&A",,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Management information systems,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,Logistic Regression,A master's degree,Internet-based,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,<1MB,,"Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,50,0,20,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,Often,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,20000,EUR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher",Self-taught,40,5,35,15,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Neural Networks - RNNs",A bachelor's degree,Academic,I don't know,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Other,Other,"Text data,Other",,1GB,Evolutionary Approaches,"Julia,MATLAB/Octave,Python",,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Rarely,,,,Often,,,Sometimes,,,,,,,,,,Sometimes,Rarely,,,,,,Most of the time,,,Often,,,,20,30,0,30,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Most of the time,20000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Time Series Analysis,R,University/Non-profit research group websites,"Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Very useful,,,,Not Useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Psychology,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,10,80,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Non-profit,"1,000 to 4,999 employees",Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,1GB,"Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Segmentation",Most of the time,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,50,10,10,5,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Most of the time,Most of the time,,,,Most of the time,,,,Sometimes,,Most of the time,Most of the time,Sometimes,,Often,Often,Most of the time,,100% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,R,Time Series Analysis,R,"Google Search,Government website","Blogs,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,Very useful,,Not Useful,,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,,"Data Analyst,Programmer",University courses,30,30,20,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Often,,,,,,,"Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Text Analytics",,,,,,,,Often,,,,,,Often,Often,Most of the time,,,,,,,Often,,,,,,Sometimes,,,,,40,30,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,,Often,Sometimes,Most of the time,Rarely,,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,Most of the time,,,26-50% of projects,,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1000000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Switzerland,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Anomaly Detection,Julia,Google Search,"Arxiv,College/University,Conferences,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,Very useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,6 to 10 years,Researcher,University courses,10,15,35,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Other,Rarely,100GB,"Bayesian Techniques,Ensemble Methods","Julia,Jupyter notebooks,NoSQL,Python,R,SQL,Stan,Unix shell / awk",,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,,,,,,Sometimes,Sometimes,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Natural Language Processing,Recommender Systems,Time Series Analysis",Often,,Often,,Sometimes,Sometimes,Most of the time,,Most of the time,,,,,,,,,,Often,,,,,Most of the time,,,,,,Often,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,,,,,,,,Often,Sometimes,Often,,51-75% of projects,More external than internal,Central Insights Team,depends on the project,noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,"60,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Stack Overflow Q&A,Textbook",,,,,,Very useful,,,,,,,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",15,25,30,25,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,10GB,"Ensemble Methods,Random Forests,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,TIBCO Spotfire",,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Rarely,,,,Often,,Often,,,,,,,,,,,,,,Most of the time,,,,,"A/B Testing,Cross-Validation,Naive Bayes,Natural Language Processing,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Often,,,,,,,,,,,,Sometimes,Often,,,,,,,,,Often,Often,Often,,,,40,20,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization",Sometimes,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Rarely,25000,EUR,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Sweden,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,SQL,"Ensemble Methods (e.g. boosting, bagging)",Python,"Government website,University/Non-profit research group websites","Arxiv,Blogs,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,,,,Very useful,,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,6 to 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,15,80,0,5,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,A master's degree,Mix of fields,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Image data,,100GB,CNNs,"Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,GANs,Neural Networks,Segmentation",,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,40,50,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,750000,SEK,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,48,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Logistic Regression",Primary/elementary school,Other,"5,000 to 9,999 employees",,,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Rarely,1GB,"Bayesian Techniques,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation",,,,,,,Often,,,,,,,,,Sometimes,,Rarely,,,Often,,,,,,Often,,,,,,,20,20,5,30,25,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Limitations of tools",,,,,,Often,,,,,,,Most of the time,,,,,,,,,,51-75% of projects,Entirely internal,,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,,Never,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,Ireland,31,Employed full-time,,,Yes,,Other,Fine,Employed by non-profit or NGO,Microsoft Azure Machine Learning,Deep learning,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Non-profit,100 to 499 employees,Stayed the same,Less than one year,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,Other,"R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Data Visualization,Segmentation,Simulation,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,,Rarely,,,,,10,10,10,50,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,Sometimes,,,Sometimes,,,,,,,,,,,Sometimes,,,,,Sometimes,,100% of projects,More internal than external,IT Department,Census Data,"Dirty Data, staff not interested in typing right data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Other,Commercial Reporting Platform,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,40000,EUR,Other,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Conferences,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,Very useful,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",35,40,5,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,Sometimes,Sometimes,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,Sometimes,,,Sometimes,Sometimes,Most of the time,,,,,,,,,Sometimes,,Sometimes,Often,,Sometimes,,,Sometimes,,,,,Often,,,,,30,5,5,10,50,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,Most of the time,,,,Most of the time,,76-99% of projects,Entirely external,Standalone Team,,cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,800000,TWD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +A different identity,Indonesia,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,SQL,Google Search,"Blogs,Company internal community,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,,"Data Stories Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Programmer",Work,30,0,50,0,0,20,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Sometimes,1TB,"HMMs,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Hadoop/Hive/Pig,Impala,Python,SQL,Tableau,TIBCO Spotfire,Unix shell / awk",,,,,,,,Sometimes,Most of the time,,,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Often,,Sometimes,Sometimes,,,,"Association Rules,Collaborative Filtering,Data Visualization,HMMs,Logistic Regression,Random Forests,Time Series Analysis",,Often,,,Often,,Most of the time,,,,,,Sometimes,,,Often,,,,,,,Sometimes,,,,,,,Often,,,,40,20,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,Often,Sometimes,,,,,Sometimes,,,,,Often,Most of the time,,,26-50% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,530000000,IDR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Stack Overflow Q&A,Textbook",,,,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",70,20,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",I prefer not to answer,Insurance,I don't know,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Text data,Rarely,10GB,"Decision Trees,Ensemble Methods,Other","Amazon Web services,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Text Analytics",,,,,,Most of the time,,,Sometimes,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,30,50,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,Office for National Statistics demographic data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,GBP,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Spain,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Deep learning,R,,"College/University,Online courses",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,20,40,10,0,0,Reinforcement learning,Logistic Regression,"Some college/university study, no bachelor's degree",Academic,100 to 499 employees,,Don't know,Some other way,Not very important,Other,Basic laptop (Macbook),Relational data,Never,<1MB,"Bayesian Techniques,Other",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis,Other",,,,,,,Most of the time,Rarely,,,,,,,,Often,,,,,Often,,,,,,,,,Sometimes,,,Most of the time,30,30,0,10,30,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,Often,,Often,,,,Often,,,,,Often,,Often,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),,16000,EUR,Has decreased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,SAP BusinessObjects Predictive Analytics,Neural Nets,Other,I collect my own data (e.g. web-scraping),"Blogs,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,Very useful,,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,,"Data Machina Newsletter,Data Stories Podcast,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer",Self-taught,30,0,70,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"CNNs,GANs,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Java,Mathematica,NoSQL,Python,RapidMiner (free version),SAS Base,SQL",,Sometimes,,Often,,,,,,,,,,,Often,,,,,Often,,,,,,,Sometimes,,,,Most of the time,,,,Rarely,,,Most of the time,,,,Most of the time,,,,,,,,,,"CNNs,Collaborative Filtering,GANs,HMMs,Logistic Regression,Neural Networks,Prescriptive Modeling,Recommender Systems,RNNs,SVMs,Text Analytics",,,,Often,Sometimes,,,,,,Most of the time,,Most of the time,,,Most of the time,,,,Most of the time,,Often,,Sometimes,Most of the time,,,Often,,,,,,10,10,20,10,50,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",Often,Sometimes,,,Often,,,,,Most of the time,,,,,,,Sometimes,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,90000,,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Kaggle,Newsletters,Online courses,Personal Projects",,Very useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,,,,,,"Data Elixir Newsletter,DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,,University courses,25,50,3,10,10,2,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,100 to 499 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,Often,,,,,,,,,,,,,Rarely,,,,"Association Rules,Cross-Validation,Decision Trees,kNN and Other Clustering,Random Forests,Segmentation,SVMs",,Sometimes,,,,Often,,Often,,,,,,Often,,,,,,,,,Often,,,Often,,Often,,,,,,50,10,15,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,Most of the time,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Less than 10% of projects,More internal than external,Other,,Prefer to not say.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"335,000",INR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,NoSQL,,Python,Google Search,"Newsletters,YouTube Videos",,,,,,,,Somewhat useful,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Workstation + Cloud service,,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,58,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Employed by non-profit or NGO,Employed by government,Self-employed",I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",Other,"Government website,I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Python,R,Unix shell / awk",,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,SVMs,Text Analytics",,,,,,Often,Most of the time,,Sometimes,,,Sometimes,,,Often,Often,,,,,,,Most of the time,,,,,Sometimes,Rarely,,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Often,Often,,Sometimes,,Most of the time,,Often,Most of the time,,,Sometimes,,Most of the time,,Sometimes,Most of the time,Sometimes,,Often,Often,Often,100% of projects,Entirely internal,Other,"NDACAN AFCARS; NDACAN NCANDS; Census, ACS; CDC NCHS",Confidentiality and record linkage,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Globus,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,Genetic & Evolutionary Algorithms,R,GitHub,"College/University,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Other",Work,0,10,40,50,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Relational data,Other",Sometimes,1MB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Naive Bayes,Neural Networks,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,Sometimes,,Most of the time,,,,,,Often,,,,Most of the time,,,,30,10,0,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",Often,Often,,,,,,,,,,,,,,Sometimes,,,,,,,76-99% of projects,Entirely internal,IT Department,forex rates ,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,"55,000",EUR,Other,7,,,,,,,,,,,,,,,,,, +Male,Israel,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Conferences",Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,"Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",Self-taught,50,10,40,0,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Video data,,1TB,"CNNs,Neural Networks,RNNs","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Other",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,"CNNs,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis,Other",,,,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,,0,0,0,0,0,100,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,None,,,,,,,,Git,Always,,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Official documentation,Online courses,Textbook",Very useful,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,20,20,50,0,10,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,10 to 19 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Relational data,Other",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,Rarely,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",Sometimes,,Often,Often,Sometimes,Most of the time,Most of the time,Most of the time,Sometimes,,Often,Most of the time,Sometimes,Often,,Often,Often,Often,,Most of the time,Sometimes,,Often,,Often,,,Sometimes,,Most of the time,,,,20,40,20,15,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning",,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,Google finance; Yahoo finance; DrugBank; PDB,Develop pipeline for drug design and development based on ML/DL algorithms.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,120000,RUB,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",15,15,30,10,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,10MB,"CNNs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Text Analytics",Sometimes,,Sometimes,,,Often,Most of the time,,,,,Often,,,,Often,,,Sometimes,Often,,,Often,,,Often,,,Often,,,,,10,5,5,30,50,0,"Enough to code it again from scratch, albeit it may run slowly","I prefer not to say,Other",,,,,,,Rarely,,,,,,,,,,,,,,,Sometimes,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",S3,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,36000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Hungary,20,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Weka,Deep learning,Python,GitHub,"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,,,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Researcher,Self-taught,80,15,5,0,0,0,Reinforcement learning,Neural Networks - CNNs,High school,Technology,Fewer than 10 employees,,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Never,,Neural Networks,"Java,Python,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,"Neural Networks,Text Analytics",,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,50,20,5,10,15,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,,,Often,,None,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Rarely,60000,SGD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +A different identity,Spain,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Deep learning,Matlab,Google Search,"Company internal community,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,Often,,,,,,,,,,"Cross-Validation,Decision Trees,Random Forests,Segmentation",,,,,,Sometimes,,Often,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,10,50,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Privacy issues",Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,45000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,,,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,15,15,65,5,0,0,"Computer Vision,Machine Translation","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data",Most of the time,10GB,"CNNs,Neural Networks,RNNs","NoSQL,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,Sometimes,,,,,,,,,,,,,,,Often,,,,"CNNs,Neural Networks,RNNs",,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,30,35,30,5,0,0,Enough to run the code / standard library,"Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,,Most of the time,Most of the time,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,48000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Social Network Analysis,Python,"GitHub,Government website","Blogs,College/University,Friends network,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,A social science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Reinforcement learning,Unsupervised Learning",Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important +Male,Pakistan,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,I collect my own data (e.g. web-scraping),College/University,,,,,,,,,,,,,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,I haven't started working yet",University courses,30,15,5,40,10,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A doctoral degree,Academic,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,10GB,Neural Networks,TensorFlow,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Most of the time,Often,Most of the time,Often,,,Often,,Often,,Often,,Sometimes,Often,Often,,,Often,,,,,Often,Often,,,,,50,30,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Privacy issues",,,,Sometimes,,,,,Often,,,,,,,,Often,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,Kaggle ,Data cleanings ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,"15,000",,Other,5,,,,,,,,,,,,,,,,,, +Male,United States,60,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,Perfectly,Employed by professional services/consulting firm,IBM SPSS Statistics,Deep learning,Python,"Google Search,Government website","Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,,3-5 years,,,Necessary,,Necessary,Necessary,,,,Necessary,,,,"Coursera,edX,Udacity",Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,,Doctoral degree,Other,I don't write code to analyze data,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,Very Important,,Very Important,Very Important,,,,,,,,,,, +Male,Netherlands,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Factor Analysis,R,Government website,"Blogs,College/University,Conferences,Kaggle,Online courses,Textbook,Tutoring/mentoring",,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,Very useful,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,20,40,10,0,10,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Increased slightly,3-5 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,100MB,Regression/Logistic Regression,"R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,Often,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Segmentation",,,,,,Often,Often,,,,,,,Most of the time,,Most of the time,,,,,,Often,,,,Most of the time,,,,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,More internal than external,Other,DUO; CBS,Cleaning; combining,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,"70,000",EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,,10MB,"Bayesian Techniques,Decision Trees,Random Forests,SVMs","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Decision Trees,Naive Bayes,Random Forests,SVMs",,,Sometimes,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,,40,20,20,10,10,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Bitbucket,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,,Very useful,,Somewhat useful,,,Very useful,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Data Analyst,Work,10,0,50,0,40,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation",,Often,,,Often,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,,Most of the time,Most of the time,,,,,Sometimes,Often,Often,Often,,Often,,,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Sometimes,,Often,,,,,,,,,,,,Often,,,,,Most of the time,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1250000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,Somewhat useful,,,,,,Very useful,Very useful,Very useful,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",15,50,20,10,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,IBM SPSS Statistics,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,NoSQL,Orange,Python,R,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,,,,,,Rarely,,,Often,,,Often,,Sometimes,,Rarely,,Rarely,,,,,,Often,,Rarely,,Most of the time,,Most of the time,,Rarely,,,,,,Rarely,Sometimes,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,,Most of the time,Often,Often,Most of the time,Most of the time,,Sometimes,,Sometimes,,,,Sometimes,Often,Often,Rarely,,Most of the time,,Often,,,Rarely,Sometimes,Rarely,,,,40,35,10,15,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,Most of the time,,,Sometimes,,,,,Rarely,,Rarely,,,Sometimes,Often,Sometimes,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Git,Subversion",Rarely,20000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Nigeria,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,85,0,0,5,10,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",Rarely,100MB,"CNNs,Neural Networks,RNNs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Neural Networks,RNNs",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,5,20,5,15,55,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,,NGN,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +A different identity,Finland,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Somewhat useful,Very useful,,,,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Researcher,Self-taught,30,0,50,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,500 to 999 employees,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Traditional Workstation",Image data,Never,1GB,"CNNs,Neural Networks,Other","C/C++,Julia,Jupyter notebooks,Python,R,Other",,,,Often,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,,,,,,,,,Most of the time,,,,,,Sometimes,Often,,,,,,,,,,,,,2,80,0,8,10,0,Enough to refine and innovate on the algorithm,"Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,,,,,Often,,Sometimes,,,,,Often,,Sometimes,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Egypt,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,Unnecessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,Udacity",Traditional Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,GitHub,"Blogs,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,,,Very useful,Somewhat useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,,Necessary,Necessary,,Necessary,,,Necessary,Nice to have,,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,C/C++,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,Very useful,Very useful,Not Useful,,Very useful,Not Useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Kaggle competitions,40,0,15,15,30,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Academic,20 to 99 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Never,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,SQL",,Rarely,,,,,,,,,,,,,,,Often,,,,,,Rarely,,,,,,,,Often,,Most of the time,,,,,Rarely,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation",,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Often,,,,Sometimes,,,,,Sometimes,,Most of the time,Sometimes,,,Often,,,,,,,10,0,0,10,30,50,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,Sometimes,,,,Often,,,,,,Often,,,,,,,,100% of projects,Entirely internal,Standalone Team,Some public dataset from french government,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Rarely,35000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,25,0,15,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,Sometimes,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Often,Sometimes,,,,,Often,,Most of the time,,,Most of the time,,Often,,Often,Often,,Most of the time,Sometimes,Often,Most of the time,Often,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,,,,,Often,Most of the time,,,,Most of the time,,,Most of the time,,Often,,,76-99% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",Flash drive,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,"93,600",BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,22,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Perl,Link Analysis,SQL,Google Search,Arxiv,Very useful,,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,10,40,10,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,A bachelor's degree,Financial,20 to 99 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Lift Analysis,Logistic Regression,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,Often,,,,,,,,,Most of the time,Often,,,,,,,Most of the time,,,Most of the time,,Sometimes,Sometimes,,,,,50,30,10,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Explaining data science to others,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,Often,,Often,,,,,,,,,Often,Often,,,,,,,51-75% of projects,More external than internal,Central Insights Team,,,,Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,500,ARS,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Spark / MLlib,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,YouTube Videos",,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Work,40,20,35,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Telecommunications,"5,000 to 9,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,Often,Often,,,,,Most of the time,,,,,,Often,,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,Rarely,,,,Often,Most of the time,Often,Sometimes,,,Sometimes,,Often,Often,Most of the time,,Often,,,Most of the time,,Often,Sometimes,,,,Sometimes,,,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,,,Most of the time,,,,,Rarely,,,Often,,,,51-75% of projects,More internal than external,Central Insights Team,Survey Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,100GB,"Bayesian Techniques,Neural Networks,Random Forests","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",Sometimes,Most of the time,,,,,,,Most of the time,,,,,,Most of the time,,Often,,,,,,,,,,Most of the time,,,,Often,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests",Most of the time,,Often,,,Often,Most of the time,,,,,,,Often,,Most of the time,,Most of the time,Most of the time,Often,,,Most of the time,,,,,,,,,,,25,5,50,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Most of the time,,Sometimes,Often,,Often,Often,,Most of the time,,,,,Often,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,N/A,"Clarity, Sparseness and labelling challenges","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,9000000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Time Series Analysis,SQL,Google Search,"Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Business Analyst,University courses,10,20,20,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches",A bachelor's degree,Financial,20 to 99 employees,Increased significantly,1-2 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Naive Bayes,Random Forests",Sometimes,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,10,30,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database",,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,55000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Social Network Analysis,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences,Friends network,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Not Useful,Somewhat useful,,Not Useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,,Self-taught,90,0,0,10,0,0,"Computer Vision,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,Government,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Image data,Video data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Evolutionary Approaches,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Java,KNIME (free version),Mathematica,MATLAB/Octave,NoSQL,R,SQL",,,,Sometimes,,,,,,,,,,,Rarely,,,,Most of the time,Sometimes,Sometimes,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",Often,,Sometimes,Sometimes,,,Most of the time,Sometimes,,Sometimes,,,,Often,,Often,,,,Often,Most of the time,,,,Often,,,,Most of the time,Most of the time,,,,40,40,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,Most of the time,,Often,,,,Most of the time,,Most of the time,Most of the time,Sometimes,,,,,,Rarely,,,,100% of projects,More internal than external,Other,public scientific data,Noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,50000,EUR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,France,39,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,,"KDnuggets Blog,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,60,30,0,0,0,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Portugal,47,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Conferences,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",5-10 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Other",University courses,50,25,0,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Survival Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist",Self-taught,50,35,15,0,0,0,"Natural Language Processing,Time Series",Logistic Regression,A master's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,Most of the time,Sometimes,,,,,,Often,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,,Sometimes,,Sometimes,,,,,Most of the time,Sometimes,,Sometimes,Often,,,,50,10,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,,,,Often,,Often,,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,FDIC,lack of curation,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Germany,26,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,I haven't started working yet,University courses,20,40,10,30,0,0,Unsupervised Learning,"Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important +Male,Philippines,28,Employed full-time,,,No,Yes,Data Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,,,Very useful,,Very useful,Somewhat useful,,,,,,Very useful,KDnuggets Blog,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,A health science,I don't write code to analyze data,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Russia,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,,Somewhat useful,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher",Work,30,10,50,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Other,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,Rarely,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Rarely,Sometimes,,Often,Sometimes,,Sometimes,Often,Often,,Most of the time,Often,,Sometimes,,,Rarely,Often,,,,40,10,0,20,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,Sometimes,,,,Often,,,,,,,Most of the time,,,Rarely,,,Most of the time,,100% of projects,More internal than external,Standalone Team,weather conditions; flight radar 24 data,data reliability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,210000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Web services,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,Linear Digressions Podcast,Partially Derivative Podcast",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Bachelor's degree,A social science,,"Business Analyst,Data Analyst,Data Scientist,Other",Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Female,Germany,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",3-5 years,Necessary,,,,Necessary,,Necessary,Nice to have,Nice to have,,,,,"Coursera,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,"Computer Vision,Reinforcement learning","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ireland,35,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Newsletters,Official documentation,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Researcher,Software Developer/Software Engineer,Other",University courses,50,5,5,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Other,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Perl,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,,,,Most of the time,Most of the time,Sometimes,Often,,,Most of the time,Rarely,Often,,Often,,,,,Often,,Sometimes,,,Sometimes,Sometimes,Often,,,,,,60,10,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,,,,,,Sometimes,,Often,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,18000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed part-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,Unix shell / awk,Text Mining,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,Very useful,Very useful,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",5-10 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Computer Vision,Time Series","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,Ireland,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Brazil,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Data Scientist,Statistician",Work,20,40,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A professional degree,Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,SAS Enterprise Miner,SQL,Tableau,TensorFlow",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,Most of the time,,,Most of the time,,,Rarely,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation",,,,,,Often,Often,Most of the time,,,,Most of the time,,Often,,Most of the time,,,,Sometimes,,,Most of the time,,,Often,,,,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Often,Often,,Most of the time,Often,,,Most of the time,,,,,,,,Often,Sometimes,,10-25% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,130000,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Africa,46,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Monte Carlo Methods,Scala,University/Non-profit research group websites,"Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,70,20,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs",Primary/elementary school,Insurance,"1,000 to 4,999 employees",Decreased slightly,1-2 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests","Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Jupyter notebooks,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),NoSQL,Orange,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,Rarely,Most of the time,Often,Often,Often,,Most of the time,,Often,,Most of the time,,,,,Sometimes,,,Often,,Sometimes,,Most of the time,Sometimes,Most of the time,,,,,,,,Often,Most of the time,,,,Often,,,,,,"Bayesian Techniques,Data Visualization,Neural Networks,Random Forests,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,,,,,Often,,,Sometimes,,,,,,,Most of the time,,,,15,40,10,25,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,76-99% of projects,More external than internal,Business Department,Statistics SA,Memory,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"550,000",ZAR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Republic of China,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Regression,Python,Google Search,"Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,Very useful,Very useful,"Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,95,0,5,0,0,0,Computer Vision,Logistic Regression,Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,52,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Survival Analysis,R,GitHub,"Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,15,20,15,0,0,,,High school,Technology,"5,000 to 9,999 employees",,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Never,10GB,Regression/Logistic Regression,"SAS Base,SAS Enterprise Miner,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Most of the time,,,"Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Text Analytics",,Often,,,,Often,Most of the time,,,,,,,,,Often,,,,,,Often,,,,,,,Often,,,,,60,10,0,15,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools",Often,,,,Most of the time,Sometimes,,,,Rarely,,Sometimes,Sometimes,,,,,,,,,,76-99% of projects,Do not know,Other,,Reliability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,"120,000",USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Hong Kong,25,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Other",Self-taught,0,50,10,0,20,20,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",A doctoral degree,Retail,20 to 99 employees,Increased slightly,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,Bayesian Techniques,"Amazon Web services,Python,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Naive Bayes,Segmentation",,Often,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,80,10,5,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Sometimes,,,,Most of the time,Sometimes,,,Often,,,,,Often,,Most of the time,,,,,,,76-99% of projects,Entirely internal,IT Department,None,Data is not clean,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,300000,HKD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Google Search,"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,"DataTau News Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,6 to 10 years,"Data Scientist,Researcher",University courses,30,10,30,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Military/Security,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",,10TB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Rarely,,,Sometimes,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,70,10,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,Rarely,,Often,,,,,,,,,,Often,,Most of the time,Sometimes,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,150000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Professional degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Pakistan,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos,Other",Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Psychology,1 to 2 years,"Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",10,50,0,10,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,,,,,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Never,1GB,Gradient Boosted Machines,"Amazon Web services,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines",,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,20,20,0,20,40,0,Enough to run the code / standard library,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,76-99% of projects,Entirely external,Other,datasets on kaggle,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Git,Always,6000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,,"Arxiv,College/University,Friends network,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,,Very useful,,,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,"KDnuggets Blog,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,,University courses,5,15,20,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,,1MB,"CNNs,Decision Trees,Neural Networks,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Sometimes,,,Most of the time,Rarely,,,,,,Rarely,,Rarely,,,,Most of the time,Rarely,,Rarely,,,,,Rarely,,,,,,35,30,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Sometimes,Often,,,,,,,,Most of the time,,Sometimes,,,,,Most of the time,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"5,400",EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Minitab,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Stories Podcast,FlowingData Blog,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,Electrical Engineering,,Engineer,University courses,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Decision Trees - Gradient Boosted Machines,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,South Africa,26,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Deep learning,Python,Google Search,Textbook,,,,,,,,,,,,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,Self-taught,70,0,0,30,0,0,Time Series,Other (please specify; separate by semi-colon),A bachelor's degree,Financial,100 to 499 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Sometimes,10GB,Other,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,70,10,5,15,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,"Bloomberg, Morningstar, INet",Cleaning the data ,Other,Email,,Other,Don't know,20000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,10 to 19 employees,Stayed the same,1-2 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Segmentation,SVMs,Text Analytics",Often,Sometimes,Sometimes,,,Often,Most of the time,Sometimes,,,,Sometimes,,Sometimes,,Often,,Sometimes,,,,,Often,,,Often,,Often,Sometimes,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Often,Sometimes,,Often,,,,,Often,Often,,,,Most of the time,Sometimes,,,Most of the time,,,76-99% of projects,Entirely internal,IT Department,weather data,i don't have enough time to enhance my skills,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,490000,TND,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,52,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"I collect my own data (e.g. web-scraping),Other","Company internal community,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,50,0,30,20,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,500 to 999 employees,Stayed the same,6-10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Workstation + Cloud service",Text data,Most of the time,10TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Perl,SQL",,,,Often,,,,,,,,,,,,,Often,,,,Often,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Often,Often,Often,,,,Sometimes,,,,Often,,Often,Often,,,,Sometimes,,,Often,,Sometimes,Often,Rarely,,,,50,10,30,10,0,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,Sometimes,Sometimes,Often,,,,Sometimes,Sometimes,Sometimes,Sometimes,Often,,,,,,,,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Canada,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Personal Projects",,Very useful,Very useful,,,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",University courses,20,10,0,60,10,0,"Machine Translation,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Rarely,100GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,NoSQL,R,Spark / MLlib",,Often,,,,,,,Often,,,,Often,,Most of the time,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs",,Often,Often,Often,Most of the time,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,Often,Most of the time,Most of the time,Most of the time,Often,,,,,,,,,30,20,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,Entirely external,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Rarely,90000,CAD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Portugal,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Cloudera,Association Rules,R,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Kaggle,Online courses,Other,Other",,Not Useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",0,95,0,0,0,5,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Technology,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,10MB,Regression/Logistic Regression,"R,SQL,Tableau,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,,Often,Rarely,,"A/B Testing,Data Visualization,Segmentation",Rarely,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,30,5,40,25,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,Most of the time,,,Often,Most of the time,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,None,"Fictional data, without expression of the business.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Never,18500,EUR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,France,37,"Not employed, but looking for work",,,,,,,,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,"Basic laptop (Macbook),Other",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,Researcher,Other,60,30,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Not important,Somewhat important,Not important +Male,Other,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,SAP BusinessObjects Predictive Analytics,Text Mining,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Very useful,,Very useful,,,,,,,Somewhat useful,Very useful,,,Very useful,"Data Stories Podcast,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,0,2,18,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Academic,I prefer not to answer,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Sometimes,10GB,"CNNs,GANs,Neural Networks,Random Forests,RNNs","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,GANs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs",Often,,Most of the time,Often,Most of the time,Often,,Often,,,Often,,,,,Most of the time,,Often,Most of the time,Most of the time,Most of the time,,Often,Often,Often,,,Sometimes,,,,,,30,40,10,20,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,Often,,Sometimes,,,,,,,,Sometimes,,51-75% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Git",Sometimes,120000,LKR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Association Rules,Python,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,60,0,20,20,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Government,100 to 499 employees,Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Recommender Systems",,,,,,Often,Sometimes,Often,,,,,,,,Often,,Sometimes,,,,,,Most of the time,,,,,,,,,,20,30,20,10,20,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,Less than 10% of projects,More internal than external,IT Department,,Predicting clicks on links in a letter,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git",Rarely,2700000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,R,Proprietary Algorithms,R,"GitHub,Google Search",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Computer Vision,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,"5,000 to 9,999 employees",Decreased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Java,NoSQL,Python,R",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction",,Often,Often,,,,Most of the time,Most of the time,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,50,10,10,10,20,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,Often,,,,,,Most of the time,Often,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)",Commercial Data Platform,,Bitbucket,Most of the time,1900000,INR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,DataRobot,,,,"Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Data Analyst,Machine Learning Engineer,Software Developer/Software Engineer",University courses,10,10,20,10,20,30,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,Technology,"10,000 or more employees",Decreased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Image data,Text data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,NoSQL,R",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Recommender Systems,Time Series Analysis",,,,Sometimes,Most of the time,Often,,,,,,,,Most of the time,,Often,,,Often,,,,,Often,,,,,,Often,,,,10,20,20,11,20,19,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,10-25% of projects,Entirely internal,IT Department,,,Graph (e.g. GraphBase/Neo4j),Email,,,,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,,,,Somewhat useful,Somewhat useful,,1-2 years,,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Professional degree,,Less than a year,Software Developer/Software Engineer,University courses,0,20,0,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Greece,28,"Not employed, but looking for work",,,,,,,,IBM SPSS Modeler,Deep learning,R,University/Non-profit research group websites,"College/University,Company internal community,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Statistician","Online courses (coursera, udemy, edx, etc.)",1,4,10,80,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Romania,36,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,Somewhat useful,Somewhat useful,Not Useful,,,,,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,35,25,0,0,0,"Recommendation Engines,Time Series",Other (please specify; separate by semi-colon),High school,Internet-based,20 to 99 employees,Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"Decision Trees,Random Forests","Google Cloud Compute,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Cross-Validation,Decision Trees,Segmentation,Text Analytics,Time Series Analysis",Most of the time,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,Often,,,,20,40,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,Most of the time,Sometimes,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,76-99% of projects,More internal than external,Business Department,GeoIP,huge volume in spikes,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",open source data visualization,"Git,Other",Rarely,118000,RON,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,France,70,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,41,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Other,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Very useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Jack's Import AI Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,20,0,20,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Hospitality/Entertainment/Sports,Fewer than 10 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100GB,"Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Text Analytics",Sometimes,,,Sometimes,Most of the time,,,,,,,,,,,,,,Often,Most of the time,,,,Most of the time,Most of the time,,,,Often,,,,,55,10,20,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Often,Most of the time,Often,,,,,,,,,,,Often,Most of the time,,,Often,,26-50% of projects,Entirely internal,Standalone Team,none,"automate the workflow, namely the data acquisition, pre-processing, storage, transformation to proper form for the ML model, train and deploy the ML model and then tie the data workflow to the specific version of the model to perform the inference. ","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Google Cloud Storage,Git,Sometimes,80000,GBP,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,R,Neural Nets,R,I collect my own data (e.g. web-scraping),College/University,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Computer Scientist,Work,50,0,20,30,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,1GB,"CNNs,Decision Trees,Ensemble Methods,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,Impala,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Often,Often,Often,Most of the time,Most of the time,Most of the time,Often,,Most of the time,Often,Often,,,Often,,Often,Often,Most of the time,Most of the time,Often,Often,Often,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,30,20,30,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Git",,250000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,24,Employed part-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Other,No,Master's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,Self-taught,50,50,0,0,0,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,Poland,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Cluster Analysis,R,I collect my own data (e.g. web-scraping),"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler",University courses,20,10,30,40,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A master's degree,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,100GB,Regression/Logistic Regression,"Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,Often,,,,Most of the time,,,,Often,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,Sometimes,,,,,,,,,,,,,,Often,,,100% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Rarely,66000,PLN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,45,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Weka,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,55,5,0,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,100 to 499 employees,,,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Most of the time,,,,Sometimes,,,,Sometimes,,,Rarely,,Sometimes,,Most of the time,,,Often,,Often,Rarely,Sometimes,,,,50,20,10,15,5,0,Enough to tune the parameters properly,"I prefer not to say,Organization is small and cannot afford a data science team",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,76-99% of projects,Approximately half internal and half external,Business Department,Government public data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,"Git,Other",Rarely,,,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Female,Turkey,29,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Matlab,Google Search,"Arxiv,Blogs,College/University,Friends network,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,10,60,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Most of the time,,"Bayesian Techniques,Ensemble Methods,HMMs,Neural Networks,SVMs,Other","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,HMMs,kNN and Other Clustering,Naive Bayes,Neural Networks,Segmentation",,,Often,,,,,,,,,,Sometimes,Most of the time,,,,Often,,Often,,,,,,Most of the time,,,,,,,,20,50,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Often,,,,,,,,,Often,Often,,,Often,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,54000,TRY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,DataRobot,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,,,,Somewhat useful,Very useful,"DataTau News Aggregator,Siraj Raval YouTube Channel,The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,DBA/Database Engineer,Other,30,20,0,0,0,50,Other (please specify; separate by semi-colon),Logistic Regression,A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,GitHub,"Blogs,Friends network,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"Data Elixir Newsletter,DataTau News Aggregator,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,0,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,Tableau,TensorFlow",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,,Rarely,Sometimes,,,,,Sometimes,,Sometimes,Rarely,,Rarely,,,,Most of the time,,,,50,20,20,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,Often,,,Often,,Often,Sometimes,,Often,,,,,,,Most of the time,Most of the time,,Most of the time,,100% of projects,Entirely internal,Other,,"Sourcing and cleaning, also organization collaboration & single view of data. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,20000,AED,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +A different identity,Netherlands,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,40,0,20,40,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs",A master's degree,Academic,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Never,10MB,Decision Trees,"Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),Python,R,RapidMiner (free version),SQL,Tableau",,Sometimes,,,,,,,,,,,Rarely,,,,Sometimes,,Rarely,,,,,,,,,,,,Sometimes,,Rarely,,Sometimes,,,,,,,Often,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,Rarely,Most of the time,Often,,Rarely,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,,Rarely,Sometimes,,,,,10,10,0,60,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Often,,,100% of projects,Entirely internal,IT Department,Various Kaggle datasets,Visualization,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,"Git,Mercurial",Sometimes,50000,EUR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Ukraine,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Perfectly,Self-employed,IBM SPSS Statistics,Text Mining,R,Google Search,"Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,50,0,50,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM SPSS Statistics,R",,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,Often,Often,Often,,Most of the time,,Most of the time,Often,,,,,Often,Often,Most of the time,,Often,,Most of the time,Often,Often,Often,,,Most of the time,Often,,,Most of the time,,,,50,5,35,0,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization",,Sometimes,,,Most of the time,Often,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,"100,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Text Mining,Python,Google Search,"Blogs,College/University,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,,,,Other,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,,,University courses,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,South Africa,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Personal Projects,Stack Overflow Q&A,Other",Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,Somewhat useful,,,,,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Engineer",Self-taught,25,25,25,25,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,Often,,,Most of the time,,,,,,,,,Sometimes,,,,Most of the time,Most of the time,Often,,,,,,,,Sometimes,,,,15,25,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,Sometimes,,,,,,,,,,,,Sometimes,Often,,100% of projects,Entirely internal,Standalone Team,,Aligning and merging separate data sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,570000,ZAR,Other,9,,,,,,,,,,,,,,,,,, +Male,Greece,34,Employed full-time,,,No,Yes,Other,Poorly,Employed by government,TensorFlow,Genetic & Evolutionary Algorithms,Python,Government website,"Arxiv,College/University,Conferences,Online courses",Very useful,,Very useful,,Very useful,,,,,,Very useful,,,,,,,,"FastML Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Workstation + Cloud service,2 - 10 hours,PhD,Yes,Master's degree,Computer Science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,20,0,80,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,Ireland,50,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,3 to 5 years,"Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",90,0,0,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Very Important,Not important +Male,Canada,55,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by government,Python,Time Series Analysis,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects",,Very useful,,,,,,,,Very useful,Somewhat useful,Very useful,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Predictive Modeler,Software Developer/Software Engineer,Statistician",University courses,75,10,10,5,0,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Government,"5,000 to 9,999 employees",Stayed the same,More than 10 years,A tech-specific job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,"Bayesian Techniques,Neural Networks","C/C++,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Minitab,NoSQL,R,SQL",,,,Most of the time,,,,,,,,,,,Often,,,,,Sometimes,Often,,Most of the time,Often,Often,,Most of the time,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Neural Networks,Segmentation,Simulation,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,Often,,Sometimes,,Often,,,,,,Often,Often,,,Most of the time,,,,40,40,0,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,Cannot disclose ,Cannot disclose,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Most of the time,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Spain,55,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other,Other,Other",,Very useful,,,,,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Researcher,Other",Work,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Mix of fields,Fewer than 10 employees,Decreased slightly,6-10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs,Other","R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,Often,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,,Often,Sometimes,,Often,Often,Often,Sometimes,Often,,Often,,Sometimes,,Often,,,,Sometimes,Sometimes,Sometimes,Often,Often,,Often,Sometimes,Sometimes,,Often,,,,40,30,10,5,5,10,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,Most of the time,,Sometimes,,,,,Most of the time,,,,,,,100% of projects,Do not know,Other,Data owned by retailers,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,"50,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Germany,24,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Julia,Jupyter notebooks,NoSQL,Orange,Python,R,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,Sometimes,,Rarely,Most of the time,,,,,,,,,,Sometimes,,Sometimes,,Most of the time,,Sometimes,Sometimes,Sometimes,,,,,,Most of the time,Most of the time,,,Sometimes,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Most of the time,,,,,,,Often,,,Often,,Often,Often,,,,,50,30,10,5,5,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Often,Most of the time,,,Sometimes,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Bitbucket,,"40,000",,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,Google Cloud Compute,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Researcher,University courses,25,25,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Decision Trees,Evolutionary Approaches,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Perl,Python,R,Spark / MLlib,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,Often,Often,,Most of the time,,,,,,,,Sometimes,,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",,,Sometimes,,,Most of the time,Most of the time,Often,,Often,,,Rarely,Rarely,,,,,,Sometimes,Sometimes,,Sometimes,,,,Sometimes,Sometimes,Sometimes,,,,,10,40,20,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,NA,NA,Other,Company Developed Platform,,"Git,Subversion",Rarely,1400000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Belgium,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Data Elixir Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,,"Business Analyst,Data Analyst,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,Time Series,Other (please specify; separate by semi-colon),A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important +Male,France,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Other,University courses,10,20,60,5,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Insurance,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,Rarely,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Often,Most of the time,,Sometimes,,,,,,Most of the time,Often,,,Often,Rarely,Sometimes,,Often,,,Often,,,Sometimes,Rarely,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Often,,,,Often,,,,,,,,,,,,Most of the time,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,,EUR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Other,48,Employed full-time,,,Yes,,Programmer,Poorly,Employed by government,Other,,Other,,"Other,Other",,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A professional degree,Other,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Other,Traditional Workstation,Text data,Never,100MB,,"Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Rarely,,,,,None,,,,,,,,,,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Company internal community,Conferences,Friends network,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",Very useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,"O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,10,30,40,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,Often,,,Often,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,,Sometimes,,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,Sometimes,,Often,,Sometimes,Most of the time,Most of the time,Often,,Often,,Most of the time,Often,,Sometimes,Most of the time,Often,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,Often,,,,,,Often,,Sometimes,Sometimes,,Sometimes,Often,,Rarely,,,100% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,250000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Africa,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer,Other",University courses,10,30,0,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Technology,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Relational data",Rarely,10GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,Often,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,Often,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,Rarely,,,,60,20,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,Sometimes,,,Often,,,,,,,,Often,,,Most of the time,Sometimes,,100% of projects,Entirely internal,Standalone Team,,Privacy concerns,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,70000,ZAR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Statistician",Self-taught,60,20,0,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,Don't know,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",,Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft SQL Server Data Mining,Python,QlikView,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,Rarely,Most of the time,,,,,Often,,,,,,,,,,,,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,,,,Most of the time,,,,,,Often,,Most of the time,,,,Sometimes,,,Sometimes,,,,,,Often,Most of the time,,,,50,20,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,10-25% of projects,Entirely internal,Standalone Team,none,NA,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,1300000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Spain,23,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,I haven't started working yet,Self-taught,30,70,0,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,South Africa,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Web services,Bayesian Methods,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,A health science,3 to 5 years,Researcher,Self-taught,70,30,0,0,0,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,"5,000 to 9,999 employees",,Don't know,"A friend, family member, or former colleague told me",Important,Other,Basic laptop (Macbook),"Text data,Relational data",,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,Rarely,Most of the time,Rarely,,,,,,,,Often,,,,,Sometimes,,Rarely,,,,Often,,Sometimes,Most of the time,,,,50,20,0,20,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,Often,Often,,,Often,,,,,Often,,,,,Sometimes,Often,Sometimes,,100% of projects,Do not know,,NHANES; BRFSS; NHS GP Practice Prescribing Presentation ,Getting to know the data. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,625000,ZAR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Other,"10,000 or more employees",Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL,TensorFlow",,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,Often,,Most of the time,,,,,,Often,,Sometimes,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,,,,,,Often,Often,Often,,,Often,,Often,,Often,,,,,,,Often,,,Often,Often,,,Often,,,,30,10,NA,0,60,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Often,,,,Often,Sometimes,,,Often,,,,,Sometimes,Often,,,,,,,,10-25% of projects,More external than internal,Business Department,,Talent,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,"FastML Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,20,10,0,40,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Russia,30,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Web services,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Company internal community,Conferences,Friends network,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,3 to 5 years,"Engineer,Programmer,Researcher",Self-taught,30,10,30,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A doctoral degree,Other,100 to 499 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests","Cloudera,Hadoop/Hive/Pig,Java,NoSQL,Python,Spark / MLlib,SQL,Other,Other",,,,,Rarely,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Segmentation",Most of the time,Sometimes,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,10,0,20,0,0,70,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Subversion",Sometimes,850000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,60,20,0,20,0,0,Natural Language Processing,"Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A master's degree,Other,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Sometimes,1GB,"Markov Logic Networks,Regression/Logistic Regression",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,"Cross-Validation,Markov Logic Networks,Natural Language Processing,Segmentation",,,,,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,,60,30,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization",Most of the time,Often,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Sometimes,35000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Speech Recognition,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Philippines,23,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Data Analyst,Data Scientist,Researcher,Statistician",Work,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,United Kingdom,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important +Male,Finland,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Kaggle competitions,0,0,40,40,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Rarely,,,"Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,TensorFlow",Often,Rarely,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,Often,,,,,Rarely,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Often,,Often,Often,Often,,,,Often,,Often,,Often,,,,Often,Often,,Often,,Often,,,,,Often,,,,60,20,10,10,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,600,EUR,Other,7,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Web services,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist",University courses,20,5,60,15,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Mix of fields,100 to 499 employees,Increased slightly,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Jupyter notebooks,KNIME (free version),Orange,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,Rarely,,Most of the time,,Sometimes,,Rarely,,,,,,,Sometimes,,,Rarely,Sometimes,,Sometimes,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Time Series Analysis",,Often,Rarely,,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,Sometimes,,Rarely,,Sometimes,Most of the time,Sometimes,Most of the time,Sometimes,Sometimes,Sometimes,,Most of the time,,Sometimes,,,,45,15,5,30,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,,,,,,Often,,,,,,Sometimes,,,Most of the time,Sometimes,,100% of projects,More internal than external,Standalone Team,Demographic data,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git,Subversion",Rarely,60000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Germany,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Regression,Julia,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Company internal community,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,Very useful,Very useful,,,,,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Programmer,University courses,80,0,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Other (please specify; separate by semi-colon),A doctoral degree,Academic,Fewer than 10 employees,Decreased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Other,,1GB,"Regression/Logistic Regression,Other","C/C++,Java,Julia,Jupyter notebooks,Python,R,RapidMiner (free version)",,,,Often,,,,,,,,,,,Rarely,Most of the time,Most of the time,,,,,,,,,,,,,,Sometimes,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction",,,,,,,Often,,,,,,,Most of the time,,,,,Sometimes,,Often,,,,,,,,,,,,,20,40,0,10,30,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Explaining data science to others,Unavailability of/difficult access to data",,Often,,,,Often,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,Other,MovieLens; UCI ML Repository; 20News dataset; FIMI Repository,Finding the correct model assumption.,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"47,000",EUR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Denmark,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),Other","College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other",,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,Researcher,Self-taught,50,30,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Other (please specify; separate by semi-colon),A master's degree,Pharmaceutical,500 to 999 employees,Stayed the same,Don't know,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Other",Relational data,Don't know,1GB,"SVMs,Other","MATLAB/Octave,R,SAS JMP,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,Rarely,,,,,,,Often,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,60,5,5,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,Most of the time,,,,Sometimes,,,,,,,,Often,,,Most of the time,Most of the time,,51-75% of projects,More internal than external,Standalone Team,open machine learning depositories ,"the data was not designed to be used for machine learning projects, it was sufficient to answer a particular question, but not more. This kind of data is in the majority. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,45000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Spain,NA,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,GitHub,"Friends network,Kaggle,Stack Overflow Q&A",,,,,,Very useful,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Stayed the same,More than 10 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,Often,,Often,,,Often,,Sometimes,,,,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,76-99% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,,60000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Belarus,22,Employed part-time,,,No,Yes,Engineer,Fine,Employed by college or university,Google Cloud Compute,Text Mining,Python,Google Search,"Arxiv,Conferences,Kaggle,Newsletters,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,,Self-taught,90,0,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,Netherlands,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,R,Deep learning,Python,"GitHub,Google Search","Arxiv,College/University,Conferences,YouTube Videos",Somewhat useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,Very useful,FastML Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,30,40,10,20,0,0,"Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression",,Academic,I don't know,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,100MB,,"C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Evolutionary Approaches,Logistic Regression,Simulation",,,,,Sometimes,Often,,,,Most of the time,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,10,50,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,,,Sometimes,,,,,Most of the time,,,51-75% of projects,More internal than external,IT Department,,,Graph (e.g. GraphBase/Neo4j),"Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,25000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,New Zealand,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,C/C++,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Not Useful,Very useful,Somewhat useful,,Not Useful,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",45,10,0,45,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,20 to 99 employees,Increased slightly,Less than one year,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Don't know,1GB,CNNs,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Rarely,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,kNN and Other Clustering,PCA and Dimensionality Reduction",,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,20,50,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,Most of the time,,Sometimes,Most of the time,,10-25% of projects,More external than internal,Standalone Team,Open Govt data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,45000,NZD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Pakistan,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by government,Hadoop/Hive/Pig,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Natural Language Processing,,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important +Male,India,21,"Not employed, but looking for work",,,,,,,,SQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,,,,,,,,,,,,,,Coursera,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,65,0,0,15,0,Computer Vision,Ensemble Methods,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,,Somewhat important +Male,Germany,27,Employed full-time,,,Yes,,Data Scientist,,Employed by professional services/consulting firm,DataRobot,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,Not Useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,,University courses,10,10,20,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Financial,20 to 99 employees,Increased significantly,1-2 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,,"Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,Unix shell / awk",,Rarely,,,Sometimes,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,,,Sometimes,,,,,Often,,,,,,,Often,,,,70,5,10,12,3,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,Sometimes,,,,,,,Often,Often,,,,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Git,Subversion",Rarely,50000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,South Africa,47,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Decision Trees,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,25,25,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",High school,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100MB,"Bayesian Techniques,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,RapidMiner (free version),SAS Base,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,Often,,,,,Rarely,,Often,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Time Series Analysis",Often,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,70,10,0,20,0,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,Often,,,,,,,Often,,,,Most of the time,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Never,650000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Ukraine,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,Very useful,,,Very useful,"FastML Blog,Jack's Import AI Newsletter,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,30,0,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,GitHub,"Arxiv,Blogs,College/University,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,20,20,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,Spark / MLlib",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,"Cross-Validation,Naive Bayes,Natural Language Processing,Text Analytics",,,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,,,,,20,10,40,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,Often,,,,,,,,Most of the time,,,Often,,Often,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,9999999,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Italy,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,TensorFlow,Bayesian Methods,Matlab,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,,University courses,10,10,0,70,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Government,"1,000 to 4,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Other",Sometimes,10GB,"Regression/Logistic Regression,SVMs,Other","C/C++,MATLAB/Octave",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,SVMs,Other",,,,Sometimes,,Most of the time,Most of the time,,,,,Often,,Sometimes,,Often,,,,,,,,,,,,Sometimes,,,Most of the time,,,20,50,10,10,10,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Need to coordinate with IT",,,,Often,,,,,,,,,,,Sometimes,,,,,,,,100% of projects,Do not know,,,,,,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,30000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,20,0,60,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,1GB,"Decision Trees,Random Forests,SVMs","Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,Rarely,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,SVMs",,,,,,,,Often,Often,,,,,,,,,,,,Often,,,,,,,Often,,,,,,10,30,0,30,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,100% of projects,Do not know,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Mathematica,Neural Nets,Matlab,I collect my own data (e.g. web-scraping),"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,"FastML Blog,FlowingData Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,30,0,20,0,0,"Speech Recognition,Unsupervised Learning",Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,22,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",15,65,20,0,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Female,India,27,"Not employed, but looking for work",,,,,,,,Python,Support Vector Machines (SVM),R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Researcher,Work,20,40,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Bayesian Techniques,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,France,29,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Official documentation,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Physics,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,30,0,0,40,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important,Not important,Somewhat important,Not important,Not important +Male,Spain,48,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Spark / MLlib,Random Forests,R,University/Non-profit research group websites,"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,Self-taught,60,20,0,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",,Academic,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,<1MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Julia,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,Rarely,,,Sometimes,,,,,,,"Association Rules,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",,Sometimes,,,,,Most of the time,Often,,Sometimes,,,,,,Sometimes,,,,,Often,,,,,Often,,,,,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Sometimes,,Rarely,,,,,,Sometimes,,76-99% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,60000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Germany,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,Somewhat useful,Not Useful,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,"Data Scientist,Programmer,Other",Self-taught,30,25,25,20,0,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,Java,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,Rarely,,,,Sometimes,,Most of the time,Sometimes,Rarely,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,Rarely,Rarely,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,SVMs,Text Analytics,Time Series Analysis",Rarely,,,,,Often,Often,Often,,,,Sometimes,,,,Often,,,,Sometimes,,,Sometimes,,,,,Often,Rarely,Sometimes,,,,40,10,0,20,30,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Sometimes,Rarely,,,,,,,,,Sometimes,Often,,,51-75% of projects,More internal than external,Standalone Team,would like historic twitter data,loading time on local machine,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Italy,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Jupyter notebooks,,,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,Reinforcement learning,,A bachelor's degree,Technology,,,,,Somewhat important,,,"Text data,Relational data",Rarely,,,"IBM Cognos,Java,Jupyter notebooks,Python,QlikView,SQL,Tableau",,,,,,,,,,Often,,,,,Often,,Sometimes,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,Often,,,Often,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,More internal than external,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,38,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Official documentation,YouTube Videos",Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Engineer,Self-taught,50,0,10,40,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,MATLAB/Octave,NoSQL,Python,R,SQL,Stan,Unix shell / awk",,,,Rarely,,,,,Rarely,,,,,,,,,,,,Rarely,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,,Often,Often,,,,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,SVMs,Text Analytics",Rarely,Sometimes,Often,,Often,Often,,Sometimes,,,,,,,,Often,,Sometimes,Most of the time,,Often,,,Often,,Sometimes,,Sometimes,Most of the time,,,,,50,20,10,20,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,Often,,,,Often,,,Often,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",,6000000,JPY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Other,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses",Very useful,Somewhat useful,,,Very useful,,Very useful,Very useful,Very useful,,Very useful,,,,,,,,"FastML Blog,Jack's Import AI Newsletter,No Free Hunch Blog",5-10 years,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,75,0,0,25,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,Belgium,40,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by a company that performs advanced analytics,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,Very useful,,Somewhat useful,,,,Very useful,Very useful,Not Useful,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,10,40,25,20,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Rarely,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Cloudera,Jupyter notebooks,KNIME (free version),Perl,Python,R,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,,,,Rarely,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,,,,,,Often,Sometimes,,Often,Often,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",Sometimes,,,,,Often,Most of the time,,,,,,,Sometimes,,Often,,,,,Most of the time,,,,,Sometimes,,,,Sometimes,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,,Most of the time,Sometimes,,Often,Sometimes,,Often,,,,Often,,Most of the time,,,Most of the time,,,26-50% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,59000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Spain,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Reinforcement learning,"Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",Primary/elementary school,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Neural Networks","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Most of the time,,Often,,,,Sometimes,,,,,,,,,,Often,,Rarely,,,,,,,,Sometimes,Most of the time,,,,Often,,Most of the time,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes,Neural Networks",,,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,Sometimes,,,,,,,,,,,,,,70,20,10,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",,Often,,,Most of the time,Often,,,Most of the time,,Most of the time,,,,,,,,,,,,None,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Sometimes,70000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,,Self-taught,40,3,50,5,2,0,"Computer Vision,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Other",,10GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs",,,,Most of the time,,Most of the time,,Rarely,,,,,,Sometimes,,Sometimes,,Rarely,,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Sometimes,,,,,,5,90,0,2,3,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,"360,000",BDT,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",5,40,20,5,15,15,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Not important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important +Male,India,26,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Deep learning,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,Other","College/University,Conferences,Newsletters,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,,Very useful,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,Data Analyst,Self-taught,80,0,20,0,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,10 to 19 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Always,100MB,"Bayesian Techniques,CNNs,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs","Java,MATLAB/Octave,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Recommender Systems,Segmentation,SVMs,Text Analytics",,Most of the time,Most of the time,,Most of the time,Sometimes,Sometimes,Most of the time,,,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,,Often,,Most of the time,,Most of the time,,Most of the time,Most of the time,,,,,60,20,10,2,8,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,Most of the time,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,700000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,South Korea,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,R,Deep learning,C/C++/C#,Google Search,"Arxiv,Conferences,Newsletters,Online courses,Textbook",Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",Self-taught,70,10,0,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Other (please specify; separate by semi-colon)",High school,Manufacturing,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Other",Most of the time,10GB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Python,R,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,Often,,,,,,"Association Rules,CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,RNNs,Simulation,Time Series Analysis",,Sometimes,,Sometimes,,Often,,,,,,,,Sometimes,,Sometimes,,,,Often,,Sometimes,,,Sometimes,,Often,,,Often,,,,50,20,10,0,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,Often,,,Often,,Often,,,,,Often,Often,,,Often,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Sometimes,75000000,KRW,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Male,Indonesia,25,Employed full-time,,,No,Yes,Other,Poorly,"Employed by college or university,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,KDnuggets Blog,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Management information systems,Less than a year,I haven't started working yet,Self-taught,30,20,0,15,5,30,,,I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Female,Germany,55,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,,,R,,"Official documentation,Online courses,Trade book,Tutoring/mentoring,YouTube Videos,Other",,,,,,,,,,Very useful,Somewhat useful,,,,,Very useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,Less than a year,"Researcher,Other",Other,10,5,10,0,0,75,Other (please specify; separate by semi-colon),,A doctoral degree,Academic,100 to 499 employees,Stayed the same,Don't know,Some other way,Not very important,,"Basic laptop (Macbook),Traditional Workstation",Text data,,100MB,"Markov Logic Networks,Regression/Logistic Regression,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks",,,,,,Sometimes,Most of the time,,,,,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,,,,,,,,,,,75,10,10,5,0,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Unavailability of/difficult access to data,Other",,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,Most of the time,Most of the time,100% of projects,Approximately half internal and half external,Standalone Team,"Scopus, Web of Science, science articles.",Getting it and cleaning it up. ,Other,"Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,35000,EUR,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Support Vector Machines (SVM),Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,90,0,10,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks",A master's degree,Financial,"5,000 to 9,999 employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,Regression/Logistic Regression,"C/C++,MATLAB/Octave,Python,R,SQL,Other",,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Prescriptive Modeling,Simulation,Time Series Analysis",,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,,Often,,,,,Most of the time,,,Most of the time,,,,0,85,1,4,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Limitations of tools",,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,We are a data vendor.,Finding relevant data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Mercurial,Sometimes,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed part-time,,,No,Yes,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,,Master's degree,"Information technology, networking, or system administration",Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,Time Series,Decision Trees - Gradient Boosted Machines,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Nigeria,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Self-employed,Oracle Data Mining/ Oracle R Enterprise,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,,Very useful,,,,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,"FlowingData Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",35,30,20,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",High school,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,100GB,"Decision Trees,Ensemble Methods,HMMs,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,Often,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Often,Most of the time,,,,Often,Sometimes,,Most of the time,,,,,,,Most of the time,,,,,Sometimes,Most of the time,Most of the time,,,,40,5,20,15,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,,,Often,,,,,,,,Most of the time,,,,Often,Most of the time,,100% of projects,More internal than external,Standalone Team,NBS;world bank;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Git,Most of the time,,,I am not currently employed,6,,,,,,,,,,,,,,,,,, +Male,Portugal,NA,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,Java,Deep learning,Python,GitHub,"Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Other",,,,,,,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,,,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,Programmer,Self-taught,60,20,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"1,000 to 4,999 employees",Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,,"Text data,Relational data",,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,45,0,15,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Most of the time,Sometimes,,,,,Often,,Often,,,Most of the time,,,,,,,100% of projects,Entirely external,Other,none,Getting what is really needed for the project out of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Always,21000,EUR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Regression,Java,,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",3-5 years,,Necessary,Necessary,,Necessary,Necessary,,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",2 - 10 hours,Github Portfolio,Yes,Master's degree,Mathematics or statistics,,"Computer Scientist,Predictive Modeler,Statistician",Kaggle competitions,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,,,Very Important,,,,Very Important,,,,Very Important,,, +Male,Other,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Other,Python,GitHub,"College/University,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,,,,,Very useful,,Very useful,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,Necessary,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,5,10,0,5,"Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,,Somewhat important,,Somewhat important +Male,India,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,,Nice to have,,,Necessary,Necessary,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,,Engineer,University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,Decision Trees - Gradient Boosted Machines,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Female,Germany,26,Employed part-time,,,Yes,,Statistician,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Random Forests,R,GitHub,"Blogs,College/University,Friends network,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Not Useful,,,Not Useful,,,,,,,,Not Useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,0,0,40,50,10,0,Survival Analysis,Logistic Regression,A master's degree,Academic,100 to 499 employees,,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,<1MB,Regression/Logistic Regression,"R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,65,30,NA,5,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Rarely,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,15600,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Monte Carlo Methods,R,University/Non-profit research group websites,"Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A health science,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,25,10,25,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Rarely,100GB,"Bayesian Techniques,Regression/Logistic Regression","Java,Jupyter notebooks,Perl,Python,R,Stan,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,Rarely,Sometimes,,Most of the time,,,,,,,,,,Rarely,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction",,,Often,,,Sometimes,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,75,10,0,10,5,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues,Other",,,,,Often,Often,,,,,,,,,Sometimes,,Sometimes,,,,,Often,76-99% of projects,More external than internal,Other,NCBI::GEO; TCGA,Storage,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Other",mailing harddrives; wget / ftp,"Bitbucket,Git",Rarely,39000,GBP,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Taiwan,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",50,20,0,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,500 to 999 employees,Decreased slightly,3-5 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Sometimes,10MB,"CNNs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Often,Most of the time,,,,,Sometimes,,,,Often,,,,,,,Most of the time,,,,,,,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Other","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",35,55,0,5,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important +Male,Germany,23,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,30,10,40,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,India,24,"Not employed, but looking for work",,,,,,,,IBM Cognos,Genetic & Evolutionary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,30,0,0,0,40,Recommendation Engines,"Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Israel,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,Python,Other,"College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A master's degree,Other,"1,000 to 4,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,,,,,,,Rarely,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,Often,,,Rarely,,,,50,10,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,Sometimes,Rarely,Often,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,Sometimes,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,117600,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,South Korea,29,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Very useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Mathematics or statistics,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Hidden Markov Models HMMs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,Very useful,,,Very useful,,Very useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,,"Data Analyst,Predictive Modeler,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Time Series","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Ireland,47,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Uplift Modeling,Python,University/Non-profit research group websites,"Kaggle,Official documentation,Online courses,Personal Projects,Tutoring/mentoring",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,,,,DataCamp,"Basic laptop (Macbook),Other",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Other,Self-taught,60,0,20,0,20,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Markov Logic Networks",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,,SQL,Government website,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,Data Analyst,Work,100,0,0,0,0,0,,,High school,Telecommunications,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Always,10TB,,Tableau,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,10,50,10,20,10,0,Enough to run the code / standard library,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,,Sometimes,64000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Portugal,31,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Amazon Machine Learning,Random Forests,Python,Google Search,"Arxiv,Blogs,College/University,Conferences,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,"FastML Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,15,5,50,30,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Image data,Rarely,10GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,SVMs","C/C++,IBM SPSS Statistics,Java,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,Rarely,,,Most of the time,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Most of the time,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,Sometimes,,,,,,Most of the time,Often,,,,,,,Often,,,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,Most of the time,,,,,,,Most of the time,Often,Sometimes,,Often,,51-75% of projects,More external than internal,IT Department,feret;celebfaces,Not enough,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Most of the time,11760,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,,,Very useful,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist",University courses,40,10,20,10,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Other,500 to 999 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,Gradient Boosted Machines,"Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation,Time Series Analysis",Most of the time,,Sometimes,,,Often,Most of the time,Often,Often,,,Often,,Often,,Sometimes,,,,Sometimes,,,Sometimes,,,Often,,,,Sometimes,,,,35,20,5,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,Sometimes,,,,,,,Rarely,,,10-25% of projects,Do not know,Standalone Team,android device list and specs,big amount,Other,Company Developed Platform,,Other,Sometimes,520000,RUB,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Israel,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Uplift Modeling,Java,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Scientist,Predictive Modeler,Programmer,Researcher",University courses,60,10,0,0,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A master's degree,Financial,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Ensemble Methods,Gradient Boosted Machines","Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,Lift Analysis,Natural Language Processing,Prescriptive Modeling,Text Analytics",,,,,,Most of the time,,Sometimes,,,,Most of the time,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,,,Sometimes,,,,,,,Sometimes,,,,,Often,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",12,75,0,7,6,0,,Decision Trees - Random Forests,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important +Male,Iran,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",Oracle Data Mining/ Oracle R Enterprise,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,Somewhat useful,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,FlowingData Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,10,10,50,30,0,Reinforcement learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,Decision Trees,"C/C++,Hadoop/Hive/Pig,Java,Mathematica,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,Sometimes,,,,,Rarely,,,,,,Most of the time,,,,,Most of the time,,,,Sometimes,Often,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Markov Logic Networks,Naive Bayes",Sometimes,,,,,Sometimes,,Often,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,10,30,40,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,Sometimes,,,Rarely,Sometimes,,,,,,,Often,,,,Rarely,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Subversion,Rarely,400000000,IRR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,69,Retired,,,Yes,,Statistician,Fine,Employed by government,Stan,Monte Carlo Methods,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Other,Self-taught,10,0,0,90,0,0,Time Series,Bayesian Techniques,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Decision Trees,Python,University/Non-profit research group websites,"Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Data Miner,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,0,25,25,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,Very useful,Very useful,Very useful,,,,,"Data Elixir Newsletter,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Other,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Other","Text data,Relational data",Most of the time,10GB,"Decision Trees,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,Often,,Often,,Often,Sometimes,,Rarely,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,Often,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Recommender Systems,Simulation,SVMs,Text Analytics",Often,,,,,Often,Most of the time,Often,,,,,,,,Most of the time,,,,,,,,Most of the time,,,Often,Most of the time,Most of the time,,,,,50,15,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data,Other",Often,,Often,,Often,Often,,,Often,,,,Often,Often,Often,,,,Often,,Often,Most of the time,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",Java,Deep learning,Python,Other,"Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,10,10,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis",,A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Not very important,Other,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Other,Never,10MB,,"Amazon Web services,Google Cloud Compute,Python",,Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,Rarely,0,0,0,100,0,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,Less than 10% of projects,Do not know,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Rarely,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Malaysia,24,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,26,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,Perl,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,0,100,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100GB,Regression/Logistic Regression,"Perl,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,10,20,10,30,0,Enough to run the code / standard library,"Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,Sometimes,,,,,,,Often,Sometimes,,,,10-25% of projects,More internal than external,Standalone Team,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Never,102000,MYR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Personal Projects,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Miner,Engineer,Predictive Modeler,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Increased significantly,3-5 years,A tech-specific job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Don't know,1TB,"Decision Trees,Neural Networks","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression",,,,,,Most of the time,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,10,60,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,Often,,,,Most of the time,,51-75% of projects,Entirely internal,Other,genomics data available online,Not enough data,Other,Other,cloud based,Bitbucket,Rarely,18000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,Other,29,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by government,IBM SPSS Modeler,Text Mining,Stata,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Personal Projects",,,Very useful,,Very useful,,,,,,,Very useful,,,,,,,,3-5 years,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,3 to 5 years,Business Analyst,University courses,10,10,25,45,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Julia,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Online courses,YouTube Videos",,Somewhat useful,,,Not Useful,,,,,,Very useful,,,,,,,Somewhat useful,"Partially Derivative Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,6 to 10 years,"Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Most of the time,100TB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,Perl,Python,R,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,Rarely,Rarely,Rarely,Sometimes,,,,,,,,,,,,,Often,Often,,Sometimes,,,,,,,,Often,,,,Often,Sometimes,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Segmentation,Text Analytics,Time Series Analysis",Most of the time,,,,,Sometimes,Most of the time,,,,,,,,,Often,,Rarely,Rarely,,,,,,,Sometimes,,,Sometimes,Most of the time,,,,60,10,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,Sometimes,,Sometimes,Rarely,,,,,,,,,,Most of the time,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,161000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,,,,,,,,,,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,Data Analyst,Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,Some other way,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data,Relational data",Rarely,1GB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Orange,Python,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Sometimes,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,Segmentation,Text Analytics",,,,Often,,Most of the time,Often,Often,,,,,,Often,,,,Sometimes,,Often,,,,,,Sometimes,,,Sometimes,,,,,20,20,0,20,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Often,,,,,,,,,,,Often,,,51-75% of projects,Approximately half internal and half external,Other,Imagenet,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Dropbox,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,150000,BRL,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Nigeria,24,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Very useful,,Very useful,,Very useful,,,Very useful,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,80,10,NA,NA,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Support Vector Machines (SVMs)",High school,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Neural Networks,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,200000,NGN,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,75,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,25,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Government website,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Time Series,Logistic Regression,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Female,Czech Republic,61,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,,Sometimes,Often,,Most of the time,,,,,,,,,,,30,25,10,25,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,Sometimes,,,,,,,Often,Often,,,100% of projects,More internal than external,Standalone Team,,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,900000,CZK,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,France,21,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Neural Nets,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Very useful,Very useful,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,,0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Brazil,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,,Very Important,Somewhat important,,Somewhat important,,,,,,,Somewhat important,Not important, +Male,Germany,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,,1-2 years,,,,,,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Spark / MLlib,Time Series Analysis,R,,"Company internal community,Online courses,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,Very useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",0,60,30,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,10 to 19 employees,Stayed the same,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,Often,Often,Rarely,,,,,,,,Rarely,,,,,,Most of the time,,,,,Often,,,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",,,,,Sometimes,,,,,,,,,Most of the time,Most of the time,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,60000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,40,20,10,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,RapidMiner (free version),SQL,Tableau",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,Often,,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Decision Trees,Prescriptive Modeling,Segmentation,Time Series Analysis",,Often,,,,,Most of the time,Often,,,,,,,,,,,,,,Often,,,,Often,,,,Often,,,,30,20,0,20,20,10,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,Occ and Opra data - public data for industry volume for financial markets.,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,72000,,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,France,26,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,,,Very useful,,,Very useful,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,,University courses,0,50,0,40,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"HMMs,Other","NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Natural Language Processing,Segmentation,Text Analytics",,,,,,Sometimes,Often,,,,,,Often,Often,,,,,Most of the time,,,,,,,Sometimes,,,Most of the time,,,,,70,10,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,,,Often,,,,,,Often,,Often,,,,Often,,,51-75% of projects,Approximately half internal and half external,Standalone Team,free of access language corpora,different types of text data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Statistician,Other",University courses,40,10,10,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Other",,,,,,,,,,,,,,Often,,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,Most of the time,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Often,Most of the time,Sometimes,Often,,,Often,,Often,,Often,,,Sometimes,,Often,,Often,,,Often,,Sometimes,Often,Often,,,,40,5,15,15,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Sometimes,,Often,Often,,,,,Often,,Sometimes,Often,Often,,,Sometimes,Sometimes,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,"35,500",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Greece,33,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,R,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,,Yes,Master's degree,Engineering (non-computer focused),,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,,"Company internal community,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Not Useful,Not Useful,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,5,30,65,0,0,"Natural Language Processing,Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,Don't know,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",Sometimes,Sometimes,,,Sometimes,Most of the time,Most of the time,Often,Often,,,,,Often,,Sometimes,,,Most of the time,,Most of the time,,Often,Most of the time,,Sometimes,,,Most of the time,,,,,20,5,10,25,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,Sometimes,Most of the time,Rarely,,,,,Rarely,,,Often,,,Sometimes,Sometimes,Most of the time,Most of the time,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,s3,Git,Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,Stan,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,,"Jack's Import AI Newsletter,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Video data,Relational data",Sometimes,10GB,"HMMs,Markov Logic Networks,Neural Networks,Other","C/C++,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",,,Sometimes,Rarely,,Often,,,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,Often,,,,50,10,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,Rarely,,Often,Sometimes,,,Sometimes,,,,,,,,,,Often,,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Git,Sometimes,45000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",No education,Financial,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Rarely,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Most of the time,Often,,,Most of the time,Most of the time,Most of the time,Often,,,,Sometimes,Often,,Most of the time,,Often,Often,,,,Most of the time,,,Most of the time,Sometimes,Often,Most of the time,Often,,,,40,10,0,30,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,Sometimes,Often,,,,,,,,,,Often,,,Most of the time,,Often,,,,26-50% of projects,Approximately half internal and half external,Business Department,Nilll,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,900000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Newsletters,Personal Projects",Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher",Self-taught,10,20,15,50,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Python,Spark / MLlib",,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Logistic Regression,Recommender Systems",Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,40,5,50,1,4,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Often,,Most of the time,Most of the time,Often,Often,,,Most of the time,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Sometimes,"85,000",GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Trade book,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,,,,Somewhat useful,Very useful,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),40+,PhD,Yes,Master's degree,Other,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important +Male,Australia,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,,Python,,"Blogs,Conferences,Friends network,Kaggle,Textbook",,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,,Very useful,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Microsoft Azure Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos,Other",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,"Business Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,Often,Often,,Sometimes,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,Sometimes,Sometimes,Often,Often,,Often,,,,,,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,Often,Often,Often,,Often,,,,20,30,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,Sometimes,,Often,,Sometimes,,,Often,,,,Often,Most of the time,,,,,,,Often,,51-75% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",Power BI,"Git,Subversion",Sometimes,"75,000",,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Philippines,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Stack Overflow Q&A",,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,"O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Traditional Workstation,Text data,Never,,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,30,0,0,50,20,0,Enough to run the code / standard library,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,Do not know,Business Department,,,,,,,Never,,,,5,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Other,Fine,Self-employed,Other,Other,Other,Other,"Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Not Useful,,Very useful,Somewhat useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Other",University courses,25,5,45,15,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,10 to 19 employees,Stayed the same,Don't know,Some other way,Important,Other,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,Often,,,,,,,,Often,,,,,,,,,,,Rarely,,,Most of the time,,Most of the time,,,,,,,,,Often,Rarely,,Rarely,Sometimes,,Sometimes,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,5,15,5,10,15,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,Sometimes,,,,,,Often,,,Often,,,Most of the time,Sometimes,,100% of projects,Do not know,Other,,getting it outta clients,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,,Git,Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Other,0 - 1 hour,Kaggle Competitions,No,Master's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Male,Indonesia,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,Somewhat useful,,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Yes,I prefer not to answer,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Personal Projects",Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"Data Machina Newsletter,FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,A health science,,"Data Analyst,Researcher",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,Other,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Very useful,,,Very useful,Somewhat useful,Somewhat useful,Not Useful,,Somewhat useful,Very useful,Very useful,,Not Useful,Not Useful,Not Useful,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,5,85,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - CNNs,A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Other","Image data,Video data",Most of the time,10GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Other,Other",,,,Often,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,Most of the time,Most of the time,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,Segmentation,Simulation,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,,,,,,,Often,,,,,,Most of the time,,,,,,Sometimes,Sometimes,,,Sometimes,,,,10,30,20,20,20,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,Often,,Often,,,,Sometimes,,,,Often,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Impala,Python,R,SQL",,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Ensemble Methods,Lift Analysis,Logistic Regression,Prescriptive Modeling,Segmentation",Most of the time,,,,,,,,Sometimes,,,,,,Often,Most of the time,,,,,,Often,,,,Often,,,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Hong Kong,33,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,R,Deep learning,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,80,0,20,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,CRM/Marketing,"5,000 to 9,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Python,R,SAS Base,SAS Enterprise Miner,SQL,Stan,Tableau,Other",,,,,,,,,Most of the time,Rarely,Rarely,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,Often,,,Most of the time,Sometimes,,Most of the time,,,,Most of the time,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",Often,Often,Sometimes,,,Most of the time,Most of the time,Most of the time,Sometimes,Rarely,,Sometimes,,Often,,Most of the time,,Sometimes,,Most of the time,Most of the time,,Most of the time,Often,,Most of the time,Sometimes,Rarely,,Sometimes,,,,50,30,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Limitations of tools",Sometimes,Often,,,Most of the time,,,,,,,,Often,,,,,,,,,,100% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,85000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important +Female,United Kingdom,25,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,Somewhat useful,Somewhat useful,Not Useful,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Sweden,43,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Other,Uplift Modeling,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Internet-based,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,Most of the time,,Rarely,,,,,,,Sometimes,,,Sometimes,Most of the time,,,Most of the time,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,Most of the time,,,,,,Sometimes,Often,Often,,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,,,35,40,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Sometimes,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,Most of the time,,,26-50% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Belgium,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Matlab,GitHub,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Other,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United Kingdom,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects,Tutoring/mentoring",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Management information systems,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Brazil,37,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by government,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,"Natural Language Processing,Recommendation Engines",Logistic Regression,Primary/elementary school,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Decision Trees,Random Forests,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,Sometimes,,Sometimes,Sometimes,,Often,,Sometimes,Sometimes,,,,Sometimes,Often,Often,,,,40,20,5,20,15,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,Malformed data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,50000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Sweden,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Software Developer/Software Engineer,Other",University courses,5,40,5,25,25,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Other,GPU accelerated Workstation,Image data,,,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,SQL",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,"CNNs,kNN and Other Clustering,Neural Networks",,,,Sometimes,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,,2,5,2,1,0,90,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Rarely,,,,,,,,,,,,,Rarely,,Often,,,,,Most of the time,,Less than 10% of projects,Approximately half internal and half external,Other,,The data itself is not released by the customer so all training must be done on site at customer.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Local File Share,"Bitbucket,Git",Rarely,524000,SEK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,30,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,Text Analytics",,,,,,Often,Often,Often,Often,,,,,,,Often,,,Often,Sometimes,,,Often,,Sometimes,,,,Often,,,,,10,10,50,0,10,20,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,,Often,Sometimes,,Sometimes,,,,,,,Most of the time,,26-50% of projects,Entirely internal,Other,word2vec,inconsistent labels,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,85000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,37,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important +Female,Kenya,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,70,5,20,0,0,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Insurance,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Rarely,Rarely,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,,,,,,,,Often,Sometimes,Often,Sometimes,Sometimes,,,Sometimes,,,,,,Sometimes,Most of the time,,,,20,30,10,20,20,0,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,Can't say,Some values cannot be explained. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,10000,USD,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer",Self-taught,60,10,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,"Some college/university study, no bachelor's degree",Non-profit,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","Microsoft Azure Machine Learning,R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,Sometimes,,,,Most of the time,,,,,,,Sometimes,,Often,,Sometimes,,,,,,Sometimes,,Often,,,,Sometimes,,,,40,15,10,20,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,Often,,,Often,,,,,,,Often,,,,Often,,,76-99% of projects,More internal than external,IT Department,Government Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,32650,GBP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Statistician",Self-taught,55,10,35,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SQL,TensorFlow",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,Often,,Most of the time,,Rarely,,,,,,,Often,,,,Often,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",Sometimes,Often,,,,Often,Most of the time,Often,Sometimes,,,,,,Sometimes,Most of the time,,,,,Most of the time,Most of the time,Often,,,,,Sometimes,,Most of the time,,,,50,20,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,Most of the time,Most of the time,,,Most of the time,Sometimes,Often,Often,Sometimes,Most of the time,,,,Sometimes,,Often,Sometimes,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,,Rarely,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Greece,32,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by government,TensorFlow,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Not Useful,Somewhat useful,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity",GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,15,10,10,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important +Male,Other,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Singapore,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,30,30,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Orange,Python,Spark / MLlib,SQL",,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,Sometimes,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics",,Often,,,,Often,Most of the time,Often,,Often,,Most of the time,,Often,,Most of the time,,Often,,,Most of the time,Most of the time,Often,,,,,Often,Often,,,,,40,15,10,15,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,Sometimes,Sometimes,Often,,,,Sometimes,Sometimes,,,,Rarely,,,,,Most of the time,Sometimes,,10-25% of projects,More external than internal,IT Department,,outlier,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,75000,SGD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Germany,37,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,10,20,10,0,"Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Retail,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,10MB,"Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,R",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Often,,,,Sometimes,Sometimes,,,,,,,,Often,Most of the time,,,,30,20,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,Sometimes,Sometimes,,,Often,,,,,,,Most of the time,Sometimes,,,Most of the time,Most of the time,,51-75% of projects,More internal than external,Business Department,"official statistic, others",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,200000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Other,Fine,Employed by government,Python,Time Series Analysis,R,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Blogs,College/University,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Psychology,1 to 2 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,High school,Government,I don't know,Increased slightly,Don't know,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Never,1MB,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,30,25,0,25,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,Often,,,,,Often,,26-50% of projects,Approximately half internal and half external,Business Department,UK public health and official statistics data,Publication formats - pdfs or spreadsheets not designed to be machine readable. Data quality. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,52000,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Other,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Other,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,35,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Machine Learning Engineer,Programmer,Researcher",Work,10,20,35,20,15,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,"10,000 or more employees",Increased significantly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,R,SAS Base,SQL,TensorFlow,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,Most of the time,,,,Rarely,,Most of the time,,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,Rarely,,,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,Sometimes,Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,,,,Rarely,,Often,,,,40,15,25,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,,,Sometimes,Often,,Most of the time,Often,Most of the time,,,,Often,,,Sometimes,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,40000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,Software Developer/Software Engineer,Self-taught,70,10,10,0,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,,"10,000 or more employees",Increased slightly,3-5 years,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data",Most of the time,10GB,"Decision Trees,Neural Networks","Hadoop/Hive/Pig,Java,Python,Spark / MLlib",,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"Association Rules,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,60,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",,,,,,,,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,Very useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,50,5,0,40,5,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,,,Most of the time,Sometimes,Most of the time,,Most of the time,,,,,,Often,,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,Most of the time,,,Often,,,Often,,,51-75% of projects,More external than internal,IT Department,"UCI Machine Learning Repository; Kaggle Dataset; TensorFlow public models; facebook fastText;",incompleteness due to privacy issues,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,dedicated server,Git,Sometimes,"65,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Cluster Analysis,Python,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Stories Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,30,0,0,70,0,0,"Adversarial Learning,Recommendation Engines,Survival Analysis","Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Romania,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,C/C++,Social Network Analysis,C/C++/C#,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,Unsupervised Learning,Neural Networks - CNNs,A master's degree,Internet-based,,,,,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10GB,"CNNs,Evolutionary Approaches,Neural Networks","C/C++,NoSQL,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Segmentation,Text Analytics",Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,20,20,40,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,,,,Most of the time,,Most of the time,,,,,Most of the time,Often,,,Often,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,0,60,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A professional degree,CRM/Marketing,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,,,"Java,Python",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Random Forests,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Sometimes,,,,,40,0,20,40,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Most of the time,,,Often,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Scientist,Engineer",Self-taught,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Retail,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Often,,,Most of the time,Most of the time,Sometimes,,,,,Often,Often,Sometimes,Often,,Often,Sometimes,Sometimes,Sometimes,Often,Rarely,Sometimes,Often,Most of the time,Most of the time,Sometimes,Sometimes,Most of the time,,,,20,30,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Most of the time,Most of the time,,,Most of the time,,Often,,,,,,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Microsoft Excel Data Mining,Social Network Analysis,R,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,0,20,20,0,0,"Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,High school,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data",,,,"Java,Perl,SQL,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,20,0,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,Mysql,Gathering data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,,15000,USD,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Monte Carlo Methods,R,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,"Data Stories Podcast,DataTau News Aggregator,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,40,10,40,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service",Other,Most of the time,100GB,"Decision Trees,Regression/Logistic Regression,Other","Amazon Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL",Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,,,,Often,Most of the time,,,Most of the time,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,,Often,,,,,,,Often,,Often,,Most of the time,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Git,Sometimes,36000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Israel,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Bayesian Methods,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Other,University courses,50,25,0,15,5,5,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",,Academic,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Machine Learning,Python",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes,Random Forests",,,Often,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,0,5,90,5,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Other",,,,,Often,,,,,,,,Most of the time,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,75000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,71,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,Amazon Web services,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Neural Networks - CNNs",A doctoral degree,Mix of fields,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks","Angoss,Microsoft SQL Server Data Mining,Python,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Enterprise Miner,SQL",,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,Sometimes,,,Often,,,Most of the time,,,,,,,,,,"Association Rules,Decision Trees,Naive Bayes,Neural Networks,Time Series Analysis",,Often,,,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,,Rarely,,,,40,10,30,20,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,Sometimes,,,Often,,,Sometimes,,,,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,IT Department,,Inconsistent data gathering methods ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Official documentation,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Very useful,,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,0,100,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Never,1GB,"Bayesian Techniques,CNNs,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Python,Spark / MLlib,TensorFlow",Most of the time,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Text Analytics",,,,,,Sometimes,Most of the time,Often,,,,,,Often,,Most of the time,,,Sometimes,Rarely,,,,,,,,,Rarely,,,,,25,0,0,40,35,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,100% of projects,More external than internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Email,Share Drive/SharePoint",,Git,Never,360000,INR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Switzerland,28,Employed part-time,,,Yes,,Statistician,Perfectly,Employed by non-profit or NGO,I don't plan on learning a new tool/technology,Rule Induction,R,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"Jack's Import AI Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Machine Learning Engineer,Statistician",University courses,20,5,20,50,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Other,10 to 19 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,10MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"CNNs,Decision Trees,Logistic Regression,Other",,,,Rarely,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,60,20,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others",,,,,Often,Sometimes,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,Missing values,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,60000,CHF,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Conferences,Non-Kaggle online communities,Online courses,Personal Projects",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,Very useful,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,15,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",Rarely,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Ensemble Methods,HMMs,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,Text Analytics",,Sometimes,Sometimes,,,,Often,,Sometimes,,,,Sometimes,,,Often,Sometimes,Sometimes,Often,Sometimes,,,Sometimes,,Sometimes,,,,Most of the time,,,,,10,40,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,1600000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Russia,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,Time Series,Bayesian Techniques,A professional degree,Financial,"10,000 or more employees",Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Always,100MB,"Bayesian Techniques,Decision Trees,Random Forests","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,Most of the time,,Sometimes,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Prescriptive Modeling,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,25,45,0,30,0,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,Often,,Often,,,,Often,,,,76-99% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,840000,RUB,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,52,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,5-10 years,,,,,,,,,,,,,,Coursera,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Other,Self-taught,50,0,50,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Australia,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,Not Useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Software Developer/Software Engineer,Other",University courses,25,25,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,Other","Amazon Web services,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,Rarely,Sometimes,,Sometimes,Often,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Often,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Often,,,,,,Sometimes,,Most of the time,,Often,Most of the time,Often,Sometimes,,,,Often,Often,Sometimes,Sometimes,Most of the time,Most of the time,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,Sometimes,,Sometimes,,,,,Often,,,,Often,Often,Most of the time,,Often,Most of the time,,,Often,,76-99% of projects,Entirely internal,Business Department,"Australian Bureau of Statistics, Social Media datasets, web scraped","integration, storage","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,130000,AUD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Poland,19,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",1-2 years,,,,,,,,,,,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,I did not complete any formal education past high school,,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",35,60,0,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,85,0,0,5,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United Kingdom,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Google Search,"Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher",Self-taught,25,10,65,0,0,0,"Computer Vision,Speech Recognition,Time Series","Ensemble Methods,Markov Logic Networks",A bachelor's degree,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests","Amazon Web services,C/C++,DataRobot,Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,Sometimes,,Rarely,,,Most of the time,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,Rarely,,,,,Most of the time,Most of the time,,,Sometimes,,,Often,,,,"Association Rules,Data Visualization,kNN and Other Clustering,Random Forests,Recommender Systems,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,Often,,,,,,,,,Often,Sometimes,,,,,,Often,,,,30,15,20,30,5,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,,,,,Sometimes,Sometimes,,,Often,Often,,76-99% of projects,Entirely internal,Other,data.gov.uk,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +,India,34,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Not Useful,,,,Somewhat useful,,,,,,,,,,Somewhat useful,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Other,No,Master's degree,Computer Science,,Other,Other,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Google Search,"Blogs,Friends network,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,Other,Self-taught,70,10,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Rarely,,,Often,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,Most of the time,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics",,Sometimes,,,Sometimes,Most of the time,Most of the time,Often,Often,,,Often,,,,Often,,,,,Sometimes,,Often,Sometimes,Sometimes,Sometimes,,,Sometimes,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Often,,Often,,Often,,,,Often,,76-99% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,2100000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Julia,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Friends network,Personal Projects,Stack Overflow Q&A",,Very useful,,,,Very useful,,,,,,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Rarely,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,Often,,Often,,,Sometimes,Rarely,Sometimes,,Often,Sometimes,,Often,,,Sometimes,Most of the time,,,,20,20,30,10,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,76-99% of projects,More internal than external,Business Department,data.gov; yahoo finance; google finance,availability of algorithmic download,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Always,1400,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Israel,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",DataRobot,Deep learning,C/C++/C#,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"FastML Blog,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,Business Analyst,Kaggle competitions,40,20,40,0,0,0,Speech Recognition,Bayesian Techniques,A bachelor's degree,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Sometimes,10MB,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Decision Trees",Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,30,40,5,5,20,0,Enough to run the code / standard library,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Mercurial,Sometimes,60000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,18,Employed part-time,,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,R,Monte Carlo Methods,Scala,Google Search,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Data Scientist,Self-taught,90,0,0,0,10,0,"Adversarial Learning,Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A professional degree,Retail,100 to 499 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1TB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,NoSQL,QlikView,R,Spark / MLlib,Statistica (Quest/Dell-formerly Statsoft)",Most of the time,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Rarely,Rarely,Rarely,,,,,Rarely,Sometimes,Often,,,Most of the time,Often,,Sometimes,Rarely,Sometimes,,,Often,,Most of the time,Often,,,,Often,Sometimes,Sometimes,,,,40,20,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",Sometimes,,,Sometimes,,Often,,,Often,Rarely,Rarely,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Turkey,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,,Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Female,Other,27,Employed part-time,,,Yes,,Statistician,Fine,Employed by government,Python,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Researcher,Statistician",University courses,50,10,30,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Stayed the same,More than 10 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,R,SQL",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",Sometimes,Sometimes,Sometimes,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Often,,,,,,,,,Most of the time,,,,20,30,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Most of the time,,Most of the time,Sometimes,,Often,Often,,,,,Most of the time,,,Sometimes,,,Most of the time,Sometimes,,76-99% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Never,,EUR,,7,,,,,,,,,,,,,,,,,, +Male,Germany,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,Self-taught,40,20,0,0,40,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Jupyter notebooks,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Simulation,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Often,Most of the time,,,,,,,Often,,,Sometimes,Sometimes,Sometimes,,Most of the time,Most of the time,Sometimes,,Often,,,Most of the time,,,,20,30,40,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,Most of the time,,,Most of the time,Often,,,Most of the time,Most of the time,Most of the time,Often,Most of the time,,Sometimes,,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,,privacy issues,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,60000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,"Google Search,University/Non-profit research group websites","Blogs,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,Self-taught,80,0,0,20,0,0,Time Series,Bayesian Techniques,"Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Sometimes,<1MB,Bayesian Techniques,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,PCA and Dimensionality Reduction,Time Series Analysis",,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,20,70,0,10,0,0,Enough to refine and innovate on the algorithm,Inability to integrate findings into organization's decision-making process,,,,,,,,Most of the time,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,148800,INR,I am not currently employed,5,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Other,Python,,"Arxiv,College/University,Company internal community,Conferences,Newsletters,Official documentation,Personal Projects,Textbook",Very useful,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,,,Very useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,,University courses,45,5,20,20,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Financial,"1,000 to 4,999 employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Rarely,100GB,"CNNs,HMMs,Neural Networks,RNNs","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,SAS Base,TensorFlow,Unix shell / awk",,,,Often,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,Rarely,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,Data Visualization,HMMs,Logistic Regression,Neural Networks,RNNs,SVMs,Text Analytics",,,,Most of the time,,Most of the time,Often,,,,,,Sometimes,,,Sometimes,,,,Most of the time,,,,,Most of the time,,,Sometimes,Sometimes,,,,,15,30,35,5,15,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,Often,,,,Often,,Sometimes,Most of the time,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,"Voxforge, TEDLIUM, Librispeech, Youtube",Obtaining clean and comprehensive datasets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,1400000,RUB,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,15,45,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Never,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,QlikView,R,SQL,Stan,Tableau",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Often,Most of the time,,,,,,,,,Most of the time,Sometimes,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,,Sometimes,,Most of the time,,,,,,,,,,,65,10,0,10,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,Often,,Most of the time,Most of the time,,Most of the time,Sometimes,,,,Most of the time,,Sometimes,Often,,,,Most of the time,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Sometimes,65000,USD,Has decreased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Friends network,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,,,,Very useful,,Not Useful,,,,,,3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,0,5,10,80,5,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Not important,Very Important,Not important,Not important,Not important +Male,Brazil,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Regression,Python,GitHub,"Arxiv,College/University,Textbook",Very useful,,Very useful,,,,,,,,,,,,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Researcher",University courses,40,5,10,40,5,0,Unsupervised Learning,"Evolutionary Approaches,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,10GB,"Evolutionary Approaches,Gradient Boosted Machines,Neural Networks","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,Spark / MLlib,Other",,,,Often,,,,,Rarely,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,Most of the time,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Text Analytics",,,,,Often,Most of the time,Most of the time,,,Often,,Often,,Most of the time,,Most of the time,,Often,,Most of the time,Most of the time,,,Often,,Most of the time,,,Most of the time,,,,,40,20,5,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Privacy issues",,,,,Often,,,,,,,,Often,,,,Sometimes,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Git,Always,140000,BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,20,40,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Financial,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,<1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,Often,,,,,,Rarely,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,Often,,,,Sometimes,Most of the time,Often,Often,Sometimes,,,,Often,,,,Often,,,Sometimes,,Often,,,,,Rarely,,,,,,40,15,10,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,,,Sometimes,,,,,Often,,51-75% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,Subversion,Most of the time,240000,TTD,,8,,,,,,,,,,,,,,,,,, +Female,Other,24,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,"Data Machina Newsletter,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Researcher",University courses,30,5,30,30,5,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data",Never,100MB,"Bayesian Techniques,CNNs,Neural Networks,RNNs","Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,Rarely,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Natural Language Processing,Neural Networks,RNNs",,,Often,Often,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,Sometimes,,,,,,,,,10,50,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,,,,Often,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,36000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,Haskell,University/Non-profit research group websites,"College/University,Stack Overflow Q&A",,,Very useful,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,30,30,10,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Markov Logic Networks,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAP BusinessObjects Predictive Analytics,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,Sometimes,Most of the time,Most of the time,,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues",Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,,,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Git,,"10,000",EUR,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Switzerland,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),40+,Github Portfolio,Sort of (Explain more),Master's degree,,1 to 2 years,I haven't started working yet,University courses,15,10,0,75,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Other,25,Employed full-time,,,Yes,,Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Predictive Modeler,Programmer,Researcher",University courses,10,20,50,19,1,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Other,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Other,Sometimes,100GB,"CNNs,Neural Networks,Random Forests,RNNs","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Sometimes,,Often,Often,Sometimes,,,,,,Sometimes,,,,,,Often,Sometimes,,Sometimes,,Often,,,,,Most of the time,,,,50,35,5,10,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,Most of the time,,,,,,,,Often,,,Sometimes,,,10-25% of projects,Entirely internal,Other,sleepdata.org,Reducing dimensions without loosing valuable information,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,"47,000",ISK,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Romania,42,"Not employed, but looking for work",,,,,,,,Other,Social Network Analysis,Python,Google Search,Newsletters,,,,,,,,Very useful,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Other,0 - 1 hour,Other,No,Bachelor's degree,Other,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,,University courses,25,50,0,25,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis",Logistic Regression,A bachelor's degree,Insurance,100 to 499 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,"Bayesian Techniques,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Lift Analysis,Logistic Regression",,,,,,,,,,,,,,Often,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,30,10,5,25,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,Often,Most of the time,,,,,,,Most of the time,,,Sometimes,Sometimes,Often,,76-99% of projects,More internal than external,Other,Cenus; national hurricane center,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Self-employed,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,Very useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,,,,Often,Most of the time,,,,,Often,,Most of the time,,Most of the time,,,,,Most of the time,,,,,Most of the time,,Often,,Most of the time,,,,10,60,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of funds to buy useful datasets from external sources,,,,,,,,,,Often,,,,,,,,,,,,,100% of projects,Entirely external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,R,University/Non-profit research group websites,"Newsletters,Textbook,Trade book",,,,,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,3-5 years,Nice to have,Nice to have,Nice to have,,,Nice to have,Nice to have,,,,,,,,GPU accelerated Workstation,2 - 10 hours,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,20,0,20,60,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by college or university,Julia,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",Self-taught,70,30,0,0,0,0,Time Series,Logistic Regression,Primary/elementary school,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Other,,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,Rarely,,Sometimes,,,,,Sometimes,,,,,,Often,,Sometimes,Most of the time,,,,30,10,0,40,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,100% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,10000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,,,,Somewhat useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,DBA/Database Engineer,Operations Research Practitioner",Self-taught,50,50,0,0,0,0,"Adversarial Learning,Natural Language Processing",Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Germany,50,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Excel Data Mining,Bayesian Methods,SQL,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Trade book,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,Very useful,,Very useful,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,I prefer not to answer,Fine arts or performing arts,,"Engineer,Operations Research Practitioner,Other",Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Female,Brazil,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,Python,"GitHub,Other","Blogs,Newsletters,Online courses,YouTube Videos",,Very useful,,,,,,Somewhat useful,,,Very useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher",University courses,30,20,30,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Technology,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,Often,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Random Forests,Recommender Systems",Often,,,,,,Sometimes,,,,,Often,,Often,,,,,,,,,Often,Often,,,,,,,,,,30,35,5,15,15,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,Often,,,Often,Often,,,Most of the time,,Sometimes,,,Most of the time,,,,,,,,,76-99% of projects,Do not know,IT Department,None,Incomplete data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,100000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,Yes,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,Julia,Monte Carlo Methods,Python,University/Non-profit research group websites,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Not Useful,,,Somewhat useful,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Engineer,Self-taught,50,10,10,10,5,15,Recommendation Engines,Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Female,India,21,Employed full-time,,,Yes,,Data Scientist,Perfectly,Self-employed,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Researcher",University courses,10,45,25,15,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Tableau,TensorFlow",,,,,,,,,Often,,,,,,Sometimes,,Often,,,,,,Often,,,,,,,,Most of the time,,Most of the time,,,Rarely,,,,,,,,,Most of the time,Rarely,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",Rarely,Sometimes,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,,Often,Most of the time,,Most of the time,,,,,Often,,,,,,50,28,10,5,7,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,,,,,,Often,Sometimes,,26-50% of projects,Approximately half internal and half external,IT Department,Iris,NA,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,10000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Other,Python,I collect my own data (e.g. web-scraping),"Conferences,Personal Projects,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Natural Language Processing,Support Vector Machines (SVMs),High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data",Never,10GB,"Neural Networks,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,SVMs",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,40,30,30,0,0,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Most of the time,,10-25% of projects,,,,,,Email,,,Most of the time,"500,000",INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","Blogs,Company internal community,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",60,20,0,0,0,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,I don't know,Decreased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Never,1MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs",,,,,,Most of the time,,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,,Often,,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,,,,,Often,,Most of the time,,,,,,,Often,,,Most of the time,,100% of projects,Do not know,Standalone Team,UCI; Kaggle,Working on small and relatively clean datasets in the UCI doesn't teach you how to scale practices for huge datasets. Working with open data is a challenge if the right question isn't defined and the right data isn't available.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Switzerland,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A",Very useful,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",University courses,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Financial,10 to 19 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics",,,,,Rarely,Most of the time,Often,,,,,Often,,Sometimes,,Rarely,,,,Sometimes,Sometimes,,Often,Rarely,Sometimes,Sometimes,,,Sometimes,,,,,45,30,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,,,Often,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,Rarely,Often,,10-25% of projects,Entirely internal,Standalone Team,Names;addresses,Access;privacy,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,,"105,000",CHF,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,31,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,,,Very useful,,,,,,3-5 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Physics,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Switzerland,44,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,C/C++/C#,"Google Search,Government website,University/Non-profit research group websites","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Business Analyst,Engineer,Researcher,Other",University courses,35,20,45,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests",High school,Insurance,"5,000 to 9,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,,,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,Most of the time,,,Sometimes,,,,50,15,10,10,10,5,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,Often,,,,,Most of the time,Most of the time,,51-75% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by college or university,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,,Not Useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer",Self-taught,60,10,20,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs",,,Rarely,,Rarely,Most of the time,Often,Most of the time,Often,Often,,Often,,Often,,Often,,Sometimes,,Often,,,Often,Rarely,,,,Often,,,,,,80,10,2,5,3,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,Often,,,Most of the time,,,,,Often,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,IT Department,clints database,Sattelite object detection,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Most of the time,3113,USD,I am not currently employed,6,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,90,0,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Chile,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Self-employed",Flume,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Newsletters,Official documentation,Textbook",Very useful,,,,,,,Very useful,,Somewhat useful,,,,,Very useful,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,Other,University courses,70,0,10,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,10PB,"CNNs,Ensemble Methods,Evolutionary Approaches,RNNs","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,Often,Often,,,,,,Often,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",Sometimes,,Often,Often,,,,,,,,,,,,,,,,,Often,,,Sometimes,Often,,,,Often,Often,,,,20,40,30,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Most of the time,,,,Most of the time,,,,Rarely,Often,Often,Rarely,Sometimes,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,100000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Google Cloud Compute,Cluster Analysis,R,Government website,"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",20,70,5,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,42,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,R,Cluster Analysis,R,,"Blogs,Company internal community,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Very useful,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Business Analyst,Data Scientist,Other",Self-taught,15,15,35,35,0,0,Time Series,Logistic Regression,A master's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,20,10,0,10,20,40,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Often,Often,,,Often,Often,,Often,,,Most of the time,Often,,Sometimes,Often,,Often,Sometimes,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Most of the time,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,"Business Analyst,Computer Scientist,Data Scientist","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Never,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAS Enterprise Miner,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,Rarely,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Rarely,,Rarely,,Rarely,Most of the time,Often,Most of the time,Sometimes,Sometimes,,Most of the time,,Most of the time,,Sometimes,,,Often,Most of the time,Most of the time,,Most of the time,Often,,,,Often,,Often,,,,50,25,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,Often,,Most of the time,Most of the time,,,,Often,Often,Most of the time,,Sometimes,,,,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Non-Kaggle online communities,YouTube Videos,Other",,,,,,,Somewhat useful,,Very useful,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,Researcher,Self-taught,70,0,30,0,0,0,"Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Never,10GB,Neural Networks,"Microsoft Azure Machine Learning,R,SQL",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,0,50,0,50,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,28000,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",70,20,10,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,Fewer than 10 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Neural Networks,SVMs","Amazon Web services,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,Rarely,Sometimes,Often,,Most of the time,Most of the time,,,,,,,Often,,Often,,Often,Often,Most of the time,Most of the time,,,,Sometimes,,,Sometimes,Most of the time,,,,,47,20,30,3,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Sometimes,,,Most of the time,Sometimes,Sometimes,Often,,,Sometimes,,,,Sometimes,,,Often,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,480000,INR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,SQL,Neural Nets,C/C++/C#,Google Search,"Blogs,Newsletters,Stack Overflow Q&A",,Very useful,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Engineer,Programmer",Self-taught,50,0,20,30,0,0,,,"Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Neural Nets,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Not Useful,,,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Predictive Modeler,Statistician",Self-taught,15,5,70,5,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression",A doctoral degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Java,R,Spark / MLlib,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Text Analytics",,,,,,Often,Often,Sometimes,,,,,,,,,,Often,Often,,Sometimes,,,,,Often,,,Often,,,,,8,34,21,2,32,3,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,Rarely,,,,,,,Sometimes,Sometimes,,,76-99% of projects,Entirely internal,Other,UK ONS data;freely available word lists;data collected by collaborating research departments;geographical data (e.g. postcode locations); wordNet; security feeds,understanding and matching entities across different sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Rarely,50000,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,,R,GitHub,"College/University,Stack Overflow Q&A",,,Very useful,,,,,,,,,,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,Engineer,University courses,0,0,0,100,0,0,"Machine Translation,Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,R,SAS Base,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,Sometimes,Often,,,Often,Often,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",Often,,Often,,,Often,Often,Often,Often,,,Often,,Often,,Often,,Often,,Often,Often,,Often,Often,,,,,,Often,,,,30,20,0,20,30,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,Often,Often,,,Often,,Often,,Often,Often,,,Often,,Often,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,,198000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Sweden,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Podcasts",,,,,Somewhat useful,,Very useful,,,,Very useful,,Somewhat useful,,,,,,"Data Elixir Newsletter,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,,Logistic Regression,I don't know/not sure,Financial,10 to 19 employees,Stayed the same,3-5 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,1GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Python,R,RapidMiner (free version),SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,Rarely,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Logistic Regression",Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,40,20,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,26-50% of projects,Entirely internal,Standalone Team,,Need to clean and transform it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,426000,SEK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Belarus,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,45,0,5,10,40,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Basic laptop (Macbook),Traditional Workstation",Image data,Never,,"CNNs,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,Mathematica,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,Rarely,,,,,,Rarely,,,,,,,Most of the time,,Most of the time,,,Rarely,,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,Often,Rarely,,,,Sometimes,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Git,Rarely,,USD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,3 to 5 years,Researcher,Self-taught,60,5,0,0,0,35,Time Series,,High school,Academic,Fewer than 10 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Video data,Text data",Never,10GB,,"Julia,Jupyter notebooks,Python",,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,60,0,0,30,10,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,Sometimes,,Most of the time,,,,,Often,,,,,,,,,,,Most of the time,,100% of projects,Entirely internal,Central Insights Team,,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,37,"Not employed, but looking for work",,,,,,,,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Traditional Workstation,0 - 1 hour,Kaggle Competitions,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,20,50,0,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,Nigeria,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Personal Projects,Textbook",,,,,,Very useful,Very useful,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",I don't know/not sure,Technology,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Rarely,100GB,Bayesian Techniques,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,Bayesian Techniques,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,30,0,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Often,,,,Most of the time,Often,,,,,,Most of the time,Sometimes,,,Often,Often,,Less than 10% of projects,Entirely internal,IT Department,,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,Predictive Modeler,Researcher,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Decreased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,Often,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",Often,,,Sometimes,,Often,,Often,Often,,,Most of the time,Rarely,,,Sometimes,,Rarely,Sometimes,,Often,,Often,,,Often,,,Sometimes,,,,,40,15,25,0,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,147000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Iran,22,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,India,35,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Genetic & Evolutionary Algorithms,Scala,"GitHub,Google Search,Government website","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,More than 10 years,"Computer Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Work,80,10,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Statistics,Java,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",Sometimes,Sometimes,,,,,,Often,Often,,,Rarely,,,Often,,Often,,,,,,,,,,Often,Rarely,,,Most of the time,Most of the time,Most of the time,,,,,,,,Sometimes,,,,Often,Sometimes,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,RNNs,SVMs,Time Series Analysis",,,Often,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Rarely,,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,Sometimes,,,Most of the time,,Most of the time,,,,60,5,5,25,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Most of the time,,,,,,Most of the time,Often,Often,,,,,Often,,,,Most of the time,,,,100% of projects,More external than internal,Business Department,ims;xpt;stock timeseries;clinical trial data,finding what is relevant and cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,2000000,INR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Ukraine,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,GitHub,"Conferences,Kaggle,Online courses,Textbook",,,,,Somewhat useful,,Very useful,,,,Very useful,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Data Analyst,Researcher",Work,40,50,7,3,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,R,RapidMiner (free version),Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,Rarely,Often,,Rarely,,,,,,,,,,Often,,Rarely,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",Rarely,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,,,Sometimes,,Most of the time,,,,,,,Often,Rarely,,,,,70,10,2,10,8,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,,,,Most of the time,,,Most of the time,,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,12000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,40,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Salfrod Systems CART/MARS/TreeNet/RF/SPM,,Matlab,I collect my own data (e.g. web-scraping),Blogs,,Very useful,,,,,,,,,,,,,,,,,Data Elixir Newsletter,15+ years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,,Experience from work in a company related to ML,Yes,Master's degree,"Information technology, networking, or system administration",,"Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Machine Translation,Speech Recognition,Supervised Machine Learning (Tabular Data)","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Julia,Deep learning,Python,Government website,"Blogs,Kaggle,Non-Kaggle online communities,Online courses",,Very useful,,,,,Very useful,,Very useful,,Very useful,,,,,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,High school,Government,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Don't know,1MB,"Gradient Boosted Machines,Random Forests","IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,Unix shell / awk",,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Natural Language Processing,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,Rarely,,,,90,0,0,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,Most of the time,Often,,,,Often,,,,,Most of the time,,,100% of projects,More external than internal,Standalone Team,"data.gov.uk, social media",obtaining clean data sets with sufficient resolution that are free of charge and free for commercial use.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,I don't typically share data",,Git,Sometimes,39000,GBP,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Poland,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Java,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Friends network,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Other",University courses,50,40,5,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,NoSQL,R,SAP BusinessObjects Predictive Analytics,SAS Enterprise Miner,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,Rarely,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,,,,,Most of the time,Sometimes,Often,,,,Most of the time,,,,Sometimes,,,Sometimes,,,,Often,,,,,,Often,,,,,70,10,5,4,11,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,Most of the time,Often,,Often,Sometimes,,,,Sometimes,Sometimes,Sometimes,,,,Most of the time,Most of the time,,,76-99% of projects,More internal than external,IT Department,GUS; AXIOM,dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,120000,PLN,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Russia,19,Employed full-time,,,Yes,,Programmer,,,Amazon Web services,Link Analysis,C/C++/C#,"GitHub,I collect my own data (e.g. web-scraping)","Conferences,Podcasts,Stack Overflow Q&A,Trade book,Tutoring/mentoring",,,,,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,"Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Programmer,Researcher",Self-taught,30,12,50,8,0,0,Adversarial Learning,"Markov Logic Networks,Neural Networks - CNNs",A master's degree,Financial,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Video data,,,"Decision Trees,Random Forests","C/C++,NoSQL,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Recommender Systems",,,,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,12,23,25,40,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Often,,,76-99% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Mercurial",,40000,USD,I am not currently employed,6,,,,,,,,,,,,,,,,,, +Female,Russia,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Fine arts or performing arts,3 to 5 years,,Self-taught,70,20,0,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,Text Analytics",,,,,,Often,Most of the time,Often,Often,,,Most of the time,,Often,,Often,,Sometimes,,Sometimes,,,Often,,,Sometimes,,,Often,,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT",,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,26-50% of projects,More internal than external,Standalone Team,weather data; census data,It is extremely messy as it comes from an application database (electronic medical record).,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,"41,000",USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer",University courses,0,20,60,20,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,Fewer than 10 employees,Stayed the same,6-10 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Other,Sometimes,100GB,"Ensemble Methods,Gradient Boosted Machines","C/C++,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Simulation,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Often,,,,60,10,0,30,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,"1,020,000",RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,C/C++,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Military/Security,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,Sometimes,Sometimes,,Often,Often,,,,,,Sometimes,,,,,5,5,0,10,10,70,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,Not Useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,6 to 10 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Other,,10GB,"Bayesian Techniques,Evolutionary Approaches,Other","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Perl,Python,R,Stan,TensorFlow,Unix shell / awk,Other",,,,Often,,,,,,,,,,,,,Sometimes,,,Rarely,Rarely,,,,,,,,,Rarely,Sometimes,,Sometimes,,,,,,,,,,Often,,,Rarely,,Sometimes,Most of the time,,,"Bayesian Techniques,Data Visualization,Evolutionary Approaches,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,Often,,,,Often,,,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,,Sometimes,Often,,Sometimes,Sometimes,,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Sometimes,,,,,,,Rarely,,Sometimes,,,,,,Often,Sometimes,,,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,75000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Russia,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,Not Useful,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,,6 to 10 years,Data Analyst,Self-taught,20,30,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Rarely,10MB,"Decision Trees,Neural Networks","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,55,10,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Most of the time,,,,Often,,Sometimes,,,,,Sometimes,Sometimes,,,,,,26-50% of projects,Entirely internal,Central Insights Team,,understanding correctness of datasets in DWH and fixing it,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Rarely,150000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Netherlands,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,35,5,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Other,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Often,,Often,,,,,,,,,Rarely,,,,Rarely,Rarely,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,Often,Sometimes,Often,,,,,Often,,Often,,,,Sometimes,Often,,Often,,,,,Sometimes,,,,,,40,20,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,Most of the time,,Often,,,,,,,,,,,,Often,,,Most of the time,Often,,10-25% of projects,More internal than external,Business Department,,Domain specific data format; Missing Data; Unlabeled Data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,60000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by non-profit or NGO,Python,Cluster Analysis,Python,"Google Search,University/Non-profit research group websites",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"DataTau News Aggregator,FlowingData Blog,KDnuggets Blog",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,70,5,15,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,No education,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,< 1 year,Necessary,Necessary,Necessary,,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,Less than a year,"DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,85,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Female,Russia,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by a company that doesn't perform advanced analytics,Self-employed",Python,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,"Data Analyst,Data Scientist,Researcher,Other",University courses,40,30,20,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Statistics,R,SQL,Other,Other",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,Rarely,Rarely,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Segmentation,Text Analytics,Time Series Analysis,Other,Other,Other",,,,,,Often,Often,Sometimes,Sometimes,,,,,,,Often,,,,,,,Often,,,Often,,,Sometimes,Sometimes,Often,Often,Sometimes,50,30,5,15,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues",,Often,,,,Most of the time,,,,Most of the time,,,Sometimes,Sometimes,,Most of the time,Sometimes,,,,,,100% of projects,More external than internal,Other,oecd,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,"60,000",RUB,Other,6,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Newsletters,,,,,,,,Somewhat useful,,,,,,,,,,,"Emergent/Future Newsletter (Algorithmia),The Data Skeptic Podcast",,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,,,Experience from work in a company related to ML,No,Doctoral degree,,,"Business Analyst,Other",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Neural Networks - GANs",I prefer not to answer,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Python,University/Non-profit research group websites,"Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,,,,,Very useful,,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,Nice to have,,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,30,0,0,30,30,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Tableau,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,Work,20,10,50,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Often,,,Sometimes,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Random Forests",Often,,Rarely,,,,Often,Rarely,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,Often,Often,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint,Other",FTP/SFTP,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,400000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Germany,27,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,DBA/Database Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",15,80,5,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,High school,Internet-based,"5,000 to 9,999 employees",Increased significantly,6-10 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Most of the time,100TB,Regression/Logistic Regression,"Amazon Web services,Python,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Segmentation",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,45,25,15,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input",Often,,Often,,Often,,,,,,Often,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Rarely,"40,000",EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,"Data Elixir Newsletter,DataTau News Aggregator,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,15,60,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Ensemble Methods,Random Forests","Jupyter notebooks,Python,SQL,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Recommender Systems,Time Series Analysis",,,,,Often,Often,,Often,Often,,,Often,,,Often,Often,,,,,,,Often,Often,,,,,,Often,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,Often,Often,,,,,,,,,Sometimes,Often,,,Often,,Sometimes,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Git,Other",Rarely,40000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,25,50,10,5,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Most of the time,10GB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python",,Rarely,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Random Forests,Segmentation",,,,Most of the time,,Sometimes,Often,,,,,,,,,Sometimes,,,,Most of the time,,,Sometimes,,,Often,,,,,,,,30,20,45,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,Often,,Most of the time,,Most of the time,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Subversion",Rarely,70000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Biology,3 to 5 years,"Engineer,Researcher",Self-taught,40,40,10,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Spain,38,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Personal Projects",,Very useful,Somewhat useful,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,0,5,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data,Text data,Relational data",Rarely,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Python,TensorFlow",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",,,Rarely,,,,Most of the time,Sometimes,,,,,,Rarely,,Rarely,,Rarely,Often,Sometimes,,,Often,,Rarely,Sometimes,,Sometimes,Rarely,,,,,40,5,10,45,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues",,Often,,,Sometimes,,,,Most of the time,,,,Sometimes,,,Most of the time,Most of the time,,,,,,Less than 10% of projects,More external than internal,Other,,Adapt the information format yo oÌÎ_r actual needs un a very short time (fue yo the lindo of projects un which se are involved),"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Most of the time,40000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,No,Yes,Other,Fine,Employed by non-profit or NGO,R,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,Very useful,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity",Laptop or Workstation and local IT supported servers,11 - 39 hours,Master's degree,No,Master's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,Logistic Regression,A master's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important +Male,Other,23,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,Somewhat useful,Somewhat useful,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Management information systems,Less than a year,Data Scientist,University courses,30,15,30,20,0,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,Denmark,44,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Personal Projects",Very useful,,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - CNNs",A professional degree,Internet-based,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,1TB,"Bayesian Techniques,Other","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Often,,,,Sometimes,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Neural Networks",,,Often,,,Often,Often,Often,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,0,10,90,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,maxmind,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,"Bitbucket,Git,Other",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Personal Projects,Textbook",Very useful,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Researcher",University courses,40,15,15,10,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",A professional degree,Financial,10 to 19 employees,Increased slightly,3-5 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",Often,,Most of the time,,,Most of the time,Most of the time,,Sometimes,Sometimes,,,,,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,,,,,,20,25,15,25,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,Often,Often,Often,,Most of the time,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,1680000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,University courses,30,30,NA,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Academic,100 to 499 employees,Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data",,100GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,RNNs,Time Series Analysis",,,,Most of the time,Often,Often,Often,,,,,,,Often,,Most of the time,,,,Most of the time,Often,Most of the time,,,Most of the time,,,,,Most of the time,,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,,Often,,,Often,,Often,,,Often,Often,,,,10-25% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,20000,EUR,Other,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"FastML Blog,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Statistician,Other",Self-taught,50,20,20,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Other,Other",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,Often,Often,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis",Most of the time,,Often,,Sometimes,Most of the time,Most of the time,Sometimes,Often,,,Often,,,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Most of the time,Often,,Often,Often,,,Often,,,,50,10,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Most of the time,,,Sometimes,,,Sometimes,,Often,,,,,Most of the time,,Often,,,51-75% of projects,More internal than external,Other,google analytics,"Getting a complete dataset (no holes, no missing fields) that I can trust and use","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email,Other",slack :(,"Git,Other",Sometimes,77000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Other,Fine,Employed by government,Hadoop/Hive/Pig,Anomaly Detection,Java,I collect my own data (e.g. web-scraping),"Blogs,Newsletters,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Military/Security,I don't know,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,10GB,"Regression/Logistic Regression,RNNs","Java,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Data Visualization,Logistic Regression,Text Analytics",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,50,25,0,25,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",Often,,,,Most of the time,,,,,Sometimes,,,,,,,,Often,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Subversion,Never,85000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,"Data Scientist,Researcher",Self-taught,10,5,40,40,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",I don't know/not sure,Retail,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,100MB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Impala,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,Rarely,,,,,Rarely,,,,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Often,,,,,,Often,,,,"Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Segmentation,Time Series Analysis,Other",,,,,,Most of the time,Most of the time,,,,,,,,Rarely,Often,,,,,,,,,,Sometimes,,,,Sometimes,Often,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,Rarely,,Often,Often,,Most of the time,,,Most of the time,,,Sometimes,Rarely,,Often,,,Rarely,Often,,76-99% of projects,Entirely internal,Business Department,loyalty card data; demographic data; segmentation data; supplier data; spatial datasets; weather data; economic indicators,Getting access to siloed data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,47500,GBP,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Female,Australia,27,"Not employed, but looking for work",,,,,,,,Python,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Online courses,YouTube Videos",,,,,Very useful,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Poland,25,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,TensorFlow,Anomaly Detection,Python,University/Non-profit research group websites,"College/University,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,,,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,Coursera,GPU accelerated Workstation,0 - 1 hour,Master's degree,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,"Computer Vision,Reinforcement learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,Russia,57,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Cluster Analysis,SQL,Government website,"Blogs,Official documentation,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,Data Analyst,University courses,50,0,15,35,0,0,,,A professional degree,Retail,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,,"IBM Cognos,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,SQL",,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,Often,,,,,,Often,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,Often,,,,,,,,,,,,Often,Often,,51-75% of projects,More internal than external,Business Department,Money aggregates; GDP; Trade balance,Introduce or improve data structure; Cleaning and normalizing data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Kenya,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Anomaly Detection,R,"Google Search,Government website","Kaggle,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,Very useful,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,80,5,2,8,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,10GB,"Bayesian Techniques,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,Often,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Naive Bayes,Random Forests,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,Sometimes,,,,20,10,20,40,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Often,Often,,,Often,Often,,,,,,Often,,,Often,,Often,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Rarely,75000,KES,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,YouTube Videos",,,,,,Not Useful,Very useful,,Somewhat useful,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Other,I don't write code to analyze data,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Monte Carlo Methods,R,Google Search,"Blogs,Conferences,Friends network,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,"Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,More than 10 years,"Data Analyst,Operations Research Practitioner,Software Developer/Software Engineer",Self-taught,40,10,40,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics",,,Often,,,Often,,Rarely,Rarely,,,,,Sometimes,,Sometimes,,Most of the time,Rarely,Most of the time,,,Rarely,Sometimes,,,,Rarely,Often,,,,,15,40,30,5,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Sometimes,Often,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,All internal.,Privacy,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,140000,,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Trade book,YouTube Videos",,,,,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,Very useful,,Very useful,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler,Other","Online courses (coursera, udemy, edx, etc.)",25,60,5,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,KNIME (free version),R,SQL",,,,,Often,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,Sometimes,,Rarely,,Rarely,,Sometimes,Rarely,,Often,,,Sometimes,,Sometimes,,,,,,65,15,5,14,1,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,Sometimes,Often,,,,,,,,Most of the time,,,,,,,,,100% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,85000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Spain,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,20,10,0,30,40,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,500 to 999 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Python,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Often,Often,Often,,Most of the time,,Often,,Often,,,,Most of the time,Often,,Often,,,,Often,Most of the time,Sometimes,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Often,,,Sometimes,,,Often,,,,,,Sometimes,,,Often,,,76-99% of projects,More internal than external,Standalone Team,UCI; Statlib,Data cleaning and Feature Engineering,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Python,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,25,25,0,25,0,,,A master's degree,Internet-based,,,,,Not very important,Other,Traditional Workstation,"Text data,Relational data",Sometimes,100MB,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,25,0,25,25,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Sometimes,Often,,Most of the time,,,,,Sometimes,,,Most of the time,,,Most of the time,,,,51-75% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Other",,Git,,,,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Friends network,Kaggle,Textbook",,,Somewhat useful,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,< 1 year,Necessary,,Necessary,,,Necessary,,,,,,,,,,0 - 1 hour,,Yes,Master's degree,Mathematics or statistics,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Survival Analysis,Time Series",Logistic Regression,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,,,,Very Important,,, +Male,Canada,26,Employed part-time,,,Yes,,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,College/University,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Other,University courses,40,25,25,10,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)",Support Vector Machines (SVMs),A master's degree,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"CNNs,SVMs","Google Cloud Compute,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,"CNNs,Natural Language Processing,Neural Networks,SVMs",,,,Sometimes,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,30,30,20,10,10,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Often,Less than 10% of projects,Entirely external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Git,Sometimes,60000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,50,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Not important +Male,Finland,35,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,,R,,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,,,A master's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,,,QlikView,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,30,5,25,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,,,,,,Git,,,,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,"Data Stories Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,University courses,12.5,12.5,0,75,0,0,Computer Vision,"Hidden Markov Models HMMs,Other (please specify; separate by semi-colon)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,,10MB,"Decision Trees,Ensemble Methods,Random Forests,Other","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Rarely,,,,,Rarely,,Rarely,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Sometimes,,,,Often,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,60,20,0,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,Sometimes,,Often,,Often,Most of the time,,,,,Sometimes,Most of the time,,,100% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Never,38000,GBP,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Germany,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Somewhat useful,,,,,,,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",0,20,0,30,50,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Technology,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Relational data,Other",,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk,Other,Other",,Most of the time,,,,,,,Sometimes,,,,,,Sometimes,,Rarely,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,Sometimes,,,,,,Often,Often,Most of the time,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,Rarely,,Sometimes,Sometimes,Often,Often,Often,Sometimes,,,Sometimes,,Often,,Rarely,,,Often,Often,Sometimes,,Sometimes,Sometimes,,Often,,,Often,Sometimes,,,,50,10,20,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,Most of the time,,,,,,Often,,,,Sometimes,Sometimes,,,76-99% of projects,Do not know,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,I don't typically share data",,"Bitbucket,Git",Rarely,65000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Brazil,46,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Decision Trees,R,"GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,,The Analytics Dispatch Newsletter,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Yes,Bachelor's degree,Management information systems,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner",University courses,0,10,0,90,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Singapore,39,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Microsoft Excel Data Mining,Monte Carlo Methods,C/C++/C#,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,Rarely,,,,,,,Rarely,,,,Sometimes,,Often,,,,Often,,Most of the time,,,,,,,,,,Often,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,Sometimes,,,Rarely,Often,Most of the time,Often,Often,,,,Sometimes,Often,,Sometimes,,Often,Often,Sometimes,Often,Often,Often,Sometimes,,,Often,Often,Often,Often,,,,40,20,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,Often,,,,,,,,,,Sometimes,,,Often,Most of the time,Most of the time,,51-75% of projects,More external than internal,Business Department,Open Internet Data,data quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Social Network Analysis,R,Google Search,"Blogs,Friends network,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,Very useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,"Data Analyst,Operations Research Practitioner,Predictive Modeler,Researcher",Work,20,10,50,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,CRM/Marketing,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,Sometimes,,Often,,,,,Often,Rarely,,,Often,,,Rarely,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,,,,Often,,Often,Often,Sometimes,,,,,Often,Often,Often,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,Often,,,Rarely,Sometimes,,,,50,10,15,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,Often,,,Sometimes,Most of the time,Sometimes,,,,Often,,Sometimes,,,,Most of the time,,,51-75% of projects,More internal than external,Central Insights Team,"Weather, property sales, credit risk bureau data",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,70000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Anomaly Detection,Python,GitHub,Personal Projects,,,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,55,5,0,0,0,Time Series,"Bayesian Techniques,Logistic Regression",,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,kNN and Other Clustering,Naive Bayes,Time Series Analysis",,,Sometimes,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,Often,,,,90,0,0,7,3,0,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Czech Republic,39,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Google Search,"Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",10,30,0,50,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Java,NoSQL,Python,R,SQL",,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,Sometimes,,,Often,Most of the time,Most of the time,Most of the time,,Sometimes,Often,,Often,Rarely,Often,,Sometimes,Rarely,,Sometimes,,Often,Often,,Often,,Sometimes,Often,Often,,,,30,25,5,20,20,0,,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,Often,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,100000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,DataRobot,"Ensemble Methods (e.g. boosting, bagging)",SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,YouTube Videos,Other",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,10,0,0,70,0,Time Series,Decision Trees - Gradient Boosted Machines,A master's degree,Mix of fields,500 to 999 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,,Regression/Logistic Regression,"SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,Often,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,60,0,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,,Most of the time,,,Most of the time,,,,,Most of the time,,,,,,Often,,,26-50% of projects,Entirely internal,Business Department,,"Hidden rules and changes to said rules over time, poor data governance and accountabilities","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,165000,AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,,Other,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),40+,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,40,50,0,0,10,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,Random Forests,Python,Google Search,Other,,,,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Work,25,50,25,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,SQL,Tableau,Unix shell / awk",Sometimes,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,Rarely,Rarely,,,,,,Often,Most of the time,,,Most of the time,,,Most of the time,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,Often,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Often,,Sometimes,,,,,Sometimes,,,Often,,,Most of the time,Most of the time,,,,40,20,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,,Often,,Often,,Sometimes,,,,,,,Sometimes,,Often,Often,Sometimes,,26-50% of projects,More internal than external,Central Insights Team,Merchant company details e.g. Golden Pages,Processing capacity in the datawarehouse,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Other",Sometimes,80000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,57,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,C/C++/C#,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,I don't write code to analyze data,Engineer,Self-taught,90,10,0,0,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Always,100MB,Other,"C/C++,Microsoft SQL Server Data Mining,Minitab,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Tableau",,,,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Rarely,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Evolutionary Approaches",,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,Sometimes,,Often,Often,,,,,,Often,,Often,,,,Often,,51-75% of projects,Entirely internal,Business Department,,"Data integrity, Data access",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,Python,Deep learning,C/C++/C#,GitHub,"College/University,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Engineer,University courses,10,0,0,90,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Increased slightly,3-5 years,Some other way,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Image data,Never,,Regression/Logistic Regression,"C/C++,Java,KNIME (free version),MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,Rarely,,,,Sometimes,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation,SVMs",,Rarely,Sometimes,,,,Rarely,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,"480,000",PKR,Other,9,,,,,,,,,,,,,,,,,, +Male,Other,43,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects",,Very useful,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,10,10,0,30,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",A professional degree,Telecommunications,500 to 999 employees,Stayed the same,1-2 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,SVMs","Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Simulation",,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,50,20,10,20,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Privacy issues",,,,,,,,,Often,,,,,,,,Sometimes,,,,,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +A different identity,India,36,Employed full-time,,,No,Yes,Other,Fine,Employed by government,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A professional degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important +Male,Russia,32,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,"Employed by a company that doesn't perform advanced analytics,Self-employed",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,,< 1 year,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Professional degree,,Less than a year,"Business Analyst,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,United States,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by college or university,Python,Bayesian Methods,Stata,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Company internal community,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,Very useful,,Very useful,,,,,Not Useful,Very useful,Not Useful,Very useful,Not Useful,,Somewhat useful,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,40,15,35,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A professional degree,Academic,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,Perl,Python,R,SAS Base,SAS Enterprise Miner,Tableau,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Often,,Often,,,,,Sometimes,Rarely,,,,,,Rarely,,,,Most of the time,,,"Association Rules,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis,Other",,Most of the time,,,,Sometimes,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,Often,Sometimes,,,Often,,,,Most of the time,Most of the time,,,35,15,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Rarely,,Most of the time,Most of the time,,,,,Sometimes,,Sometimes,Most of the time,,,,,,Most of the time,Often,,76-99% of projects,More internal than external,Standalone Team,"Demographic Health Survey; US Census Data; CDC/NVSS; Human Mortality Database, World Values Survey; Medical Claims Databases, Hospital Patient Records, Clinical Trial Data, Norwegian and Danish Health Registries","data cleaning and formatting, understanding the data structure, and combining data from different sources","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,"114,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Poorly,Self-employed,Spark / MLlib,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,,Not Useful,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,30,10,50,0,5,5,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Rarely,Often,Sometimes,,Sometimes,Most of the time,,Most of the time,Most of the time,,,Often,Rarely,,,Most of the time,,Often,,,Often,,Often,,,,,Sometimes,,Sometimes,,,,50,15,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,Sometimes,Most of the time,Most of the time,,Often,Most of the time,,Most of the time,Often,Often,Often,Most of the time,Often,Most of the time,Rarely,Most of the time,Most of the time,Often,,10-25% of projects,Entirely internal,IT Department,none,quality,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,200000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Researcher",University courses,35,15,25,0,25,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Neural Networks,Random Forests,RNNs,SVMs","NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,Sometimes,Often,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,Often,Often,Rarely,,,,Rarely,,,Rarely,Sometimes,,,,,10,55,30,5,0,0,Enough to run the code / standard library,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,Sometimes,,,Sometimes,,,,10-25% of projects,More internal than external,IT Department,Kaggle;IBM;Google,a good way to Data preprocessing or Build an appropriate model,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,160000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Italy,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Python,Regression,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,32,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,NoSQL,Anomaly Detection,Other,Google Search,"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,,1-2 years,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,Doctoral degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,20,60,0,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,India,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,IBM SPSS Statistics,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Online courses,Textbook,Tutoring/mentoring",,Very useful,,,,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Business Analyst,Other,30,20,0,0,0,50,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Germany,66,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Perl,Genetic & Evolutionary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer,Programmer",Self-taught,60,20,10,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Bayesian Techniques,CNNs,Neural Networks,RNNs","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics)",,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,Sometimes,Often,,,,,,,,,,,,Sometimes,,,,Often,Sometimes,,,,,,,,,,,,,20,30,15,20,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,80000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,DataTau News Aggregator",1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer",Self-taught,20,40,30,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,People 's Republic of China,33,Employed full-time,,,No,Yes,Engineer,,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,,,,,,,,O'Reilly Data Newsletter,1-2 years,Necessary,,,,Necessary,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,80,20,0,0,0,0,"Computer Vision,Reinforcement learning","Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,Italy,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Female,United States,63,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Monte Carlo Methods,Python,,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,High school,Government,"10,000 or more employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,10MB,Decision Trees,"C/C++,Perl,Python,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Text Analytics",,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,25,15,10,25,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,,,Most of the time,,,,,Most of the time,Often,,,,,,Most of the time,Often,,76-99% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,,,,"College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,30,0,0,70,0,0,,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,66,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,40,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Always,10GB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,40,20,10,10,20,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,Sometimes,Often,,,,Most of the time,,Often,,,,,Often,Most of the time,,100% of projects,Entirely internal,IT Department,,Data quality,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,145000,USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Finland,31,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,Matlab,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"DBA/Database Engineer,Programmer",University courses,25,25,25,25,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Other,,1TB,"CNNs,Ensemble Methods,Evolutionary Approaches,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Time Series Analysis",,,Often,Most of the time,Sometimes,Most of the time,Most of the time,,Often,Sometimes,,,,Sometimes,,Most of the time,,Sometimes,,Most of the time,Often,,,,Sometimes,,,Often,,Rarely,,,,40,5,0,5,0,50,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,Often,Often,,,,,,Often,Most of the time,Most of the time,,26-50% of projects,More external than internal,Standalone Team,Million Song Dataset; Free Music Archive; GTZAN,Copyright issues and lack of large scale data with good labels and full audio,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Rarely,24000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring",,Very useful,Very useful,,Very useful,Very useful,,Very useful,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,10,0,10,60,0,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,Other,"Amazon Web services,Python,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,90,0,0,10,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Other,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,,"100,000",USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Ireland,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,"Employed by a company that performs advanced analytics,Employed by government",Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,50,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Government,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,"Regression/Logistic Regression,RNNs","Amazon Machine Learning,IBM Watson / Waton Analytics,Python",Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"RNNs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,,,,,,,Often,Often,,,100% of projects,Entirely internal,Other,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",I don't typically share data,,Git,Most of the time,132000,EUR,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,Other,Poorly,Employed by government,DataRobot,Deep learning,Python,"GitHub,Google Search","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,Very useful,Very useful,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,Finland,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Monte Carlo Methods,Matlab,Google Search,"Arxiv,Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Researcher,Work,60,0,0,40,0,0,Unsupervised Learning,"Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Academic,100 to 499 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Never,100MB,"Regression/Logistic Regression,SVMs,Other","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,Most of the time,Most of the time,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,Often,,,,,,20,10,0,20,50,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Often,,,,,,,,,Often,Often,,76-99% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,40000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Spain,39,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,Very useful,,,,,,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,,University courses,10,20,0,60,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Government,500 to 999 employees,Stayed the same,More than 10 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",Rarely,1MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Java,MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,Sometimes,,,Often,,,,,,Rarely,,Most of the time,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,Often,,,,Often,,,,,Often,,Often,,,,,,,Often,,,,20,45,5,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Sweden,49,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Trade book,YouTube Videos,Other",,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,Somewhat useful,,Somewhat useful,"Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,,,,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,Udacity,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,75,0,0,25,0,Computer Vision,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,,Very Important,Very Important,Very Important,,,Very Important,Very Important,,Very Important,,Very Important,Very Important,, +Male,United States,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Python,Deep learning,Python,Google Search,"College/University,Conferences,Online courses,Personal Projects,Textbook",,,Very useful,,Very useful,,,,,,Very useful,Very useful,,,Somewhat useful,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,15,25,35,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",High school,Mix of fields,,,,,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Other,Other",Rarely,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Rarely,Rarely,,Sometimes,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,Rarely,Most of the time,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Recommender Systems,Text Analytics",Rarely,,Rarely,,,,Most of the time,,,,,,,,,Most of the time,,Often,,,,,,Sometimes,,,,,Often,,,,,50,25,0,25,0,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Not employed, but looking for work",,,,,,,,Amazon Web services,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,25,25,0,50,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,India,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Support Vector Machines (SVM),Python,"GitHub,Google Search","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Machine Translation,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer,Operations Research Practitioner",University courses,30,10,45,10,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SAS Enterprise Miner,SQL",,Rarely,,Rarely,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,Often,,Most of the time,,,,,Most of the time,Sometimes,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics",Sometimes,Rarely,,,,Most of the time,Most of the time,Most of the time,Often,,,,,Sometimes,Often,Often,,Sometimes,,,Often,Often,Most of the time,,,Sometimes,,,Rarely,,,,,80,10,2,5,3,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,Rarely,,Sometimes,Often,,,,,,,,,,Most of the time,,,,,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,Just getting to it. IT support is terrible.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,103000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,Personal Projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,Engineer,Self-taught,50,10,30,0,10,0,Machine Translation,Neural Networks - CNNs,I prefer not to answer,Mix of fields,"10,000 or more employees",Increased significantly,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Prescriptive Modeling,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,,,,,,,,,,,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,Very useful,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",5-10 years,Necessary,Necessary,Necessary,,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,,"Coursera,DataCamp,edX,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Other,Sort of (Explain more),Master's degree,Other,,"Business Analyst,Predictive Modeler,Researcher",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Female,United Kingdom,40,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"DataTau News Aggregator,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Other,Yes,Master's degree,Mathematics or statistics,,"Business Analyst,Computer Scientist","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important +Male,Japan,35,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",Work,50,0,50,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,High school,Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Rarely,100GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,70,10,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues",Often,,,,,,,,Most of the time,,Sometimes,,,,,,Sometimes,,,,,,26-50% of projects,More internal than external,Standalone Team,PASCAL; MS-COCO; KITTI; ImageNet,To keep uniform quality,Key-value store (e.g. Redis/Riak),Share Drive/SharePoint,,"Git,Subversion",Never,6000000,JPY,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,France,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Official documentation,Stack Overflow Q&A",,Somewhat useful,,,Very useful,,Somewhat useful,,,Very useful,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Data Scientist,Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,5,25,70,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Image data,Text data,Relational data",Sometimes,<1MB,"Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Rarely,,Often,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,,,,,Often,,Often,,,,,,,,,,Sometimes,,,,,Most of the time,Often,,,,30,20,20,20,10,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,population statistics; geographical data,get relevant data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Sometimes,,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Java,Time Series Analysis,Python,"Google Search,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,10,40,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,10MB,Ensemble Methods,"MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Simulation",,,,,,Often,,,Rarely,,,,,,,Rarely,,,,,Rarely,,,,,,Often,,,,,,,25,10,25,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,Sometimes,Sometimes,,Sometimes,,,,,,,,Often,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Rarely,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,,,Very useful,Very useful,,,,,< 1 year,Nice to have,Nice to have,,,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Israel,63,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,,,,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Work,50,0,40,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,Technology,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests","Java,R,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,"Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,Random Forests,Time Series Analysis",,,,,,,,Most of the time,Often,Rarely,,,,Often,,,,,,Rarely,,,Rarely,,,,,,,Sometimes,,,,10,50,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,,,,,,,,,9,,,,,,,,,,,,,,,,,, +Female,Iran,30,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Google Search,Government website","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,,"Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Africa,31,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,6 to 10 years,,Self-taught,20,30,50,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","MATLAB/Octave,SQL",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Simulation",Often,,,,,,Often,,,,,,,,Sometimes,Often,,,,,,,Most of the time,,,Sometimes,Often,,,,,,,60,20,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Often,,Most of the time,Often,Sometimes,,,Most of the time,,,,,Often,,,,Most of the time,Often,,,,76-99% of projects,Entirely internal,Business Department,,Dirty Data;Lack of available definitions,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1000000,ZAR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Statistician,Other",University courses,10,0,30,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Statistics,Java,R,SQL,Tableau",,Most of the time,,,,,,,,,,Rarely,,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,Often,,Most of the time,,,,,Most of the time,,Most of the time,,,Most of the time,Sometimes,,Sometimes,Sometimes,,,,65,10,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,Rarely,,,,,,,,,Sometimes,Often,Often,,,Often,,,76-99% of projects,More external than internal,Other,Client data,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,80000,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Singapore,26,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Monte Carlo Methods,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Textbook",,Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,,3 to 5 years,Data Analyst,University courses,0,0,0,100,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important +Male,United States,49,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Python,Time Series Analysis,Python,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,"Researcher,Other",Self-taught,50,10,0,0,40,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Other,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Python",,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Naive Bayes,Random Forests,RNNs,Segmentation",,,,Most of the time,,Most of the time,Often,Often,,,,Often,,,,,,Sometimes,,,,,Sometimes,,Often,Sometimes,,,,,,,,30,50,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Rarely,160000,USD,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,GitHub,"Arxiv,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Very useful,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",14,30,5,50,1,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Natural Language Processing,Random Forests,Segmentation,SVMs",,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,60,10,10,20,0,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,Rarely,Often,,Often,,,,,,,,Often,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)","Email,Other",Google Drive,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,76000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,18,Employed part-time,,,Yes,,Programmer,Perfectly,"Employed by college or university,Employed by non-profit or NGO,Self-employed",Mathematica,Survival Analysis,,University/Non-profit research group websites,"Kaggle,Stack Overflow Q&A",,,,,,,Not Useful,,,,,,,Somewhat useful,,,,,"Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Other,1 to 2 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Other",Self-taught,20,70,0,10,0,NA,"Adversarial Learning,Machine Translation","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches",Primary/elementary school,Technology,Fewer than 10 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Video data,Text data",Most of the time,100MB,"Bayesian Techniques,CNNs,Markov Logic Networks","Amazon Machine Learning,Amazon Web services,Java,Mathematica,Orange",Often,Often,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Decision Trees,Time Series Analysis",Rarely,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,,,30,30,30,10,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Rarely,,,Sometimes,,,,,,Often,,,,,,,,Most of the time,,,,Less than 10% of projects,Entirely external,Business Department,"preel , piny and more",is my translator ,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,1000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,20,50,0,0,0,30,Supervised Machine Learning (Tabular Data),Ensemble Methods,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Newsletters,Online courses,YouTube Videos",Very useful,,,,,,Very useful,Very useful,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Other",11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,20,20,10,30,0,"Computer Vision,Natural Language Processing,Reinforcement learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Russia,33,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,,Python,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A humanities discipline,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Natural Language Processing,,A doctoral degree,Technology,100 to 499 employees,Increased significantly,More than 10 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,,,"Perl,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,80,0,20,0,0,0,Enough to run the code / standard library,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,,"Commercial Data Platform,Email",,Git,Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,35,5,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Technology,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100TB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Most of the time,,,,,,Often,,,,"Cross-Validation,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests",,,,,,Often,,,Often,,,,,,,Sometimes,,Sometimes,,,,,Often,,,,,,,,,,,50,30,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,Sometimes,,,Most of the time,,,Most of the time,,,,,,,Often,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,exelate;visualDNA;iota;nielson,Too huge to process,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Bitbucket,Sometimes,2500000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Japan,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Time Series Analysis,Python,GitHub,"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,10,10,0,10,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - RNNs",High school,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Most of the time,10GB,"Bayesian Techniques,Neural Networks,RNNs","Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,RNNs",,,,,,Often,,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,Often,,,,,,,,,50,20,10,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Maintaining responsible expectations about the potential impact of data science projects",,Often,,,,,,,,,,,,Often,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,7000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Time Series Analysis,SAS,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Kaggle,Textbook,YouTube Videos",,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,R,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Work,50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)",,"Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,3-5 years,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Most of the time,1GB,,"Cloudera,Hadoop/Hive/Pig,Java,R,SQL",,,,,Rarely,,,,Often,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression",,Sometimes,,,Sometimes,,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,30,30,20,20,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Sometimes,1600000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Female,Spain,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Not Useful,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,20,20,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,Academic,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Never,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Python,QlikView,R,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,Often,,,,,,,"Data Visualization,Decision Trees,Random Forests",,,,,,,Often,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,80,0,10,0,10,0,Enough to run the code / standard library,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,Most of the time,,,,,Most of the time,,,Often,,,Sometimes,,Sometimes,,,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,12000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other,Other",,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,No Free Hunch Blog",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Other,I haven't started working yet",Self-taught,30,20,50,0,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Australia,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,"Data Stories Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Miner,Data Scientist,Software Developer/Software Engineer",University courses,0,20,20,60,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,Financial,"5,000 to 9,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Random Forests","DataRobot,Hadoop/Hive/Pig,KNIME (free version),Python,QlikView,R",,,,,,Rarely,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,Often,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Segmentation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,10,20,10,40,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Often,,,,,,,,Often,,Often,,,,,,,,,,,,76-99% of projects,Entirely external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,"Bitbucket,Git",,131500,AUD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,Ireland,39,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Other,,"Researcher,Other",University courses,NA,NA,NA,NA,NA,NA,,"Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Other,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,,,,,Very useful,Very useful,,,Very useful,"Data Elixir Newsletter,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,Other,Work,50,10,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,Other","Jupyter notebooks,KNIME (free version),Python,R,RapidMiner (free version),SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,Often,,Rarely,Most of the time,,,Sometimes,,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Often,Most of the time,,,Most of the time,Most of the time,Often,Sometimes,,,,,,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,,,,,Sometimes,,,,40,30,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,Most of the time,Often,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,NPPES;HCRIS,"Inconsistency of data warehouse, in design and representation of data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",25,30,40,5,0,0,"Computer Vision,Machine Translation","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,New Zealand,38,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,R,,R,Google Search,"Blogs,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Researcher",University courses,20,10,50,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Often,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,Often,,,"A/B Testing,Association Rules,Data Visualization,Logistic Regression",Sometimes,Sometimes,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,20,0,30,50,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,"Weather, POS",Getting complete data from clients ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Canada,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other","Basic laptop (Macbook),Traditional Workstation",40+,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",10,80,8,0,2,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Fine,Self-employed,Tableau,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,,Necessary,,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,"DataCamp,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Other,1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",15,80,5,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Newsletters,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,Very useful,,,Somewhat useful,,,,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,University courses,50,0,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Other",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems",Often,,Often,,,Often,Most of the time,,,,,Sometimes,,,,Often,,Often,Most of the time,,Sometimes,,,Sometimes,,,,,,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,,,,,,,,Sometimes,Often,,,,,,Often,,,51-75% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,42000,GBP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,Very useful,,,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Other",Other,20,0,80,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Other,"5,000 to 9,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Other,"Text data,Relational data",Most of the time,10TB,"Gradient Boosted Machines,Random Forests,SVMs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk",,,,,Often,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,Sometimes,,,Sometimes,,Often,,,,"Cross-Validation,Decision Trees,Lift Analysis,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,,Often,,,,,,,Often,Sometimes,,,Often,,,Often,Often,,,Often,,Often,Often,Often,,,,10,10,30,20,20,10,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,76-99% of projects,Approximately half internal and half external,Business Department,Customer data; Acxiom,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"170,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Java,I don't plan on learning a new ML/DS method,R,"Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,6 to 10 years,Data Analyst,Work,0,0,100,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Random Forests,A bachelor's degree,CRM/Marketing,100 to 499 employees,Decreased slightly,6-10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,Sometimes,,,Often,,,,,,,"Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,,Often,Rarely,,,,,,,,,,,,,Sometimes,,Rarely,,,Sometimes,,,,Sometimes,,,,30,40,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,Sometimes,,Sometimes,,,Sometimes,,,,,,,,,,,,Sometimes,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Sometimes,44000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,"Arxiv,Conferences,Friends network,Personal Projects,Textbook",Very useful,,,,Very useful,Very useful,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,,University courses,25,0,25,50,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,10MB,"Bayesian Techniques,CNNs,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,Stan,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,Often,,Sometimes,,,,Rarely,,,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,,Rarely,,,Rarely,,Most of the time,Most of the time,,,"Bayesian Techniques,CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Simulation,Text Analytics",,,Most of the time,Sometimes,,Often,,,,,,,,Often,,Sometimes,,,,,,,,,,,Often,,Sometimes,,,,,25,25,5,20,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,Often,,,,,,,Often,,,Sometimes,,,,,,,,100% of projects,More internal than external,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,30,20,50,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,HMMs,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,SVMs",,,Often,,Sometimes,,,,,,,,Sometimes,Often,,,,Often,,,Often,,,,,Often,,Sometimes,,,,,,30,10,40,15,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team",Most of the time,,,,Most of the time,,,,,,,,,,,Often,,,,,,,51-75% of projects,Entirely external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Sometimes,16000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Philippines,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Friends network,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,,,Very useful,,,,,,,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,6 to 10 years,"Business Analyst,Data Miner,Predictive Modeler",Work,20,0,40,40,0,0,"Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning",Logistic Regression,A doctoral degree,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,kNN and Other Clustering,Logistic Regression,Segmentation,Text Analytics",,Sometimes,,,,,Most of the time,,,,,,,Often,,Often,,,,,,,,,,Often,,,Often,,,,,50,10,15,15,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,Often,Often,,,Sometimes,,,Sometimes,Sometimes,,,,Often,,76-99% of projects,More internal than external,Standalone Team,mostly social media public APIs,not having a well-designed data warehouse,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,1000000,PHP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Anomaly Detection,SQL,Google Search,"Tutoring/mentoring,Other",,,,,,,,,,,,,,,,,Somewhat useful,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,,1 to 2 years,"Engineer,Other",University courses,35,45,5,15,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Very useful,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Researcher,Statistician",University courses,50,30,20,NA,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,Sometimes,,Often,,Most of the time,Most of the time,Often,Often,,,,,Often,,Often,,,,Sometimes,Often,,Often,,,Often,,,,Often,,,,50,15,10,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,Often,Often,,,,,,,,,,,,Most of the time,,Most of the time,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,100000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Other,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Very useful,,,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,40,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,Technology,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Text Analytics",,,,,,Often,Often,Often,,,,,,,Sometimes,Sometimes,,,,,,,Often,,,,,,Sometimes,,,,,40,10,10,20,20,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Often,Often,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,,,,7,,,,,,,,,,,,,,,,,, +Male,Other,47,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Decreased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests,Simulation,SVMs,Text Analytics",,Rarely,Sometimes,,,Most of the time,Most of the time,Most of the time,Often,,,,,,,Often,,Often,,,,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,,,,Sometimes,,Often,Most of the time,,,Less than 10% of projects,Entirely internal,Business Department,,,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,110000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,41,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,R,Monte Carlo Methods,R,"Google Search,Government website","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Scientist,Operations Research Practitioner",Self-taught,10,20,70,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Retail,"10,000 or more employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Rarely,Sometimes,,,,,Rarely,Often,,,,Often,,,Rarely,Often,,,,,,Rarely,Often,Rarely,,Often,Rarely,,Rarely,Rarely,,,,30,10,25,30,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,Often,Sometimes,,Often,Often,Often,,,,,Most of the time,,,,,Sometimes,Often,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Rarely,56000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Russia,20,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Stack Overflow Q&A",Very useful,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,"FastML Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,10,40,30,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Other",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow,Unix shell / awk",Rarely,Rarely,,,,,,,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",,,Sometimes,Sometimes,,Most of the time,Most of the time,Rarely,Often,,,Often,,Sometimes,,Sometimes,,Sometimes,Sometimes,Often,Often,,Often,,Often,Rarely,,Often,,,,,,10,20,10,30,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,,RUB,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Textbook",Very useful,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Other",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Other,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,48,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",35,40,10,10,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,6 to 10 years,Researcher,Work,30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A doctoral degree,Pharmaceutical,Fewer than 10 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,Perl,Python,R,SQL",,Often,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Sometimes,,Sometimes,,Sometimes,,,,,Often,,Most of the time,,,,,,,,,,,50,10,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,Most of the time,,,Rarely,,Most of the time,,76-99% of projects,More external than internal,Central Insights Team,nan,Interpreting complex biophysical data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,81500,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,Government website,"Online courses,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,,< 1 year,Nice to have,,,,Nice to have,,,,Nice to have,,,,,"DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Other,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Newsletters,Official documentation,Stack Overflow Q&A,Trade book,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,,,,Somewhat useful,,Very useful,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Business Analyst,Self-taught,75,0,25,0,0,0,Outlier detection (e.g. Fraud detection),,A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,,,,"Hadoop/Hive/Pig,Python,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,,,Often,,,,,,,,,,Sometimes,,,Most of the time,Rarely,,,,30,0,0,15,55,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Scaling data science solution up to full database",,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Other,Don't know,"Git,Subversion",Most of the time,25000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,42,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Decision Trees,R,"Google Search,Government website,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,< 1 year,,,,,,,,,,,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Software Developer/Software Engineer,Other",Self-taught,80,20,0,0,0,0,,,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Deep learning,Python,,"Arxiv,Company internal community,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,Very useful,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Researcher",Work,20,40,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,SAS Base,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,Most of the time,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,Segmentation",Most of the time,,,,,Sometimes,Most of the time,,,,,Often,,,Sometimes,Often,,,,Rarely,,,,,,Often,,,,,,,,25,25,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Privacy issues",,,,Sometimes,Sometimes,,,,,,,,,,,,Rarely,,,,,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,RUB,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Germany,37,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Julia,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Traditional Workstation,0 - 1 hour,PhD,No,Master's degree,Electrical Engineering,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,United States,68,Retired,,,Yes,,Scientist/Researcher,Fine,Self-employed,,,Python,,"Official documentation,Online courses",,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,25,25,25,25,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Cluster Analysis,R,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,Less than a year,Statistician,"Online courses (coursera, udemy, edx, etc.)",40,15,40,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,Often,Sometimes,,,,,,,,,Rarely,Sometimes,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,,,,,Sometimes,Often,,,,,Often,,Often,,,Often,,,,Sometimes,,,,50,30,0,10,10,0,Enough to tune the parameters properly,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",,NA,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,,,Nice to have,,Nice to have,Nice to have,Nice to have,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Hidden Markov Models HMMs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,,,,,,,,,,,,,,,Somewhat important +Male,Poland,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle",,Somewhat useful,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Predictive Modeler,Programmer",University courses,0,0,20,40,40,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","Java,Jupyter notebooks,KNIME (free version),Spark / MLlib",,,,,,,,,,,,,,,Often,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Neural Networks,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Often,,,,,,,,,,Sometimes,,Rarely,,,Often,,,,,Often,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Programmer",University courses,20,10,0,60,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL",Rarely,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Segmentation",,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,50,10,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Most of the time,,,,Often,,,,,,,Most of the time,,,,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,MLS real estate; listhub,Inconsistent field mapping,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,70000,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Fine arts or performing arts,3 to 5 years,,University courses,34,33,0,33,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,I don't know,,,,Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Text data,Never,<1MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Often,Often,Rarely,,,,,Rarely,Sometimes,,Often,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,50,30,0,5,15,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,15000,GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Australia,37,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Monte Carlo Methods,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Podcasts",,Very useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Other,Work,20,20,60,0,0,0,Survival Analysis,,"Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,Most of the time,,,Sometimes,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Segmentation",Most of the time,,,,,,Most of the time,Often,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,50,30,5,10,5,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,Often,,,,,,Sometimes,,10-25% of projects,More internal than external,Business Department,,understanding it and having to manipulate to be useable,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"55,000",AUD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Finland,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Scientist,Programmer,Researcher",University courses,20,10,20,40,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Other,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Relational data,Other",Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Other",,,,,,Often,Most of the time,Often,,,,Often,,Often,,Often,,Often,,Often,Most of the time,,Often,,,,,Often,,,Often,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,100% of projects,Entirely internal,IT Department,The Cancer Genome Atlas (TCGA);1000 Genomes;ExAC;dbSNP;Clinvar,Lack of information (description) about the data. Unstructured data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,36000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Random Forests,R,I collect my own data (e.g. web-scraping),"Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Predictive Modeler",Self-taught,80,5,15,0,0,0,,,A bachelor's degree,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,SAS Base,SQL",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,"Cross-Validation,Decision Trees",,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Sometimes,58600,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,40,40,0,20,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Image data,Video data,Text data",,100MB,"Bayesian Techniques,CNNs,RNNs,SVMs","C/C++,Java,NoSQL,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Often,,,,Rarely,,,,,,"Association Rules,Bayesian Techniques,Naive Bayes,Natural Language Processing,Neural Networks,Segmentation,Simulation",,Most of the time,Most of the time,,,,,,,,,,,,,,,Often,Often,Sometimes,,,,,,Sometimes,Most of the time,,,,,,,60,10,5,10,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,,,,,Sometimes,,,,,,,,,,,Often,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,6600,BHD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,27,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Other,Other,2 - 10 hours,Other,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Other,40,50,0,10,0,0,Other (please specify; separate by semi-colon),Logistic Regression,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,Very useful,,Somewhat useful,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,,Engineer,Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,31,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Work,30,25,35,5,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Mix of fields,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,Often,,,Rarely,Most of the time,,Most of the time,Most of the time,,,Most of the time,,Often,,Sometimes,,,Sometimes,Sometimes,Often,,Most of the time,Rarely,,Often,,,Sometimes,Sometimes,,,,45,15,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,,,,Often,,,,,,,,,Often,Often,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,3458000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Other,3 to 5 years,"Machine Learning Engineer,Researcher",University courses,20,20,20,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Netherlands,33,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,Tableau,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Relational data,Rarely,,,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Decision Trees,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,None,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Sometimes,42000,EUR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Belgium,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,40,0,30,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Telecommunications,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data,Other",Most of the time,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Julia,Jupyter notebooks,Python,Unix shell / awk",,Often,,,,,,,Often,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Simulation,Time Series Analysis",Sometimes,Sometimes,,,Often,Most of the time,Most of the time,Often,Often,Often,,,,Sometimes,,,,,,Sometimes,Sometimes,,Often,Often,Sometimes,,Often,,,Often,,,,20,30,20,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,120000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Analyst,Data Scientist",Work,20,10,50,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Segmentation,Text Analytics",,Sometimes,,,,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,Sometimes,,,Sometimes,,,Sometimes,,,,,20,40,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,60000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,University/Non-profit research group websites,"Blogs,Kaggle,Personal Projects,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,Yes,Master's degree,Computer Science,Less than a year,"Machine Learning Engineer,Programmer",University courses,20,0,0,80,0,0,Reinforcement learning,Support Vector Machines (SVMs),No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +A different identity,Other,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,"Blogs,College/University,Company internal community,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,,Very useful,Not Useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,80,0,10,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Financial,500 to 999 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,Python,R,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Most of the time,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Neural Networks,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,Rarely,Often,,Often,Most of the time,Sometimes,Most of the time,,,Sometimes,,Sometimes,,,,,,Most of the time,,,Most of the time,,,,,,Sometimes,Sometimes,,,,70,20,4,4,2,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,GitHub,"Blogs,Online courses,Podcasts",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,"Business Analyst,Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,25,25,0,0,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Random Forests","Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,,,,,,,Often,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,85,10,2,3,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,,,,,Sometimes,,Often,,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,slack,Git,Sometimes,150000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,R,Deep learning,SQL,GitHub,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,80,10,3,2,0,,,A master's degree,CRM/Marketing,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Video data,Text data,Relational data",,,,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,60,40,100,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,,,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Bitbucket,Most of the time,"200,000",,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Hadoop/Hive/Pig,Deep learning,Python,Google Search,"Company internal community,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Other,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Relational data",,,,"Amazon Web services,Java,NoSQL,SQL",,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,"kNN and Other Clustering,Logistic Regression",,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,Often,,,Sometimes,,,,,Sometimes,,,Most of the time,,,Often,Often,,None,More internal than external,IT Department,,Dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Other",Sometimes,"78,000",USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Anomaly Detection,SQL,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,,Work,100,0,0,0,0,0,,,"Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,,Workstation + Cloud service,"Text data,Relational data",,,,"Java,SAS Base,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,Most of the time,,,,80,4,1,10,5,0,Enough to run the code / standard library,Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,73000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,Programmer,University courses,NA,NA,NA,NA,NA,NA,Time Series,"Evolutionary Approaches,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Denmark,28,Employed part-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Miner,Programmer,Software Developer/Software Engineer",University courses,0,0,50,50,0,0,"Machine Translation,Natural Language Processing","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A doctoral degree,Technology,Fewer than 10 employees,Increased significantly,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,10GB,,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Often,,,,,,,,,,"Cross-Validation,Natural Language Processing",,,,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,20,60,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Git,Other",Most of the time,100000,DKK,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,Spark / MLlib,Survival Analysis,SQL,,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,Other,University courses,20,5,55,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,,"Decision Trees,Regression/Logistic Regression","Amazon Web services,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,Often,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation",Sometimes,,,,,Often,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,30,10,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,,,Often,Often,,,,Sometimes,,,,,,Sometimes,,,,,,,,76-99% of projects,More internal than external,Other,public statistics bureau data,"Dirty data, poor variable naming, strange relational architecture ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other","Direct access to relational databases; box/drive/dropbox; physical storage exchange (usb memory sticks, SD cards)",,Most of the time,4500000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Data Scientist,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Friends network,Kaggle,Official documentation,Textbook",,,,,,Very useful,Very useful,,,Very useful,,,,,Very useful,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,20,5,20,35,20,0,Supervised Machine Learning (Tabular Data),,Primary/elementary school,Financial,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Never,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,Often,,,,,,,,Most of the time,,,,,,Sometimes,,,,,Most of the time,,,Most of the time,,Most of the time,,,,,Most of the time,Sometimes,,Sometimes,Most of the time,,,,Often,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Rarely,,Rarely,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Often,,Often,Often,,Most of the time,,,,,Most of the time,,,,,,95,3,0,1,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,,,Most of the time,Most of the time,Rarely,,,,,,,Most of the time,,Often,,,Most of the time,,,Most of the time,,100% of projects,More external than internal,Other,,"missing data, noisy data, big data ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Other",Internal messenger ,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Scala,Google Search,"Arxiv,Conferences,Kaggle",Somewhat useful,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,Software Developer/Software Engineer,Self-taught,65,5,20,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting","Some college/university study, no bachelor's degree",Retail,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,Sometimes,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,Sometimes,Often,,,,,,,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests",,,,,,,,Often,Sometimes,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,40,15,5,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,,,,Sometimes,,,,,Most of the time,,Often,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,APIs,"Bitbucket,Git",Sometimes,,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by government,Self-employed",Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Other,Self-taught,80,5,0,0,15,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,500 to 999 employees,Stayed the same,3-5 years,Some other way,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Often,,Sometimes,,Sometimes,,Often,Sometimes,,Often,,Often,Sometimes,,Sometimes,Rarely,Often,,,,45,30,0,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,Sometimes,,,Most of the time,,Often,,,,,,,,,Most of the time,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,,Data integrity ,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,102000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,I don't know,Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Don't know,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Rarely,,Sometimes,,,Rarely,Rarely,,,,,,Rarely,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Sometimes,Sometimes,,,,,,Rarely,,Often,,Rarely,Often,,,,Often,,,,,Sometimes,Sometimes,,,,,5,70,0,10,15,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,,,,,,Often,,,,Often,Most of the time,,10-25% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,5000000,JPY,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",Very useful,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Not Useful,Somewhat useful,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",3-5 years,,,,,,,,,,,,,,"edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,I never declared a major,3 to 5 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Belgium,43,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,R,I collect my own data (e.g. web-scraping),"Blogs,Company internal community,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Not Useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",10-15 years,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,Yes,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Software Developer/Software Engineer,Statistician",Work,50,20,10,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Spain,57,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Official documentation,Online courses,Podcasts,Textbook",Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,Very useful,Very useful,,Very useful,,Somewhat useful,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher",University courses,40,50,0,0,10,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",High school,Academic,500 to 999 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Julia,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Other",Often,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,Often,,,Sometimes,Sometimes,,,Often,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Recommender Systems,Simulation",Often,Often,Sometimes,,Often,,Most of the time,Most of the time,Sometimes,,,,,Most of the time,,Most of the time,,Often,,,,,,Often,,,Most of the time,,,,,,,20,50,0,30,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,100% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Dropbox,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,80000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,25,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Deep learning,,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Stack Overflow Q&A,Textbook",Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,20,30,20,30,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,6-10 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,,"Bayesian Techniques,CNNs,Neural Networks","Amazon Web services,C/C++,MATLAB/Octave,Python,TensorFlow",,Rarely,,Often,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,"kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,RNNs",,,,,,,,,,,,,,Rarely,,,,Rarely,Sometimes,Sometimes,,,,Often,Often,,,,,,,,,20,50,30,0,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,,,,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Australia,59,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,Java,Regression,Java,Government website,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Other",Self-taught,40,30,30,0,0,0,Time Series,"Logistic Regression,Neural Networks - CNNs",I don't know/not sure,Academic,20 to 99 employees,Stayed the same,Don't know,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Never,100TB,Regression/Logistic Regression,"IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,30,0,20,50,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools",,Most of the time,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,without enough tools,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"85,000",AUD,Has decreased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,SQL,Google Search,"Arxiv,Blogs,Conferences,Friends network,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,Very useful,Very useful,,,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Talking Machines Podcast",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,,Doctoral degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist,Engineer,Predictive Modeler,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,40,10,0,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - GANs",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,44,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Non-profit,20 to 99 employees,Increased slightly,6-10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,R,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics",,,Rarely,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,,,,Often,,,,,50,10,0,15,25,0,Enough to tune the parameters properly,Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,100% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Always,112500,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Sweden,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Neural Nets,R,University/Non-profit research group websites,"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Never,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",Most of the time,,,,,,Most of the time,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,Sometimes,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,420000,SEK,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,SQL,Google Search,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Video data,Text data,Relational data",Most of the time,,,"Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,QlikView,Spark / MLlib,SQL,TIBCO Spotfire,Unix shell / awk",,Most of the time,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,Often,,,,Often,,,Sometimes,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Sometimes,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Rarely,195000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Sweden,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Text Mining,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,YouTube Videos",Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,,Somewhat useful,"Siraj Raval YouTube Channel,Talking Machines Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Data Analyst,Self-taught,50,10,10,30,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,France,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",Very useful,,,Very useful,,Very useful,Very useful,,Very useful,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,PhD,Yes,Master's degree,Mathematics or statistics,,"Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler",Kaggle competitions,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +"Non-binary, genderqueer, or gender non-conforming",Turkey,25,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,SQL,Google Search,"Company internal community,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,Business Analyst,Self-taught,60,10,30,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection)",Bayesian Techniques,A bachelor's degree,Retail,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees","NoSQL,R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics",Most of the time,,Most of the time,,,Often,Often,Sometimes,,,,,,,Often,Sometimes,,,,,,Often,,,,Most of the time,,,Sometimes,,,,,70,10,10,5,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Sometimes,Most of the time,Often,,,Often,,,,Rarely,,,,,Sometimes,,Often,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,60000,TRY,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Germany,59,Retired,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Time Series,Logistic Regression,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,32,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,25,0,0,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Russia,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,R,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Somewhat useful,,,,,,,Very useful,,,Very useful,,"Data Stories Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,10,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Technology,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Rarely,1GB,SVMs,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Segmentation,SVMs,Time Series Analysis",Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,Rarely,,,,50,10,0,30,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues",Most of the time,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,10-25% of projects,Entirely internal,Central Insights Team,none,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,2000000,RUB,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Most of the time,100GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Angoss,Cloudera,Impala,Jupyter notebooks,Python,R,SAS Base,SQL,Tableau",,Rarely,Rarely,,Often,,,,,,,,,Often,,,Often,,,,,,,,,,,,,,Often,,Rarely,,,,,Most of the time,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Often,,,,,Sometimes,Often,Often,,,,,,Often,Often,Often,,,,,Sometimes,Often,Sometimes,,,Often,,,,Sometimes,,,,75,0,20,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,Sometimes,Most of the time,,,Sometimes,,,,,Sometimes,Sometimes,,,Often,,Sometimes,,Most of the time,,100% of projects,More internal than external,Standalone Team,Census,ETL,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,85000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Spain,42,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Genetic & Evolutionary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites,Other","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",3-5 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Workstation + Cloud service,2 - 10 hours,Github Portfolio,Yes,Master's degree,Management information systems,,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Online courses,Personal Projects",,Somewhat useful,,,,,,,,,Somewhat useful,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",45,45,0,10,0,0,Time Series,,A master's degree,Technology,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,,"Amazon Web services,NoSQL,Python",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Time Series Analysis",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,20,35,5,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,Often,,,51-75% of projects,Entirely internal,IT Department,NYC Taxi; drug database; gene database,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Rarely,100000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,Less than a year,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",10,50,0,0,40,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Text data,Always,100MB,"Gradient Boosted Machines,RNNs","Java,MATLAB/Octave,Python,SQL,TensorFlow,Other",,,,,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,Most of the time,,,"Cross-Validation,Decision Trees,Ensemble Methods,Natural Language Processing,Neural Networks,Random Forests,RNNs",,,,,,Most of the time,,Often,Sometimes,,,,,,,,,,Often,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,50,20,10,0,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,,Most of the time,,,Often,,,,Sometimes,,Most of the time,,Less than 10% of projects,More internal than external,IT Department,,,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Subversion,Sometimes,50000,USD,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Sweden,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Other,Python,University/Non-profit research group websites,"Arxiv,College/University,Conferences,Kaggle,Online courses,Podcasts,Textbook",Somewhat useful,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,Very useful,,Somewhat useful,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Programmer,Researcher",University courses,0,30,0,70,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Mathematica,Python,R,Spark / MLlib,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,,Sometimes,,,Often,,Often,,,Often,Sometimes,Often,,Often,Rarely,,Sometimes,,Often,Often,Most of the time,,,,50,10,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,Sometimes,,Often,Often,,Often,,,Sometimes,,,Sometimes,,Often,Sometimes,,,100% of projects,Entirely internal,Central Insights Team,"AppMonsta, MCC database, wikidata","indexing, size","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,850000,SEK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,C/C++/C#,Google Search,"Arxiv,Blogs,Kaggle,Textbook",Somewhat useful,Very useful,,,,,Somewhat useful,,,,,,,,Very useful,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Machine Learning Engineer,Programmer",Self-taught,30,20,50,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Always,1PB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,CNNs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",Sometimes,,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,Sometimes,,,,,50,30,10,10,0,0,Enough to tune the parameters properly,"Dirty data,Limitations in the state of the art in machine learning",,,,,Often,,,,,,,Sometimes,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Other,Always,700000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Nigeria,29,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,SQL,Social Network Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,,Experience from work in a company related to ML,No,Professional degree,,I don't write code to analyze data,"Researcher,Other",Work,60,10,30,0,0,0,Reinforcement learning,Decision Trees - Random Forests,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,United States,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,Microsoft Excel Data Mining,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Not Useful,Very useful,Somewhat useful,,Very useful,Very useful,Not Useful,,Somewhat useful,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Researcher",Work,25,25,25,20,5,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,I don't know,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Text data,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Excel Data Mining,Perl,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,Rarely,,,,,,,,,Rarely,,Rarely,,Often,,,,,,Often,,,,,,,Rarely,Often,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,,Often,Sometimes,Sometimes,,,,,,,Sometimes,,,Often,,,,Sometimes,Often,,,,,Often,Sometimes,,,,30,15,5,5,45,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Often,Often,Rarely,,Sometimes,,,,,Sometimes,Rarely,Often,,Often,Sometimes,Sometimes,Sometimes,Often,,76-99% of projects,Approximately half internal and half external,Other,,cleaning; consistency; formats,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Never,,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Brazil,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Online courses,Stack Overflow Q&A",,Very useful,,,,,,,,,Very useful,,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",5,80,13,0,2,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,Fewer than 10 employees,Increased slightly,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Most of the time,<1MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,,,,Often,Sometimes,,,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,,20,20,5,25,30,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,10-25% of projects,More internal than external,Other,,feature selection and dimension reduction,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",PowerPoint presentations,Bitbucket,Always,150000,BRL,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,France,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Monte Carlo Methods,R,Google Search,"Arxiv,Kaggle,Stack Overflow Q&A",Very useful,,,,,,Very useful,,,,,,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Statistician",Kaggle competitions,60,20,0,0,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",Sometimes,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Often,,,Sometimes,Sometimes,,Rarely,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",Often,,,,,Most of the time,Sometimes,Sometimes,Often,,,Often,,Sometimes,,Often,,Rarely,Often,,Often,,Often,Sometimes,,Sometimes,,Sometimes,Sometimes,,,,,50,15,20,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Most of the time,Most of the time,,,,Most of the time,,,,,,Most of the time,,,Often,Most of the time,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,10000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,Google Search,"Blogs,Conferences,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Not Useful,,,,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,,,,Not Useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,20,10,30,40,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Always,100GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,TensorFlow",,Sometimes,,,,,,Sometimes,Often,,,,,,Often,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks",,,,,Often,Most of the time,,Often,,,,Most of the time,,,,Often,,,Sometimes,Often,,,,,,,,,,,,,,35,10,40,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,,,,,,,,,Often,,,,,Most of the time,,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,39,GBP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",Less than a year,Data Scientist,Other,20,30,7,40,3,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Telecommunications,10 to 19 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Don't know,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Flume,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow",,,,,,,Rarely,,Often,Rarely,,Rarely,Rarely,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,,,,Most of the time,Often,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,,Often,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,Most of the time,,,,Sometimes,,,Sometimes,,Sometimes,Most of the time,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,76-99% of projects,More external than internal,Standalone Team,twitter; facebook; different news sources; and blogs;,cleaning; appropriate data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,360000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Nigeria,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,,,,,Very useful,Very useful,,,,,,Very useful,,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,,Yes,Master's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,Computer Vision,"Evolutionary Approaches,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Poland,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Statistician",Self-taught,50,10,30,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Always,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Angoss,Cloudera,Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,KNIME (commercial version),Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,Rarely,,Often,,,,Most of the time,,,,,Often,Sometimes,Sometimes,Most of the time,Sometimes,,,,,,Often,,,,,,,Most of the time,,Most of the time,,Sometimes,Sometimes,,Most of the time,Most of the time,,Most of the time,Most of the time,,,Sometimes,Often,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Often,Most of the time,,Often,Most of the time,,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Often,Sometimes,Sometimes,,,,40,20,20,0,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,Most of the time,Most of the time,,,Most of the time,,Sometimes,,,,Often,Often,,Often,,Most of the time,,,,26-50% of projects,More internal than external,Standalone Team,Demographic information; Bureau data; Traffic; Weather; Macro economic data ,Messy or dirty data ,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,200000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Other,University courses,40,10,20,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,500 to 999 employees,Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,<1MB,"Decision Trees,Random Forests,Regression/Logistic Regression",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,,,Often,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,,,,Often,Sometimes,Sometimes,,,Often,,,,,,Most of the time,Sometimes,,,51-75% of projects,Entirely internal,Central Insights Team,Bloomberg; FactSet,"Lack of technology (either internal or from provider) to seamlessly pull data (no API, for example).","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"107,000",,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Official documentation,Online courses,Personal Projects,Other",Very useful,,Very useful,,,,,,,Very useful,Very useful,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,100 to 499 employees,,,Some other way,Very important,Other,Other,Relational data,,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs,Other","Amazon Machine Learning,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Other",Rarely,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis,Other",Sometimes,,Often,Often,,Most of the time,Most of the time,Sometimes,Sometimes,,Sometimes,Often,,Rarely,,Often,Rarely,Often,Sometimes,Often,Rarely,,,Rarely,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Most of the time,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Unavailability of/difficult access to data,Other",,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Sometimes,Most of the time,None,,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Git,Other",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,27,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Excel Data Mining,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,,,, +Male,Japan,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,1 to 2 years,"Researcher,Other",Self-taught,40,30,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",Rarely,,Sometimes,,,Often,Often,,,,,,,,,Often,,Sometimes,Often,,,,Sometimes,,,,,Sometimes,Often,,,,,40,25,15,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,,,,Most of the time,,,Often,,,Rarely,,,,,,Sometimes,,,26-50% of projects,More internal than external,IT Department,census data,joining diverse data sets with different naming schemes,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,66500,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Other,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",I don't plan on learning a new tool/technology,Neural Nets,Python,,"Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,,,,,,,,,,,,,,edX,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Physics,Less than a year,Researcher,Kaggle competitions,5,0,5,0,90,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Github Portfolio,No,Master's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,57,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Julia,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Podcasts,Textbook",,Somewhat useful,,,,,Very useful,,,,Very useful,,Somewhat useful,,Somewhat useful,,,,"Linear Digressions Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Greece,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle,Online courses,Personal Projects",Somewhat useful,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Coursera,GPU accelerated Workstation,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,15,0,30,5,0,"Computer Vision,Time Series",Neural Networks - CNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Female,Germany,52,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Other,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,,NA,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Official documentation,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Researcher,Software Developer/Software Engineer",University courses,20,0,30,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Sometimes,10GB,"Bayesian Techniques,Neural Networks,Random Forests","C/C++,Python,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,Often,,Sometimes,,,,,,,Often,,,,20,30,40,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations in the state of the art in machine learning,,,,,,,,,,,,Sometimes,,,,,,,,,,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,200000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Researcher",University courses,20,5,25,35,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,Bayesian Techniques,"Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Other",,,,,,,Often,,,,,,,,,Sometimes,,Often,,,Sometimes,,,,,,,,,,Often,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,Sometimes,,,,,Often,,,,,,Often,,,100% of projects,Entirely internal,Other,"Again, too many to list",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,126000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,New Zealand,43,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,Somewhat useful,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,DBA/Database Engineer,Self-taught,20,20,20,0,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",,1-2 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Relational data",Never,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,Often,,Most of the time,,Sometimes,Sometimes,,Rarely,Sometimes,,,,Rarely,,,,Sometimes,Rarely,,Rarely,,,,,,,,,,,50,20,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,Sometimes,Sometimes,,Often,,,,,,,Sometimes,,,,Sometimes,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,125000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,France,30,"Not employed, but looking for work",,,,,,,,R,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts",,,,,,,,,,,Very useful,,Very useful,,,,,,The Data Skeptic Podcast,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Other,No,Master's degree,A health science,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,42,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by government,IBM Watson / Waton Analytics,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Online courses",,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Rarely,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,R,Other",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Rarely,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Rarely,,,,Sometimes,Most of the time,Most of the time,,,,,,Sometimes,,Sometimes,,,,Often,Sometimes,,Often,,,,,,Sometimes,Sometimes,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,Often,,,Often,,,Most of the time,,,Often,Most of the time,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Rarely,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Data Analyst,Perfectly,Self-employed,Hadoop/Hive/Pig,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,10,20,20,20,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection)","Logistic Regression,Neural Networks - RNNs",A professional degree,Technology,100 to 499 employees,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Data Visualization,Simulation,SVMs",Often,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,30,30,10,10,20,NA,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,Most of the time,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,35,BRL,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Other,Cluster Analysis,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler",Self-taught,100,0,0,0,0,0,Survival Analysis,Bayesian Techniques,A doctoral degree,Insurance,100 to 499 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,100GB,Bayesian Techniques,"Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,Sometimes,,,,,,,Most of the time,Sometimes,,,,,,,,,Often,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Naive Bayes,Segmentation",,,,,,,Most of the time,,,,,,,Rarely,,,,Rarely,,,,,,,,Sometimes,,,,,,,,5,20,20,30,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,,,,,,,Often,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,150000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,France,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Non-Kaggle online communities,Tutoring/mentoring",,,,,,,Very useful,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Statistician","Online courses (coursera, udemy, edx, etc.)",0,60,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Financial,"1,000 to 4,999 employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Text data,Relational data",Always,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Orange,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,Often,Sometimes,,,Sometimes,,,,Most of the time,,,Sometimes,,Sometimes,Sometimes,Often,,,,,,Sometimes,Rarely,,,,60,20,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues",Often,,,Sometimes,Often,,,,Often,,,Sometimes,,,Most of the time,,Often,,,,,,Less than 10% of projects,More internal than external,Business Department,open data,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,30,5,20,40,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","Jupyter notebooks,NoSQL,Orange,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,Often,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,Often,,Sometimes,,,,Often,,Often,,Often,,,Often,,Often,,,,,,Sometimes,Sometimes,,,,5,10,10,40,5,30,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,13000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,,KDnuggets Blog,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,Github Portfolio,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Ukraine,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Monte Carlo Methods,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Newsletters",Somewhat useful,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",25,5,5,5,60,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Internet-based,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Natural Language Processing,Random Forests,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Sometimes,,,,Most of the time,,,,,,,Sometimes,,,,Often,,,,,,,Often,,,,70,5,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process",Most of the time,Often,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Never,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Online courses",,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Operations Research Practitioner,Self-taught,60,20,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Other,"10,000 or more employees",Decreased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1PB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,Often,,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Text Analytics,Time Series Analysis",Often,,Often,,,Often,Sometimes,,Sometimes,,,,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,20,10,50,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,,,Sometimes,,Sometimes,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,150000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,Weka,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,,,< 1 year,,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,39,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,30,0,20,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Laptop or Workstation and private datacenters,Other,Always,10MB,Bayesian Techniques,"C/C++,Python,Other",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,Bayesian Techniques,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5,30,50,5,10,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"FastML Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Scientist,Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",Work,14,0,40,45,1,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Video data,Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow,Other",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Often,Often,,Most of the time,,Sometimes,Sometimes,Sometimes,,,Sometimes,Often,,Sometimes,,Sometimes,,Often,Sometimes,,Sometimes,,,Often,Often,Most of the time,,Often,,,,35,30,30,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input,Limitations of tools",Sometimes,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",Sometimes,175000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Australia,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Employed by company that makes advanced analytic software,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,Very useful,Somewhat useful,Somewhat useful,,,Very useful,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,20,0,80,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Very important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Always,100TB,"CNNs,Decision Trees,Evolutionary Approaches,Neural Networks","Hadoop/Hive/Pig,NoSQL,Python,TIBCO Spotfire,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,"Evolutionary Approaches,Naive Bayes,Neural Networks,Simulation,Time Series Analysis",,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,Most of the time,,,Most of the time,,,,10,0,0,0,0,90,Enough to refine and innovate on the algorithm,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,"Git,Other",Sometimes,500000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Turkey,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,,"FastML Blog,Linear Digressions Podcast,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,20,0,0,50,0,Recommendation Engines,"Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important +Male,Pakistan,31,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,R,Social Network Analysis,Python,"GitHub,Google Search","Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Business Analyst,Self-taught,10,90,0,0,0,0,Recommendation Engines,Logistic Regression,A bachelor's degree,Technology,I don't know,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,"Python,QlikView,R,SAP BusinessObjects Predictive Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,Sometimes,,,,Rarely,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,Recommender Systems,Segmentation,Text Analytics",,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,,5,5,5,60,25,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,NA,NA,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Git,Rarely,0,PKR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Personal Projects,Textbook",,,,,,,Very useful,,,,,Very useful,,,Very useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Very useful,"DataTau News Aggregator,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer",University courses,20,10,20,30,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,500 to 999 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Python,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,Rarely,,,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Time Series Analysis",Sometimes,,Sometimes,Most of the time,Sometimes,Most of the time,Often,Sometimes,Sometimes,,,Most of the time,,Sometimes,,Often,,Rarely,Most of the time,Often,Often,,Most of the time,Often,Often,Rarely,,,,Most of the time,,,,60,10,20,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Often,Sometimes,,,Sometimes,,,Rarely,,,Often,,,,,,Often,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,250000,BRL,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Personal Projects",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,,,,Very useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,45,15,5,20,15,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"5,000 to 9,999 employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Other,Workstation + Cloud service,"Text data,Relational data",,,,"Jupyter notebooks,NoSQL,Python",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation",Often,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,20,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,Most of the time,,,,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,140000,BRL,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Hungary,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Stack Overflow Q&A",Very useful,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Researcher,Self-taught,40,20,0,30,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,,10GB,"CNNs,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis",,,,Often,,Most of the time,,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,Sometimes,Often,,Often,,,,10,30,30,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Often,Sometimes,,,,Often,,,,,,Often,,,100% of projects,Entirely internal,Central Insights Team,1000 functional connectomes project's datasets,I'm working with fMRI measurements where the data dimensionality is magnitudes larger than the number of samples,"Column-oriented relational (e.g. KDB/MariaDB),Other",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,2500000,HUF,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,No,Yes,Statistician,Fine,Employed by college or university,R,Social Network Analysis,Stata,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Online courses,Personal Projects,Textbook,Trade book",,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,15+ years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,PhD,Yes,Doctoral degree,A humanities discipline,I don't write code to analyze data,"Researcher,Statistician",University courses,20,0,40,40,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Other,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Online courses,Textbook",,,,,Somewhat useful,,,,,,Very useful,,,,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Statistician",University courses,10,0,50,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",,Retail,"1,000 to 4,999 employees",Stayed the same,Don't know,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service","Image data,Text data,Relational data",Rarely,,"Bayesian Techniques,Random Forests","IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Orange,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,Often,,,,,Often,,,,,Sometimes,Sometimes,Most of the time,Most of the time,,,,Often,,Most of the time,,Most of the time,,,,Often,Most of the time,Most of the time,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Lift Analysis,Naive Bayes,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Most of the time,Often,,Most of the time,Often,Most of the time,Often,,Sometimes,,,,,Often,,,Often,,,,Often,Most of the time,Most of the time,Often,Most of the time,Often,,Often,Often,,,,40,30,0,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,Often,Most of the time,,,,Most of the time,,,Most of the time,Most of the time,,,Sometimes,Often,,Often,Most of the time,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,,Rarely,,,,6,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,R,Text Mining,Stata,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,,,,,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Programmer,Self-taught,20,20,20,20,0,20,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Never,<1MB,Decision Trees,"Cloudera,Hadoop/Hive/Pig,Perl,SQL",,,,,Rarely,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues",Rarely,,,,,,,,Most of the time,Most of the time,,,,,,,Rarely,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Personal Projects",,Very useful,,,,Very useful,,,,,,Very useful,,,,,,,KDnuggets Blog,< 1 year,,Nice to have,Necessary,,Nice to have,,,,,,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,15,0,5,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,Not important,,,,Somewhat important +Male,Brazil,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by non-profit or NGO,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",,A bachelor's degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,Some other way,Very important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Sometimes,100GB,SVMs,"Jupyter notebooks,Python,Other,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,"Cross-Validation,Data Visualization,Prescriptive Modeling,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,15,15,30,20,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Limitations of tools",,,Sometimes,Often,,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,45000,BRL,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Genetic & Evolutionary Algorithms,Scala,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Miner,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Natural Language Processing,Recommendation Engines",,"Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",,100GB,,"Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,NoSQL,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Most of the time,,Most of the time,,Most of the time,,,,,Often,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,Sometimes,,,,Often,,Often,,,,"Collaborative Filtering,Data Visualization,Logistic Regression,Natural Language Processing,Recommender Systems,Text Analytics",,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,60,40,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,Often,,Often,Sometimes,Sometimes,,Often,Sometimes,,Sometimes,,Most of the time,Often,,100% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,38000,EUR,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Brazil,22,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Other,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Very useful,,Very useful,,,Somewhat useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Computer Vision,Natural Language Processing,Time Series","Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Never,10GB,"CNNs,GANs,Markov Logic Networks,Neural Networks,RNNs","Amazon Web services,Java,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Most of the time,,Often,,,,,,Rarely,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,Often,,,,Often,,Often,,,,"A/B Testing,CNNs,Data Visualization,GANs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",Sometimes,,,Often,,,Often,,,,Sometimes,,,Sometimes,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,Most of the time,,,,20,40,5,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Often,,,,,,,,,,,,,,,Often,,Rarely,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Always,53000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Belarus,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,Very useful,"Data Elixir Newsletter,FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,40,10,10,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",,10GB,"CNNs,Neural Networks,Random Forests,RNNs,Other","Java,Jupyter notebooks,Mathematica,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Other",,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Rarely,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,,Often,,Often,Sometimes,,,,,Often,,Most of the time,,,,,Most of the time,Often,Sometimes,,Sometimes,,Sometimes,,,,Most of the time,,,,,40,30,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,Often,,,Most of the time,,Often,Sometimes,,Sometimes,,Often,Sometimes,,,Sometimes,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,25000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,"FastML Blog,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,50,10,30,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Increased significantly,3-5 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,Most of the time,,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis,Other",Rarely,Rarely,Often,,Sometimes,Often,Most of the time,,,,,,Rarely,Rarely,Sometimes,Sometimes,,Sometimes,Often,,Often,,,Sometimes,,Sometimes,,Sometimes,Often,Often,Most of the time,,,40,10,30,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,Often,Most of the time,Most of the time,,,,Often,,,,Often,,Sometimes,Often,,Most of the time,Often,,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,165000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by non-profit or NGO,Amazon Web services,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Official documentation,Online courses,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,6 to 10 years,Researcher,University courses,35,5,10,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Non-profit,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Rarely,10GB,"HMMs,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,Perl,Python,R,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,Rarely,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,Sometimes,Most of the time,,,,,,,Often,,Sometimes,,,,,Often,,,,,,,Sometimes,,,,,,60,10,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,,,,,Rarely,,,,,Sometimes,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial",Sometimes,90000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Portugal,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Cloudera,Regression,SQL,"GitHub,University/Non-profit research group websites","Friends network,Kaggle,Stack Overflow Q&A",,,,,,Somewhat useful,Very useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog",< 1 year,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search","Arxiv,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,,,"Data Stories Podcast,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",Work,15,5,60,15,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Video data,Text data,Relational data",Sometimes,10GB,"Gradient Boosted Machines,Neural Networks","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Often,,,,,,"Collaborative Filtering,Data Visualization,Decision Trees,Gradient Boosted Machines,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,Rarely,,Often,Sometimes,,,,Most of the time,,,,,,,Often,Often,,,Often,,,,,,Often,Sometimes,,,,50,30,15,4,1,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,Most of the time,Sometimes,,,,,,Rarely,,,,,Most of the time,Often,,,Most of the time,,Less than 10% of projects,Approximately half internal and half external,Business Department,"governance data, social networks, telecom data, internal client's data",policy and dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"250,000",RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,22,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,C/C++,Time Series Analysis,Python,Google Search,"Arxiv,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,YouTube Videos",Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,A tech-specific job board,Not at all important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,Don't know,1GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Julia,Jupyter notebooks,Python,TensorFlow",,Often,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Neural Networks,RNNs",,,,Most of the time,,,Often,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,20,75,0,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,Often,,,,Sometimes,,,,,Often,,100% of projects,Entirely internal,IT Department,CMU Arctic; VCTK,For our project we need audio samples to be aligned which is quite non-trivial task by itself. Besides differences in intonations of speakers has bad impact on quality of the models.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,USB drives,Bitbucket,Sometimes,6000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,35,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Friends network,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Workstation + Cloud service,11 - 39 hours,Master's degree,Yes,Bachelor's degree,Physics,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Stan,Text Mining,Python,University/Non-profit research group websites,"Arxiv,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,Very useful,Not Useful,Somewhat useful,Very useful,,,,"FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Programmer,Researcher",University courses,40,25,20,15,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,,Don't know,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Mathematica,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,30,20,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Rarely,,,,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,100% of projects,More external than internal,,COMTRADE,Too small,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,70000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,30,0,40,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Germany,44,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Official documentation,Personal Projects,Textbook",,,,,,,,,,Somewhat useful,,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,University courses,60,0,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,A general-purpose job board,Very important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,Often,,Often,,,,,Often,Often,Often,,,,Sometimes,Often,,,,,,20,30,0,40,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Often,,,Sometimes,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Git,Subversion",Most of the time,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses",Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",70,10,0,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,1TB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Often,,,,,,,,,,Often,,,,Sometimes,Often,,,,,,,Often,,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Sometimes,,,Sometimes,,Sometimes,,,Often,,,Sometimes,,Sometimes,Sometimes,Often,,76-99% of projects,Approximately half internal and half external,Standalone Team,Tcga;gtex,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,75000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,SAP BusinessObjects Predictive Analytics,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Not Useful,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Other,40,10,15,5,10,20,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Video data,Relational data",Sometimes,100GB,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",Sometimes,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,Often,Rarely,Rarely,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,,,,,,Sometimes,Often,,,,Often,,Often,Often,,,,,Most of the time,,Most of the time,,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,Often,,Most of the time,Sometimes,,,,,,,Sometimes,Sometimes,,,,,Most of the time,,,,51-75% of projects,More internal than external,Standalone Team,Open Government Data,"Cleaning data, accounting for biases in the data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Rarely,160000,CAD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst",Work,40,10,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",,Financial,"10,000 or more employees",Increased slightly,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10MB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,Mathematica,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,Rarely,,,,Sometimes,,,,,,,Often,,Sometimes,,,,,Often,,,,Often,,,Sometimes,,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,Often,Rarely,,,Often,Often,,Often,Sometimes,,,,65,15,5,5,5,5,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,Often,Often,,,,,,,,Most of the time,,Sometimes,,,Sometimes,,,Most of the time,,10-25% of projects,More internal than external,Central Insights Team,,Artificial limitations on data access and tools by ineffective risk management,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Git,Rarely,147000,AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Other,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Other,Work,25,15,30,20,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Government,100 to 499 employees,Stayed the same,3-5 years,A tech-specific job board,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Rarely,10GB,"Ensemble Methods,Random Forests","Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Often,Sometimes,,Often,,Often,,,,,,Often,,,,Often,,,Often,,Often,,,,,Often,,,,,,10,10,20,20,10,0,Enough to run the code / standard library,"Lack of significant domain expert input,Limitations of tools",,,,,,,,,,,Often,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,Hardware in terms of clusters and big data handling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,90000,RSD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Japan,33,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Python,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,,Other,Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,NA,10,20,60,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Brazil,27,Employed full-time,,,No,Yes,Programmer,,,RapidMiner (commercial version),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Master's degree,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,50,0,20,20,10,0,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,Canada,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,C/C++,Deep learning,Python,GitHub,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,3 to 5 years,"Data Analyst,Researcher,Statistician",Self-taught,30,15,40,15,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Sometimes,Sometimes,,,,Often,Often,Sometimes,Sometimes,,,Often,,Sometimes,Sometimes,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,,Often,,,,60,20,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",Often,,,Rarely,Often,,,Sometimes,Sometimes,,,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Rarely,97500,CAD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Other",Work,65,5,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Not at all important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Flume,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",Often,Often,,,,,Sometimes,,Rarely,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,Often,Most of the time,,,Sometimes,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",Often,,Often,,Often,,Most of the time,Often,Often,,,,,,Sometimes,Often,,,Most of the time,,,,Often,Often,,,Often,,Most of the time,Often,,,,0,5,5,5,5,80,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,More internal than external,Business Department,Confidential ,Lack of definitions ,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,285000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,20,15,65,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Decision Trees,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,SQL,Tableau,Unix shell / awk",,Rarely,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,Often,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis",Often,,,,,Sometimes,Often,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,Sometimes,,Often,,,,35,20,5,25,15,0,Enough to explain the algorithm to someone non-technical,"Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,Sometimes,Often,,Sometimes,,51-75% of projects,Entirely internal,Other,,Scale of the data ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Data Scientist,Fine,,TensorFlow,Neural Nets,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,Data Analyst,Self-taught,10,5,55,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,Often,,,,,Rarely,,Often,Often,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",Sometimes,,,,,Often,Often,Often,,Often,,,,Rarely,,Often,,,Sometimes,Often,Often,,Often,Sometimes,Sometimes,Rarely,,Often,Sometimes,,,,,10,10,60,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,Sometimes,,Rarely,,Sometimes,,,,,,,,Often,,,Sometimes,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,67000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Greece,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,Python,Support Vector Machines (SVM),Python,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,Very useful,,,,Somewhat useful,,,Very useful,,Very useful,,Somewhat useful,,,Very useful,,"Data Machina Newsletter,FastML Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,10,0,30,20,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Other,41,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by non-profit or NGO,TensorFlow,Neural Nets,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,,,Nice to have,,,,,Nice to have,,,,,,Basic laptop (Macbook),,Github Portfolio,No,Master's degree,Electrical Engineering,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Conferences,,,,,Very useful,,,,,,,,,,,,,,,1-2 years,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,,Logistic Regression,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,54,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,Other,Neural Nets,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Programmer,Researcher",Self-taught,50,10,40,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Stayed the same,6-10 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,Often,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,,,,,,,Sometimes,Sometimes,,Often,,,,,Often,,,,,,70,10,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,AWS S3,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"108,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Predictive Modeler,Fine,Self-employed,Python,Neural Nets,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"DataCamp,Other","Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Researcher",Self-taught,50,10,30,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important +Male,United States,54,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Online courses,Personal Projects",Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,,,,,Necessary,Necessary,Nice to have,,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Other,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Male,Netherlands,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer,Other",University courses,0,10,30,40,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Other","Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,RapidMiner (free version),SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow",,Sometimes,,Rarely,,,,Sometimes,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,Rarely,,Often,Most of the time,,,Rarely,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation",Sometimes,,,,Rarely,Often,Most of the time,Often,Often,,,Sometimes,,Sometimes,Rarely,Sometimes,,,,Sometimes,Sometimes,,Often,Rarely,Sometimes,Sometimes,,,,,,,,25,15,20,10,10,20,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,Often,Often,,,,,,,,,,,Most of the time,,,,Often,,51-75% of projects,More external than internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,73000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects,Tutoring/mentoring",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,Somewhat useful,,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Other",Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Management information systems,,"Data Analyst,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important +Male,Portugal,23,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",,,Very useful,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,15,0,30,50,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,20 to 99 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Other",,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,Rarely,Most of the time,,Rarely,,Sometimes,,Often,Most of the time,,Sometimes,,Rarely,Most of the time,,Most of the time,,Often,,,,50,20,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,Often,,Often,,,Sometimes,,,,Most of the time,,76-99% of projects,Approximately half internal and half external,Standalone Team,Physionet datasets,Acquisition noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,12000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Spark / MLlib,Cluster Analysis,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Official documentation,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,University courses,30,10,10,50,0,0,Unsupervised Learning,"Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,,,,Very useful,Very useful,Very useful,Very useful,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,55,5,20,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,,,,,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Rarely,1GB,"Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,Python,R,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,,,,,Most of the time,,Most of the time,Most of the time,,,,,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,30,10,0,5,25,30,Enough to refine and innovate on the algorithm,"Explaining data science to others,Limitations in the state of the art in machine learning",,,,,,Often,,,,,,Most of the time,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,150000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,52,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,Other,Random Forests,Python,Google Search,"Blogs,Friends network,Online courses",,Somewhat useful,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,"5,000 to 9,999 employees",Increased slightly,1-2 years,A general-purpose job board,Very important,,Traditional Workstation,"Text data,Relational data",,,,Java,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Git,Subversion",Rarely,320000,BRL,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,,R,"GitHub,Google Search,Government website","YouTube Videos,Other",,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer",Other,60,10,30,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,Telecommunications,10 to 19 employees,Decreased significantly,Less than one year,Some other way,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,1TB,,"QlikView,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Other",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,0,0,0,60,40,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Most of the time,,,,Sometimes,,,Often,Often,,,,Most of the time,Most of the time,,,,,,,Sometimes,,100% of projects,Entirely internal,Central Insights Team,,"Unclear of how and where to start analysis with, we have data from client as 2 of our traditional products are already sold. hence we need to show capability to client by building insights , problem for us is how to approach on available data to find valuable insights.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,,Rarely,700000,INR,Other,2,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Other,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,1 to 2 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,50,10,0,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Other (please specify; separate by semi-colon)",A bachelor's degree,Pharmaceutical,500 to 999 employees,Stayed the same,Don't know,Some other way,Somewhat important,Other,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Often,,,,Sometimes,,Often,,,,,,,Sometimes,,Often,,,,,,,,,,,80,0,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Other",Often,,,,,,,,,,,,,,Often,,,,,,,Most of the time,Less than 10% of projects,Entirely external,Standalone Team,websites;pdf's;,scrapping the different websites or PDF's for regulatory information,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,shared folder,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Never,60000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,India,21,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Weka,Social Network Analysis,R,"GitHub,Google Search","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,"Data Analyst,Engineer",Other,10,35,55,0,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A bachelor's degree,Pharmaceutical,Fewer than 10 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1MB,Random Forests,"C/C++,MATLAB/Octave,R",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Decision Trees",,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,25,30,25,20,NA,0,Enough to run the code / standard library,"Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,Rarely,,,,,Often,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Sometimes,,,,7,,,,,,,,,,,,,,,,,, +Female,Brazil,74,Retired,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,100,0,0,0,0,0,Time Series,"Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,,Business Analyst,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Female,Other,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Social Network Analysis,R,Google Search,"College/University,Kaggle,Official documentation,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,,Very useful,,,,,,Very useful,,1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Machine Translation,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Statistician,Poorly,Employed by company that makes advanced analytic software,Other,Support Vector Machines (SVM),SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,Very useful,,Very useful,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,6 to 10 years,"Data Analyst,Engineer,Researcher",Self-taught,80,0,15,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",,Pharmaceutical,"5,000 to 9,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Relational data,,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,SAS JMP",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,Simulation,Text Analytics",,Often,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,Often,,Sometimes,,Sometimes,,,Most of the time,,,Often,Most of the time,,Sometimes,,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,Often,Often,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,130000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,SQL,I collect my own data (e.g. web-scraping),"Blogs,College/University,Personal Projects,Podcasts,Other",,Somewhat useful,Very useful,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Business Analyst,Self-taught,50,30,10,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Often,,,Often,Naive Bayes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,,,,,,,,,,,,,Often,,,Less than 10% of projects,More external than internal,,,,,,,,,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Other,Neural Nets,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service",Text data,Most of the time,10TB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Impala,Python,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Segmentation,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,Often,,,,Most of the time,,,,80,5,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,76-99% of projects,More internal than external,Business Department,Steam spy; NPD; appannie,Accuracy and cleanliness.,Other,Commercial Data Platform,,"Git,Other",Rarely,136500,USD,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Germany,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Mathematica,Neural Nets,Python,Google Search,"College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Other",11 - 39 hours,,No,Bachelor's degree,Computer Science,,Other,University courses,NA,NA,NA,NA,NA,NA,,"Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,"Arxiv,College/University,Company internal community,Conferences,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher",Work,10,0,40,50,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Insurance,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Julia,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk,Other",Sometimes,Sometimes,,Sometimes,Often,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Often,Sometimes,,Sometimes,Most of the time,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,10,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,Sometimes,Sometimes,Sometimes,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint,Other",Dpmino Data Lab,Git,Rarely,200000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Friends network,Stack Overflow Q&A,Other",,Very useful,,,,Somewhat useful,,,,,,,,Very useful,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Miner,Data Scientist",Self-taught,75,25,NA,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",,Other,20 to 99 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,RNNs","Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",Sometimes,Most of the time,,,Often,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Often,Often,,Most of the time,,,,"A/B Testing,Decision Trees,kNN and Other Clustering,Natural Language Processing,Random Forests,RNNs,SVMs,Text Analytics",Most of the time,,,,,,,Most of the time,,,,,,Often,,,,,Sometimes,,,,Often,,Often,,,Rarely,Sometimes,,,,,70,20,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,,,,Most of the time,,,,Often,,Rarely,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,App stores;appannie;nlp libraries,Dirty tracking data and missing data,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,Git,Sometimes,40000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,SQL,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Data Analyst,Researcher,Statistician",University courses,20,30,5,40,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Non-profit,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","KNIME (free version),Python,R,Tableau,Other",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Sometimes,,,,Sometimes,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Often,,,,,,Often,,Most of the time,,,,Sometimes,,,Often,,,,Most of the time,,Most of the time,Most of the time,,,,60,20,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization",Often,Often,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,"World Development Indicators (World Bank), OECD data",NA,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,25000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Textbook",,Somewhat useful,,,,,,,,Very useful,Very useful,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist",Self-taught,35,30,5,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Image data,Text data,Other",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Other","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Rarely,,Sometimes,,Often,,Often,,,,,,,,,Sometimes,,,,Sometimes,,Often,Sometimes,Often,,Rarely,,,Often,Sometimes,,,,40,10,35,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,Often,,Often,Sometimes,,,,,,,,,,,Most of the time,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,Subversion,Rarely,2000000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Self-employed,R,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Textbook",,,,Somewhat useful,,,Very useful,,Very useful,Somewhat useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,Statistician,University courses,25,0,25,40,10,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Sometimes,,Rarely,,,Often,Most of the time,Often,Sometimes,,,,,,,Often,,,,,Often,,Sometimes,,,Sometimes,,,,Sometimes,,,,15,25,15,25,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Sometimes,Sometimes,,,,,,,Rarely,,,,,,,Sometimes,Sometimes,,51-75% of projects,More external than internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Russia,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,Python,Text Mining,Python,GitHub,"Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring",,,,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,1 to 2 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Work,0,10,70,0,20,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,Always,100GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Mathematica,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Statistica (Quest/Dell-formerly Statsoft),TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,Often,,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",Often,Often,Often,Often,,,,Often,,,,Often,Often,,,Sometimes,,Often,Most of the time,Often,,,Often,Often,Often,,,,Most of the time,Sometimes,,,,70,10,20,0,0,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,git,Git,Sometimes,110000,RUB,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,KNIME (free version),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,3 to 5 years,"Data Scientist,Researcher",Self-taught,80,0,0,10,0,10,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Decreased significantly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,Rarely,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,,,Sometimes,Most of the time,Sometimes,Most of the time,,Often,,Rarely,,,,Most of the time,Often,,,,80,5,3,2,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,Most of the time,,,,Sometimes,,,,,,,,,Sometimes,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Bitbucket,Sometimes,132,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation,Online courses,Tutoring/mentoring",,,,,,,Very useful,,,Somewhat useful,Very useful,,,,,,Very useful,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),40+,Online Courses and Certifications,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Unsupervised Learning,"Decision Trees - Random Forests,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,22,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,FastML Blog",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,40,10,5,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,20+,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important, +Male,Other,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Online courses,Personal Projects",,,,,,,,,,Very useful,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,15,0,30,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,Some other way,Important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data,Other",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM Cognos,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,SQL,Other,Other",,,,Often,,,,,,Rarely,,,,,Rarely,,Rarely,,,,,Sometimes,Often,Sometimes,Most of the time,,Sometimes,,,,Rarely,Rarely,Most of the time,,,,,,,,,Rarely,,,,,,,Sometimes,Often,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,Often,,Rarely,Most of the time,Most of the time,Most of the time,Sometimes,Rarely,Rarely,,Sometimes,Most of the time,Often,,Often,Sometimes,Most of the time,Rarely,Sometimes,Often,Often,Rarely,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,,20,10,20,10,20,20,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Often,,,Often,Often,,,,Often,,Sometimes,Often,Most of the time,,76-99% of projects,Approximately half internal and half external,Standalone Team,Climate data; Property ownership data; Online news media; Social Media; Web sites,Getting the data (privacy issues + organisations' reluctance to giving access to data).,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,36000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Philippines,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,Decision Trees - Gradient Boosted Machines,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,People 's Republic of China,40,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),The Analytics Dispatch Newsletter",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Newsletters,Non-Kaggle online communities,Online courses",,,,,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,Other,University courses,15,15,30,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Decreased slightly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Video data,Text data,Relational data",Sometimes,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,TIBCO Spotfire",Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Sometimes,,,,,Often,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,Often,,,,Often,,,,Most of the time,,,Rarely,,Sometimes,,Most of the time,,,,,Sometimes,Sometimes,Often,,,,30,30,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Often,,,,,,,,,,Most of the time,,,,,Rarely,Most of the time,,,76-99% of projects,More internal than external,IT Department,Weather; Population; Economic; Market Prices,Granularity; Time horizon; Availability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,84000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,Java,Association Rules,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,"FastML Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Programmer,Researcher",Work,15,10,75,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Often,,Rarely,,,Often,Often,Sometimes,Often,Rarely,,Often,,Sometimes,,Often,,Rarely,,,Often,,Sometimes,,,Sometimes,,Sometimes,Often,,,,,15,35,35,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Sometimes,Often,,Often,,,,,,,,,Often,,Sometimes,Often,Rarely,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",,,Git,Sometimes,1620000,RUB,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Other",Self-taught,50,0,0,0,50,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,100 to 499 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,SAS Enterprise Miner,SQL,Stan,TensorFlow,TIBCO Spotfire",,Sometimes,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,Often,,,Often,Sometimes,,,Most of the time,Most of the time,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Often,,Often,Often,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,Most of the time,Most of the time,,Often,,Most of the time,Often,Most of the time,Most of the time,,Most of the time,,,,50,20,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"77,500",,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Somewhat useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,10,50,10,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service",Image data,Most of the time,10TB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Rarely,,Often,,,,Rarely,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,Rarely,Most of the time,,Often,Often,,,,,,,Sometimes,,Sometimes,,Sometimes,,Most of the time,Often,,Sometimes,,,Often,,Sometimes,,,,,,40,25,20,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,none,getting ground truth annotations,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,135000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,,,,,,,,,,,,,,,,,"Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Rarely,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,15,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Sometimes,110000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,"Arxiv,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,10,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs",A professional degree,Academic,I prefer not to answer,Increased slightly,Don't know,Some other way,Not very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data",Never,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,,,Often,Sometimes,,,Often,Often,,,Often,,Sometimes,,Sometimes,,,Often,Often,Often,,Most of the time,Sometimes,Often,,,Sometimes,Sometimes,,,,,10,30,0,40,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Most of the time,,,,Often,,,,Most of the time,,Sometimes,,,Often,Most of the time,,,,,,,,100% of projects,Entirely external,Other,"cifar, mnist",,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Unix shell / awk,Uplift Modeling,R,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"FastML Blog,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Statistician",University courses,70,0,15,15,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Most of the time,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests,RNNs,SVMs","Microsoft SQL Server Data Mining,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,SVMs",Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Often,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Other,Sometimes,28000,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,South Korea,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",C/C++,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Not Useful,,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,Very useful,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,45,10,40,5,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data",Most of the time,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Sometimes,Most of the time,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics",Often,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Often,Most of the time,Most of the time,Most of the time,,,,,78,15,3,2,2,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,Rarely,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,80000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Researcher,Work,50,30,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Gradient Boosted Machines,Random Forests,SVMs","Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Sometimes,,,,,,,,Often,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,Often,,,,50,30,5,5,10,0,Enough to tune the parameters properly,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Often,,,,26-50% of projects,More internal than external,Business Department,Quandl,Dirty,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,60000,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,,Other,Company internal community,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer,Other",Work,80,0,20,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,100MB,Regression/Logistic Regression,"Python,QlikView",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,40,10,20,0,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git,Subversion",,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Belgium,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,Very useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,Talking Machines Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,Self-taught,50,5,0,30,15,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important +Female,Canada,21,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,R,Deep learning,R,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,,,,,,Nice to have,,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Kaggle Competitions,Yes,Bachelor's degree,A health science,Less than a year,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,80,10,0,10,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,,,,,,,,,Somewhat important,,,,,, +Female,Malaysia,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Data Analyst,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,Very useful,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Other",Work,25,60,15,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Retail,100 to 499 employees,Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Most of the time,Often,Often,,,,,,Sometimes,,Often,,,,,,,Often,,,,,,,Often,,,,75,10,NA,5,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,120000,RUB,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites,Other","Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Engineer,Researcher",University courses,20,0,0,80,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"10,000 or more employees",Increased significantly,Don't know,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,"Decision Trees,Ensemble Methods,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,DataRobot,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,RapidMiner (free version),SQL,Tableau",,,,,Often,Often,,,Often,,,,,Often,Often,,Often,,,,,,,,,,,,,,Often,,,,Often,,,,,,,Often,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Often,,,,Often,Often,Often,Often,,,,,Often,,Often,,,Often,Often,Often,Often,Often,Often,,Often,,Often,Often,,,,,50,20,10,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,Often,,Often,,,,Often,Often,,,Often,Often,Often,,Often,Often,Often,,Often,,100% of projects,More external than internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Always,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,I don't write code to analyze data,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,80,0,10,10,0,,Decision Trees - Random Forests,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Proprietary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Data Miner,Software Developer/Software Engineer",University courses,0,0,89,10,1,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs",High school,Manufacturing,Fewer than 10 employees,Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,Neural Networks,"Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Neural Networks,Prescriptive Modeling,Time Series Analysis",Often,,Often,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,Often,,,,50,15,15,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Often,Sometimes,,,,,,,,,,,,,Sometimes,Often,Often,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),"Share Drive/SharePoint,Other",FTP,Subversion,Most of the time,100000,CHF,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Switzerland,32,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,SAS Base,,,,"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher",University courses,10,10,0,80,0,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Don't know,<1MB,Bayesian Techniques,"C/C++,MATLAB/Octave,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Sometimes,Often,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,40,10,0,10,40,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,105000,CHF,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +A different identity,Other,100,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,35,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Factor Analysis,Python,Other,"Conferences,Friends network,Kaggle,YouTube Videos",,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",Other,50,0,40,0,0,10,,,A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,,Not very important,Analyze and understand data to influence product or business decisions,,Relational data,Sometimes,1GB,,"Java,Python,SAS Enterprise Miner,SQL,Other",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,,,,,Sometimes,,,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,25,25,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Often,Often,Often,,,Often,,,Often,,,,,,,Often,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Rarely,,,,6,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by government,Julia,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,Engineer,Self-taught,90,5,0,0,5,0,"Natural Language Processing,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,SVMs","Google Cloud Compute,Java,Python,R",,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,SVMs",,,Most of the time,,,,,,,,,,,,,Rarely,,Most of the time,Often,Rarely,,,,,,,,Rarely,,,,,,40,50,5,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,sqlserver,Scraping data from Google News,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Bitbucket,Git",Sometimes,36000,TRY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Israel,33,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Official documentation,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,,,,Not Useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Researcher",University courses,40,5,20,30,5,0,"Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Retail,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,1TB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Julia,Jupyter notebooks,Python,SQL,TensorFlow",Rarely,,,,,,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",Most of the time,,,,,,,Often,,,,,,,Most of the time,Often,,,,Most of the time,,,Most of the time,,Most of the time,,,Sometimes,,Most of the time,,,,35,20,35,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,Sometimes,,,Most of the time,Sometimes,,,Often,,,,,,,,Often,,,,,,10-25% of projects,Approximately half internal and half external,Other,wikipedia,dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Git",Sometimes,100000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,24,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Tutoring/mentoring",,Very useful,,,,,Very useful,Somewhat useful,,,Very useful,,,,,,Very useful,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,No,Bachelor's degree,Engineering (non-computer focused),,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,35,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),Friends network,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Financial,500 to 999 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Other,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,SVMs","Mathematica,Python,R,SQL",,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Simulation,SVMs",Sometimes,,Rarely,,,,Rarely,Rarely,,,,,,,,Rarely,,Rarely,,,,,,,,,Rarely,Rarely,,,,,,10,15,20,5,10,40,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Researcher,University courses,30,0,30,30,10,0,Computer Vision,Neural Networks - CNNs,,Manufacturing,20 to 99 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Don't know,10GB,CNNs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,30,30,5,5,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Rarely,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Machine Learning Engineer,,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",80,0,0,0,0,20,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Most of the time,,,"Amazon Machine Learning,Google Cloud Compute,Python,TensorFlow",Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,GANs,Neural Networks,RNNs",,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,50,30,10,10,0,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,Often,,,Often,,,Often,,,Often,,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,Graph (e.g. GraphBase/Neo4j),Other,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,,,5,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Company internal community,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,,,,,Very useful,,,Very useful,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Statistician,Other","Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,100MB,"Decision Trees,Neural Networks","Minitab,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,,Often,Often,Often,,,,,,,,,,,,"Decision Trees,Logistic Regression,Neural Networks",,,,,,,,Often,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,50,20,0,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,,,Often,Often,Often,Often,,Sometimes,,,,Often,Often,,Often,,,76-99% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,200000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book",Very useful,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,,"FastML Blog,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,"Computer Scientist,Researcher",University courses,70,30,0,0,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,10 to 19 employees,Stayed the same,6-10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Rarely,100MB,"Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,Python,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,,,,Sometimes,,,,,Sometimes,,Most of the time,,,Most of the time,,Sometimes,,,,,,,Most of the time,Most of the time,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,gitlab,Git,Most of the time,178000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,Other,"Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,Very useful,Very useful,,,,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,,,,,,,,Necessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,75,0,5,0,0,,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Female,Other,29,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Python,Other,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community",,,Not Useful,Not Useful,,,,,,,,,,,,,,,,1-2 years,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",70,28,2,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,India,27,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,Python,Deep learning,R,University/Non-profit research group websites,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Statistician",University courses,0,30,0,30,40,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Government,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,1MB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,R",,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,Most of the time,,,,,20,20,20,30,10,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Organization is small and cannot afford a data science team",,,,,,,,,,,,,Often,,,Often,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,Other,Email,,,Rarely,240000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,Recommendation Engines,Neural Networks - RNNs,,Telecommunications,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,10TB,Neural Networks,"Amazon Web services,Java,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,70,20,0,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations of tools,Organization is small and cannot afford a data science team",Most of the time,Sometimes,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,10-25% of projects,More external than internal,IT Department,bank data,Reach value for company,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",Never,120000,BRL,Has decreased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Java,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,Somewhat useful,,,,,,Very useful,Very useful,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,10,30,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Non-profit,100 to 499 employees,Stayed the same,1-2 years,A tech-specific job board,Somewhat important,Other,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,HMMs,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Often,,,Rarely,Sometimes,,Often,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,HMMs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,SVMs,Text Analytics,Other",,,Often,,,,Often,,Sometimes,,,,Rarely,,,,,,Most of the time,Sometimes,Often,,,,Sometimes,,Sometimes,Often,Often,,Sometimes,,,60,20,10,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Sometimes,,Rarely,,,,,,,,,,,,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,Gdelt,Nightly processing of the data ETL pipeline,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,"91,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Very useful,Not Useful,Not Useful,Very useful,,,,"Linear Digressions Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Programmer,University courses,44,1,10,40,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,"1,000 to 4,999 employees",,,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Other",Never,10GB,"Regression/Logistic Regression,Other","C/C++,R,Unix shell / awk,Other",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Other,Other",,,,,,Often,Most of the time,Rarely,,,,Sometimes,,Sometimes,,Often,,,,,Often,,Sometimes,,,,,Sometimes,,,Most of the time,Often,,10,50,0,20,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Most of the time,,Most of the time,,,,,,Sometimes,,,Most of the time,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Other",External storage drive,Git,Sometimes,25000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,68,Retired,,,Yes,,Statistician,Fine,Employed by government,R,Survival Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Trade book,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,Very useful,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Psychology,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Survival Analysis,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Other",,Very useful,,,,,Very useful,,,,Very useful,,Very useful,Very useful,Somewhat useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,75,10,15,0,0,0,Time Series,,A bachelor's degree,Government,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10MB,Regression/Logistic Regression,"Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Often,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,20,10,20,40,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,Often,,,,,,,,Sometimes,,,Sometimes,,,76-99% of projects,More internal than external,Other,none,reading in large amounts of data from SQL databases into R,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,dropbox,Git,Rarely,62000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,37,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,O'Reilly Data Newsletter,< 1 year,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Somewhat important +Female,Colombia,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects",,,,,Not Useful,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Very useful,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,0,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),SAS Base,SQL,Tableau,TensorFlow",,,,Rarely,,,,,Rarely,,Sometimes,Sometimes,,,,,Most of the time,,Sometimes,,Sometimes,Sometimes,,Often,,,Often,,,,Most of the time,,Most of the time,,Sometimes,,,Sometimes,,,,Often,,,Most of the time,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Most of the time,,Most of the time,Most of the time,Most of the time,Often,,,,Often,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Often,Often,,,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,,,,60,20,5,12,3,0,Enough to refine and innovate on the algorithm,"I prefer not to say,Limitations of tools",,,,,,,Most of the time,,,,,,Often,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)",Commercial Data Platform,,Git,Sometimes,8800,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Vietnam,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A",,Not Useful,Very useful,,,,Somewhat useful,Not Useful,,,,Very useful,Somewhat useful,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,Other,University courses,NA,NA,NA,NA,NA,NA,,Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,Russia,47,"Not employed, but looking for work",,,,,,,,NoSQL,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Newsletters,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Necessary,,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Not important,Not important,,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Not important +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Stan,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Not Useful,,,,,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Statistician",Work,10,10,75,0,0,5,,,A bachelor's degree,Manufacturing,100 to 499 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Other,Traditional Workstation,Text data,,,Other,"Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,Rarely,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,35,0,5,40,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Often,Often,,,Most of the time,,,,Sometimes,,,,,,,Sometimes,Often,,Most of the time,Sometimes,Often,Most of the time,76-99% of projects,Entirely internal,Other,,"Inconsistent files located all over the place, changing formats over time.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Rarely,60000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by non-profit or NGO",TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,,,,,Somewhat useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,I never declared a major,6 to 10 years,Software Developer/Software Engineer,Self-taught,60,30,10,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,Spark / MLlib,SQL,Other",,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Text Analytics,Time Series Analysis",,Sometimes,,,,,Most of the time,Sometimes,,,,,,Most of the time,,,,,Often,,,,,,,,,,Often,Most of the time,,,,30,5,5,20,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Often,,,,,Often,,,Sometimes,,,Rarely,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data,Share Drive/SharePoint",,"Git,Other",Never,100000,BRL,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,19,Employed part-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,"College/University,Conferences,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,"O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Software Developer/Software Engineer",University courses,5,5,0,75,15,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Julia,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,65,25,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Textbook",,Very useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Operations Research Practitioner,Researcher",Work,15,5,80,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks",High school,Government,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data,Other",Sometimes,100TB,"Bayesian Techniques,Decision Trees,HMMs","MATLAB/Octave,NoSQL,Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,Sometimes,,,"Bayesian Techniques,Data Visualization,HMMs,kNN and Other Clustering,Naive Bayes,Simulation",,,Most of the time,,,,Most of the time,,,,,,Sometimes,Often,,,,Sometimes,,,,,,,,,Often,,,,,,,20,20,5,20,35,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,Most of the time,,,,Sometimes,,,Sometimes,,,Often,,,,,,,,100% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,"100,000",CAD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,R,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,Somewhat useful,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,,Other (please specify; separate by semi-colon),A bachelor's degree,Government,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,100MB,Regression/Logistic Regression,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,80,5,0,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,,Most of the time,Often,,,Most of the time,,,,,,,Most of the time,,,,,Often,,100% of projects,Entirely external,Other,SIOPE; SIDRA; public available data,understanding it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Other,Sometimes,6700,BRL,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Argentina,36,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,,,,"Coursera,edX,Udacity",Traditional Workstation,40+,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",,,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Very Important +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Other",University courses,65,15,5,5,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A doctoral degree,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Rarely,Sometimes,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics",Rarely,,,,Sometimes,Often,Often,,Sometimes,,,,,Rarely,,Sometimes,,,,,Sometimes,,Often,Sometimes,,Rarely,Often,,Often,,,,,70,5,0,5,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Most of the time,Often,,Often,Sometimes,,,,,Sometimes,,Often,,,,Most of the time,Sometimes,,26-50% of projects,Approximately half internal and half external,,data.world;us census;acs;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,60000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Kaggle,Official documentation",Somewhat useful,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,No Free Hunch Blog,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Psychology,1 to 2 years,,Kaggle competitions,10,0,0,0,90,0,Supervised Machine Learning (Tabular Data),Gradient Boosting,High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Poorly,Employed by non-profit or NGO,,,,,"Blogs,Company internal community,Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Government,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Sometimes,100GB,Bayesian Techniques,"Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,,,,,,,,Rarely,,,,,,,,,,,,,Rarely,Rarely,,Sometimes,,Rarely,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees",,,Rarely,,,,Often,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,Sometimes,,,,,,Often,,,Often,,Sometimes,Often,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +A different identity,India,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Deep learning,Python,University/Non-profit research group websites,"Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,,,,Somewhat useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Mexico,26,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,Very useful,,Very useful,,Very useful,,,,,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,60,20,10,10,0,0,Computer Vision,Ensemble Methods,"Some college/university study, no bachelor's degree",Technology,"5,000 to 9,999 employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Always,10TB,Neural Networks,"Amazon Web services,Python,RapidMiner (commercial version)",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,60,10,10,10,10,0,Enough to run the code / standard library,Data Science results not used by business decision makers,,Most of the time,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Email,I don't typically share data",,Git,Sometimes,18000,MXN,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Somewhat useful,,,,,,,,Very useful,,,,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler",University courses,30,30,20,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"5,000 to 9,999 employees",Stayed the same,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","IBM Cognos,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,Rarely,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Rarely,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Simulation,SVMs,Time Series Analysis",Often,,,,Often,,,,,,,Sometimes,,Often,,Often,,,,Often,,,Often,Often,,,Often,Sometimes,,Often,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,70000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Java,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Personal Projects,YouTube Videos",,,,,,,,,,,,Very useful,,,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,50,0,20,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Support Vector Machines (SVMs)",High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed part-time,,,Yes,,Predictive Modeler,Poorly,Employed by college or university,Stan,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,,,,,,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,"Data Miner,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",10,30,20,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,Increased slightly,Don't know,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Rarely,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,Spark / MLlib,SQL",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis,Other",,Sometimes,Often,,,Most of the time,Often,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Often,,,,Often,Sometimes,,Often,,Sometimes,,,Often,Sometimes,Sometimes,,,Often,35,35,2,8,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,Often,,Often,Often,,,Sometimes,Often,,,,Often,,Often,Often,,,51-75% of projects,More external than internal,Other,,Understanding the data (no or bad documentation),"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Self-employed",TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Researcher",University courses,70,0,15,13,2,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM SPSS Statistics,Jupyter notebooks,Python",,Most of the time,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Rarely,,,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Sometimes,,Most of the time,,,Rarely,,Sometimes,,Most of the time,,,Most of the time,Most of the time,Often,,Often,,,,75,10,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,Often,,100% of projects,More internal than external,Standalone Team,Data obtained by web scrapping ,Run all transformations fast,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Female,Spain,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Official documentation,Online courses,YouTube Videos",,Somewhat useful,,,Very useful,,Somewhat useful,,,Very useful,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Other",Self-taught,10,40,30,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Mix of fields,20 to 99 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service,Other","Image data,Text data,Relational data",Always,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","NoSQL,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,Often,Often,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,30,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,,Often,,,,Often,,,,,,,Often,,,,,Often,,51-75% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Bitbucket,Sometimes,40000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,Very useful,Somewhat useful,Not Useful,Not Useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Not Useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,"Data Scientist,Researcher,Statistician",University courses,5,5,10,80,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Government,Fewer than 10 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL",,Rarely,,Rarely,,,,,Rarely,,,,,,Rarely,,Rarely,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Rarely,,,,Rarely,Often,Most of the time,Sometimes,Rarely,,,,,Sometimes,,Often,,,Sometimes,,,,Sometimes,Rarely,,Rarely,Rarely,,Sometimes,Rarely,,,,50,20,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Often,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,100% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"140,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,26,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,College/University,Personal Projects,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,PhD,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,0,20,20,10,0,"Computer Vision,Machine Translation","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Other",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Female,India,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,30,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,United States,27,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,SAS Base,Bayesian Methods,Stata,"Government website,University/Non-profit research group websites","Blogs,Friends network,Stack Overflow Q&A",,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,A humanities discipline,3 to 5 years,,University courses,90,5,5,0,0,0,"Survival Analysis,Time Series",Logistic Regression,A master's degree,,Fewer than 10 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Regression/Logistic Regression,"IBM SPSS Statistics,Jupyter notebooks,Python,R,Other",,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,50,50,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,Often,Often,,,26-50% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,22000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"Data Stories Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,University courses,30,40,5,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Financial,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Always,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","NoSQL,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Naive Bayes,Neural Networks",Often,,,,,,Most of the time,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,40,20,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,Often,,,,Often,Often,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,,clean it,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Mercurial",Rarely,20000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +A different identity,Netherlands,58,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Other,Other",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,70,0,0,10,5,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Mix of fields,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Sometimes,,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,SVMs","MATLAB/Octave,Minitab,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Often,,Most of the time,,Often,Sometimes,,,,,Often,,Most of the time,,Often,,Most of the time,Most of the time,,Often,,Often,,,Often,Sometimes,Often,,,,40,15,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT",Sometimes,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,EUR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Singapore,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,SQL,Regression,Python,GitHub,"Blogs,Online courses,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,A social science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Time Series,Unsupervised Learning",Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,23,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,R,Cluster Analysis,,,"Conferences,Kaggle,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,,< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Other,Less than a year,I haven't started working yet,Kaggle competitions,0,0,24,26,50,0,Reinforcement learning,,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Poland,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Deep learning,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,Somewhat useful,,,Very useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Business Analyst,Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,40,30,20,5,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Java,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (free version),SQL,Tableau,Unix shell / awk",,,,,,,,,,,,Sometimes,,,Rarely,,Rarely,,,,,,,,,,,Sometimes,,,Sometimes,,Most of the time,,Rarely,,,,,,,Often,,,Sometimes,,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,Rarely,Often,,,Often,Most of the time,Most of the time,,,,,,,,Often,,Rarely,,,Often,Most of the time,Often,,,Often,Often,,,Sometimes,,,,50,10,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,Most of the time,,Most of the time,Sometimes,,,,,,,,,,,Often,,,26-50% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,80000,PLN,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Other,University courses,10,20,20,10,10,30,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Machine Learning,Amazon Web services,IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,Tableau",Sometimes,Often,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,Sometimes,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,,Often,Often,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,,Often,Often,,,,,,,Sometimes,,,,45,20,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,Sometimes,,,Sometimes,,,,Often,,Sometimes,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,450000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,KDnuggets Blog,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,30,0,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,India,14,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Official documentation,YouTube Videos",,,,,,,,,,Somewhat useful,,,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,"Information technology, networking, or system administration",Less than a year,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,15,0,0,0,5,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Most of the time,100MB,"CNNs,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Other",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,Most of the time,"CNNs,Cross-Validation,kNN and Other Clustering,Neural Networks,SVMs",,,,Most of the time,,Most of the time,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,15,50,30,5,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Most of the time,,,,,,Most of the time,Often,,,,,Most of the time,,Often,,Often,Often,,None,Entirely external,Standalone Team,CIFAR; ImageNet; COCO,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Poland,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Time Series Analysis,R,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Predictive Modeler,Researcher,Statistician",University courses,10,15,15,60,0,0,"Survival Analysis,Time Series",,A master's degree,Financial,100 to 499 employees,Decreased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,25,25,10,20,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,3650,PLN,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,16,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Online courses,Other",,,,,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),I did not complete any formal education past high school,,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important +Male,Australia,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook",,,,,,,,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,6 to 10 years,"Data Analyst,Predictive Modeler,Other",Work,30,0,70,0,0,0,,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Financial,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,1GB,"Decision Trees,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,Most of the time,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Segmentation,Text Analytics",,,,,,,Often,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,Rarely,,,,,50,10,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,,,,Sometimes,,,Often,Often,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,"Census, Equifax, dnb, Roy Morgan, ",Productionise transformed data back to the data warehouse,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,166000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,Not Useful,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Engineer,Software Developer/Software Engineer",University courses,10,60,10,10,10,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Pharmaceutical,500 to 999 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,1MB,,"Cloudera,R",,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Brazil,34,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,,,,,,,Somewhat useful,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,Programmer,University courses,0,0,0,100,0,0,"Computer Vision,Reinforcement learning",Support Vector Machines (SVMs),No education,Technology,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,,SVMs,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,0,0,0,100,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,4500,BRL,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Ireland,29,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Cluster Analysis,Python,I collect my own data (e.g. web-scraping),Other,,,,,,,,,,,,,,,,,,,,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Researcher,I haven't started working yet",University courses,75,0,0,20,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook",,,Very useful,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,"Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Programmer,Statistician",University courses,30,0,25,35,5,5,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,R,SQL",,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,Sometimes,,,,Sometimes,,Often,Often,,,,,,,Most of the time,,,,,Often,,Most of the time,,,,,Often,,Sometimes,,,,60,20,0,5,15,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,,,,Most of the time,,,Sometimes,Often,,10-25% of projects,Entirely internal,IT Department,Not to disclose,Cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Never,,INR,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,,"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,28,"Not employed, but looking for work",,,,,,,,R,Decision Trees,SQL,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,,1-2 years,,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,NA,20,0,0,Other (please specify; separate by semi-colon),,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Python,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Online courses,Personal Projects,YouTube Videos,Other",,,,,,,,,,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,0,0,0,30,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,Sometimes,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,,,Often,,,,Often,,,,Often,,,Often,Often,Sometimes,,Often,,,,,Often,Often,,,,,35,25,10,20,10,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Organization is small and cannot afford a data science team",,,,,,,Often,,,,,,,,,Often,,,,,,,10-25% of projects,Do not know,Other,Government,Access.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed part-time,,,Yes,,Data Scientist,Perfectly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Non-Kaggle online communities,Official documentation,Personal Projects,Textbook",,,Very useful,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Scientist,University courses,25,0,25,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,R,SAS Base,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,Often,Sometimes,,,,60,20,5,7.5,7.5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,Often,,,,,,,,,,,Sometimes,,Often,,51-75% of projects,More internal than external,IT Department,web scraping;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"45,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,38,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Company internal community,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Psychology,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,Mexico,25,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Excel Data Mining,Decision Trees,R,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,42,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Amazon Web services,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,Very useful,,Somewhat useful,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher,Statistician",University courses,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,6-10 years,A general-purpose job board,Very important,Other,Traditional Workstation,"Text data,Relational data",,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","C/C++,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Rarely,Often,Sometimes,Most of the time,,Often,,,Often,,Most of the time,Often,,Often,Sometimes,,Most of the time,Sometimes,,,,10,10,0,10,10,60,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Sometimes,Most of the time,,,,,Sometimes,,,,,,Often,,,Often,,,76-99% of projects,Entirely external,Other,Defense,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Sometimes,"140,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Australia,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,Very useful,Very useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,60,10,0,5,0,Other (please specify; separate by semi-colon),Bayesian Techniques,A master's degree,Financial,500 to 999 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Cloudera,IBM Cognos,IBM Watson / Waton Analytics,Python,QlikView",,Sometimes,,,Most of the time,,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees",,,Sometimes,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,0,0,60,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Most of the time,,,,Often,,,,,,,,,,,,Sometimes,,,,,,51-75% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Sometimes,200000,AUD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,25,50,0,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",Primary/elementary school,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,R,Spark / MLlib,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,Rarely,,,,,,"Ensemble Methods,kNN and Other Clustering,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,,,,Most of the time,,,,,Often,,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,10,50,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,Git,Rarely,60900,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,Less than a year,,University courses,5,5,55,30,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,SQL,Google Search,"Blogs,Newsletters,Online courses,Podcasts,YouTube Videos",,Very useful,,,,,,Very useful,,,Very useful,,Somewhat useful,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,75,15,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,500 to 999 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Always,1TB,Other,"Cloudera,Microsoft Excel Data Mining,R,SAP BusinessObjects Predictive Analytics,Tableau",,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Prescriptive Modeling",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,25,25,5,40,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,Often,Most of the time,,,Most of the time,,,,,,,Often,,,,,Often,,,100% of projects,More internal than external,IT Department,GIS,Integration,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Other,Sometimes,110000,USD,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Russia,33,"Not employed, but looking for work",,,,,,,,Python,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",< 1 year,,Necessary,Necessary,,Necessary,Necessary,,,,,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - GANs,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,Somewhat important,,,,, +Male,United States,56,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Neural Nets,Python,,"Company internal community,Online courses,Personal Projects,Textbook,YouTube Videos",,,,Very useful,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Engineer,Other",University courses,20,0,20,60,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks",High school,Technology,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,NoSQL,R,SQL",,,,,,,,,,,Often,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,,,,,Often,,,,,,,,Most of the time,,,,,,,Often,,,,Most of the time,,Most of the time,Most of the time,,,,50,20,10,10,5,5,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Privacy issues",Sometimes,,,,,,,,,,,,,,,,Often,,,,,,100% of projects,Entirely internal,Other,none,completeness and accuracy,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,"300,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",15,50,15,15,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Telecommunications,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,1GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",Rarely,Rarely,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,Often,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",Rarely,,,Rarely,Rarely,Most of the time,Most of the time,,Most of the time,,,,,Often,,,,,Rarely,Rarely,Sometimes,,Most of the time,Rarely,,Sometimes,,Rarely,Rarely,,,,,55,10,15,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,Sometimes,,,Sometimes,,,,,,Most of the time,,,51-75% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Never,,EUR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +A different identity,Netherlands,49,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,0,0,60,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Regression/Logistic Regression,Other","Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Often,,Sometimes,,,,Rarely,,,Sometimes,,,,,Sometimes,Often,Often,,,,60,4,1,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,,,,,,,,,,Most of the time,,,100% of projects,Entirely internal,IT Department,GIS,Collecting and Cleaning it.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,Very useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Statistician",University courses,14,15,20,50,1,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased significantly,1-2 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics",,,Most of the time,,,Most of the time,Most of the time,Rarely,Most of the time,,,Sometimes,,,,Most of the time,,,,Sometimes,Most of the time,Most of the time,Most of the time,,,Often,,,Sometimes,,,,,70,3,10,10,7,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,,Often,Most of the time,,,,Most of the time,,,,,,Most of the time,,Often,Often,,,Often,,10-25% of projects,More internal than external,Standalone Team,,Lack of documentation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,50000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Not Useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,3 to 5 years,"Data Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL",,,,Rarely,Sometimes,,,,Sometimes,,,,,,,,Often,,,,Rarely,Often,,Sometimes,Rarely,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",Often,Sometimes,,,,Most of the time,Most of the time,Sometimes,Rarely,,,Sometimes,,Often,Sometimes,Most of the time,,,,Sometimes,Most of the time,,Often,,,Often,,Sometimes,,Often,,,,50,15,20,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,,,,Most of the time,,,,,Sometimes,,,,,Often,,Most of the time,,76-99% of projects,More internal than external,IT Department,NOAA weather; Twitter; Facebook,Quality,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,Git,Sometimes,121000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Czech Republic,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Textbook",,Very useful,,,,,Very useful,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,70,0,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Spark / MLlib,SQL",,,,,Often,,,,Often,,,,,Often,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests,Segmentation,SVMs,Time Series Analysis",Often,,Often,,,Often,Often,Often,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,Often,,,Sometimes,,Sometimes,,Sometimes,,,,50,15,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,Often,Sometimes,Often,,,,,,,,,,Often,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,,Not Useful,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Not Useful,"Partially Derivative Podcast,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,25,20,35,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Telecommunications,"5,000 to 9,999 employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,Unix shell / awk,Other",,,,,,,,,Sometimes,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,Sometimes,,,Often,Often,Often,,Often,Rarely,,Often,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Often,Sometimes,Sometimes,,,Often,,Sometimes,Sometimes,Sometimes,,Rarely,Rarely,,Sometimes,Often,Often,,,Sometimes,,,Often,Often,,,,50,15,5,5,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Rarely,,Often,Most of the time,Rarely,,,,Rarely,,,Often,,Most of the time,,Often,,,Rarely,Most of the time,,100% of projects,Approximately half internal and half external,Central Insights Team,Weather; Twitter; Traffic; Social Media; Facebook; ,Slow data acquisition process,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,67000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,SQL,Cluster Analysis,R,I collect my own data (e.g. web-scraping),"College/University,Friends network,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,55,0,0,0,5,"Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Business Analyst,Kaggle competitions,50,0,0,0,50,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",No education,Technology,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Python,QlikView,R",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests",,,,,,,Most of the time,Often,Sometimes,,,,,,,Often,,Sometimes,Sometimes,,,,Often,,,,,,,,,,,40,40,5,5,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team",,Often,,,Most of the time,,,,Sometimes,,Sometimes,,,,Often,Often,,,,,,,10-25% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,900000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Data Analyst,University courses,40,0,20,0,40,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Java,Python,R,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,Sometimes,Often,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics",,,,,,Most of the time,,Often,,,,Sometimes,,,,Often,,Sometimes,Often,,,,,,,,,,Often,,,,,40,20,30,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,None,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,13000,CAD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Greece,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Bayesian Methods,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,50,20,20,10,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,500 to 999 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,"Bayesian Techniques,Ensemble Methods,Other","NoSQL,Perl,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,HMMs,Logistic Regression,Naive Bayes,Simulation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Often,,,,Most of the time,Sometimes,Sometimes,,,,Sometimes,,,Often,,Often,,,,,,,,,Sometimes,,Sometimes,Most of the time,,,,30,20,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,Sometimes,,Often,,,Sometimes,Often,,Sometimes,,Sometimes,Rarely,,,51-75% of projects,More internal than external,Standalone Team,network related datasets like geolocation datasets (MaxMind etc.),consistency along with understanding semantics for fields and values,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Company Developed Platform,,Git,Most of the time,40000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Tableau,I don't plan on learning a new ML/DS method,R,Google Search,"Blogs,Conferences,Official documentation,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",University courses,50,0,10,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Decreased significantly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Decision Trees,Regression/Logistic Regression,SVMs,Other","Java,NoSQL,R",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,SVMs",,,,,,Often,Most of the time,,,,,,,Often,,Rarely,,,,,,,,,,,,Rarely,,,,,,50,5,5,30,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",,,,Often,,,,Often,Sometimes,,,,Sometimes,,,Often,,,,,,,100% of projects,More internal than external,Other,None,currently in h5 format (not ideal),"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","I don't typically share data,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,71000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,GitHub,"Arxiv,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Talking Machines Podcast",1-2 years,,Necessary,Necessary,,Necessary,,,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Github Portfolio,Yes,Master's degree,Computer Science,,Data Scientist,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,Other,Python,Google Search,"College/University,Conferences,Kaggle,Personal Projects,Other",,,Very useful,,Very useful,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,33,0,33,34,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,20 to 99 employees,Stayed the same,More than 10 years,Some other way,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",Other,Most of the time,10GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Java,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,RapidMiner (commercial version),RapidMiner (free version)",Sometimes,Sometimes,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Most of the time,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Prescriptive Modeling,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,,25,30,5,10,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,Sometimes,,Most of the time,Sometimes,Sometimes,,Sometimes,Sometimes,,,26-50% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Sometimes,106000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Argentina,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Non-Kaggle online communities",,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,"Computer Scientist,Machine Learning Engineer,Researcher",University courses,0,40,0,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - GANs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,RNNs,SVMs","MATLAB/Octave,SQL",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",,Sometimes,Often,,,Most of the time,,Often,,Sometimes,,,,Often,,,,Often,,Often,Often,,Often,,Often,Often,,Often,,,,,,40,60,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Tableau,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,40,0,20,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,Often,,Most of the time,Most of the time,Most of the time,,,,,,,Sometimes,,Often,,,Sometimes,,,,,Often,,Most of the time,,,Often,Often,,,,30,20,10,10,20,10,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",,,,,,,,,,Often,,,,,,,,Often,,,,,76-99% of projects,Entirely internal,Standalone Team,N/a,"Volume, shaping, processing","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Mercurial,Rarely,225000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by non-profit or NGO,Python,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites,Other","Blogs,Company internal community,Conferences,Friends network,Kaggle,Personal Projects",,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,,Very useful,,,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,20,50,0,0,0,,,A master's degree,Non-profit,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,,"Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Time Series Analysis",,,,,,Often,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,70,0,15,8,7,NA,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,Other",,,,Sometimes,Often,Rarely,,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,Most of the time,100% of projects,More internal than external,Standalone Team,nyc public education data; nyc department of education data; post-secondary collegeboard data; clearinghouse data,Building consistent business rules to standardize the data extracts from a system whose user interface does not provide data validation,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other","Company Developed Platform,Other","Google Drive, Tableau",Git,Sometimes,74000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,1MB,"CNNs,RNNs","Jupyter notebooks,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Natural Language Processing,Text Analytics",,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Sometimes,Often,,,,,,,,,Most of the time,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Bitbucket,Rarely,475000,INR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Chile,29,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,20,15,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Java,Python,R,SQL",,,,,,,,,,,Most of the time,Often,,,Often,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,,,,,Most of the time,,,,,,,,Most of the time,,,,Often,Often,,,,,Often,,,,,,,,40,15,20,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Need to coordinate with IT,Unavailability of/difficult access to data",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,26-50% of projects,More internal than external,Central Insights Team,"census, financial information",now..performance and storage. We have to process billions of transactions,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,50000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,GitHub,"Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,,,Very useful,,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Other,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer",Self-taught,85,15,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,,,,,,,,,,,,,,,, +Male,South Africa,23,Employed part-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Textbook",,Very useful,,,,,,,,,,Very useful,,,Very useful,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,50,0,0,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Telecommunications,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,50,5,5,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Julia,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,Very useful,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Work,10,0,70,15,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Pharmaceutical,20 to 99 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Rarely,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Impala,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics",,Rarely,Rarely,,Rarely,Most of the time,Most of the time,,,,,Sometimes,,,,Sometimes,,,,Sometimes,Sometimes,,Often,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,,,,,,,,Most of the time,,Most of the time,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Mercurial",Never,7000000,JPY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Other,25,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Anomaly Detection,Python,GitHub,"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,"Business Analyst,Researcher",University courses,25,10,0,40,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Newsletters,Official documentation,Personal Projects",Somewhat useful,Very useful,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,"DataTau News Aggregator,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Scientist",University courses,40,0,30,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,Often,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,,,Often,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,Often,Often,,Sometimes,,,Often,,Often,Sometimes,,,,Rarely,Rarely,Sometimes,,,,10,5,5,10,20,50,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Other",,,Sometimes,,,,,,,,,,,,,,,,,,,Often,26-50% of projects,Entirely internal,Standalone Team,Census; ACS; Point of interest; App Metadata,"It's large, takes lots of compute and there is contention for compute resources",Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Other,Rarely,153000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,20,10,10,60,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Rarely,100MB,RNNs,"C/C++,IBM Watson / Waton Analytics,Java,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,Rarely,,Sometimes,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,,,,Most of the time,Often,,,,,,Often,,,,,Sometimes,Often,Most of the time,,,,Often,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,Sometimes,,Most of the time,,,,Most of the time,Sometimes,Sometimes,,Often,,,,Most of the time,Sometimes,,Often,Most of the time,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,"100,000",BRL,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,28,"Not employed, but looking for work",,,,,,,,Python,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,United States,26,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,R,I don't plan on learning a new ML/DS method,R,"Government website,University/Non-profit research group websites","College/University,Conferences,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,Somewhat useful,,,,,,Very useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Statistician,Other",University courses,0,15,25,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,I don't know,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","SAS Base,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis,Other",,,Rarely,,,,Often,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,,,,Rarely,Sometimes,Often,,,10,30,40,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Rarely,,Often,,,,,Sometimes,,,Sometimes,,Often,,Rarely,,,Sometimes,Rarely,,76-99% of projects,More internal than external,Other,"HCUP NIS, T1D Exchange",N/A,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"55,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,"Google Search,Government website","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,50,30,0,0,20,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,QlikView,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes",,,Often,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,10,10,10,30,40,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,,,BRL,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,8,2,0,0,,Ensemble Methods,Primary/elementary school,Technology,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Other,Laptop or Workstation and local IT supported servers,"Relational data,Other",Rarely,10MB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,,,,,,Often,,Rarely,,,,,Sometimes,,,,,,Rarely,Sometimes,,,,,,Often,,Rarely,Sometimes,,,,30,25,0,25,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Sometimes,,,Most of the time,,Sometimes,,,,,Sometimes,,,,Most of the time,,,76-99% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,33000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Poland,50,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Spark / MLlib,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle,Official documentation,Personal Projects,Textbook,Other",,,Somewhat useful,,Very useful,,Very useful,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Operations Research Practitioner,Programmer,Software Developer/Software Engineer",University courses,30,50,0,20,0,0,Time Series,"Evolutionary Approaches,Gradient Boosting,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,<1MB,"Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Neural Networks,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,,Often,Sometimes,,Sometimes,,,,,,Sometimes,,,,Sometimes,,Often,,,,,Often,,,Often,,,,20,60,0,20,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources",,Most of the time,Often,,,,,,,Often,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Other,stock data,construct new methods,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Other,Sometimes,90000,PLN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Brazil,66,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,SAS Base,Neural Nets,C/C++/C#,,"Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,Operations Research Practitioner,University courses,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,Fewer than 10 employees,Increased slightly,6-10 years,Some other way,Somewhat important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Relational data,Most of the time,10TB,"Evolutionary Approaches,Neural Networks","NoSQL,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Neural Networks,Time Series Analysis",,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,20,30,0,20,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,,,,Often,Often,,,,,,Often,,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,Git,Most of the time,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Podcasts,YouTube Videos",Very useful,Very useful,,,,Somewhat useful,,,,,,,Not Useful,,,,,Somewhat useful,"KDnuggets Blog,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Machine Learning Engineer,Programmer",University courses,40,10,20,30,0,0,"Computer Vision,Natural Language Processing","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Sometimes,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,Other","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk,Other,Other",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Sometimes,Often,Often,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics,Other,Other",Often,,,Sometimes,,Most of the time,Often,,,,,,,Sometimes,,Often,,,Most of the time,Often,Sometimes,,,,Sometimes,,,,Most of the time,,Most of the time,Sometimes,,40,30,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data,Other",,Sometimes,Often,,Most of the time,,,,Most of the time,,,,,Sometimes,,,Sometimes,,,,Often,Most of the time,10-25% of projects,More internal than external,Standalone Team,"text corpora for generating embeddings (Common Crawl, Conceptnet)",Gathering data; Cleaning data; Labeling data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,130000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,R,Anomaly Detection,R,Other,"Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Very useful,,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,Self-taught,20,0,60,20,0,0,Time Series,,A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Other,Other,Most of the time,10GB,"Regression/Logistic Regression,Other","C/C++,R,Unix shell / awk,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,10,10,10,50,20,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Sometimes,,,,Often,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,Performance,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",Git-LFS,"Bitbucket,Git,Other",Always,48000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Brazil,40,Employed full-time,,,Yes,,Other,Fine,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Spark / MLlib,Decision Trees,Scala,GitHub,"College/University,Official documentation,Online courses,Podcasts,Tutoring/mentoring",,,Somewhat useful,,,,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,"Data Stories Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,60,0,15,0,0,"Natural Language Processing,Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches",I don't know/not sure,Insurance,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Always,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches","Cloudera,Flume,Hadoop/Hive/Pig,NoSQL,Python,R,Spark / MLlib",,,,,Most of the time,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Naive Bayes,Text Analytics",,,Sometimes,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,Often,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,,,,,,,,,,,Rarely,Rarely,,Sometimes,,Often,,76-99% of projects,Approximately half internal and half external,IT Department,"Hbase, Hive, Impala",Data cleaning,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Git,,150,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University",,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,27,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,Textbook",,,,,,,,,,,Very useful,,Very useful,,Somewhat useful,,,,"O'Reilly Data Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,India,24,Employed full-time,,,No,Yes,Data Analyst,,Employed by a company that performs advanced analytics,IBM SPSS Statistics,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,Very useful,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,30,0,0,0,20,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs",No education,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,29,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Social Network Analysis,R,Google Search,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),3-5 years,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,,,,Basic laptop (Macbook),0 - 1 hour,Other,Sort of (Explain more),Master's degree,Fine arts or performing arts,3 to 5 years,,University courses,0,5,20,70,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,United States,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Switzerland,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,,Somewhat useful,Somewhat useful,Not Useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"KDnuggets Blog,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,10,30,8,2,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - RNNs",A bachelor's degree,Financial,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10TB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Amazon Web services,C/C++,Java,Jupyter notebooks,KNIME (commercial version),Mathematica,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Rarely,,Rarely,,,,,,,,,,,Sometimes,,Most of the time,Rarely,,Rarely,Rarely,,,Sometimes,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,Often,Often,,,,Often,,Often,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics,Time Series Analysis",,,,Sometimes,Sometimes,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,Often,Often,Sometimes,Sometimes,,,Often,Often,,,,55,10,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Sometimes,,Sometimes,Rarely,,,,,Often,,Sometimes,,,,Often,Often,,26-50% of projects,More internal than external,Standalone Team,can't say,"relationships are not clear, unclear fields, unclean data, legal issues","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,90000,CHF,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,,Very useful,,< 1 year,,,,,,,,,,,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Other,Yes,Doctoral degree,Physics,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Natural Language Processing,Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important +Male,United States,30,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Other,"Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,Business Analyst,Self-taught,40,3,10,47,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Pharmaceutical,500 to 999 employees,Increased slightly,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Often,Sometimes,Rarely,,,,,Rarely,,,,Rarely,Rarely,,,,Sometimes,,,Sometimes,,,Often,Often,,,,37,5,5,10,43,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,Lack of time for deeper analysis ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,,Most of the time,160000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Romania,22,Employed part-time,,,No,Yes,Programmer,Fine,Employed by non-profit or NGO,TensorFlow,Text Mining,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Textbook",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,Not Useful,,,,,< 1 year,,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Programmer,University courses,10,50,0,35,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Very useful,Not Useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher",University courses,20,10,0,65,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,Often,,,Often,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",Often,Sometimes,Often,,,Often,Most of the time,Sometimes,,,,,,,Often,Often,,,,,Sometimes,,,,,Often,Often,,,Sometimes,,,,30,15,15,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,Sometimes,Sometimes,,Most of the time,,,Sometimes,,,,,,Sometimes,,,Most of the time,,,,Most of the time,,100% of projects,Approximately half internal and half external,Business Department,,HIPAA laws,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,93500,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Finland,48,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,Somewhat useful,KDnuggets Blog,< 1 year,,,,,,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Engineer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,DataRobot,Genetic & Evolutionary Algorithms,SQL,Google Search,"Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Very useful,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,50,10,40,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","DataRobot,Jupyter notebooks,KNIME (free version),Python,R,SQL,Tableau",,,,,,Most of the time,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Often,,,,,Often,Most of the time,Often,Often,,,Often,,Often,Often,Often,,Often,Sometimes,Often,,Sometimes,Often,,,Most of the time,Often,Often,,Often,,,,60,9,1,10,20,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Central Insights Team,,lack of a primary key,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Sometimes,220000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United Kingdom,27,Employed full-time,,,Yes,,Statistician,Poorly,Employed by professional services/consulting firm,SQL,Text Mining,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Newsletters,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,50,15,0,0,0,Time Series,Logistic Regression,High school,Pharmaceutical,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,10MB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Often,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction",,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,Often,,Sometimes,,,Often,,,,,,,,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Often,,Most of the time,Often,,,Often,Most of the time,Most of the time,,,Often,,Most of the time,,,,Sometimes,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,CDC,Trying to get it in the first place (cost); data cleaning.,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,29000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,70,0,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,India,23,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler",University courses,30,30,10,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,NoSQL,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,I prefer not to answer,Mathematics or statistics,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Ukraine,31,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,,Less than a year,I haven't started working yet,Self-taught,40,30,0,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Mexico,37,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,Python,Google Search,"Blogs,Podcasts,Textbook",,Very useful,,,,,,,,,,,Very useful,,Very useful,,,,Partially Derivative Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Mathematics or statistics,Less than a year,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important +Female,Pakistan,26,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"GitHub,Government website,I collect my own data (e.g. web-scraping)","College/University,Official documentation,Online courses,Personal Projects",,,Very useful,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,PhD,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",55,15,20,10,0,0,Reinforcement learning,"Bayesian Techniques,Ensemble Methods",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Australia,27,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,"Not employed, but looking for work",,,,,,,,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +,United States,38,Employed full-time,,,Yes,,Other,Fine,Employed by non-profit or NGO,Hadoop/Hive/Pig,Neural Nets,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Very useful,Very useful,,Very useful,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Fine arts or performing arts,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",10,30,10,40,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,Fewer than 10 employees,Increased slightly,More than 10 years,Some other way,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data,Other",,,,"Amazon Web services,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Recommender Systems,Segmentation,SVMs,Text Analytics",Often,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Often,,Often,Sometimes,,,,,40,10,20,10,10,10,Enough to tune the parameters properly,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,,,,,,,,,90000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",Google Cloud Compute,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Podcasts,Tutoring/mentoring",,,,,,,Very useful,,,,,,Very useful,,,,Very useful,,"Becoming a Data Scientist Podcast,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,50,0,5,30,0,"Machine Translation,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Logistic Regression",A professional degree,Academic,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,,,"Markov Logic Networks,Random Forests","IBM SPSS Modeler,IBM Watson / Waton Analytics",,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests",,,,,,,Most of the time,Most of the time,,,,,,,,,,Sometimes,,Often,,Often,Most of the time,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,57,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,University/Non-profit research group websites,"Blogs,Friends network,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,Very useful,,,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,3 to 5 years,,University courses,0,80,0,15,5,0,,,A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Never,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,30,0,20,20,30,0,Enough to run the code / standard library,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Other,,Time and date data stored in different ways.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,,9,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",30,40,10,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SAS Base,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Rarely,,,,Sometimes,,Often,,Rarely,,,,,,Most of the time,,Often,,,,,Often,,,,Most of the time,,,Sometimes,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,SVMs",Often,,Sometimes,,,Most of the time,Most of the time,Often,Sometimes,,,,,Most of the time,,Most of the time,,Most of the time,,,,,Most of the time,,,,,Sometimes,,,,,,50,40,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,,,Sometimes,,,,,Often,,,,Often,,,76-99% of projects,More internal than external,IT Department,,missing value,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Never,90000,CNY,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Belgium,24,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,Mathematica,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Personal Projects",Very useful,Very useful,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,University courses,40,0,10,50,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Relational data,,100MB,Regression/Logistic Regression,"Julia,Python",,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,30,50,15,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,,,,,,,,Often,,100% of projects,Entirely internal,IT Department,/,Making sense of this crap. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Mercurial",Sometimes,25000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,Genetic & Evolutionary Algorithms,Python,GitHub,"Blogs,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A professional degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Microsoft R Server (Formerly Revolution Analytics),Perl,Python,R,SQL,Tableau,TensorFlow",Rarely,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Rarely,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,,Often,Most of the time,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others",,,,,Often,Rarely,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,transaction data,mixed data standards,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,University/Non-profit research group websites,"Arxiv,Blogs,YouTube Videos",Very useful,Very useful,,,,,,,,,,,,,,,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Brazil,35,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,50,30,10,0,10,0,"Outlier detection (e.g. Fraud detection),Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs",Primary/elementary school,Academic,"10,000 or more employees",Increased slightly,Don't know,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Bayesian Techniques,Regression/Logistic Regression","Java,Other,Other",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,"Data Visualization,Logistic Regression,Segmentation,Text Analytics",,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,Often,,,Often,,,,,15,15,10,30,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization",Sometimes,Often,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,"Adresses ",Little time to devote on it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Git,Subversion",Rarely,72780,BRL,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed part-time,,,No,Yes,Other,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",TensorFlow,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,"Statistician,Other",University courses,20,20,5,50,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Online courses,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,,,,,,Very useful,,1-2 years,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,5,0,0,5,Computer Vision,,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,United States,35,Employed full-time,,,Yes,,Statistician,Poorly,Employed by college or university,TensorFlow,Neural Nets,R,Other,"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Personal Projects",Not Useful,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,,,Very useful,,,,,,,"Data Elixir Newsletter,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,6 to 10 years,Researcher,Self-taught,100,0,0,0,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Statistics,Python,R,SQL",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,Rarely,,,,Most of the time,Sometimes,,,,,,Rarely,,Most of the time,,,,,Sometimes,,Sometimes,,,,,,Rarely,Sometimes,,,,44,1,1,10,44,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,,Often,Most of the time,Sometimes,,,Most of the time,,,,Sometimes,Often,Most of the time,,,Most of the time,,Most of the time,Sometimes,,51-75% of projects,More internal than external,Other,None; exclusively private data,Data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,75000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ireland,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,"Business Analyst,Other",Self-taught,40,25,25,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Rarely,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Often,,,,,,,,Rarely,Most of the time,,,Most of the time,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Often,,,Often,Most of the time,,,,,,,Sometimes,Often,Often,,Often,,,Sometimes,,,,,Most of the time,,,Most of the time,Most of the time,,,,35,15,10,15,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Sometimes,,,,,,,,,Often,Sometimes,,76-99% of projects,More internal than external,Standalone Team,"For me, this is not applicable.","Incomplete dirty data, high dimensionality.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,51500,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Belgium,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,Data Scientist,Self-taught,15,5,15,65,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,Fewer than 10 employees,Increased slightly,Less than one year,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Sometimes,,,,Sometimes,Often,,Rarely,,,,Rarely,Often,,Often,,,,20,25,5,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Rarely,Often,,,,,,,,,,,,,Most of the time,,Sometimes,,,100% of projects,Do not know,Standalone Team,,Too big to build models within reasonable timeframe given limited computational resources,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Always,153600,PLN,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,Neural Nets,R,,"Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Trade book",,,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",15,70,15,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,5,15,0,5,0,75,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,India,43,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Tableau,Deep learning,R,Google Search,"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,,,,,Nice to have,,,Nice to have,,,,,,"Basic laptop (Macbook),Other",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,20,40,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,Very Important, +Male,United States,23,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Company internal community,Conferences,Newsletters,Online courses,Personal Projects",,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,,,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,40,30,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",Often,,,,,Most of the time,Often,Sometimes,,,,,,Sometimes,,Most of the time,,,Most of the time,Sometimes,Often,,Often,,Sometimes,,,,Often,,,,,5,10,10,10,20,45,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,,,Most of the time,,,,,,,,,Sometimes,,,,,Sometimes,,,Often,26-50% of projects,Entirely internal,Standalone Team,NDI Dataset,Our source data is unstructured heterogenous text data in the form of .txt files or .jpgs from scans,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,130000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Australia,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Python,Bayesian Methods,Python,Government website,"College/University,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",University courses,0,30,40,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,Most of the time,,,,Often,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",Rarely,,Sometimes,,,Often,Most of the time,Often,,,,,,Often,Rarely,Often,,,Often,,Often,,Sometimes,,,,Sometimes,,Often,,,,,20,15,0,15,10,40,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Rarely,,Most of the time,,,,,,,,,,,,Often,,Sometimes,Often,Most of the time,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Never,130000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,75,"Not employed, but looking for work",,,,,,,,R,Regression,Python,"Government website,I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A,Textbook,Trade book",,,,,,,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,Other,11 - 39 hours,PhD,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,75,0,0,0,25,,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,C/C++,Time Series Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service,Other",0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Miner,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,95,0,0,0,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Not important +Male,,40,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,R,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Kaggle,Personal Projects",Somewhat useful,Somewhat useful,Very useful,,,,Not Useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,70,0,20,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,More than 10 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,RNNs","Hadoop/Hive/Pig,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,Often,,,,,,,,,Often,,,,Rarely,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests",,,Often,,,Often,Often,Rarely,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,Rarely,,,,,,,,,,,60,10,0,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Sometimes,Sometimes,,Sometimes,Often,,,,Sometimes,Most of the time,Rarely,Sometimes,,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,understanding the tables and columns included in the provided database,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,"100,000",,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Singapore,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Conferences,Online courses,Stack Overflow Q&A",,Very useful,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Internet-based,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests","Cloudera,Hadoop/Hive/Pig,Impala,Python,QlikView,R,SQL,Unix shell / awk",,,,,Often,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,Rarely,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,,Often,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,Often,,,,,,,,40,20,10,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,,,,,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,IT Department,,Cleaning the data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Never,75000,SGD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Argentina,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,NoSQL,Neural Nets,SQL,Google Search,"Blogs,College/University,Company internal community",,Somewhat useful,Somewhat useful,Very useful,,,,,,,,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Business Analyst,University courses,0,20,20,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,1GB,"Decision Trees,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Lift Analysis,Natural Language Processing,Text Analytics,Time Series Analysis,Other",Often,,,,,,Often,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,Often,80,0,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,Often,,Often,Sometimes,,26-50% of projects,Do not know,Business Department,The one necessary for the current analysis,Cleaning,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,18000,,Has decreased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Personal Projects",,Very useful,,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",65,35,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,SQL",Sometimes,Often,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,Rarely,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,,,Most of the time,Most of the time,Often,,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,,,Most of the time,,,,,Most of the time,,,,,,90,10,0,0,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,The time it takes to create calculated fields and slicing various samples,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Sometimes,"150,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,61,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Very useful,Very useful,Very useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"DBA/Database Engineer,Researcher,Other",Self-taught,50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Sometimes,10MB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,Python,R,RapidMiner (free version),Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,Often,,Rarely,,Rarely,,,,,,Most of the time,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,Logistic Regression,Simulation,Time Series Analysis",,,Sometimes,,,Sometimes,Most of the time,,,,,,Often,,,Often,,,,,,,,,,,Often,,,Most of the time,,,,15,20,10,20,35,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Often,,,,,,,,Often,,,,Most of the time,,Most of the time,,,Sometimes,Often,,26-50% of projects,More internal than external,Other,,Privacy issues,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,175000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Other","Amazon Web services,C/C++,Jupyter notebooks,Python,Spark / MLlib,Unix shell / awk,Other",,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,Most of the time,Most of the time,,,"Collaborative Filtering,Cross-Validation,Ensemble Methods,Naive Bayes,Natural Language Processing,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Often,Most of the time,,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,20,50,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning",,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,None,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Git,Mercurial",Sometimes,"75,000",CAD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Other,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by government,,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,10,20,30,10,30,0,"Recommendation Engines,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,I prefer not to answer,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics",Often,Often,Often,,,Often,Most of the time,Most of the time,,,,,Sometimes,Often,,Often,Often,Often,,Often,Sometimes,Sometimes,Often,Sometimes,,,,Often,Sometimes,,,,,35,20,20,15,10,0,"Enough to code it again from scratch, albeit it may run slowly",Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,,Google Cloud Compute,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher",Work,30,0,40,10,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,IBM SPSS Statistics,Julia,Jupyter notebooks,KNIME (free version),Mathematica,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,Rarely,,,,Rarely,Most of the time,,Sometimes,Rarely,,Sometimes,,Rarely,,,Rarely,,,,Most of the time,,Sometimes,,Rarely,,,,,,Sometimes,Often,,,Often,Sometimes,,Often,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Simulation,Time Series Analysis",,,Sometimes,,Rarely,Most of the time,Most of the time,Sometimes,Often,,,Most of the time,,Often,,,,,,Sometimes,Often,,Sometimes,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,,30,10,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,100% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,85000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Anomaly Detection,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Doctoral degree,Mathematics or statistics,1 to 2 years,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Stan,Bayesian Methods,R,"Google Search,Government website","Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Very useful,,,Very useful,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Work,10,0,75,15,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression",High school,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Never,100MB,"Bayesian Techniques,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Prescriptive Modeling,Segmentation,Simulation",,,Most of the time,,,Sometimes,Most of the time,,,,,,,Sometimes,Rarely,Most of the time,,,,,,Most of the time,,,,Often,Often,,,,,,,20,40,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,Sometimes,,,Most of the time,,,,Often,,,,Sometimes,,Sometimes,,,,100% of projects,Approximately half internal and half external,Standalone Team,"Census, ACS, Pums",,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Rarely,137000,USD,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Bayesian Methods,R,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,Very useful,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",40,20,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Retail,"1,000 to 4,999 employees",Increased significantly,Less than one year,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,Sometimes,Rarely,,,Often,,,,30,20,5,10,35,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Often,,,Often,Sometimes,,,,,,Most of the time,,,,,Most of the time,Sometimes,,100% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Git,Other",Always,120000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer,Other",Other,15,3,43,6,0,33,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,Fewer than 10 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data,Other",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Time Series Analysis,Other",Sometimes,,,,,Most of the time,Most of the time,Rarely,Sometimes,,,Rarely,Often,Sometimes,Most of the time,Most of the time,,,,Rarely,Often,Most of the time,Rarely,,,,Most of the time,Often,,Often,Often,,,36,28,6,12,18,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,,Sometimes,,,Sometimes,,Often,,Most of the time,,,,Sometimes,Often,,76-99% of projects,More internal than external,Other,None,Properly fusing data sources and interpreting data elements,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,100000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler",University courses,0,25,0,75,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A bachelor's degree,Retail,"1,000 to 4,999 employees",Decreased significantly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,Regression/Logistic Regression,"IBM SPSS Modeler,Minitab,Oracle Data Mining/ Oracle R Enterprise,R,SAS JMP,SQL,Tableau,Other",,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,Often,,,,,,,Sometimes,,Most of the time,,,Most of the time,,,,Most of the time,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Most of the time,,,,,,,,90,2,0,5,3,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,76-99% of projects,Do not know,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),Other,Alteryx,,Sometimes,108000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,63,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by college or university,R,Neural Nets,R,Government website,"Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,Not Useful,Not Useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,DBA/Database Engineer,Self-taught,60,30,5,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Rarely,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests",,,,,,Sometimes,Often,Sometimes,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,70,6,0,20,4,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,26-50% of projects,Entirely internal,Business Department,,Small yet dirty datasets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,Belgium,62,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,39,"Not employed, but looking for work",,,,,,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity,Other",Traditional Workstation,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,"Business Analyst,Data Analyst,Software Developer/Software Engineer,Other",Other,40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,Self-taught,90,0,10,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,100GB,Other,"Cloudera,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SQL,Tableau",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,Often,,,Sometimes,Rarely,Rarely,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,25,10,5,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,Often,,,Most of the time,Often,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",,65000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,31,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),40+,Kaggle Competitions,No,Bachelor's degree,Management information systems,I don't write code to analyze data,Other,Self-taught,60,30,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,62,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Java,I don't plan on learning a new ML/DS method,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Online courses,Personal Projects",,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,More than 10 years,Programmer,Work,20,10,70,0,0,0,,,"Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",,<1MB,,"Java,SAS Base,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization",,Sometimes,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,40,45,10,0,5,0,Enough to tune the parameters properly,"Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools",,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,,6,,,,,,,,,,,,,,,,,, +Male,Colombia,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),40+,Experience from work in a company related to ML,Yes,Bachelor's degree,Other,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,0,10,10,0,Computer Vision,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Italy,45,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Programmer,Self-taught,90,5,5,0,0,0,,Logistic Regression,A bachelor's degree,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,,Traditional Workstation,Text data,Never,10MB,Regression/Logistic Regression,IBM Watson / Waton Analytics,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,Less than 10% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Other",Never,,,,3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,,"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer",Self-taught,80,15,1,2,0,2,Other (please specify; separate by semi-colon),"Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Other,,,"Bayesian Techniques,CNNs,Neural Networks,RNNs","Java,Python,R",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,GANs,Neural Networks,RNNs",,,,Often,,,,,,,Often,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,50,10,10,0,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues",Often,,,,,,,,Often,,,,,,,,Often,,,,,,10-25% of projects,Entirely external,Standalone Team,,,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by non-profit or NGO,Microsoft Azure Machine Learning,I don't plan on learning a new ML/DS method,,,"Kaggle,Online courses,Textbook,Trade book",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,40,20,20,0,20,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Other,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Time Series Analysis",,,Rarely,,,Rarely,Rarely,Rarely,Rarely,,,,,,Rarely,Rarely,,,,,,,,,,,,,,Rarely,,,,10,20,70,0,0,0,Enough to tune the parameters properly,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Often,Most of the time,,,,,,,,,,,Often,,,None,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Sometimes,108000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,SQL,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,,,Very useful,,,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Researcher,Work,30,20,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Non-profit,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Sometimes,,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,,,,,Most of the time,,Sometimes,,,,Often,,,,Most of the time,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,,50,20,10,5,15,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Often,,,,,,,,Often,,,,,Often,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Rarely,78000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Other,No,Bachelor's degree,Psychology,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,25,15,35,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Brazil,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Kaggle,Online courses,Stack Overflow Q&A",Very useful,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Researcher",University courses,0,30,0,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Argentina,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,Not Useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,40,40,0,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Mix of fields,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",Rarely,,Sometimes,,Rarely,Often,Often,Often,Sometimes,,,Rarely,,Sometimes,,Often,,,,,Sometimes,,Sometimes,Rarely,,,,Sometimes,Often,,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues",Rarely,Sometimes,,Rarely,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,Sometimes,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Norway,31,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Very useful,,,,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,A social science,3 to 5 years,Data Analyst,Work,10,60,30,0,0,0,,,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,India,28,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,No,Master's degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",10,50,10,10,10,10,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Russia,18,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,,Nice to have,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,I haven't started working yet",Self-taught,30,30,0,0,30,10,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Not important,Somewhat important,Somewhat important,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,Python,Google Search,"College/University,Kaggle,Online courses,Personal Projects,Textbook",,,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,20,10,50,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Always,1TB,Other,"Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib",,,,,Most of the time,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,70,10,10,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,Often,,None,Entirely internal,Business Department,,,Other,I don't typically share data,,Git,Never,2289008,RUB,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,TensorFlow,Rule Induction,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,,,,,Somewhat useful,,,Very useful,Very useful,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Predictive Modeler,Researcher",Self-taught,90,10,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Ensemble Methods,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","Java,Microsoft Azure Machine Learning,Perl,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,Rarely,,,,,,,,Rarely,Often,,Most of the time,,,,,Sometimes,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,GANs,HMMs,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,Simulation,Text Analytics,Time Series Analysis",,Often,Often,,,Often,,,,,Sometimes,,Sometimes,,Sometimes,Most of the time,,,Most of the time,Often,,,,,,,Often,,Most of the time,Most of the time,,,,78,7,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools",,,,,Most of the time,Most of the time,,,Most of the time,,Most of the time,Often,Often,,,,,,,,,,10-25% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,72000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Financial,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,,,,"Amazon Web services,Python,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,60,0,0,20,20,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,Most of the time,,,Most of the time,,,,,Often,,,,,,,Rarely,,76-99% of projects,Entirely internal,Standalone Team,Credit Bureau data,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,45000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Republic of China,29,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,,Somewhat useful,,Very useful,,,,,"Data Machina Newsletter,FastML Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Business Analyst,Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,50,0,0,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Data Elixir Newsletter,DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,80,5,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,Less than one year,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,Python,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,,,,,,Often,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,50,20,5,20,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,Most of the time,Most of the time,Sometimes,,,,,,,,,,,,,Sometimes,Often,Most of the time,,76-99% of projects,Entirely internal,IT Department,None,Cooperation with DBA's,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Rarely,55000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,54,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Perfectly,Self-employed,Tableau,Regression,SQL,University/Non-profit research group websites,Textbook,,,,,,,,,,,,,,,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",10-15 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Master's degree,Yes,Master's degree,A health science,More than 10 years,"Researcher,Statistician",Self-taught,50,10,40,0,0,0,Survival Analysis,"Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)",A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,20,1,9,70,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Rarely,1TB,"CNNs,Decision Trees,Ensemble Methods,GANs,Neural Networks,Random Forests,SVMs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation",Sometimes,,,Most of the time,,Most of the time,Most of the time,Sometimes,Often,,Sometimes,,,,,Sometimes,,,,Most of the time,Sometimes,,Sometimes,,,,Often,,,,,,,25,30,5,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Often,,,,,,,Most of the time,,,,,Often,Most of the time,,,Often,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,Government website,"Arxiv,Conferences",Very useful,,,,Very useful,,,,,,,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,40,30,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Python,R,RapidMiner (commercial version),RapidMiner (free version),Tableau,TensorFlow,Unix shell / awk",Most of the time,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,,,Often,Often,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,Often,Most of the time,,,,60,30,10,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Most of the time,,Often,,,,,,,,Often,,26-50% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,,,,,8,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,"Not employed, but looking for work",,,,,,,,Mathematica,Regression,Matlab,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,PhD,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,20,40,0,0,0,"Computer Vision,Speech Recognition","Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Singapore,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Tableau,Deep learning,R,"Government website,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Statistician",University courses,15,10,40,30,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Always,100GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,IBM SPSS Statistics,Impala,Julia,Jupyter notebooks,KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Minitab,Perl,Python,R,RapidMiner (free version),SAS Base,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",,Sometimes,,Rarely,,,,,Sometimes,,,Rarely,,Often,,Sometimes,Most of the time,,Sometimes,Sometimes,Sometimes,Often,Sometimes,,,Rarely,,,,Rarely,Most of the time,,Most of the time,,Rarely,,,Sometimes,,,Often,Often,Sometimes,,Sometimes,Often,Rarely,Often,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,Often,,Often,Most of the time,Often,Sometimes,Sometimes,,Sometimes,,Often,,Often,,,Often,Often,Often,,Often,Sometimes,,Often,,Often,Often,Often,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues",,,,,Often,,,,,,,,,,,,Sometimes,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,Prefer not to answer, ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,-99,SGD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Very useful,,,,Somewhat useful,,Somewhat useful,Not Useful,Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,"Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer",Work,35,15,25,25,0,0,,,A bachelor's degree,Other,"1,000 to 4,999 employees",Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Python,R,SQL,Tableau",,Sometimes,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,kNN and Other Clustering,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,25,0,25,25,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,Sometimes,,Often,Sometimes,,,,,,,,,,Often,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,Mexico,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Newsletters,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Researcher,Software Developer/Software Engineer",University courses,15,5,0,80,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Female,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,0,0,100,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Other,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data,Relational data",Most of the time,100GB,"CNNs,GANs,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Perl,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,Rarely,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Often,,Most of the time,,,Often,,Sometimes,Often,Sometimes,Sometimes,,Most of the time,,,Often,Often,Often,,Most of the time,,Often,,,,,Most of the time,,,,50,20,20,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,,Sometimes,,,,Often,,Most of the time,,,Often,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,69,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,Spark / MLlib,Monte Carlo Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,Not Useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Computer Scientist,Researcher",Self-taught,100,0,0,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data,Relational data,Other",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Flume,Java,Jupyter notebooks,NoSQL,Perl,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,Often,,,Sometimes,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",Most of the time,Often,Often,,Most of the time,Most of the time,Most of the time,Often,,,,,,Often,,Often,Often,Most of the time,Most of the time,Often,Most of the time,,Sometimes,Most of the time,,Most of the time,Most of the time,Sometimes,Most of the time,,,,,0,0,0,0,100,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Often,,,Most of the time,,,,,Most of the time,,Most of the time,,,,100% of projects,Entirely internal,Standalone Team,none,complexity,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,200000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Ireland,42,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Other",University courses,30,30,0,30,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",,Retail,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Relational data,Other",Sometimes,10MB,"Decision Trees,Other","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"Decision Trees,kNN and Other Clustering,Prescriptive Modeling,Time Series Analysis",,,,,,,,Rarely,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,Often,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,51-75% of projects,More internal than external,IT Department,,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,58,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Switzerland,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Newsletters,Stack Overflow Q&A",Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",60,20,0,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Simulation,SVMs,Other",,,,,,Often,Most of the time,Often,,,,,,,,,,,,,,,Often,,,,Most of the time,Sometimes,,,Sometimes,,,20,20,10,30,20,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,100% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,52000,CHF,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Poland,26,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,43,5,10,2,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Other,"10,000 or more employees",Increased significantly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,,,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,86,1,1,11,1,0,,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,Sometimes,Sometimes,Sometimes,,26-50% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,29400,PLN,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,50,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"College/University,Conferences,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Master's degree,Yes,Master's degree,Electrical Engineering,,Other,University courses,NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Online courses,Podcasts,Stack Overflow Q&A",,Very useful,,Somewhat useful,Very useful,,,,,,Very useful,,Somewhat useful,Very useful,,,,,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,Data Analyst,Self-taught,20,50,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Python,QlikView,R,SAS Base,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,,,,Sometimes,Rarely,,Most of the time,,,,,,,,,,Often,,,,Most of the time,Rarely,Rarely,,,,,Rarely,,,Sometimes,Most of the time,,,Sometimes,,,Sometimes,,,,"A/B Testing,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",Often,,,Rarely,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,Often,,Rarely,,Often,,,,,Often,Often,Sometimes,,,,60,10,10,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,Sometimes,Often,,,Most of the time,,Sometimes,Sometimes,Often,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Bitbucket,Rarely,185000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Personal Projects",Very useful,Very useful,,,,,,,,,,Very useful,,,,,,,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Programmer,Statistician",University courses,20,0,40,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Google Cloud Compute,NoSQL,R",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Random Forests,Recommender Systems",,,Often,,,Sometimes,Most of the time,Often,Often,,,,,,,,,,Often,,,,Often,Often,,,,,,,,,,15,20,50,10,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,Most of the time,,,,,Often,,,Sometimes,,,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Always,105000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,45,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Tutoring/mentoring",,Very useful,,,,,,,,,,Somewhat useful,,,,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",30,60,8,2,0,0,Machine Translation,Logistic Regression,A master's degree,Manufacturing,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,28,38,20,8,6,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",,Often,,,Most of the time,,,,Often,Often,Most of the time,,Often,,Most of the time,,,,,,,,76-99% of projects,More internal than external,Standalone Team,"Sugar industries from Colombia,Brasil,South Africa, Australia, Central America",To build useful models for prediction,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,65000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher",University courses,40,20,10,20,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",A master's degree,Non-profit,,,,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Orange,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Simulation,SVMs",,,,,,Most of the time,Most of the time,Often,Often,Often,,Often,,,,Often,,,,Most of the time,,,Most of the time,,,,Most of the time,Most of the time,,,,,,50,10,10,20,10,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Scaling data science solution up to full database",,,,,Often,,,,,,Most of the time,,,,,,,Most of the time,,,,,26-50% of projects,,IT Department,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,C/C++,Deep learning,R,Google Search,"College/University,Online courses",,,Very useful,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Other,I haven't started working yet",University courses,10,30,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Markov Logic Networks",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,50,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"Jack's Import AI Newsletter,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,35,10,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Technology,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,Often,,Often,,,Rarely,Rarely,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Often,,,Sometimes,Most of the time,Most of the time,Often,Often,,,,,,,Often,,,Often,,Sometimes,,Often,Sometimes,,Often,Sometimes,,Often,Sometimes,,,,15,30,25,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,,Sometimes,,,Often,,,,,,,Often,Sometimes,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,80000,EUR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Canada,55,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,DBA/Database Engineer,Researcher,Software Developer/Software Engineer",Self-taught,50,20,20,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Government,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,IT Department,census data,incomplete data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Sometimes,90000,CAD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Genetic & Evolutionary Algorithms,R,University/Non-profit research group websites,"Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,Textbook",,,,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,6 to 10 years,Researcher,University courses,35,10,20,30,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Traditional Workstation,Relational data,Sometimes,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SAS Base,Tableau,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,Sometimes,,,,Most of the time,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Random Forests,Other",,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,90000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Australia,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Newsletters,Personal Projects",Very useful,,,,,,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,"Business Analyst,Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,,,,,Rarely,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",Most of the time,,Sometimes,Sometimes,,,Often,,,,,,,,,Sometimes,,,,Sometimes,Often,,,Sometimes,,,,,Sometimes,,,,,10,20,20,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Sometimes,,,,,Most of the time,,,,Sometimes,,Often,,,10-25% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,100000,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,48,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,IBM Watson / Waton Analytics,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,,,,,Very useful,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,50,10,30,10,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,IBM SPSS Statistics,Java,MATLAB/Octave,NoSQL,R",,,,,,,,,Rarely,,,Sometimes,,,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,,Sometimes,,,,,,,,Often,,,,,Often,,Sometimes,,,,Often,Most of the time,Most of the time,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,Often,Sometimes,Often,Often,Sometimes,,,,,,Often,,Often,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,55000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,,I don't plan on learning a new tool/technology,Deep learning,Python,Google Search,"Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Sometimes,10GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,"Bayesian Techniques,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,Sometimes,,,,Often,,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,,,20,35,10,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,135000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Python,Deep learning,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Kaggle,Personal Projects,Textbook",Somewhat useful,,Very useful,,Very useful,Very useful,Somewhat useful,,,,,Very useful,,,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,,University courses,20,0,40,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Mathematica,Minitab,R,SAS Base",,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,Rarely,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Simulation,SVMs,Time Series Analysis",Sometimes,,Most of the time,,,Most of the time,Most of the time,Often,Most of the time,,,,Often,Often,,Often,,Sometimes,,,,,Sometimes,,,,Most of the time,Sometimes,,Often,,,,10,20,0,10,0,60,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data,Other",Often,Often,,,Often,Often,,,Sometimes,,,Often,,,,,Sometimes,,,,Often,,100% of projects,More internal than external,Standalone Team,US census data; State-level surveys; etc,Finding appropriate data to support the development of novel techniques ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"200,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Hong Kong,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,100MB,"Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,CNNs,Logistic Regression,Neural Networks,SVMs",Often,,,Often,,,,,,,,,,,,Often,,,,Often,,,,,,,,Often,,,,,,40,40,0,20,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,,,,,,Often,,26-50% of projects,Approximately half internal and half external,Standalone Team,stock price data,data integrity about the business logic at stock market,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),,900000,HKD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,Portugal,23,"Not employed, but looking for work",,,,,,,,Tableau,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,40,0,0,30,30,0,Time Series,"Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Colombia,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by government,IBM Watson / Waton Analytics,Deep learning,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,,"Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,"GitHub,Government website","Online courses,Personal Projects",,,,,,,,,,,Very useful,Somewhat useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,60,15,0,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10GB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Often,,,,40,30,10,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Often,Often,,,,,,Often,Often,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,HLDI; ISO,Cleaning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,R Projects,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,58000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,R,Uplift Modeling,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Stack Overflow Q&A,Other",,,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,More than 10 years,Researcher,University courses,20,20,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Non-profit,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Julia,NoSQL,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,Often,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Text Analytics",Often,,Often,,,Most of the time,Most of the time,Most of the time,,,,,,Often,,Most of the time,,Often,,,Most of the time,,,,,,Often,,Often,,,,,80,5,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Privacy issues",,,,,Most of the time,,,,Sometimes,,,,,,,,Sometimes,,,,,,100% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Other",github,Git,Most of the time,160000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Brazil,26,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,GitHub,"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Norway,49,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Spark / MLlib,,Python,,"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,3-5 years,Nice to have,Necessary,Nice to have,,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,DataCamp,Laptop or Workstation and local IT supported servers,11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,30,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,,"Arxiv,Blogs,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,Researcher,University courses,25,25,25,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,Rarely,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Often,Often,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Often,,,,Sometimes,,,Often,Sometimes,Sometimes,,Often,,,,,,Often,Sometimes,,,,25,20,25,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools",,,,,Often,Often,,,Sometimes,,,,Sometimes,,,,,,,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Email,Other",Slack,Git,Rarely,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,40,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,50,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,30,35,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Somewhat important,Other,Basic laptop (Macbook),Text data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,TensorFlow,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Rarely,,,Sometimes,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,Often,Most of the time,Most of the time,Most of the time,,,,,,Often,,Most of the time,,Often,Most of the time,,Often,,Most of the time,,,,Often,Often,Most of the time,Often,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization",,Sometimes,,,Often,,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,400000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by government",Python,Random Forests,Python,"GitHub,Google Search","Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,50,0,0,0,50,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A professional degree,Government,20 to 99 employees,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,1GB,"Regression/Logistic Regression,SVMs,Other","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,,,,,,,,,,120000,AUD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Canada,19,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,27,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,10,10,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",,Telecommunications,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data",,1GB,"CNNs,Neural Networks,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Most of the time,,Often,,,,,,,,,,Sometimes,,,,Often,Sometimes,,,,,,,Sometimes,,,,,,0,60,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Often,,76-99% of projects,More external than internal,Standalone Team,imagenet;kaggle,state of art to create model,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,20000000,IDR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,Python,Government website,"Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Scientist,Other",University courses,50,10,30,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Manufacturing,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Regression/Logistic Regression","Amazon Web services,MATLAB/Octave,NoSQL,Python,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation",Often,Sometimes,Most of the time,Sometimes,,,Most of the time,Often,,Often,,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,,,,,Sometimes,,,,,,,80,5,1,4,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,Most of the time,,,,,,Sometimes,,,,Often,,,,,Often,,100% of projects,More internal than external,Standalone Team,"DataSus, IBGE, SERASA/Experian","Our clients do not have a properly vision on data science, so they cannot understand where they can get with it.","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Tableau and internal development tools,,Rarely,85000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,45,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Time Series Analysis,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,KDnuggets Blog,3-5 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Workstation + Cloud service,,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,,,,,,,,, +Female,India,27,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,< 1 year,,,Necessary,,,Necessary,Necessary,,,,,,,DataCamp,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Business Analyst,Self-taught,60,40,0,0,0,0,Survival Analysis,Logistic Regression,High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,51,Employed full-time,,,Yes,,Engineer,Fine,Employed by non-profit or NGO,Jupyter notebooks,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,40,20,40,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Non-profit,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,QlikView,R,SQL,Tableau,TensorFlow",Often,Often,,,,,,Often,,,,,,,,,Often,,,,,Often,Often,Often,,,Often,,,,Often,Sometimes,Often,,,,,,,,,Often,,,Sometimes,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Text Analytics",Often,,Sometimes,,,Often,Often,Sometimes,,,,,,,,Often,,,,,,,Often,,,,,,Often,,,,,20,20,15,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Sometimes,Rarely,,,,,,Most of the time,,,,Sometimes,,,76-99% of projects,Approximately half internal and half external,Business Department,SAMHDA; National Archive of Criminal Justice Data; GSS; ECLP; Data.gov; ICPSR; SSEDL,Cleaning Bad Data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Most of the time,"95,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Chile,30,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,Weka,Neural Nets,Java,I collect my own data (e.g. web-scraping),"Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,,,,,,Very useful,,Somewhat useful,,,Very useful,,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",0 - 1 hour,Other,No,Professional degree,,Less than a year,Researcher,University courses,10,0,0,90,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,70,0,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Female,United States,41,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Decision Trees,SQL,Other,"College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,KDnuggets Blog,< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,,,A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important +Male,Canada,50,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Google Search,"Arxiv,Online courses,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,,,,,Very useful,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,15,0,0,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"GANs,Neural Networks,RNNs,SVMs","C/C++,Java,Jupyter notebooks,Python,SQL,TensorFlow,Other",,,,Sometimes,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,Often,,,"CNNs,Data Visualization,GANs,Neural Networks,RNNs,SVMs",,,,Often,,,Often,,,,Rarely,,,,,,,,,Often,,,,,Sometimes,,,Sometimes,,,,,,65,20,5,5,5,0,Enough to tune the parameters properly,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,Often,,,,,,,,Often,Most of the time,Often,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Bitbucket,Sometimes,75000,CAD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Jupyter notebooks,Time Series Analysis,Python,"GitHub,Google Search","Arxiv,Kaggle,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Hospitality/Entertainment/Sports,10 to 19 employees,Decreased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,CNNs,GANs,Markov Logic Networks,RNNs","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Naive Bayes,PCA and Dimensionality Reduction",Often,,,,,,,,,,,,,,,,,Often,,,Often,,,,,,,,,,,,,90,9,1,0,0,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Often,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,performance,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,,20,CNY,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Other,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Amazon Web services,Text Mining,SQL,University/Non-profit research group websites,"Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,50,25,25,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Internet-based,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Random Forests,Segmentation",Rarely,Often,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,Sometimes,Sometimes,Sometimes,Often,,Most of the time,,,Most of the time,,,Most of the time,,,,,,,,75,10,5,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,Often,,100% of projects,More internal than external,Central Insights Team,Category of Point Of Sales,Filling missing data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Japan,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,"Engineer,Programmer",Self-taught,70,0,0,0,30,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",Gradient Boosting,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Salfrod Systems CART/MARS/TreeNet/RF/SPM,"Ensemble Methods (e.g. boosting, bagging)",R,"Google Search,Other","Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer,Operations Research Practitioner",Self-taught,20,70,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,100 to 499 employees,Decreased significantly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,Minitab,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Naive Bayes,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,Often,Often,Often,,,,,,,,Often,Often,,,,,,,,,,Most of the time,,,,Often,,,,50,10,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Sometimes,,,,,,,Often,,,,,,,Most of the time,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,Other,Email,,Other,,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,GitHub,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Physics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Python,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Miner,Other","Online courses (coursera, udemy, edx, etc.)",25,20,50,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","SAS Base,SAS Enterprise Miner,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Sometimes,,Most of the time,Sometimes,,Often,Most of the time,,Often,Sometimes,Most of the time,Sometimes,Often,,,Often,,Most of the time,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,98000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Web services,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",1-2 years,,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Master's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,10,10,10,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,GitHub,"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,40,20,10,20,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",High school,CRM/Marketing,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Tableau,TensorFlow",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,Often,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Most of the time,,Sometimes,,Often,,Most of the time,Most of the time,,,,,,Sometimes,,,,,,Rarely,,,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,,,20,30,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database",Most of the time,Most of the time,,,Often,,,Often,,,,,,,,,,Most of the time,,,,,51-75% of projects,Entirely internal,Standalone Team,,Cleanliness and uniformity of data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,90000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Monte Carlo Methods,Python,,"Blogs,Conferences,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,Other,University courses,20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft SQL Server Data Mining,Python,R,SAS Base,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Often,,Most of the time,,,,,Most of the time,,,,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,Often,,Most of the time,,,Most of the time,,,,Sometimes,,,,,Sometimes,Most of the time,,,,,50,30,0,5,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT",,Sometimes,,,,,,Sometimes,Most of the time,,Often,,,,Most of the time,,,,,,,,76-99% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Always,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,Work,15,5,50,10,20,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Rarely,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,,,Rarely,,Often,Often,,Most of the time,,,Most of the time,,Often,,Sometimes,,,,20,40,0,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Other,Kaggle,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,DataTau News Aggregator,KDnuggets Blog",1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Other,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Work,30,30,10,10,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,16-20,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Republic of China,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,"Blogs,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast",< 1 year,Necessary,,,Nice to have,Necessary,Necessary,Necessary,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Experience from work in a company related to ML,Yes,Bachelor's degree,A health science,,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer",Self-taught,NA,NA,NA,NA,NA,NA,Survival Analysis,"Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,Not important,,Not important,,,Not important,Somewhat important,,,,, +Female,Turkey,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,,1-2 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,,DBA/Database Engineer,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,United States,50,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,R,GitHub,"Conferences,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,,,,,Very useful,"Data Stories Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Operations Research Practitioner,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Chile,29,Employed full-time,,,Yes,,Other,Fine,Employed by government,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Official documentation,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Other",University courses,80,10,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Government,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","R,SQL,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,Sometimes,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Random Forests,Time Series Analysis",,,Sometimes,,,Sometimes,Most of the time,Rarely,Rarely,,,Rarely,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,,,,Often,,,,40,5,5,25,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Often,,,,Often,,Often,,Often,,Often,,,,,Most of the time,Often,,100% of projects,More internal than external,Other,Macroeconomic series provided by Central Bank; Employment survey data from Government Agency; Internal Revenue Service data,IT Department not taking its job seriously: poorly maintained datamart with nonexistent ETL processes.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,36000000,CLP,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Argentina,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Bayesian Methods,R,I collect my own data (e.g. web-scraping),"Arxiv,Kaggle,Podcasts,Textbook",Somewhat useful,,,,,,Somewhat useful,,,,,,Somewhat useful,,Very useful,,,,"FlowingData Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Computer Scientist,Data Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,15,0,5,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Internet-based,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Always,100GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"A/B Testing,CNNs,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks",Sometimes,,,Most of the time,,,,Sometimes,,,,Sometimes,,Often,,,,,,Most of the time,,,,,,,,,,,,,,75,10,5,0,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,,,Often,,,Most of the time,,,,Sometimes,Most of the time,,,,,,,,,Less than 10% of projects,Entirely internal,Other,Only internal data. Good quality and open datasets are strange.,Collect and Clean it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,ARS,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,24,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Personal Projects",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,,,Somewhat useful,,,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",Kaggle competitions,15,0,60,5,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",A master's degree,Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Regression/Logistic Regression","Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,Often,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,Most of the time,,,,40,10,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,Rarely,,Sometimes,Often,,,,Rarely,,Most of the time,,Most of the time,,,Sometimes,Most of the time,,76-99% of projects,Entirely internal,Other,None,"Privacy and confidentiality, and getting support for projects. Very little funding is being put into R&D and regular business delivery makes up a large component","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Email,Share Drive/SharePoint",,,Sometimes,"70,000",AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,Other,Time Series Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,20,5,30,35,10,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Technology,10 to 19 employees,Increased significantly,1-2 years,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Web services,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,Often,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,Most of the time,,Sometimes,,Often,,,,,,,,,,,20,20,20,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,Often,Often,,,,,,,Often,,Sometimes,,,,,,,Most of the time,,10-25% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,Never,45000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Stan,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Researcher,Statistician,Other",University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Academic,I don't know,Stayed the same,Don't know,Some other way,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,Never,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Mathematica,MATLAB/Octave,Python,R",,,,Often,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Often,Often,,,,,,,Often,,Sometimes,,,Sometimes,,Often,,,,Most of the time,,,Often,,,,5,25,5,10,5,50,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,Most of the time,,,,,,,,Often,Often,,100% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Most of the time,80000,USD,Has decreased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Sweden,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Proprietary Algorithms,Matlab,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Company internal community,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,Very useful,,,,,,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,30,40,0,0,Computer Vision,"Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Retail,20 to 99 employees,Increased significantly,3-5 years,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Image data,Most of the time,,,"Amazon Web services,C/C++,Jupyter notebooks,Python",,Sometimes,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation",,,Sometimes,,,Often,Most of the time,Often,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,10,30,30,20,10,0,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,Sometimes,,,,Often,Often,Often,Sometimes,,Sometimes,,76-99% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Support Vector Machines (SVM),Python,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Engineer,Statistician",Work,20,20,60,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",High school,Technology,10 to 19 employees,Stayed the same,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Most of the time,1TB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,TensorFlow",,,,Often,,,,,Often,,,,,,Often,,Rarely,,,,,,Sometimes,Sometimes,,,,,,,Often,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Often,Often,Sometimes,Rarely,,,,,,,Most of the time,,,,Sometimes,Most of the time,Often,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Git,,150000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,21,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Not Useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,"Other,I haven't started working yet",Kaggle competitions,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Conferences,Kaggle",Very useful,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer",Self-taught,70,10,15,0,5,0,"Adversarial Learning,Computer Vision","Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Other,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Rarely,100MB,"CNNs,Decision Trees,GANs,Neural Networks","Jupyter notebooks,Python,TensorFlow,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,"CNNs,GANs,Neural Networks,Segmentation",,,,Most of the time,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,5,55,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues",,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Mercurial,Subversion",Sometimes,25000,GBP,Other,5,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Julia,Anomaly Detection,Python,Google Search,"Arxiv,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer",Self-taught,20,10,70,0,0,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,RNNs,Segmentation,Simulation,SVMs",Often,,,Often,,Often,Often,Often,,,,,,Rarely,,,,,,Most of the time,,,Often,,Often,Sometimes,Often,Sometimes,,,,,,50,40,0,0,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization",,Often,Sometimes,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,168000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Newsletters,Online courses",Very useful,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,15,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Java,NoSQL,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,Often,,,Most of the time,Sometimes,Most of the time,Most of the time,,,,,Often,,Sometimes,,,,,Sometimes,,Most of the time,,,,,,,,,,,10,25,15,5,5,40,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,Often,Often,,,,Most of the time,,,,,,,,,Often,,Sometimes,Often,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Researcher,Poorly,Employed by government,Microsoft Azure Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",Python,I collect my own data (e.g. web-scraping),"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Data Analyst,Researcher,Other",Self-taught,60,30,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Rarely,100GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner",Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Most of the time,,Sometimes,,,,,Most of the time,Most of the time,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Sometimes,Often,,,,,,,,Most of the time,,,,,,,Often,,,,,Often,Often,,,,,70,10,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues",Most of the time,Sometimes,,,Most of the time,,,,Sometimes,,,,Most of the time,,,,Often,,,,,,51-75% of projects,Entirely internal,Other,,Cleaning the data is insanely time consuming and manual. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,120000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Other,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,"Researcher,Other",Work,60,10,20,10,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,"5,000 to 9,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Text data,Relational data",,,"HMMs,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SQL",,Rarely,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,Rarely,Rarely,,,,70,2,5,13,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Sometimes,Sometimes,,,,,,,,,Often,,Often,Often,Often,Sometimes,Most of the time,,76-99% of projects,More internal than external,Other,,"Understanding how it was input and what it actually means (eg a Date field could mean many different things: start date, end date, push date, etc). Requires coordinating with those who enter data and those who built the database.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Canada,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,IBM Watson / Waton Analytics,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Self-taught,50,10,20,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Insurance,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,,,,,,,,,Often,,,,,,,Sometimes,,,Often,Often,,,Sometimes,,,,20,50,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools",,Sometimes,,,Often,,,,Often,,,,Most of the time,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,60000,CAD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SAS JMP,Neural Nets,Python,Other,"Blogs,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Physics,Less than a year,"Data Analyst,Data Miner,Engineer,Researcher",Self-taught,40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Brazil,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Arxiv,Kaggle,Personal Projects,Textbook",Somewhat useful,,,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,100MB,"Decision Trees,Random Forests,Other","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Other",,,,,,Most of the time,Most of the time,Sometimes,,Often,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,,,50,10,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,Often,,Often,Often,,,,,,Often,,,,,,Often,,76-99% of projects,Entirely internal,Other,,Limited and untidy data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,117000,BRL,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Programmer,Fine,Self-employed,Other,Deep learning,Haskell,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,Tutoring/mentoring,YouTube Videos,Other",Very useful,Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,"Partially Derivative Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Fine arts or performing arts,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,10 to 19 employees,Stayed the same,Don't know,A general-purpose job board,Not very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Never,10MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Cloudera,Google Cloud Compute,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,Tableau,TensorFlow,Other",,Most of the time,,Often,Most of the time,,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs",Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,Sometimes,Sometimes,,,Most of the time,,,,,Most of the time,,,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,26-50% of projects,Entirely external,Standalone Team,"SEER, open source, web scraping",cleaning,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,40000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Cloudera,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist",Work,30,0,70,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",Primary/elementary school,Internet-based,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,Often,,,,,,,Sometimes,,,,,,Rarely,,Often,,,,,,Rarely,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,Often,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,Rarely,,Often,,,,Most of the time,,,,60,20,5,10,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,60000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Other,Fine,Employed by government,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Personal Projects,Podcasts,Textbook",Somewhat useful,Very useful,Very useful,,,,,,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Researcher,Other",University courses,30,15,25,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,"5,000 to 9,999 employees",Increased slightly,Less than one year,Some other way,Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",,10MB,"Bayesian Techniques,Decision Trees","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees",,,Rarely,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,20,0,0,20,20,40,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,,,Sometimes,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,Often,,76-99% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,125000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Poorly,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Textbook,Tutoring/mentoring",,Very useful,,,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",20,20,20,20,0,20,"Computer Vision,Reinforcement learning,Time Series",Neural Networks - CNNs,A master's degree,Insurance,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Rarely,100GB,CNNs,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Neural Networks",,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,40,30,0,0,30,0,Enough to run the code / standard library,"Dirty data,Lack of funds to buy useful datasets from external sources",,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,ImageNet,Read,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Sometimes,100000,CNY,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Canada,36,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Python,Deep learning,Python,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Engineer,Programmer",University courses,0,5,20,75,0,0,Computer Vision,Bayesian Techniques,Primary/elementary school,Academic,10 to 19 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Image data,Sometimes,100MB,"Bayesian Techniques,SVMs","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Segmentation,Simulation",,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,100% of projects,More external than internal,IT Department,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,College/University,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Software Developer/Software Engineer",University courses,10,5,20,60,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Academic,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,Often,,,Most of the time,,Most of the time,Most of the time,,,,,,,Sometimes,,,,,Sometimes,,Often,,,,,,,,,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,100% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,2000000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Ireland,27,"Not employed, but looking for work",,,,,,,,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,,"DataTau News Aggregator,Linear Digressions Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Workstation + Cloud service",11 - 39 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Engineer",University courses,30,20,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important +Male,India,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,Often,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,,Often,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,Often,Often,,,Most of the time,,,,,,Most of the time,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,Most of the time,,,,,,,Most of the time,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,Very useful,Very useful,"FastML Blog,Jack's Import AI Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",75,20,0,0,5,0,Natural Language Processing,"Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Rarely,10GB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Often,,,,,,"CNNs,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",,,,Sometimes,,,Often,,,,,,,Often,,Sometimes,,,Most of the time,Often,Often,,,,Sometimes,,,,Most of the time,,,,,55,20,10,10,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Often,Sometimes,Most of the time,Often,,Often,Most of the time,,,Often,Often,Often,Sometimes,Often,,Sometimes,Often,Often,Most of the time,,10-25% of projects,More external than internal,Other,,"Lack of access to the data itself. Access to useful business data blocked behind slow moving, inaccessible and very conservative CRM software management team.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,73000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Self-employed,R,Bayesian Methods,SAS,Government website,"Conferences,YouTube Videos",,,,,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Researcher,Statistician",Self-taught,20,0,80,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Amazon Web services,SAS Base,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,"Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,Sometimes,,,,,Often,Sometimes,Sometimes,,,,,,,Often,,,,,,,,Rarely,,Sometimes,Sometimes,,,Often,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources",,Often,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,10-25% of projects,More internal than external,,U.S. Government Consumer Expenditure Survey; weather data;,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,,Always,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,"Data Elixir Newsletter,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer",Work,20,40,40,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Internet-based,500 to 999 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Cloudera,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,TensorFlow",,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,Often,Often,,,Often,,,Sometimes,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",Often,,,,Sometimes,Sometimes,Most of the time,,,,,,,,,Often,,,Often,,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others",,Sometimes,,,Often,Sometimes,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,75000,GBP,Other,8,,,,,,,,,,,,,,,,,, +Female,Taiwan,40,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Retail,"5,000 to 9,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1TB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,SVMs","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,Most of the time,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Often,Often,Sometimes,,,Most of the time,Often,,,,,,,,,,,,,,,,Sometimes,,Most of the time,Sometimes,,Often,Often,,,,60,20,5,5,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Often,,,,,,,Sometimes,,,,,,,Often,,,,,,,26-50% of projects,Entirely external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,Very useful,,,,,Very useful,,,,Somewhat useful,,,,,,,,,< 1 year,Necessary,Necessary,,,Necessary,,Necessary,Unnecessary,Unnecessary,,,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Other",2 - 10 hours,,No,Master's degree,Physics,I don't write code to analyze data,Business Analyst,Self-taught,30,30,0,0,40,0,,,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,,Very Important,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Researcher,Poorly,Employed by professional services/consulting firm,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher",Self-taught,50,25,0,0,0,25,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,A bachelor's degree,Other,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,Other,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,PCA and Dimensionality Reduction",,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,,20,5,0,20,50,5,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Sometimes,,,Often,,,,Rarely,Sometimes,,Most of the time,,,Sometimes,Sometimes,,,100% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,38500,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Poland,39,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Decision Trees,Python,GitHub,Kaggle,,,,,,,Not Useful,,,,,,,,,,,,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,,,,,Unnecessary,,Unnecessary,,,,,,,,Laptop or Workstation and local IT supported servers,,Master's degree,No,Master's degree,Computer Science,1 to 2 years,Computer Scientist,Self-taught,60,30,10,0,0,0,Time Series,Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,,Somewhat important,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Not Useful,,,,Very useful,,,Somewhat useful,,,,,"Data Elixir Newsletter,FlowingData Blog,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important +Male,Brazil,54,"Not employed, but looking for work",,,,,,,,RapidMiner (free version),Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,,Very useful,Very useful,,,,,,Very useful,O'Reilly Data Newsletter,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,0 - 1 hour,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,Other,University courses,30,20,0,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,United States,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,Not Useful,,,Somewhat useful,,,,Very useful,Somewhat useful,Very useful,,,,,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,10,0,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,,,,"Amazon Web services,Python,R,SAS Base,SQL,Tableau,TensorFlow,Other",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Rarely,,,,Rarely,,,Sometimes,Rarely,,,Often,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,70,0,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,Often,,,,,Often,,,,,,,Most of the time,,,,,,,76-99% of projects,Entirely internal,Other,N/A,"Having to collect it myself. Also storing it, and standardizing it across different groups.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,N/A,Other,Always,0,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,44,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Official documentation,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,Software Developer/Software Engineer,Self-taught,40,40,10,10,0,0,,Other (please specify; separate by semi-colon),I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Not Useful,,,,,Somewhat useful,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,,,,,,"Friends network,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",,Financial,I prefer not to answer,,,,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Sometimes,,"Decision Trees,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Java,Python,R,Spark / MLlib,SQL",,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Most of the time,Sometimes,,,,,,,,,,"Decision Trees,Natural Language Processing,Random Forests,Text Analytics",,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,Often,,,,,40,20,10,10,0,20,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Do not know,,,,,,,Bitbucket,,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,"Blogs,Stack Overflow Q&A,Textbook,Other,Other",,Very useful,,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,30,20,10,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,500 to 999 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,1GB,,"Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Often,,,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,,Often,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems",,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,R,,"Textbook,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,20,10,40,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,I prefer not to answer,Stayed the same,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,Other,"Text data,Relational data",Sometimes,,Other,"Jupyter notebooks,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,Often,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,,,Entirely internal,Other,,,,,,,,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,DBA/Database Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Not Useful,,,Very useful,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Other,20,0,0,40,0,40,,,A bachelor's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,United States,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,41,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Physics,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Friends network,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,"Data Scientist,Machine Learning Engineer",Self-taught,45,5,45,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,CRM/Marketing,"10,000 or more employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,Often,Most of the time,,,,,,,,Often,,Most of the time,,,Sometimes,,Sometimes,,,Often,,Sometimes,,,Sometimes,Sometimes,,,,0,0,100,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,Often,Often,,,,Rarely,,,,,,,,,,Sometimes,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Rarely,120000,,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Russia,56,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,25,10,25,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important +Male,Brazil,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by college or university",Jupyter notebooks,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Machine Translation,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Non-profit,"10,000 or more employees",Stayed the same,1-2 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation","Image data,Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,MATLAB/Octave,Python,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests",,,Sometimes,,,,,,,,,,,Sometimes,,,,Sometimes,Sometimes,Often,,,Often,,,,,,,,,,,10,10,30,30,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,,,4,,,,,,,,,,,,,,,,,, +Male,Canada,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Statistician",University courses,40,10,30,10,10,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,MATLAB/Octave,NoSQL,Orange,Python,R,SQL,Tableau",,Often,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Sometimes,Often,Sometimes,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Often,,Often,,Often,Often,Sometimes,Most of the time,Sometimes,,Sometimes,Sometimes,Sometimes,Often,Sometimes,,,,10,60,5,20,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Often,Often,,,,,,,,,,,,Sometimes,,,Sometimes,Often,,10-25% of projects,More internal than external,Business Department,"geo, weather",ETL and dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,240000,CAD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,South Korea,26,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Python,Deep learning,Python,GitHub,"Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,25,0,0,25,50,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,Fewer than 10 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Rarely,,Rarely,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,Often,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Often,,Sometimes,,Most of the time,Often,,Most of the time,,,Often,Most of the time,Sometimes,,,,,,30,40,0,30,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,,,,,Sometimes,,Often,Most of the time,,,Often,Most of the time,,26-50% of projects,More external than internal,Standalone Team,.,.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,"30,000,000",KRW,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,South Korea,26,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,SQL,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,< 1 year,,,,,,,Necessary,,,,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Russia,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,R,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Conferences,Kaggle,Personal Projects",,Somewhat useful,,,Very useful,,Very useful,,,,,Somewhat useful,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer",Self-taught,40,40,0,0,20,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",Primary/elementary school,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Text data,Relational data",Never,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Time Series Analysis",,,Sometimes,,,Often,Often,Often,Often,,,Most of the time,,Often,,Most of the time,,Often,,,,,Most of the time,,,,,,,Most of the time,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources",,,Sometimes,,Often,,,,,Sometimes,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,,,,6,,,,,,,,,,,,,,,,,, +Male,Turkey,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Other,Other,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Personal Projects",,,Very useful,,Very useful,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,5,3,15,75,2,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Decreased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Always,1PB,Other,"Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,NoSQL,Spark / MLlib",,Rarely,,Often,Often,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,0,0,100,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,65000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",25,35,15,0,25,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Rarely,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,,,Often,Sometimes,,,Sometimes,,,Sometimes,Most of the time,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Scaling data science solution up to full database,Unavailability of/difficult access to data,Other",Often,,Often,,Most of the time,Often,,,,,,,,,,,,Often,,,Often,Often,76-99% of projects,Entirely internal,IT Department,,It's not reliable. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Never,,,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,,,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,75,20,5,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation",Text data,Always,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,Python,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Most of the time,Sometimes,,,,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Rarely,,,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Sometimes,,,,,Sometimes,,Most of the time,,,,,Often,,Sometimes,,,,30,10,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,,Often,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,,KRW,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Argentina,39,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Personal Projects",Very useful,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,,,,,,"FastML Blog,Jack's Import AI Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,"Business Analyst,DBA/Database Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,50,30,0,5,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests","Jupyter notebooks,Python,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,,Most of the time,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Random Forests,Text Analytics",Often,,,Most of the time,,Often,Often,Often,,,,,,Sometimes,,,,Sometimes,,,,,Often,,,,,,Often,,,,,64,20,10,5,1,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,Often,Most of the time,,,,,,,,,Often,,,,Sometimes,,Often,,,51-75% of projects,More internal than external,Standalone Team,ImageNet,Scale,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,1000000,ARS,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),Other","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"Data Elixir Newsletter,FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Insurance,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Rarely,100GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,R,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Time Series Analysis",,,,,,Rarely,Most of the time,Sometimes,,,,,,,,Often,,,Sometimes,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,30,15,5,40,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Often,Often,,,Often,,,,Often,,,,,,,,Often,,,,Often,,51-75% of projects,More internal than external,Central Insights Team,,Lagged data. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Subversion",Never,74000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Regression,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,High school,Insurance,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Never,1GB,,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,45,15,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Need to coordinate with IT",,Often,,,Most of the time,Often,,,,,,,,,Often,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Sometimes,"64,000",,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,Less than a year,Researcher,Self-taught,20,0,40,10,30,0,,,A bachelor's degree,Academic,10 to 19 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Other,Basic laptop (Macbook),Relational data,Never,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Random Forests",,,,,,,Often,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,80,0,0,10,10,0,Enough to run the code / standard library,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,100% of projects,More internal than external,Other,,Cleanliness and not having enough of it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,29000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Self-employed,Julia,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Arxiv,Textbook",Very useful,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Programmer",University courses,25,0,25,50,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Pharmaceutical,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Other,Sometimes,1GB,"Decision Trees,Ensemble Methods","Amazon Machine Learning,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Often,Sometimes,,,Often,,,Rarely,,,,,,,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs",,,,,,,,Sometimes,Often,Sometimes,,Often,,,,Sometimes,,,,,,,Most of the time,,,,,Most of the time,,,,,,10,35,25,10,20,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,We use none,The type of data we analyze is highly variable and can be biased based on the chemicals used to make the measurements we are using.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,132000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,46,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),,,,"Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,6 to 10 years,"Business Analyst,Software Developer/Software Engineer,Other",Work,40,0,60,0,0,0,,,High school,Other,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,10GB,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,25,20,15,25,15,0,Enough to tune the parameters properly,"Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,Often,,,,,Sometimes,,51-75% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,,Rarely,,,,9,,,,,,,,,,,,,,,,,, +Male,United States,21,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,20,20,10,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)",Decision Trees - Random Forests,A master's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,,,,"C/C++,Java,Jupyter notebooks,NoSQL,Perl,Python,Spark / MLlib,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,,,Rarely,Often,,,,,,,,,,Often,Often,,,,,,Often,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,50,0,50,0,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Subversion,Always,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by college or university,TensorFlow,Neural Nets,R,Google Search,"Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher",University courses,60,5,10,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Orange,Python,R,SQL,Tableau,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Often,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",Sometimes,,Often,,,Most of the time,Most of the time,Rarely,,,,,,Often,,Often,,Rarely,,,Often,,Rarely,,,,,,Often,Often,,,,59,20,1,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,Sometimes,,,,,,,Sometimes,,,,Often,,Often,,Most of the time,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Sometimes,200000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Australia,20,Employed full-time,,,No,Yes,Other,Perfectly,Employed by college or university,IBM SPSS Statistics,Bayesian Methods,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,edX,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United Kingdom,31,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,35,35,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Never,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,Rarely,,Often,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Segmentation,Time Series Analysis",,,Sometimes,,,,Often,Often,Often,,,Often,,,,,,,,,,,Often,,,Rarely,,,,Rarely,,,,10,40,0,40,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,,,Sometimes,,,,,Most of the time,,,,,Sometimes,Sometimes,Sometimes,Often,,10-25% of projects,More internal than external,Standalone Team,,size,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Never,60000,GBP,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,,,,,"Blogs,Company internal community,Kaggle,Stack Overflow Q&A",,Somewhat useful,,Not Useful,,,Not Useful,,,,,,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,90,0,0,0,0,10,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,R,SQL,Other",,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,Most of the time,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Neural Networks,Text Analytics,Time Series Analysis",,,Most of the time,,,,Often,Rarely,,,,,,,,Most of the time,Sometimes,,,Sometimes,,,,,,,,,Sometimes,Rarely,,,,50,25,20,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,,,,,,,,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,Business Department,,,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,South Korea,27,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"College/University,Company internal community,Kaggle,YouTube Videos",,,Somewhat useful,Not Useful,,,Somewhat useful,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,Less than a year,"Data Analyst,Data Scientist",Self-taught,100,0,0,0,0,0,Reinforcement learning,Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,1-2 years,Necessary,Necessary,Unnecessary,,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,Less than a year,"Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,60,10,0,30,0,0,Natural Language Processing,Bayesian Techniques,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Italy,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,3 to 5 years,"Computer Scientist,Researcher",Self-taught,50,20,0,30,0,0,"Natural Language Processing,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Never,1TB,"CNNs,Evolutionary Approaches,Neural Networks,RNNs,SVMs","C/C++,IBM SPSS Statistics,MATLAB/Octave,NoSQL,Python,TensorFlow",,,,Often,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Evolutionary Approaches,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Text Analytics",,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,Sometimes,Often,Most of the time,Often,,,,Most of the time,,Most of the time,,Most of the time,,,,,35,35,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,Often,,,,,Often,Often,,,,,Sometimes,,,,,,,26-50% of projects,More internal than external,IT Department,benchmark text datasets,that it is unexplored and there are no standards,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,I don't typically share data",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,20000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by non-profit or NGO,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",15,10,70,5,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,Most of the time,Often,,,,Sometimes,,Sometimes,,Sometimes,,,,,Often,,Often,,,,,Sometimes,,,,,,10,40,0,30,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,Scientific literature,Not enough of it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"102,500",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Canada,36,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Podcasts,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,"R Bloggers Blog Aggregator,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,0,65,0,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Academic,I don't know,,,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Other,,,"Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,NoSQL,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,,,,,,,Often,,Most of the time,,,,Often,Most of the time,,Sometimes,,,,,,,,,,,45,20,0,35,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,Often,,100% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,SQL,University/Non-profit research group websites,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Other,University courses,10,1,50,39,0,0,,Logistic Regression,A professional degree,Retail,"10,000 or more employees",Decreased significantly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,,,,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,Lift Analysis,Logistic Regression",Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,0,1,0,0,99,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Unavailability of/difficult access to data",Often,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Other,,Lack of a data dictionary as there is no one who has complete historical knowledge of everything that lives within our data warehouse,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,70000,,Other,6,,,,,,,,,,,,,,,,,, +Male,Canada,55,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,67,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,"FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,Engineer,University courses,20,0,20,10,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Computer Scientist,Data Scientist,Engineer,Predictive Modeler,Researcher,Other",University courses,60,10,10,10,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,I don't know,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,Other","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL,TensorFlow,Other",,,,Rarely,,,,,,,,,,,Rarely,,Most of the time,,,,Sometimes,,,,Rarely,,Sometimes,,,,Most of the time,,Sometimes,,,,,,Sometimes,,Sometimes,Most of the time,,,,Sometimes,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Often,,,,Often,,,Often,,Sometimes,,,,,,,,Often,Most of the time,,,,20,40,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Sometimes,,Sometimes,,,,Sometimes,Rarely,,,,Often,Most of the time,,,Often,Most of the time,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,"Moodys Analytics, QuantEcon",,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,83000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Julia,Random Forests,Matlab,Google Search,"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Most of the time,1GB,"CNNs,Gradient Boosted Machines,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",Often,Often,,Sometimes,,,,,Sometimes,,,,,,Often,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Natural Language Processing,Neural Networks,Text Analytics",,,,Often,,,,,,,,,,,,,,,Often,Often,,,,,,,,,Often,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input",,,,,Often,,,,,,Often,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Rarely,110000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,30,70,0,0,0,0,"Computer Vision,Natural Language Processing",Neural Networks - CNNs,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Online courses,Textbook",,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,"No Free Hunch Blog,Partially Derivative Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Operations Research Practitioner",Self-taught,35,15,0,15,35,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,1MB,Regression/Logistic Regression,"Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Rarely,,,,,,,"Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Rarely,,,Rarely,,,,,60,10,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,,,,,,,,,,,,,,,Most of the time,,,76-99% of projects,Approximately half internal and half external,Standalone Team,Federal Procurement Data; Census Data,Cleaning and linking,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Always,175000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,United States,53,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,75,0,0,10,5,,,A bachelor's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook",Very useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,"Data Elixir Newsletter,DataTau News Aggregator,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,45,10,5,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Other,100 to 499 employees,Stayed the same,More than 10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"CNNs,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",Rarely,Often,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,Text Analytics",,,,Sometimes,,Most of the time,Often,,Sometimes,,,,Sometimes,Often,,Often,,,Most of the time,Most of the time,Often,,Sometimes,,Most of the time,,Most of the time,,Most of the time,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,,,,Often,,,Most of the time,,Often,,,,,,,,,26-50% of projects,Approximately half internal and half external,Business Department,glove; fasttext,complicated taxonomies,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Git,Sometimes,,,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,Data Analyst,University courses,30,20,5,35,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Sometimes,1GB,"CNNs,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,Rarely,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Often,,,Sometimes,Sometimes,,Sometimes,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,Segmentation,SVMs",Rarely,,,Sometimes,Rarely,Often,Most of the time,,,,,,,Sometimes,Rarely,Often,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,51-75% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Never,32000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Canada,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,More than 10 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Often,Rarely,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Often,Sometimes,Sometimes,Often,,Sometimes,,Sometimes,,,,,,Often,Sometimes,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team",Sometimes,Often,,,Often,,,Often,,,,,,,,Rarely,,,,,,,51-75% of projects,Entirely internal,Business Department,Black Book; ALG; Adesa; Mannheim; AutoVIN,data integrity across systems,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,115000,CAD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Monte Carlo Methods,R,Other,"Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Very useful,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,DBA/Database Engineer,University courses,10,0,30,60,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Other,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Always,,"Decision Trees,Regression/Logistic Regression","R,SAS Base,SAS JMP",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,Often,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,Rarely,,Often,,,,,,Sometimes,Sometimes,,,,,,,Often,,,,60,10,5,5,20,0,,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,Rarely,,,Often,Sometimes,,,Sometimes,Often,,,,Most of the time,,,,Most of the time,,,Most of the time,,100% of projects,More internal than external,Standalone Team,social media; Department of buildings,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Dropbox,Other,Never,"60,000",USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that performs advanced analytics,,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Not Useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Not Useful,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,6 to 10 years,"Predictive Modeler,Programmer,Researcher",Self-taught,90,0,0,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"5,000 to 9,999 employees",Stayed the same,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Sometimes,,,,Sometimes,,,Often,Sometimes,Sometimes,Rarely,,Sometimes,,Sometimes,,,,Often,,,Most of the time,,,,60,15,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,250000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Other,University courses,0,30,40,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Sometimes,10GB,"Bayesian Techniques,CNNs,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Python,RapidMiner (commercial version),SAP BusinessObjects Predictive Analytics,TensorFlow",,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,Often,,,Sometimes,,,,,,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Often,Often,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,,Sometimes,Often,,Often,,,,,,,Often,,,,60,10,0,10,20,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,76-99% of projects,More external than internal,Business Department,,Cleaning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,Very useful,,,,,,Very useful,,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,60,10,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Government,"5,000 to 9,999 employees",Increased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,100GB,"Bayesian Techniques,Decision Trees,SVMs","IBM Cognos,Jupyter notebooks,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SAS JMP,SQL,Unix shell / awk",,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Rarely,,,,Rarely,,,Sometimes,,Sometimes,,,,,,Rarely,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Naive Bayes,Recommender Systems,SVMs",,Sometimes,Sometimes,,,,Often,Sometimes,,,,,,Rarely,Sometimes,,,Sometimes,,,,,,Sometimes,,,,Rarely,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,,Often,,,Often,,,,,,Often,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,45000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Ukraine,21,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,32,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Bayesian Methods,Python,,"Non-Kaggle online communities,Official documentation,Personal Projects",,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,25,0,0,75,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,1MB,Regression/Logistic Regression,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,,Most of the time,Often,,,,,,,,,,,,,,Often,,,,,Often,,,Often,,,,40,40,5,15,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,,,Often,,,,,Most of the time,,,,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Rarely,67000,CAD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Text Mining,Python,I collect my own data (e.g. web-scraping),"Arxiv,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher",Self-taught,70,25,5,0,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Telecommunications,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Rarely,100GB,Neural Networks,"C/C++,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,Rarely,,,,,Often,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Neural Networks",,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,70,10,3,12,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Most of the time,,,,Most of the time,,,,Often,,Often,,,Sometimes,,,,,,,,,10-25% of projects,Entirely internal,IT Department,Factual Data; SalesForce,Proprietary data comes from hundreds of disparate sources. Maintaining ETL jobs is often infeasible for a small team like this one.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,61000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Survival Analysis,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,University courses,50,17,0,33,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data,Other",Rarely,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,SQL",,Often,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Often,Often,,,,,,,Often,,,,,Often,,,,,,,Often,,Often,,,,50,25,1,10,14,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,Often,Sometimes,,Often,Sometimes,,76-99% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Other,Poorly,Employed by non-profit or NGO,Python,Monte Carlo Methods,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,Survival Analysis,Logistic Regression,"Some college/university study, no bachelor's degree",Non-profit,"5,000 to 9,999 employees",Decreased significantly,Don't know,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1TB,"Decision Trees,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,80,5,0,15,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Unavailability of/difficult access to data",Most of the time,,,,,,,,Often,,,,,,,,,,,,Most of the time,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,,Never,151000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,No,Yes,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,15+ years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,Master's degree,Yes,Master's degree,A humanities discipline,More than 10 years,Other,Self-taught,35,30,30,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Other,,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Japan,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Julia,Text Mining,Python,University/Non-profit research group websites,"Arxiv,Blogs,Friends network,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,,,"FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,"Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,35,0,25,40,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",High school,Internet-based,,,,,,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Julia,Jupyter notebooks,Python,R,SAS Base,SQL",Rarely,Sometimes,,Rarely,,,,Rarely,Rarely,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Rarely,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,Rarely,,,,Often,Most of the time,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,Rarely,Often,,,,45,4,1,25,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,Sometimes,,Most of the time,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,51-75% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"6,000,000",JPY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,,University courses,30,20,20,30,0,0,,,A professional degree,Academic,Fewer than 10 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,10MB,Bayesian Techniques,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Natural Language Processing,Text Analytics",,,Most of the time,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,50,10,0,20,20,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Sometimes,,,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,Airline Tweets,Unstructured and messy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"10,400",,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,R,Bayesian Methods,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring",,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A master's degree,Mix of fields,20 to 99 employees,,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,,"Text data,Relational data",,,,"IBM SPSS Statistics,R",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,Rarely,Rarely,,,,7,5,5,25,25,33,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,Statcan,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Ukraine,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Non-Kaggle online communities,Online courses,Trade book",Somewhat useful,Somewhat useful,Somewhat useful,,,,,,Very useful,,Very useful,,,,,Very useful,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,20 to 99 employees,Increased slightly,Less than one year,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Often,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",Most of the time,,,,,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,Sometimes,,Sometimes,,Most of the time,,,,,,,,,,,20,20,30,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database",,,,Often,,,,,Sometimes,,Often,,,,,,,Often,,,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,10500,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Germany,26,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Friends network,Kaggle,Textbook",,,Very useful,,,Very useful,Very useful,,,,,,,,Very useful,,,,No Free Hunch Blog,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,Yes,Master's degree,Management information systems,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,28,Employed full-time,,,Yes,,Operations Research Practitioner,Perfectly,Employed by government,Microsoft R Server (Formerly Revolution Analytics),Deep learning,Python,"GitHub,University/Non-profit research group websites","College/University,Conferences,Kaggle,Personal Projects,Textbook",,,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Operations Research Practitioner,Programmer,Researcher",University courses,15,15,30,40,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,Academic,I don't know,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1MB,"Decision Trees,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,C/C++,Julia,NoSQL,Python,R,SQL,Unix shell / awk",,Often,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,Sometimes,,Rarely,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Decision Trees,Evolutionary Approaches,Logistic Regression,Simulation,Text Analytics",,,,,,,,Often,,Sometimes,,,,,,Sometimes,,,,,,,,,,,Sometimes,,Most of the time,,,,,10,40,25,15,10,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,85200,BRL,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Romania,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,"FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,30,20,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A professional degree,Telecommunications,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SQL,TensorFlow,Other",Rarely,Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,Most of the time,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,,,,Most of the time,Most of the time,,,,,Most of the time,,Sometimes,,Often,,,Often,Often,Often,,Often,,Often,,,,Often,,,,,40,25,25,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,Often,,,Most of the time,Sometimes,,,,Most of the time,,Most of the time,,,,,,,100% of projects,More internal than external,Standalone Team,,Junk data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,DVC,"Bitbucket,Git",Always,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",33,33,34,0,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,RNNs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,,Often,,,Most of the time,,,,,,,Most of the time,,,,Often,,,,,,Often,,,,,30,30,30,10,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,,,,,Often,,,Often,,,,,,Often,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Argentina,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,Very useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer",University courses,40,20,15,15,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,Spark / MLlib,SQL,TensorFlow",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,Often,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,SVMs",Sometimes,,,Sometimes,Sometimes,Most of the time,Often,Often,,,,,,Often,Most of the time,,,,,,Often,,,Sometimes,Sometimes,Often,,Sometimes,,,,,,40,20,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,Most of the time,,,,,,,,,,,,Often,,,,Most of the time,,,76-99% of projects,Entirely internal,IT Department,none,"Data inconsistency, missing data","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Git",Sometimes,32000,ARS,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",R,Anomaly Detection,SQL,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Not Useful,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,"DataTau News Aggregator,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,20,5,25,0,0,Natural Language Processing,Logistic Regression,A master's degree,Non-profit,500 to 999 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Segmentation,Text Analytics",Most of the time,,,,,Sometimes,Most of the time,,,,,,,,,Often,,Rarely,Rarely,,,,,,,Often,,,Often,,,,,40,10,5,30,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,Often,Sometimes,,,,,,Sometimes,,Often,,,,,Often,,,,Often,,Most of the time,76-99% of projects,More internal than external,Other,Census data,Data isn't collected with data science study in mind. Data collection and format vary greatly from year to year. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,38000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Statistician,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Java,Anomaly Detection,SQL,University/Non-profit research group websites,"Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Statistician,Other",Self-taught,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,500 to 999 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Angoss,Python,R,SAS Enterprise Miner,SQL",,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,,,Most of the time,Sometimes,,,,,Sometimes,Most of the time,Often,,Sometimes,,,Often,,Sometimes,,,Sometimes,,,,,,,,50,20,10,5,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,Sometimes,,Sometimes,Often,,Less than 10% of projects,More internal than external,IT Department,,Accessing and refining,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Sometimes,150000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Other,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Online courses,Personal Projects,Textbook",Very useful,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,Self-taught,60,30,10,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Sometimes,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Most of the time,Often,Most of the time,Most of the time,,,,,,,Often,,,,,Most of the time,Most of the time,Often,,Sometimes,Often,Most of the time,,,Most of the time,Most of the time,Often,,,,40,50,10,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Most of the time,Sometimes,,,,,,Often,,Sometimes,,,Most of the time,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,Most academic datasets for testing,It is private data so we need to work with it while not seeing it.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,"89,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Singapore,55,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,10,20,60,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Image data,,10GB,"CNNs,Neural Networks,RNNs","C/C++,Java,Jupyter notebooks,Python,SQL,Tableau,TensorFlow",,,,Sometimes,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,RNNs",,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,20,40,0,40,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,,,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Git,Rarely,24000,CHF,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Vietnam,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,70,10,0,10,0,10,"Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Always,100MB,"Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Often,,Often,,Most of the time,Often,Most of the time,,,Most of the time,,Often,Sometimes,Sometimes,,,,Sometimes,Sometimes,,Often,Often,,Often,Sometimes,Sometimes,Sometimes,Often,,,,30,10,10,10,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",Most of the time,Often,,,Most of the time,,,,Most of the time,Often,,,Often,Sometimes,,,,Often,,,,,100% of projects,More internal than external,Central Insights Team,Social data,Collect data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,18000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,46,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO,Employed by government",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,Researcher,Self-taught,40,10,10,30,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,Most of the time,10GB,"HMMs,Regression/Logistic Regression","IBM Watson / Waton Analytics,Java,Perl,Python,R,Unix shell / awk",,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,,Often,Most of the time,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Segmentation",,,Rarely,,,,Often,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,10,30,20,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,,,,,Sometimes,,,,,,,,Often,,,,Often,,100% of projects,Entirely internal,Standalone Team,genomics data,acquiring positive/negative controls on which to develop algorithms ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,"Bitbucket,Git",Rarely,63000,CAD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Friends network,Newsletters,Official documentation,Personal Projects,Podcasts,Textbook",,,,,,,,,,,,,,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Statistician,Other",Work,30,20,30,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Stan,Unix shell / awk,Other",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Often,,,,,,,,Rarely,,Rarely,,,,,Often,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,,,,,Often,,Sometimes,Often,,Sometimes,,Sometimes,,,,,,Most of the time,Most of the time,,,,10,20,60,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,142000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,edX,Other",GPU accelerated Workstation,40+,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Other,3,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,34,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Not Useful,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Psychology,3 to 5 years,Other,Other,10,50,20,0,10,10,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,Often,Sometimes,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Random Forests,Segmentation,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Often,,,,,,Sometimes,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,,,Sometimes,,,,Sometimes,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Most of the time,Often,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,,Most of the time,,,Sometimes,Sometimes,,Most of the time,Often,,100% of projects,Entirely internal,Other,Fannie Mae; Freddie Mac; Teranet; Equifax; Oxford Economics,Availability,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,90000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,DataRobot,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Scientist,Other",University courses,20,20,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Other",,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,Rarely,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,Most of the time,Often,Most of the time,Sometimes,,,Often,,Sometimes,,Often,,,,Sometimes,,,Most of the time,,,,,,,Sometimes,,,,60,15,5,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Often,,,Most of the time,,,Most of the time,Often,,,,Sometimes,,,,,,Sometimes,,,,51-75% of projects,More internal than external,IT Department,Factset; Mintigo; Intricately;,Lack of clarity on business rules and definitions (i.e. how to define a sale),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,105000,BRL,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Julia,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Personal Projects,Tutoring/mentoring",Very useful,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,Somewhat useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,Data Scientist,Self-taught,95,5,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Other,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Sometimes,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,TensorFlow",,Rarely,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Lift Analysis,Natural Language Processing,Neural Networks,Prescriptive Modeling,Simulation,Time Series Analysis",Sometimes,,Often,,,,,,,,,,,,Often,,,,Sometimes,Sometimes,,Often,,,,,Sometimes,,,Often,,,,60,30,10,0,0,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,51-75% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Other",,Git,Rarely,120000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,"Coursera,DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),,Other,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,I collect my own data (e.g. web-scraping),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Business Analyst,Predictive Modeler",Self-taught,90,10,0,0,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Other,Always,1TB,Regression/Logistic Regression,"Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Ensemble Methods,Segmentation",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,30,50,10,0,10,0,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,2000000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler,Statistician",Self-taught,30,5,30,30,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Financial,"5,000 to 9,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Statistics,Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau",,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,Often,Often,Often,,Often,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,Often,Often,,Often,Often,,,,40,10,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT",Often,Often,,,,Often,,,Most of the time,,,,Often,,Most of the time,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,110000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Trade book",,Very useful,,,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,,,Very useful,,,"Data Elixir Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,University courses,20,30,25,15,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,TensorFlow",,Sometimes,,,,,,,Often,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,Sometimes,,,Most of the time,Often,Sometimes,Often,,,Often,,,,Often,,,Often,Sometimes,Most of the time,,Often,,,Most of the time,,,Often,,,,,35,20,15,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,Gathering it because of internal politics.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,110000,BRL,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Belarus,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,R,Random Forests,Java,GitHub,"Company internal community,Kaggle,Personal Projects",,,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",15,70,5,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,R",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation",Often,,Often,,,Often,Often,Often,,,,,,,,Often,,,,,,,Often,,,,Often,,,,,,,20,20,20,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Often,,,,Often,,Often,,,,,,Often,,Often,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,,"20,000",BYN,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Brazil,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,"FastML Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Physics,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,10,0,40,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs",,,,Sometimes,,Most of the time,Often,Often,Most of the time,,,Most of the time,,Sometimes,,Often,,Often,,Often,Most of the time,,Often,,Sometimes,,,Often,,,,,,10,70,5,5,10,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,10-25% of projects,Entirely internal,IT Department,kaggle;worldbank,Lack of documentation or some sort of metadata.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,BRL,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Ukraine,66,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Non-Kaggle online communities,Textbook",Somewhat useful,,,,,,Very useful,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,DBA/Database Engineer,Self-taught,80,0,0,0,20,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,10GB,Random Forests,C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Random Forests",,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,20,40,0,10,0,30,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,10000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Company internal community,Conferences,Official documentation",,,,Very useful,Very useful,,,,,Somewhat useful,,,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,6 to 10 years,Other,Other,25,0,25,25,0,25,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Logistic Regression",High school,CRM/Marketing,100 to 499 employees,Stayed the same,6-10 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,10PB,"Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,Most of the time,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,Rarely,,,"Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Segmentation",,,,,,,Most of the time,,,,,,,Often,Sometimes,Most of the time,,,Sometimes,,,,,,,Sometimes,,,,,,,,10,5,40,20,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,"Census (ACS), GIS polygons",Sampling changes due to system health concerns,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,140000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,,,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,I haven't started working yet",University courses,5,0,10,80,5,0,"Natural Language Processing,Time Series","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,25,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,Python,Text Mining,Python,Google Search,"College/University,Kaggle,Textbook",,,Very useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Computer Scientist,Data Scientist",Work,30,0,40,30,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python",,Often,,,,,,,,,,,,,Often,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Random Forests",,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,30,30,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues",Sometimes,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,26-50% of projects,Entirely internal,,,Unreliable data; sparsity,,Email,,Subversion,Rarely,"70,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Russia,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Textbook",,Somewhat useful,,,Very useful,,Very useful,,,,,,,,Somewhat useful,,,,"Data Elixir Newsletter,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A master's degree,Telecommunications,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,,,,"Google Cloud Compute,NoSQL,Python,R,SQL,Tableau",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Often,Most of the time,,,,20,10,10,30,30,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,France,50,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Text Mining,R,Other,"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,Other","Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,,Often,Often,,,,,,Often,,Most of the time,,Sometimes,,,Sometimes,,Often,,,Most of the time,,,,Sometimes,,,,60,10,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,Often,,,,Most of the time,,,,,,,Often,Sometimes,,Less than 10% of projects,More internal than external,Business Department,datasets from data.gouv.fr,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"60,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +A different identity,United States,61,Retired,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,10,0,0,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Anomaly Detection,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Other,Other",,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,0,0,10,10,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Netherlands,62,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Oracle Data Mining/ Oracle R Enterprise,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook",,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,,,Very useful,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,More than 10 years,"Researcher,Other",University courses,0,0,80,20,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Rarely,10TB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SAS JMP,Statistica (Quest/Dell-formerly Statsoft),Tableau,TensorFlow,TIBCO Spotfire",,,,Often,,,,,,Often,Sometimes,Sometimes,Sometimes,,Most of the time,,,,,Sometimes,Sometimes,,Often,Sometimes,Often,,,Most of the time,,Sometimes,Often,Sometimes,Most of the time,,,,,Often,Sometimes,Often,,,,Sometimes,Sometimes,Sometimes,Often,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Most of the time,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,Most of the time,,Often,,,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,,,55,5,5,30,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Sometimes,Most of the time,Sometimes,,Sometimes,,,,,,,,Most of the time,,100% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,Subversion,Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Flume,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Personal Projects,Textbook",Somewhat useful,Somewhat useful,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Researcher,Other",Work,15,5,20,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Rarely,10GB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Sometimes,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",Rarely,,Sometimes,,Rarely,Most of the time,Most of the time,,,,,,,Most of the time,,Most of the time,,Often,Most of the time,Often,Often,,,,Often,,,Often,Most of the time,,,,,15,50,5,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,,,,,,Sometimes,,,Often,,,Rarely,,,76-99% of projects,Entirely internal,Standalone Team,Twitter; yelp; ,Terms of service/inability to distribute,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Web hosted,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Greece,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,Engineer,University courses,20,20,0,60,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Nigeria,32,Employed full-time,,,Yes,,Business Analyst,,Employed by professional services/consulting firm,Tableau,MARS,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Newsletters,Stack Overflow Q&A",,,,,Very useful,,,Very useful,,,,,,Very useful,,,,,"Data Machina Newsletter,DataTau News Aggregator,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,1 to 2 years,"Computer Scientist,Data Analyst,Programmer",Work,10,40,50,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Random Forests","Amazon Machine Learning,Oracle Data Mining/ Oracle R Enterprise,Python,Tableau",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Natural Language Processing,Text Analytics,Time Series Analysis",,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,10,10,10,60,10,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Most of the time,Sometimes,,,,,,,Often,,Often,,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Mercurial",,90000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,62,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,"Employed by a company that performs advanced analytics,Self-employed",Python,Deep learning,R,I collect my own data (e.g. web-scraping),"Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Researcher,Statistician,Other",University courses,40,0,30,30,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs",High school,Mix of fields,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,Most of the time,100PB,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Mathematica,Python,R,Stan",,Rarely,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Often,,,,,,,,,"Bayesian Techniques,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,Most of the time,,,,Often,,,,,,Sometimes,Often,,Most of the time,,Sometimes,,,,,,,,Often,Sometimes,,Rarely,Often,,,,50,30,10,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,,Often,,,,,,,,,,,,Sometimes,,10-25% of projects,More external than internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,IBM Watson / Waton Analytics,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"College/University,Conferences,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,90,0,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,<1MB,Random Forests,"C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Java,MATLAB/Octave,Microsoft SQL Server Data Mining,R,SQL",,,,Rarely,,,,,,,Often,Often,,,Rarely,,,,,,Rarely,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Logistic Regression,Random Forests,Segmentation",,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,Sometimes,,,,,,,,10,20,0,10,60,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Sometimes,,,,,,,,Often,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Neural Nets,Python,GitHub,College/University,,,Not Useful,,,,,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,Traditional Workstation,0 - 1 hour,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,95,0,0,5,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,Brazil,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,Very useful,,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,60,0,20,10,10,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Most of the time,100GB,"GANs,HMMs,Regression/Logistic Regression,RNNs,SVMs","Java,MATLAB/Octave,NoSQL,Python,R,SQL,Other",,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,Rarely,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,,,,,Most of the time,,,"HMMs,kNN and Other Clustering,RNNs,Segmentation,SVMs,Time Series Analysis",,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,Often,Most of the time,,Often,,Often,,,,40,30,15,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",Most of the time,,,,Often,,,,Sometimes,,,,,,,,,Sometimes,,,,,26-50% of projects,More internal than external,IT Department,kaggle data,time processing,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Git,Subversion",Sometimes,180,BRL,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,NA,Employed full-time,,,Yes,,Data Analyst,,,,,,,"Arxiv,Blogs,Conferences,Friends network",Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,,,30,30,30,10,0,0,,,,Financial,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Neural Networks,SVMs","IBM Watson / Waton Analytics,Java,Jupyter notebooks,Mathematica,Microsoft Excel Data Mining,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,Rarely,,Sometimes,,Sometimes,,,Rarely,,,Sometimes,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Segmentation,SVMs,Text Analytics",Often,Most of the time,,,,Often,Most of the time,,,,,,,,,Often,,Often,Most of the time,,,,,,,Often,,Often,Most of the time,,,,,70,20,10,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Sometimes,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,Most of the time,,,,,,,76-99% of projects,,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",,,,,7,,,,,,,,,,,,,,,,,, +Male,Canada,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Not Useful,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,25,25,20,25,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Often,Often,,,Often,,,Rarely,,,,,,,"Decision Trees,Lift Analysis,Logistic Regression",,,,,,,,Often,,,,,,,Rarely,Often,,,,,,,,,,,,,,,,,,90,4,3,2,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,Sometimes,Rarely,,,,,,Often,,,,,Most of the time,,,10-25% of projects,More internal than external,Other,G5;Environics,multiple database ; different level of aggregation ; not enough infos,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,,Sometimes,85000,CAD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,MATLAB/Octave,Random Forests,Matlab,GitHub,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,Survival Analysis,Logistic Regression,A bachelor's degree,Academic,I don't know,Increased significantly,More than 10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Rarely,1GB,Markov Logic Networks,Amazon Machine Learning,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,30,30,0,0,Enough to run the code / standard library,"Dirty data,Privacy issues,Other",,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,Sometimes,10-25% of projects,,Standalone Team,,,Graph (e.g. GraphBase/Neo4j),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,30000,GBP,,6,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by non-profit or NGO,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,75,5,0,10,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,20 to 99 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Sometimes,Often,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Often,Often,Often,Sometimes,,Most of the time,,Most of the time,,Most of the time,Often,Often,Most of the time,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,Often,Sometimes,,,,Most of the time,,Sometimes,,Often,Often,Most of the time,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,57000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Proprietary Algorithms,Python,Google Search,"Arxiv,College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,Linear Digressions Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Researcher",University courses,0,0,0,100,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Not important +Male,United States,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",45,45,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Russia,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Java,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Non-Kaggle online communities,Official documentation",Very useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Researcher",Kaggle competitions,50,0,10,10,30,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,10GB,"CNNs,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Rarely,,,,Often,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Text Analytics",Often,,,Often,Often,Most of the time,Most of the time,Most of the time,Most of the time,,Rarely,,,Sometimes,,Most of the time,,,Often,Often,,,Sometimes,Often,Often,,,,Often,,,,,45,15,30,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,,,,,,,Most of the time,,Often,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,"Undocumented unexpected changes over time, dead features","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,55000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Company internal community,Conferences,Stack Overflow Q&A,YouTube Videos",,,Very useful,Not Useful,Somewhat useful,,,,,,,,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler",Work,40,0,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM Cognos,Jupyter notebooks,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Enterprise Miner,SQL,Tableau",,Rarely,,,,,,,,Rarely,,,,,,,Most of the time,,,,,,,,Often,,,Often,,,Often,,Most of the time,,,,,,Rarely,,,Most of the time,,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",Rarely,Rarely,Rarely,,,,Most of the time,Sometimes,,,,,,Sometimes,Most of the time,Most of the time,,,Often,,Sometimes,Most of the time,Often,,,,,Often,Most of the time,Sometimes,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,Sometimes,Most of the time,,,Sometimes,,Sometimes,Sometimes,,,Most of the time,Sometimes,,,Most of the time,Most of the time,Sometimes,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"92,000",,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Hong Kong,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Software Developer/Software Engineer,Other",Other,50,0,0,0,10,40,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,R,SQL",,Sometimes,,,,,,Often,,,,,,,Sometimes,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,"Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests",,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,Sometimes,Most of the time,,,,,,,,,,,85,5,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,Most of the time,,,,,,Sometimes,,10-25% of projects,Entirely internal,IT Department,none,Data wrangling to create features for the models,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)",I don't typically share data,,"Bitbucket,Subversion",Rarely,1500000,HKD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,Statistica (Quest/Dell-formerly Statsoft),Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Not Useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Not Useful,KDnuggets Blog,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",0,0,0,100,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Canada,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Not Useful,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,Business Analyst,University courses,20,20,15,45,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,CRM/Marketing,"1,000 to 4,999 employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Always,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression",,Rarely,,,,Sometimes,Most of the time,Most of the time,,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,25,25,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,Network Drives,Other,Rarely,80000,CAD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Other,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,Engineer,Work,80,5,15,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,SVMs","Java,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,Often,Often,,,Most of the time,,Often,Often,,,,,Most of the time,,,,Often,,,Most of the time,,Often,,,,,Often,Often,,,,,25,25,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,Often,,,,Often,,,Often,,,Often,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,uci repository,computation power,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Sometimes,15333,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Stan,,,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A health science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,IBM Watson / Waton Analytics,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Stan,TensorFlow",Often,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,Rarely,Often,Rarely,,,Rarely,,,,,,"A/B Testing,Association Rules,Neural Networks",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,50,50,0,0,0,0,Enough to tune the parameters properly,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,,Email,,,,90000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Perfectly,Self-employed,Amazon Machine Learning,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,Very useful,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Other,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,Very useful,,Not Useful,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",Kaggle competitions,30,0,0,0,70,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data,Other",,,,"Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Logistic Regression,Other",,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Rarely,80,0,0,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Rarely,385000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,62,"Not employed, but looking for work",,,,,,,,Other,Other,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Newsletters,Official documentation,Online courses,Podcasts,Textbook,Trade book,YouTube Videos,Other",Very useful,Very useful,Very useful,,Somewhat useful,,,Very useful,,Very useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity,Other","Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation",40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Canada,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Social Network Analysis,Python,GitHub,"Blogs,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,Not Useful,Very useful,Very useful,,,,"Data Elixir Newsletter,Data Stories Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,30,0,40,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Gradient Boosting,Neural Networks - RNNs",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Rarely,,Neural Networks,"C/C++,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Sometimes,,,Most of the time,,,,"Cross-Validation,Data Visualization,Neural Networks,Text Analytics",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,20,0,30,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Sometimes,,,,,Sometimes,,,,,,Most of the time,,Most of the time,Rarely,Often,,Most of the time,,51-75% of projects,More external than internal,IT Department,"1000 Genomes, dbSNP, Human Genome Diversity Project, UCSC Public Data, Sanger Catalogue of Somatic Mutations in Cancer (COSMIC), ","Lack of gold standards for sharing, manipulating and interpreting genetic data ","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,65000,CAD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,SQL,Monte Carlo Methods,Python,Government website,"Official documentation,Personal Projects,Tutoring/mentoring",,,,,,,,,,Very useful,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),More than 10 years,Other,Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Manufacturing,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Minitab,Python,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,Sometimes,,Often,,,Often,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Simulation",,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,20,10,10,40,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,Often,,,,,,,,,,,,,Sometimes,,,Most of the time,,100% of projects,More internal than external,Other,,,Other,Share Drive/SharePoint,,Git,Sometimes,200000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Time Series Analysis,Python,Government website,"Company internal community,Stack Overflow Q&A,Textbook",,,,Not Useful,,,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Other",Self-taught,60,20,0,20,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",,Don't know,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",,,,"C/C++,Python,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,40,0,0,20,40,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",,Often,,,,,,,Often,,,,Often,,Often,,,Often,,,Often,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,96000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Italy,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,,3-5 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,10,10,10,65,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,Other,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,,,Not Useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,I never declared a major,1 to 2 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,40,10,20,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Rarely,100MB,"Bayesian Techniques,Decision Trees,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,TensorFlow",,Most of the time,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,,,,Often,Most of the time,,,,,,,Sometimes,,,,Often,,Sometimes,Often,,,Often,,,,,,,,,,30,10,30,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,,,,Often,,Most of the time,,,,,Often,,,,,,,26-50% of projects,More external than internal,Central Insights Team,,Explore findings.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,12000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,Very useful,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,,Kaggle competitions,33,0,34,0,33,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,,Other,"Amazon Web services,Cloudera,DataRobot,Hadoop/Hive/Pig,Java,Julia,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,Sometimes,Rarely,,,Sometimes,,,,,,Often,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,Often,,,,Sometimes,,Often,,,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Time Series Analysis",,,Sometimes,,,,,Sometimes,,,,Sometimes,,Often,,Often,,Sometimes,,Sometimes,,Often,,,,,,,,Often,,,,0,25,25,50,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,Often,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,Often,,100% of projects,More internal than external,Other,gdelt,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Rarely,74000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Ukraine,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer",Self-taught,60,10,0,0,30,0,"Natural Language Processing,Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Israel,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Julia,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Stack Overflow Q&A",Not Useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,60,NA,30,5,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Most of the time,,,,,,Sometimes,Most of the time,,,,,,Sometimes,,Often,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,Often,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,Most of the time,Often,,,,,Sometimes,,Sometimes,,,,,,,Often,,,,73,2,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,Most of the time,,Often,Often,,,,,Sometimes,,,Sometimes,,Often,Sometimes,,,76-99% of projects,More external than internal,Standalone Team,none,joining multiple data sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Amazon S3,Git,Rarely,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed part-time,,,No,Yes,Data Scientist,,Employed by college or university,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Non-Kaggle online communities,YouTube Videos",,Somewhat useful,Very useful,,,,,,Very useful,,,,,,,,,Somewhat useful,,< 1 year,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,,Some college/university study without earning a bachelor's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,40,0,50,0,0,,Logistic Regression,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important +Male,France,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Kaggle,Trade book",Somewhat useful,,,,,,Somewhat useful,,,,,,,,,Somewhat useful,,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,20,10,30,30,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,Fewer than 10 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Java,Python,R,Spark / MLlib",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Often,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs",,,Sometimes,,Sometimes,,Often,Most of the time,,,,,,,,Often,,Sometimes,,Sometimes,,,Most of the time,Sometimes,,,,Sometimes,,,,,,30,30,10,20,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues",,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,26-50% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Don't know,36800,EUR,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Other,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Non-Kaggle online communities,Personal Projects,YouTube Videos",,,,,,,,,Somewhat useful,,,Very useful,,,,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Other",Other,20,10,20,20,NA,30,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Non-profit,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,Often,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Sometimes,,,,,,,,Often,Rarely,,,,Sometimes,,Rarely,,,,Sometimes,,,Often,,,,35,15,25,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Sometimes,,,Often,Often,,,,,,,Most of the time,,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,FRED2;BLS;Census,sparsity--we often dont have enough observations to do much,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,117000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Italy,52,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,R,Social Network Analysis,R,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Non-Kaggle online communities",,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Data Miner,Data Scientist,Researcher,Statistician",University courses,20,0,30,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",High school,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1MB,"Bayesian Techniques,Decision Trees,HMMs,Markov Logic Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Evolutionary Approaches,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Most of the time,,Often,,Rarely,,,,,Rarely,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,,,Sometimes,,Sometimes,Often,,,,30,20,10,10,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,,,,,,,,Often,,,,,,,76-99% of projects,Entirely external,Standalone Team,,,,Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,"100,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,Taiwan,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Conferences,Newsletters,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,Very useful,,,Very useful,,"Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Management information systems,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,50,5,5,30,10,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important,Very Important +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,R,Google Search,"Blogs,Friends network,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Very useful,,,,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer",University courses,40,20,10,15,15,0,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,"5,000 to 9,999 employees",Decreased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Neural Networks,Random Forests,Time Series Analysis",,Often,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,10,40,15,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Sometimes,,Often,Sometimes,,,,,,,,Often,Most of the time,,100% of projects,Approximately half internal and half external,Standalone Team,,Ambiguity and similarity in units,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Other,Rarely,380000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +A different identity,United States,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,No,Yes,Engineer,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Employed by non-profit or NGO,Employed by government",Python,Neural Nets,Python,GitHub,"Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",59,40,1,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics,Employed by government",TensorFlow,Neural Nets,Python,GitHub,"Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Researcher,Other",Self-taught,75,10,10,5,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,SQL,TensorFlow,Other",,,,,,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,Often,,,,,,Often,,Often,,,Rarely,Sometimes,,Often,Sometimes,,,,,Sometimes,Sometimes,Often,,,,60,15,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Often,,,,,Most of the time,,,,Most of the time,,Most of the time,,Often,,,Often,,,10-25% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,190000,USD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Philippines,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Amazon Web services,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Personal Projects,Stack Overflow Q&A",,Very useful,,,Very useful,,,,,,,Very useful,,Very useful,,,,,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Computer Vision,,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,41,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts",,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,,,"Data Elixir Newsletter,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Operations Research Practitioner,Researcher,Other",University courses,25,25,15,30,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Random Forests","Amazon Machine Learning,Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",Rarely,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,Sometimes,,Rarely,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics",Sometimes,,,,Often,Often,,,,Often,,,,Often,,,,Sometimes,Often,,Often,,Often,Most of the time,,,Often,,Sometimes,,,,,35,10,30,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,Most of the time,,,Most of the time,Sometimes,,,,,Sometimes,,Often,,,Sometimes,,,,10-25% of projects,Approximately half internal and half external,Other,,"Inconsistent labels, facts not in evidence in the data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Never,90000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,SAS Enterprise Miner,Anomaly Detection,SAS,University/Non-profit research group websites,"College/University,Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Trade book",,,Somewhat useful,Somewhat useful,Not Useful,,Very useful,,,,Very useful,,,Not Useful,Somewhat useful,Somewhat useful,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Computer Scientist,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler",University courses,10,20,15,40,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Perl,Python,SAS Base,SAS Enterprise Miner,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Often,,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,Most of the time,Most of the time,,,Often,Most of the time,Most of the time,,Most of the time,,,Often,Often,,Often,Often,,,,40,10,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",Most of the time,Often,,,,,,Often,,,,,,,Often,,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,210000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,50,10,20,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Rarely,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,R,RapidMiner (free version),Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,Sometimes,Most of the time,,,,,,Often,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,Rarely,Sometimes,Most of the time,,,,,,,,,,,,,,Rarely,,Sometimes,Sometimes,,,,Sometimes,Often,Often,,,,20,10,20,20,30,0,Enough to explain the algorithm to someone non-technical,"Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,Most of the time,Often,,,,Sometimes,,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,100000,BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,France,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Researcher,Statistician",University courses,20,5,0,70,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)",I don't know/not sure,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,R,SQL",,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,Often,Rarely,Most of the time,,Sometimes,,,,,Most of the time,,Often,,,Often,Sometimes,,,Often,,,,30,20,5,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,,,,,,,Sometimes,Sometimes,,,Often,,76-99% of projects,Do not know,Business Department,,"Understanding the data, cleaning them and extract valuable insight","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"64,000",EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,6 to 10 years,"Researcher,Other",University courses,30,0,10,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,SVMs","Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,Sometimes,,,Most of the time,,,,"Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,,Often,Sometimes,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,Most of the time,Most of the time,,,,,40,15,15,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Often,,,,,,Sometimes,,100% of projects,Do not know,Standalone Team,WordNet; FrameNet,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,63000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,15+ years,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Mathematics or statistics,More than 10 years,Software Developer/Software Engineer,Kaggle competitions,50,0,0,0,50,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Other,"Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,6 to 10 years,"Data Analyst,Data Scientist,Researcher,Other",Self-taught,50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"5,000 to 9,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Other,"Text data,Relational data,Other",Rarely,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,Often,,,Most of the time,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,Rarely,Often,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Often,Most of the time,,Most of the time,,Often,Most of the time,,Most of the time,Sometimes,Sometimes,Often,Sometimes,Most of the time,Sometimes,Often,,,,75,10,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data,Other",Often,Often,,,,Sometimes,,,,,,,,,Most of the time,,Often,,,,Often,Often,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git,Subversion",Sometimes,"130,000",,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Kaggle,Online courses,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,30,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,Fewer than 10 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow",,Rarely,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,Often,,,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,Often,,,,Most of the time,,Often,,,,,,,Often,,Rarely,,Often,Often,,,,,,,,,Often,,,,30,20,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Rarely,,Sometimes,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,76-99% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,180000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,,,,"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,0,30,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,CRM/Marketing,100 to 499 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,,Most of the time,,,Sometimes,,Most of the time,,,,,Most of the time,Sometimes,Often,,,,40,20,0,20,20,0,Enough to tune the parameters properly,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,100% of projects,Do not know,Other,,,,,,,,600000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,70,Retired,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Random Forests,SQL,,"Kaggle,Non-Kaggle online communities",,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Business Analyst,Work,0,0,100,0,0,0,,,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Gradient Boosting,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Belgium,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,Other,I don't plan on learning a new ML/DS method,Python,Government website,"Company internal community,Conferences",,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,Data Miner,Self-taught,80,0,10,5,0,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Mix of fields,,,,,Not at all important,Other,Other,Relational data,Always,10TB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,NoSQL,Python,R,Tableau,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,Rarely,,,,,,,,,,,,Sometimes,,,,Most of the time,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis,Other",,,Rarely,,Sometimes,Most of the time,Most of the time,Sometimes,Rarely,,,Rarely,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,Rarely,Rarely,Sometimes,,Sometimes,Rarely,,Sometimes,Sometimes,Most of the time,,,90,5,1,2,2,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Need to coordinate with IT",Sometimes,,,,Often,Sometimes,,,,,,,,,Often,,,,,,,,100% of projects,Entirely internal,Standalone Team,Governemantal data (but not very often),Slow access when data is stored into hadoop,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,Network Drive,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,200000,EUR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Czech Republic,29,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,Somewhat useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +A different identity,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,1 to 2 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",25,25,20,0,0,30,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,6-10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Often,,,,,,,,,Sometimes,,,Often,,Sometimes,,Sometimes,,,,,,,,,,,20,10,10,10,50,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,Sometimes,,,,,,Sometimes,,,,,Sometimes,,,Often,Often,,Often,76-99% of projects,More internal than external,Standalone Team,,possibility that there aren't compelling insights to be found,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,115000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,68,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,Other,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,Very useful,,,Very useful,Very useful,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Other",Self-taught,30,50,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Other,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Bayesian Techniques,Ensemble Methods,Regression/Logistic Regression","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Recommender Systems,Segmentation,Simulation,Time Series Analysis",Most of the time,,Most of the time,,,Most of the time,Most of the time,Often,Often,,,,,,Rarely,,,,,,,,,Rarely,,Often,Often,,,Often,,,,10,25,25,20,20,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,26-50% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Always,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Africa,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Other,Python,Google Search,"Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Not Useful,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Professional degree,,Less than a year,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",5,85,5,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,Less than one year,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Relational data,Other",,100GB,"CNNs,Neural Networks,Random Forests,RNNs","Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Often,,,,,,"CNNs,Neural Networks,Random Forests,RNNs",,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,Often,,Often,,,,,,,,,75,10,10,5,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,,,,,,Most of the time,,Often,,,76-99% of projects,Entirely internal,Central Insights Team,,It's sheer size - last time I checked we had 6.6bn lines in our one source alone.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git,Subversion",,900000,ZAR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,44,Employed part-time,,,Yes,,Business Analyst,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,1 to 2 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,100MB,Bayesian Techniques,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,PCA and Dimensionality Reduction,Time Series Analysis",,,Rarely,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,50,20,0,30,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,Often,Often,,,Most of the time,,,Often,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Git,Never,90000,BRL,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos,Other,Other",,Very useful,,,,,Very useful,Somewhat useful,,,,,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,15,0,15,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Netherlands,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,R,Text Mining,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,,Necessary,,,Necessary,Necessary,Nice to have,,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Other,27,0,0,0,38,35,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,16-20,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important +Female,United States,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",,,,"Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Rarely,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,Rarely,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,,Most of the time,Often,,100% of projects,Approximately half internal and half external,Other,,No standards for data collection,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Git,Most of the time,110000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",1-2 years,,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,DataCamp,Traditional Workstation,2 - 10 hours,Master's degree,No,Bachelor's degree,A humanities discipline,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Singapore,18,Employed full-time,,,No,Yes,Other,Poorly,Employed by government,Other,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Very useful,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,I never declared a major,1 to 2 years,I haven't started working yet,Self-taught,80,0,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs",A doctoral degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important +Male,Canada,48,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,6 to 10 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",5,5,50,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service","Text data,Other",Sometimes,1TB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Rarely,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,Sometimes,,,Sometimes,Often,,,,,,,,,,,Often,Most of the time,Sometimes,,Often,,,,,,,Most of the time,Most of the time,,,,40,10,10,10,20,10,Enough to refine and innovate on the algorithm,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Sometimes,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other","Share Drive/SharePoint,Other",FTP,Git,Most of the time,107000,CAD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,NoSQL,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Friends network,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,,,Somewhat useful,Very useful,Very useful,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,1 to 2 years,"Data Analyst,Other",Work,50,25,25,0,0,0,,,A master's degree,Other,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,,"IBM Cognos,IBM SPSS Statistics,Python,QlikView,R,SQL,Tableau",,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,Often,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Other",,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,50,0,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",,,,,Most of the time,,,Sometimes,,,,,Often,,,,,,Often,Most of the time,Often,Sometimes,51-75% of projects,More internal than external,Business Department,,Messy data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"70,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Google Search,Government website","Blogs,College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,,University courses,10,5,5,80,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Other,Traditional Workstation,Text data,Never,<1MB,Random Forests,"Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Simulation",,,,,,Rarely,Often,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,2,0,0,8,10,80,Enough to tune the parameters properly,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,70000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,68,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,Mathematica,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,20,10,0,20,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,,,,,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Rarely,100MB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL",Rarely,Rarely,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,40,40,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,76-99% of projects,Do not know,Standalone Team,none,context,Graph (e.g. GraphBase/Neo4j),I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Russia,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"DBA/Database Engineer,Engineer,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",35,40,10,0,15,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Relational data",Never,1GB,"CNNs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Often,Rarely,Rarely,Often,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,Sometimes,,Often,,,Rarely,,Rarely,,Often,,,Sometimes,,Rarely,Rarely,,,,,60,5,0,30,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,150000,RUB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Cluster Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,Very useful,,Somewhat useful,,"FastML Blog,FlowingData Blog,KDnuggets Blog",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",40,50,3,5,2,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important +Male,United States,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",Other,20,0,40,40,0,0,"Adversarial Learning,Computer Vision","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,Other,University courses,10,10,20,50,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"Amazon Web services,R",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,Random Forests,Simulation,Time Series Analysis",Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,,,,Rarely,,,Often,,,,30,30,10,15,15,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Always,,,,5,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts",,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",,1TB,"Decision Trees,Evolutionary Approaches,Regression/Logistic Regression","Hadoop/Hive/Pig,NoSQL,Perl,Python,Tableau,Unix shell / awk,Other",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,Often,Most of the time,,,,,,,,,,,,,,Often,,,Often,Most of the time,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,Often,,,Often,,,,,,,,,,Often,Often,,,,65,0,15,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,Sometimes,,,Most of the time,,,Often,,,Most of the time,,,26-50% of projects,Approximately half internal and half external,Other,,"messy, non tabular","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,110000,USD,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,500 to 999 employees,Decreased slightly,Don't know,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Never,10MB,Regression/Logistic Regression,"Java,MATLAB/Octave,R",,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",Most of the time,,,,,,,,,,,,,,,Often,,,,Often,Sometimes,,,,,,,,,,,,,10,70,0,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Sometimes,Most of the time,,,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,,,Most of the time,,,None,More internal than external,Standalone Team,,,"Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Bitbucket,Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Not Useful,"DataTau News Aggregator,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,TIBCO Spotfire,Unix shell / awk",Rarely,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,Sometimes,Sometimes,,Often,Often,,,,,Rarely,Rarely,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation",,,Sometimes,,,,Often,Often,Sometimes,,,Most of the time,,Most of the time,,Rarely,,,Often,Rarely,,,Often,,,Most of the time,,,,,,,,40,15,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Rarely,,,,,Sometimes,,,Most of the time,,,,,,,,Rarely,,Often,,Often,,100% of projects,More internal than external,Standalone Team,Cable Provider Set-top Box Data; Experian demographics; Census projections,Size and clear documentation/dictionaries,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,88000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",R,GitHub,"Kaggle,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,,,,Necessary,Necessary,,Necessary,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,30,30,10,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,Very Important,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,60,20,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Always,1GB,"Ensemble Methods,Regression/Logistic Regression","Microsoft Excel Data Mining,Orange,R,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Rarely,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Segmentation",,,,,,Often,Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,,,,,,,,45,30,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Other",,,,,Often,,,,,,,Often,,Often,,,,,,,,Most of the time,26-50% of projects,Entirely internal,Standalone Team,n/a,verify data integrity,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,Other,Never,310000000,VND,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Jupyter notebooks,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Official documentation,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,Very useful,,,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer",Self-taught,40,0,30,10,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Other,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,Often,,Sometimes,,,,,,,,,,,60,20,9,10,1,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Sometimes,Most of the time,,,,Most of the time,,,,Rarely,,,,,,,Rarely,,,76-99% of projects,Do not know,Business Department,cansim,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Other,Never,83001,CAD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Tableau,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,1 to 2 years,I haven't started working yet,University courses,10,10,0,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Male,United States,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,34,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Text Mining,SQL,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Programmer",Self-taught,40,40,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Telecommunications,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","IBM SPSS Modeler,Python,SQL",,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Segmentation",Most of the time,,,,,,Most of the time,Often,,,,,,,,Often,,,,Sometimes,,,,,,Sometimes,,,,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,,Often,,,Often,,,,,,Often,,Often,,,Often,Often,,76-99% of projects,More external than internal,Business Department,no,big volumes,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,480000,UAH,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,Somewhat useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,0,10,20,70,0,0,Computer Vision,"Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,RapidMiner (free version),Support Vector Machines (SVM),,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,Less than a year,Engineer,University courses,10,30,10,30,20,0,Reinforcement learning,"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,,,"Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise",,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,10,90,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,,,,,Often,,,,,,Often,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,75000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by non-profit or NGO,Python,Deep learning,Python,"Government website,University/Non-profit research group websites","Arxiv,College/University,Company internal community,Conferences,Friends network,Textbook,Trade book",Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,,,,,,Somewhat useful,Somewhat useful,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,0,50,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Government,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Perl,Python,SQL,Unix shell / awk,Other",,Sometimes,,,Sometimes,,,,Sometimes,,,,,Rarely,Most of the time,,Often,,,,,,,,,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,Sometimes,,,"CNNs,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Sometimes,,Most of the time,,,Most of the time,,,,,Sometimes,,Often,,Often,Most of the time,Sometimes,Sometimes,,Often,,Sometimes,,,Often,Most of the time,,,,,55,25,0,0,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,Often,,,Sometimes,,,,,,Often,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,wikipedia; pretrained models;,getting access to enough data to run experiments,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Most of the time,,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,35,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,Google Search,"Blogs,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,"Data Stories Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Taiwan,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Other,Other,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,College/University,Friends network,Stack Overflow Q&A,Other,Other",Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,Computer Scientist,Self-taught,65,0,35,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),Primary/elementary school,Academic,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,Don't know,<1MB,"Regression/Logistic Regression,Other","C/C++,Microsoft Excel Data Mining",,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,PCA and Dimensionality Reduction,Simulation",,,,,,,,,,,,,,Often,,,,,,,Often,,,,,,Sometimes,,,,,,,0,100,0,0,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Most of the time,Often,,,,,,Most of the time,,,,,Often,,10-25% of projects,More external than internal,IT Department,UCI Machine Learning Repository,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,,Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,"Business Analyst,Operations Research Practitioner",Self-taught,20,30,10,30,10,0,Computer Vision,Logistic Regression,High school,Internet-based,"10,000 or more employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Bayesian Techniques,Regression/Logistic Regression","NoSQL,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,GANs",Sometimes,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,20,20,30,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,22000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Netherlands,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,,Very useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,I never declared a major,,"Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,New Zealand,33,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Linear Digressions Podcast,Talking Machines Podcast",1-2 years,Unnecessary,Unnecessary,Nice to have,,Necessary,Nice to have,Nice to have,Unnecessary,,Nice to have,,,,"Coursera,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,No,Master's degree,Other,1 to 2 years,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Evolutionary Approaches,Neural Networks - GANs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Very Important +Female,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Python,Anomaly Detection,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Data Analyst,Researcher",Other,25,25,0,0,0,50,Other (please specify; separate by semi-colon),Logistic Regression,High school,Government,I don't know,,Don't know,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,,,Regression/Logistic Regression,"Python,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,Rarely,,,Often,,,Rarely,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,0,0,25,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Most of the time,,,,,,,,,,,Often,,,26-50% of projects,Entirely internal,Other,None,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Sometimes,62000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook",,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,,Very useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,65,0,0,5,0,Speech Recognition,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,United States,23,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",Work,10,50,25,15,0,0,Survival Analysis,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Technology,100 to 499 employees,Stayed the same,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Random Forests","MATLAB/Octave,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Decision Trees,Naive Bayes",,,Often,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,20,10,0,20,50,0,Enough to explain the algorithm to someone non-technical,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,Country's GDP; Board of Ed; Collegeboard,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Mercurial",Rarely,,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Switzerland,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FlowingData Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher",University courses,20,10,20,40,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,Often,,,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,,,Often,,Often,,,,,,,,,,,10,20,40,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,"80,000",CHF,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,32,"Not employed, but looking for work",,,,,,,,Mathematica,Neural Nets,Java,"GitHub,Google Search","College/University,Friends network,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,,,,,,,,Very useful,,,,Very useful,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Biology,3 to 5 years,I haven't started working yet,University courses,0,0,0,100,0,0,Reinforcement learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Italy,40,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Regression,R,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Kaggle competitions,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,Kaggle,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Neural Nets,Python,Google Search,"Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Increased significantly,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,100MB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,,Most of the time,,,,,Sometimes,,Sometimes,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,15,20,15,50,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,100% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,95000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Finland,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"Data Elixir Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,Work,30,15,30,5,20,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A master's degree,Academic,,,,,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Never,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,Often,,,Often,Most of the time,Sometimes,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,45,30,0,20,5,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"28,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Chile,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Other,Python,Google Search,"Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer",University courses,0,0,50,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,6-10 years,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Other",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,Most of the time,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Time Series Analysis",,,,,,Most of the time,,Rarely,Sometimes,,,Most of the time,,Rarely,,Rarely,,,Sometimes,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,30,20,10,5,10,25,Enough to explain the algorithm to someone non-technical,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Rarely,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,global economic indicators,inconsistent use of core aspects of the data model that the organization believes is trustworthy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,245000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by government",I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,50,0,30,0,20,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Neural Networks - CNNs",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Never,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,SVMs","Java,Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,Often,,Most of the time,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Often,Often,,,,,Sometimes,,,,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Privacy issues,Unavailability of/difficult access to data",Often,Sometimes,,Often,,,,,,,,,,,,,Often,,,,Most of the time,,51-75% of projects,Entirely internal,IT Department,Imagenet pre trained weights;Glove pre trained ,Availability of hardware to run,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,150000,BRL,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,"KDnuggets Blog,Linear Digressions Podcast,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important +Male,United States,60,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,DataRobot,Support Vector Machines (SVM),Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Conferences,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,,Very useful,"Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Work,30,30,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),Mathematica,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,Tableau,TensorFlow,TIBCO Spotfire,Other",Often,,,Rarely,,,,,Sometimes,,,,,,Sometimes,,Often,,Sometimes,Sometimes,Sometimes,,,Rarely,,,Often,,,,Sometimes,,Most of the time,,Sometimes,,,,,,Sometimes,,,,Rarely,Sometimes,Sometimes,,Often,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Naive Bayes,Random Forests,Text Analytics",Often,,Most of the time,,,Often,,Often,,,,,,,,,,Most of the time,,,,,Often,,,,,,Often,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Often,,,,,,Often,,,26-50% of projects,Approximately half internal and half external,Business Department,,Feature Engineering,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Subversion",,150000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,Software Developer/Software Engineer,Self-taught,50,0,0,0,50,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Rarely,100MB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow,Other,Other",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,Often,Most of the time,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,Most of the time,,Most of the time,Most of the time,,,,,Sometimes,,,,,,,Often,Most of the time,Sometimes,,Sometimes,,,,,,Often,Sometimes,,,,30,10,15,15,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,Sometimes,Most of the time,,,,,Sometimes,,,,,,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,60000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,GitHub,"Arxiv,Blogs,Personal Projects,Podcasts,Textbook",Somewhat useful,Very useful,,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,"Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Other,Self-taught,80,10,10,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A bachelor's degree,Insurance,20 to 99 employees,Increased slightly,Don't know,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,"Bayesian Techniques,Random Forests","Jupyter notebooks,Python,R,SAS Base,SQL,Stan",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,Most of the time,,,,Most of the time,Sometimes,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Prescriptive Modeling,Random Forests",,,Sometimes,,,Often,,,,,,,,,,Often,,,,,,Most of the time,Sometimes,,,,,,,,,,,10,50,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",,Often,,,,Often,,,Sometimes,,Most of the time,,,,,,,,,,,,Less than 10% of projects,Entirely external,Business Department,Health Insurance Claim Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Bitbucket,Sometimes,150000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Jupyter notebooks,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Not Useful,,,Not Useful,Not Useful,,Very useful,Not Useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,10,20,0,60,10,0,"Computer Vision,Time Series,Unsupervised Learning","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Manufacturing,,,,,Somewhat important,Other,Basic laptop (Macbook),Text data,Rarely,,Other,"Microsoft Excel Data Mining,Perl,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,Rarely,Most of the time,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,80,0,10,0,10,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,Less than 10% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,5,,,,,,,,,,,,,,,,,, +Male,United States,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Stan,Neural Nets,R,University/Non-profit research group websites,"Blogs,Official documentation,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,6 to 10 years,"Data Scientist,Researcher",Work,37.5,0,37.5,25,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Microsoft Azure Machine Learning,R",,Often,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,,,,Rarely,,,,Most of the time,Sometimes,,,,,,Often,,,,,Often,Most of the time,Most of the time,,,,25,10,5,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,Often,,,,,,,,,Sometimes,,,100% of projects,More internal than external,Standalone Team,NYState hospital,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,87500,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,SQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Other,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,40,0,10,30,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Nigeria,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Deep learning,R,GitHub,"Friends network,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,Very useful,,,,,Very useful,Very useful,,Very useful,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,Less than a year,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",50,30,10,0,0,10,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer",University courses,30,0,25,45,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Hospitality/Entertainment/Sports,"5,000 to 9,999 employees",Stayed the same,Don't know,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests",Sometimes,,,,,,Often,Sometimes,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,80,10,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,Often,,Sometimes,Sometimes,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Git,Mercurial",Sometimes,92600,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Colombia,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,GitHub,"Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,Not Useful,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Engineer,Programmer,Researcher",University courses,35,40,12,10,2,1,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,Often,Often,,,,,Sometimes,,,,Rarely,Sometimes,,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Most of the time,,,,Rarely,,Sometimes,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Other",,Sometimes,Sometimes,,Sometimes,,Often,,Often,,,,,Most of the time,,Often,,,,Sometimes,Sometimes,,Most of the time,Most of the time,,,,Most of the time,,,Sometimes,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues",,,,Sometimes,Often,,,,Rarely,,Often,,,,Often,,Most of the time,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,"cleansing, cleaning and understanding","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,25000000,COP,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,61,"Not employed, but looking for work",,,,,,,,R,Regression,R,Government website,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Spark / MLlib,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,Researcher,University courses,50,20,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Academic,500 to 999 employees,Increased slightly,Don't know,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,IBM SPSS Statistics,KNIME (free version),Python,R,RapidMiner (free version),SAS Enterprise Miner,Spark / MLlib,Tableau",,Sometimes,,,Rarely,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,,Often,,Most of the time,,Rarely,,,,Most of the time,,Rarely,,,,Most of the time,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Most of the time,,,Rarely,,Rarely,Most of the time,Most of the time,,,,,,,Often,Most of the time,,,Often,Often,Most of the time,,Often,,,Most of the time,,,,Often,,,,20,40,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,Sometimes,Sometimes,,,,Sometimes,,,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,26-50% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Sometimes,"250,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,54,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,24,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Anomaly Detection,R,I collect my own data (e.g. web-scraping),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,,Mathematics or statistics,1 to 2 years,,University courses,20,0,0,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,28,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,University/Non-profit research group websites,"Blogs,College/University,Friends network,Kaggle,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Necessary,,,,,"Basic laptop (Macbook),Workstation + Cloud service,Other",,Master's degree,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher,Statistician",University courses,40,10,0,40,0,10,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher",Self-taught,15,30,20,15,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,10 to 19 employees,Decreased significantly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,,Often,Most of the time,,,,,,,Most of the time,,,Most of the time,Often,Often,,Most of the time,Sometimes,Sometimes,,,Often,Most of the time,Often,,,,25,40,20,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization",Often,,,Often,Often,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,"twitter,facebook,youtube,instagram",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Git,Sometimes,95000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,France,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,KDnuggets Blog,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",40,10,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,United States,47,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Support Vector Machines (SVM),R,,"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,,Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,100GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SAS Base,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,Often,,,Often,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Simulation",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,70,5,15,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Often,,,,,,Often,,,Often,Often,Often,,,Often,,,Often,Often,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Italy,48,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,,,"Perl,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,50,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Never,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Spain,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,R,Bayesian Methods,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Data Scientist,Other",Self-taught,20,20,50,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,NoSQL,Python,SQL,Unix shell / awk,Other",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Most of the time,,,,,,Sometimes,Often,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,,,,,Often,Often,,,,,,Sometimes,,Often,,,,,Rarely,,Sometimes,,,Often,,Rarely,Sometimes,Often,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,Often,,Sometimes,,,,,,,,,,,,,,Sometimes,,,Often,100% of projects,More internal than external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,"Git,Other",Always,"110,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,0,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Other,"10,000 or more employees",Stayed the same,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",Often,Rarely,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,,,,,Often,,,,40,10,10,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",,Sometimes,,,Sometimes,,,,,,,,,,,,,,Most of the time,Often,Sometimes,Most of the time,76-99% of projects,More internal than external,Central Insights Team,Healthcare procedural data,Slow queries from Oracle database,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data,Share Drive/SharePoint",,Other,Sometimes,117000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,35,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,3-5 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,Taiwan,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,FastML Blog",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,No,Master's degree,Management information systems,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher",University courses,50,0,5,30,10,5,"Computer Vision,Reinforcement learning,Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by non-profit or NGO",TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,Somewhat useful,,,,Somewhat useful,Not Useful,Not Useful,,,,Very useful,"Data Elixir Newsletter,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",80,10,5,0,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,500 to 999 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Image data,Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Stan,Tableau,TensorFlow",Rarely,,,,,,,,,,,,,,,,Often,,,,,Rarely,Sometimes,,,,,,,,Often,,Often,,,,,Rarely,,,,Often,Rarely,,Rarely,Often,,,,,,"CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Simulation,Time Series Analysis",,,,Sometimes,,,Most of the time,Most of the time,,,,,,Most of the time,,Often,,,Often,Often,Often,Sometimes,Often,,Rarely,,Rarely,,,Sometimes,,,,80,10,1,5,4,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,Sometimes,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,76-99% of projects,More internal than external,Standalone Team,,old managers,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,143000,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,62,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Manufacturing,I prefer not to answer,Stayed the same,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data",Rarely,10TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Minitab,Perl,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,Sometimes,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,,Rarely,,,,20,15,5,20,40,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,image data sets ,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,85000,USD,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,I haven't started working yet,University courses,10,20,10,50,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,,,Most of the time,Often,,,,,,,Often,,Most of the time,,,,,Often,,,,,,,,Sometimes,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,120000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,Deep learning,Python,"Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Non-Kaggle online communities,Personal Projects,YouTube Videos",Very useful,,Very useful,,,,Somewhat useful,,Somewhat useful,,,Very useful,,,,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Business Analyst,University courses,15,0,0,80,5,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Academic,"10,000 or more employees",Increased significantly,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Other,Rarely,1MB,Bayesian Techniques,"Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Most of the time,,,,Often,,,,,,,,,,,,,,Rarely,,,,,,Often,,,Sometimes,,,,5,80,0,15,0,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,None,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,15000,GBP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Tableau,Regression,R,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Personal Projects,YouTube Videos",,,,,,,,,,,,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,25,25,50,0,0,0,Time Series,,High school,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,Decision Trees,"IBM Cognos,Microsoft Excel Data Mining,R,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Often,,Rarely,,,,,"Data Visualization,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,25,10,10,25,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,Often,,Often,,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,200000,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by government,TensorFlow,Social Network Analysis,Python,Google Search,"Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,A humanities discipline,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",0,80,10,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Not important +Female,Chile,26,Employed full-time,,,Yes,,Computer Scientist,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher",University courses,25,15,20,35,5,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",High school,Academic,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Most of the time,100MB,"Decision Trees,Evolutionary Approaches,Neural Networks","Python,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches",,,,,,Sometimes,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,,,,,,10,20,40,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Sometimes,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,None,Make the client understand the importance oficina clean data,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,"Bitbucket,Git",Rarely,"15,000,000",CLP,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,85,0,5,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,,,"Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Often,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,50,10,0,15,25,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,60000,CAD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Kaggle competitions,50,30,0,0,20,0,Time Series,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Other","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Amazon Web services,Cluster Analysis,Python,"GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,20,20,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,10GB,"CNNs,Ensemble Methods,HMMs,Neural Networks,Regression/Logistic Regression,Other","Microsoft SQL Server Data Mining,Python,RapidMiner (free version),Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Sometimes,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,,,Often,,Most of the time,Most of the time,Sometimes,Often,,,,Sometimes,Most of the time,,,,Sometimes,Most of the time,,Most of the time,,Often,Rarely,,Often,,Often,Most of the time,,,,,40,30,0,20,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team",,Sometimes,,,Most of the time,,,,,,,,,,,Often,,,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Microsoft Azure Machine Learning,Other,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Textbook,Trade book",,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,DBA/Database Engineer,University courses,10,10,40,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Government,"1,000 to 4,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,Most of the time,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",Often,,,,,Most of the time,Most of the time,,,,,Sometimes,,Sometimes,,Often,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,Sometimes,Sometimes,,,,50,10,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,Most of the time,,,Often,,,,,,,Often,,,,Most of the time,Often,,51-75% of projects,More external than internal,Standalone Team,"Census and other government data, such as NOAA",dirty data and a silo mentality ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,120000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,15,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Sometimes,100MB,"Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,Other",,Most of the time,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Rarely,,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,SVMs",,,,,,Sometimes,Rarely,Rarely,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,40,25,15,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Often,Sometimes,,,,,,,Often,,,,,,,Often,Rarely,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,117000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Stack Overflow Q&A,Textbook",,Very useful,,,Very useful,,,,,,,,,Somewhat useful,Somewhat useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,60,0,20,0,20,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Always,100GB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests","C/C++,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests",,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,30,40,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Often,,76-99% of projects,Entirely internal,Other,VirusTotal,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Internal flask app,Git,Most of the time,165000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,University/Non-profit research group websites,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",70,10,5,15,0,0,Natural Language Processing,"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Other,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Often,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,Sometimes,Sometimes,Often,,,,,50,10,10,20,10,0,Enough to tune the parameters properly,"Dirty data,Privacy issues",,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,100% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,86000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Newsletters,Online courses,Personal Projects",,,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Very useful,,,,,,,FlowingData Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,Time Series,Logistic Regression,A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Python,Support Vector Machines (SVM),Python,Google Search,Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",University courses,5,20,15,60,0,0,Unsupervised Learning,Logistic Regression,A bachelor's degree,Insurance,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,,1GB,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,50,0,0,30,20,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Often,26-50% of projects,More internal than external,Central Insights Team,"weather data, earthquake data",,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"70,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Japan,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Online courses,Textbook",Very useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,10MB,"CNNs,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow",,,,Most of the time,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Often,,Often,,,,,,,,,Sometimes,,,,Often,,,,,,"RNNs,Segmentation",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,10,50,10,10,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,Often,,,,Often,,,,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,No,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,61000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Julia,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Other",Very useful,Somewhat useful,,,Very useful,,Very useful,,,,,Very useful,,,,,,,"Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,40,30,20,0,10,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,100 to 499 employees,Increased slightly,Less than one year,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Text data,Relational data,Other",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Often,,Often,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Often,Often,Sometimes,Most of the time,Most of the time,Rarely,Sometimes,,Rarely,Sometimes,Rarely,Sometimes,,Sometimes,,,Often,Often,Sometimes,,Sometimes,Rarely,Often,Rarely,,,Often,Most of the time,,,,70,10,5,5,10,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,Often,,,Often,,,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,"TIMIT, OpenSLR, WSJ, Some I can remember.",It's unlabeled,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,AWS S3,Git,Always,185000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Tableau,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Very useful,,Very useful,Very useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,,,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Kaggle competitions,40,0,0,0,30,30,Survival Analysis,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,,Somewhat useful,,,,Very useful,,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,"Engineer,Researcher",University courses,20,50,0,20,0,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,,Very useful,,,,Not Useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Business Analyst,University courses,20,5,15,60,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10MB,,"Amazon Web services,Impala,Java,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk,Other,Other",,Rarely,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,Rarely,Most of the time,,,,,,Often,Most of the time,Sometimes,,"A/B Testing,Association Rules,CNNs,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,SVMs",Often,Often,,Sometimes,,,,Often,Most of the time,,,,,,Often,Rarely,Rarely,,,Often,Most of the time,,,,,,,Sometimes,,,,,,60,15,15,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,,,,Often,,,Sometimes,,,,,,,,,,Often,Often,,,10-25% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,63000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,25,15,0,60,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Text data",,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Evolutionary Approaches,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",,,Most of the time,Often,,Often,Often,,Sometimes,,,,Often,Often,,Often,,Often,,Most of the time,Most of the time,,,,Often,Often,Often,Sometimes,,Most of the time,,,,25,65,0,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Scaling data science solution up to full database",,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,10-25% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Mexico,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,"Government website,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,30,20,25,25,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,Fewer than 10 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10MB,"Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM Watson / Waton Analytics,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Often,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Rarely,,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,,,,,,,Sometimes,,Often,,,,Sometimes,Often,,Often,,,,,Sometimes,Most of the time,,,,,50,10,30,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,Often,,Often,51-75% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,264000,MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Survival Analysis,Scala,"Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Very useful,,,,,,,Very useful,Not Useful,Very useful,Very useful,,,,"DataTau News Aggregator,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",5-10 years,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Master's degree,Computer Science,,"Data Analyst,Data Scientist,Predictive Modeler",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important +Male,Other,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Text Mining,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Business Analyst,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,CRM/Marketing,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Never,10MB,,"IBM SPSS Statistics,Microsoft Excel Data Mining,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Often,Often,,,,30,0,0,50,20,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,Often,,,Most of the time,,,,Most of the time,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,15,15,10,35,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,SAS Enterprise Miner,SAS JMP",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,Sometimes,Sometimes,,,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs,Time Series Analysis",Often,,,,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,Sometimes,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,,,,,,,,,,,,,,,,,,,,More external than internal,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Cluster Analysis,Matlab,I collect my own data (e.g. web-scraping),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Not Useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,80,10,0,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,100MB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Spark / MLlib,SQL",,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Logistic Regression,Recommender Systems,Segmentation",,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,80,8,2,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,,,Sometimes,Often,,Often,,Often,Often,,Most of the time,Sometimes,,,Often,,,None,More external than internal,IT Department,,It's incredibly dirty.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,123000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Sweden,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Bayesian Methods,R,"Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Somewhat useful,,,,,,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Other,2 - 10 hours,Other,No,Bachelor's degree,Mathematics or statistics,,"Other,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Spain,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Link Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Friends network,Kaggle,Official documentation,Stack Overflow Q&A",Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,Researcher,Self-taught,50,25,0,15,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,Fewer than 10 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,,Sometimes,,,Most of the time,Most of the time,Often,Sometimes,Rarely,,Rarely,,Often,,Most of the time,,Sometimes,,Rarely,Often,,Often,,,Sometimes,Sometimes,,,Rarely,,,,50,10,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,Sometimes,Most of the time,,,,,,Often,,Sometimes,Sometimes,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,30000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Colombia,37,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search","Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Electrical Engineering,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Russia,24,"Not employed, but looking for work",,,,,,,,Java,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,65,0,25,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,,Very Important,Somewhat important,Somewhat important +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Work,20,0,20,60,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Often,,,,,,,Rarely,,Sometimes,,,,,,Often,,,,,,Often,,,,Rarely,,,,Sometimes,,Most of the time,,,,,Sometimes,Rarely,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Most of the time,Sometimes,Often,,Often,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,Sometimes,Sometimes,,,,30,20,15,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,,,,Often,,,,Most of the time,,,Often,,100% of projects,More external than internal,Other,Census;Weather;Consumer Buying Power;Competitive Marketing Spend;Location of stores;Gas prices,"It has missing data, incorrect data, and mis-coded data. Some of the data also has high multicollinearity, making it difficult to tease things apart in the models.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"120,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Colombia,36,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Data Analyst,Predictive Modeler,Statistician,Other",Self-taught,45,0,45,9,1,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Microsoft R Server (Formerly Revolution Analytics),Anomaly Detection,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Trade book",,Somewhat useful,,,Somewhat useful,,,Very useful,,Very useful,Very useful,,,Very useful,,Somewhat useful,,,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Analyst,Researcher",Work,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,20 to 99 employees,Increased slightly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics",Most of the time,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Most of the time,,,Often,,Most of the time,Most of the time,Sometimes,,,,Most of the time,,Often,,,,,50,10,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,Often,,,,,Sometimes,,,76-99% of projects,Entirely internal,Standalone Team,NA,Not having control or say over how the data is originally collected or stored.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"80,000",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,Other,"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Predictive Modeler,Researcher",Self-taught,80,19,0,0,1,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,"Bayesian Techniques,Ensemble Methods,Random Forests,SVMs","R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,"Ensemble Methods,Naive Bayes,Random Forests,SVMs",,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,Often,,,,,,,,Sometimes,,,,,,Often,,,26-50% of projects,Approximately half internal and half external,Other,chEMBL,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Italy,26,"Not employed, but looking for work",,,,,,,,Java,Text Mining,Java,"Google Search,University/Non-profit research group websites","Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,0,0,0,90,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important +Male,Other,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,Less than a year,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,CRM/Marketing,,,,,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,Sometimes,Often,Often,,,,,,,,Often,,,,,,,Often,,,Most of the time,,,Rarely,,,,,40,10,10,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Often,,,,,,,Often,,,,,,,Often,,,,,,,51-75% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,Sometimes,2400000,KZT,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,40,10,0,50,0,0,,Decision Trees - Random Forests,A bachelor's degree,Academic,I don't know,,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Traditional Workstation",Image data,Never,,"Ensemble Methods,Random Forests","Amazon Web services,C/C++,Java,SQL",,Sometimes,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Random Forests",,,,,,Sometimes,,Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,30,30,35,0,5,0,Enough to tune the parameters properly,"Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,,,,Most of the time,,None,Do not know,,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,,Sometimes,"40,000",USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,Other,Self-taught,50,10,10,0,30,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Internet-based,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,,,,,45,30,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,,,Sometimes,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,50000,GBP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Canada,19,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,3 to 5 years,I haven't started working yet,Self-taught,50,50,0,0,0,0,Time Series,Logistic Regression,A master's degree,Academic,10 to 19 employees,Decreased significantly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,1MB,Regression/Logistic Regression,"C/C++,IBM SPSS Statistics,Java,MATLAB/Octave,Python,R,SAS Base,SQL",,,,Rarely,,,,,,,,Most of the time,,,Often,,,,,,Rarely,,,,,,,,,,Often,,Often,,,,,Sometimes,,,,Rarely,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,25,50,0,15,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Often,,Often,Most of the time,,,Sometimes,,,,Most of the time,,,Often,,,,76-99% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,0,CAD,I am not currently employed,3,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer,Researcher",Self-taught,30,10,20,30,10,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Female,Taiwan,18,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,Very useful,"Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,"Researcher,Software Developer/Software Engineer",Self-taught,60,0,40,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Rarely,100GB,"Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,Python",,,,,,,,,Most of the time,,,,,,Sometimes,,Most of the time,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression,Simulation,Time Series Analysis",,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,Most of the time,,,,,,Often,,Sometimes,Most of the time,Most of the time,,100% of projects,More internal than external,IT Department,,Security concerns surround PII data and legal considerations surround certain data sets.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,"74,000",,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by government,Python,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Management information systems,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,"Bayesian Techniques,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,Other,33,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,C/C++/C#,Google Search,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,,1-2 years,,,Necessary,,Necessary,Necessary,Necessary,,,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Other,University courses,5,0,5,90,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Female,United States,41,Employed full-time,,,No,Yes,Other,Fine,Employed by company that makes advanced analytic software,Microsoft R Server (Formerly Revolution Analytics),Deep learning,SQL,Government website,"Company internal community,Online courses",,,,Not Useful,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important +Female,United States,20,"Not employed, but looking for work",,,,,,,,R,Deep learning,Java,Government website,"College/University,Non-Kaggle online communities",,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,Self-taught,100,0,0,0,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Female,Russia,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,,,,,,,,,,,,,,edX,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Computer Scientist,Data Analyst,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Computer Vision,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Colombia,41,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Genetic & Evolutionary Algorithms,Matlab,GitHub,"College/University,Company internal community,Non-Kaggle online communities,Personal Projects,Trade book,Tutoring/mentoring",,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Online Courses and Certifications,No,Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,38,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,Brazil,36,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,DataRobot,Cluster Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Friends network,Personal Projects,YouTube Videos",,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,80,10,5,0,0,5,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Markov Logic Networks,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Markov Logic Networks,SVMs","Amazon Machine Learning,Java,Jupyter notebooks,Python,SQL,Other",Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Often,,,"A/B Testing,Collaborative Filtering,Natural Language Processing,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,Often,Often,Most of the time,,,,80,5,5,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database",Most of the time,Often,,,,Often,,,Most of the time,,,,,,,,,Most of the time,,,,,None,Entirely internal,IT Department,,"Size, computer power. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"aws s3, ftp","Bitbucket,Git",Rarely,"100,000",BRL,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Amazon Web services,Neural Nets,Python,University/Non-profit research group websites,"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Management information systems,I don't write code to analyze data,"Computer Scientist,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,5,0,5,90,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important +Male,United States,34,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,20,0,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Germany,57,Employed full-time,,,Yes,,Researcher,Poorly,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Online courses",Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Researcher,Other",Self-taught,10,5,80,5,0,0,Natural Language Processing,"Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",High school,Government,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Most of the time,100GB,"Neural Networks,Other","C/C++,Java,Jupyter notebooks,NoSQL,Perl,Python,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,Rarely,,,Often,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,,10,30,10,0,30,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Limitations of tools,Need to coordinate with IT,Other",Most of the time,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,Most of the time,Less than 10% of projects,More internal than external,IT Department,parallel corpora such as http://opus.lingfil.uu.se/; ,"We are expected to cover 24 EU languages and some more, but parallel text corpora (beyond our own datasets) are typically not available or not big enough for many of these languages","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,shared disks in a LAN,"Git,Subversion",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Cloudera,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Python,R,SQL",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Often,,,Most of the time,Often,Often,,Sometimes,Often,,Most of the time,,Most of the time,,Sometimes,Most of the time,Most of the time,,,Most of the time,,Often,Most of the time,,Often,Most of the time,Sometimes,,,,55,25,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,Often,Most of the time,,Less than 10% of projects,More external than internal,Standalone Team,,text mining,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Rarely,360000,CNY,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Trade book",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,,Self-taught,60,10,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Military/Security,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Rarely,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,Sometimes,,Rarely,Sometimes,,Sometimes,,,,Often,,,Most of the time,,,,25,20,5,30,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,55000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Self-employed,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Most of the time,10GB,"Ensemble Methods,Regression/Logistic Regression,Other","C/C++,Java,MATLAB/Octave,Microsoft Excel Data Mining,Python",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Often,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Recommender Systems,Simulation,SVMs,Time Series Analysis",,,Most of the time,,,Often,Most of the time,Often,Often,,,Often,,Most of the time,,Most of the time,,,,,,,,Often,,,Most of the time,Sometimes,,Most of the time,,,,20,40,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,Most of the time,,,Most of the time,Sometimes,Often,Most of the time,,Most of the time,,,Sometimes,,,Often,Often,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,United States,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Self-employed",Jupyter notebooks,Genetic & Evolutionary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Stories Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Operations Research Practitioner,Predictive Modeler,Researcher",Work,20,0,80,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Always,1TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,Perl,Python,R,SAS Base,SQL,Tableau,Unix shell / awk",,Sometimes,,Often,,,,,Sometimes,Often,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,Most of the time,,Most of the time,Often,,,,Often,Often,,Often,,,,,Often,,,,Often,,,Sometimes,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,Often,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,Often,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,,40,30,5,5,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Inability to integrate findings into organization's decision-making process,Privacy issues",,,,,,,,Sometimes,,,,,,,,,Often,,,,,,10-25% of projects,More internal than external,Business Department,Assessment data,Bias,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,"140,000",USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,40,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,6-10 years,A tech-specific job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,10GB,"CNNs,SVMs","Amazon Web services,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Often,,,,,,,,,,,,Often,,,Often,Often,Rarely,,,,,,,Sometimes,,,,,,5,5,5,10,0,75,Enough to explain the algorithm to someone non-technical,"Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Often,,,,,,Often,,,10-25% of projects,Entirely internal,IT Department,ImageNet,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,"122,000",CAD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,49,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by government,Amazon Web services,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,No Free Hunch Blog,5-10 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Other,No,Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,0,0,30,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Iran,27,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,Textbook,,,,,,,,,,,,,,,Very useful,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,I don't write code to analyze data,Engineer,Self-taught,50,50,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,Other,Link Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,5,0,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Often,,,,,,,Often,,,,,,,,Sometimes,,,,,,,Sometimes,,,Often,,,,Often,,Often,,,,,,,,Often,Often,,,Often,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,,Sometimes,,,Often,,Often,,,,,,,,,Often,,,,,,Often,Often,,,,60,15,5,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Often,,Most of the time,Often,,,Often,,,,,Often,,,Most of the time,,,Most of the time,,,10-25% of projects,More external than internal,IT Department,,"Data Privacy Limitations, Data cleanliness","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Git,Subversion",Always,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,"Researcher,Statistician,Other",University courses,10,10,20,60,0,0,"Recommendation Engines,Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,R,SAS Base,SQL",,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",Sometimes,,,,,Most of the time,Most of the time,Often,Often,,,Sometimes,,,Often,Most of the time,,,,,Sometimes,,Sometimes,,,,Often,,Most of the time,,,,,50,20,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,Sometimes,,,,,,Often,,Often,Often,,,51-75% of projects,Approximately half internal and half external,Business Department,"data purchased from merchants, social media, etc.",Combining data from different databases that represent the same information but have been coded differently,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,94600,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,20,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Google Search,Government website","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,"Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,"Recommendation Engines,Time Series","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A professional degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Sometimes,1TB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,Often,,,Often,,,,Often,,,,15,45,15,20,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,Often,,,,Often,,51-75% of projects,Approximately half internal and half external,IT Department,"kaggle, udacity, public EPA or gov data",Privacy issues,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,"70,000",USD,Other,9,,,,,,,,,,,,,,,,,, +Male,Other,46,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Time Series Analysis,Matlab,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Podcasts,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,,,Very useful,,Very useful,,,Very useful,"Data Machina Newsletter,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,20,40,20,5,15,"Time Series,Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,500 to 999 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Most of the time,100MB,"GANs,Neural Networks,RNNs,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,Minitab,SAS JMP",,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,"CNNs,GANs,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Often,,,,,,,Often,,,,,,,,,Most of the time,Often,,,,Often,,,Often,,,,,,10,65,5,5,15,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,Sometimes,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Subversion,Sometimes,15000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",SAS Enterprise Miner,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,"Siraj Raval YouTube Channel,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Engineer,Work,25,0,50,25,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,1TB,"Decision Trees,Neural Networks","Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,Often,,,Sometimes,,,,,,,,,,"Decision Trees,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,,,,,,Often,,,,,,,,,,,,Often,Often,,,Often,,,,,,,,,,35,10,10,35,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,Often,,,,,Often,Rarely,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Subversion",Never,98000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Julia,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites,Other","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,,University courses,30,5,20,45,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Other","Amazon Web services,Microsoft R Server (Formerly Revolution Analytics),Perl,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Rarely,,,Most of the time,,,,,,,,,Rarely,,,Sometimes,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,,,,Most of the time,,,Sometimes,,Most of the time,,Rarely,,,,Most of the time,,,Most of the time,,,,1,6,0,3,10,80,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Need to coordinate with IT,Privacy issues",Sometimes,,Often,,Often,,,,,,,,,,Often,,Sometimes,,,,,,100% of projects,More external than internal,Standalone Team,"IBM/Truven Marketscan, Optum Humedica, TCGA",Data processing,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,210000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,Other,Python,"GitHub,University/Non-profit research group websites","Arxiv,Conferences,Personal Projects",Very useful,,,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,80,0,20,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Military/Security,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Video data",Most of the time,1GB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Python,Other,Other,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Often,Rarely,Often,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs,Time Series Analysis",,,Sometimes,Often,,Often,Most of the time,,Often,,,Sometimes,,Sometimes,,Sometimes,,,,Sometimes,Most of the time,,,,,Most of the time,Most of the time,Sometimes,,Often,,,,30,20,30,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Scaling data science solution up to full database",Often,,,,Most of the time,,,,Often,,,,Often,,,,Often,Sometimes,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,250000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Greece,26,"Not employed, but looking for work",,,,,,,,R,Genetic & Evolutionary Algorithms,Python,"Google Search,University/Non-profit research group websites","Friends network,Official documentation,Online courses",,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,,"FastML Blog,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX",,0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,10,0,0,0,70,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,25,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,R,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle",,,,,Somewhat useful,,Very useful,,,,,,,,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,1 to 2 years,Data Scientist,University courses,20,5,70,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",,Internet-based,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,10GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Perl,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Often,,,,,,,,,Often,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Naive Bayes",Sometimes,Sometimes,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,55,10,0,15,20,0,Enough to tune the parameters properly,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,,,,,,,,Often,,Sometimes,,,,26-50% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,42000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",University courses,0,50,0,0,50,0,,Neural Networks - CNNs,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,SQL,Bayesian Methods,R,I collect my own data (e.g. web-scraping),"Blogs,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,Less than a year,,Work,25,0,25,50,0,0,,,"Some college/university study, no bachelor's degree",,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,,,,"C/C++,R",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,70,0,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools",Sometimes,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,,,,,Never,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,United States,45,Employed part-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Other,Anomaly Detection,R,Google Search,"Blogs,Conferences,Kaggle,Newsletters,Textbook",,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,More than 10 years,"Researcher,Statistician",Self-taught,50,10,0,0,0,40,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,R,SAS Base,SAS JMP,Stan",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,Rarely,,,Rarely,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,Often,Sometimes,,,Often,Most of the time,Often,Often,,,,,Most of the time,,Most of the time,,Sometimes,,,Most of the time,,Most of the time,,,,Sometimes,Sometimes,,Often,,,,40,20,5,20,10,5,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,Often,,,Most of the time,,Most of the time,,,Sometimes,Most of the time,,Most of the time,,,Often,,,100% of projects,Approximately half internal and half external,Standalone Team,Human Microbiome Project (HMP); short-read database; IMMPORT,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,"80,000",USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by college or university,R,Neural Nets,Other,I collect my own data (e.g. web-scraping),"College/University,Friends network,Kaggle,Online courses,Other",,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,,Nice to have,Necessary,,,,edX,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,A social science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,80,10,0,0,10,Other (please specify; separate by semi-colon),,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Association Rules,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,"Data Stories Podcast,Linear Digressions Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Work,20,15,50,0,5,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,Often,Most of the time,,,,,Most of the time,Sometimes,,,Often,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Text Analytics",Sometimes,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,60,15,10,5,10,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",,,,Sometimes,,,,,Often,,Sometimes,,Often,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,70000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,Other,22,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Python,Neural Nets,Python,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Other,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Female,Brazil,51,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Social Network Analysis,Julia,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Official documentation,Personal Projects,Textbook",,,Very useful,,Very useful,,,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,"FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Researcher,Statistician",Self-taught,90,0,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,RNNs,SVMs,Other","C/C++,Java,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Orange,Python,R,RapidMiner (free version),SQL,TensorFlow",,,,Often,,,,,,,,,,,Often,,,,,,Most of the time,,Most of the time,,Often,,,,Often,,Most of the time,,Most of the time,,Often,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,20,50,10,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,,,,,,Most of the time,,Often,,,Often,Often,,100% of projects,More internal than external,Other,public datasets: economic index,Relate bases that were not previously prepared,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,30330,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,R,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,30,10,30,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,"5,000 to 9,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,Often,Often,,,,Often,Often,,,,,,Often,,Often,,Often,Often,,Often,,Often,,,,,Often,Often,,,,,25,40,15,10,10,0,Enough to tune the parameters properly,"Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Most of the time,,,,Often,,Often,,,Often,,,100% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Never,55000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,Self-taught,80,0,10,10,0,0,"Survival Analysis,Time Series","Evolutionary Approaches,Logistic Regression",High school,Academic,I don't know,Increased slightly,Don't know,Some other way,Very important,Other,Traditional Workstation,Text data,,10GB,"Evolutionary Approaches,Regression/Logistic Regression","C/C++,Mathematica,Perl,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,Often,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Data Visualization,Evolutionary Approaches,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,Often,,,,,,Sometimes,,,,,Sometimes,,,,,,Most of the time,,,Sometimes,,,,40,10,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Often,,,,Sometimes,,Often,,,,,,,,,,,,100% of projects,Do not know,Other,None,Visualizing the data and filtering out data that is not useful.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,26000,CAD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,"I collect my own data (e.g. web-scraping),Other","Arxiv,Blogs,Non-Kaggle online communities,Personal Projects",Somewhat useful,Very useful,,,,,,,Very useful,,,Somewhat useful,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,Engineer,Other,40,0,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,Somewhat important,Other,Other,Other,Most of the time,10TB,"Decision Trees,Ensemble Methods","C/C++,Cloudera,Impala,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,Spark / MLlib,Unix shell / awk,Other",,,,Sometimes,Often,,,,,,,,,Rarely,,,Often,,,Rarely,Rarely,,Rarely,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,Sometimes,Most of the time,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests,Time Series Analysis",,Sometimes,,,,Most of the time,Sometimes,Often,,,,,,,,,,,,,,Often,Most of the time,,,,,,,Sometimes,,,,50,10,30,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Often,,,,,,,Sometimes,,,,,,,Most of the time,Often,Most of the time,Most of the time,,Less than 10% of projects,More internal than external,Business Department,various open source threat intelligence feeds,that labels are largely unreliable and class imbalance is hugely out of proportion,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,120000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,Very useful,,"Data Elixir Newsletter,Data Machina Newsletter,DataTau News Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,1 to 2 years,,University courses,10,0,25,65,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Rarely,,Often,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,Sometimes,Most of the time,Most of the time,Sometimes,Most of the time,,,Most of the time,,Sometimes,,Often,,Sometimes,Often,Rarely,Often,,Most of the time,Rarely,,Often,,Sometimes,Often,Sometimes,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Rarely,,Often,Often,,,,Sometimes,,,Rarely,Rarely,Sometimes,,,,,,Rarely,,,76-99% of projects,Entirely internal,Other,Too many to list,Consistency among data sources,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",S3,Git,Rarely,95000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,SQL,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",10,30,60,0,0,0,,,A professional degree,Mix of fields,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,R,SQL",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",Often,,,,,Often,Most of the time,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Often,,,,,30,10,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,90000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Argentina,39,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",30,30,15,25,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,"10,000 or more employees",Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",,,"Decision Trees,Gradient Boosted Machines,Random Forests","Amazon Web services,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Rarely,Often,,,,,Most of the time,Most of the time,,,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Most of the time,,,,Sometimes,Most of the time,Often,Often,Sometimes,,,Sometimes,Sometimes,Often,Often,Most of the time,,,Often,,Sometimes,,Often,Often,,Most of the time,,,,,,,,70,15,3,2,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,Sometimes,Sometimes,Most of the time,Most of the time,,Most of the time,Sometimes,,,,,Most of the time,Most of the time,,,,,Sometimes,Rarely,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,730000,ARS,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,Yes,Statistician,Perfectly,Employed by government,SQL,Deep learning,Python,,"Online courses,Other",,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,,Necessary,,Necessary,Necessary,Necessary,,,Necessary,Necessary,,,DataCamp,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Time Series,Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,Other,"College/University,Friends network,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,Other,Self-taught,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",,A doctoral degree,Pharmaceutical,20 to 99 employees,Increased slightly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Image data,Video data,Relational data",Most of the time,1GB,,"Amazon Web services,MATLAB/Octave,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Often,,,,,,,Often,,,,,,,,,Most of the time,,,,25,30,20,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Other",,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,100% of projects,Do not know,Other,,Dimensionality Reduction,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Git,Rarely,120000,SZL,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,RapidMiner (commercial version),Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Somewhat useful,,,,,,,"DataTau News Aggregator,No Free Hunch Blog,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,"Information technology, networking, or system administration",,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Female,United States,42,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Text Mining,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,"DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher,Statistician",University courses,5,10,0,80,5,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,16-20,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important +Male,United States,57,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,NoSQL,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",,Somewhat useful,,,,Very useful,,,,,Very useful,Very useful,,Very useful,,Very useful,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",30,30,20,0,0,20,,,A master's degree,Academic,100 to 499 employees,Stayed the same,1-2 years,A tech-specific job board,Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,,,Often,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Sometimes,,,,Often,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,114000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Podcasts,YouTube Videos",,,,,,,Very useful,,,,,,Somewhat useful,,,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,20,0,10,0,Natural Language Processing,"Logistic Regression,Neural Networks - RNNs",,Pharmaceutical,"10,000 or more employees",Stayed the same,Don't know,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,,Regression/Logistic Regression,"Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,50,0,50,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other,Other,Other",Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"DataTau News Aggregator,FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Other",University courses,20,20,25,30,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Text data",Sometimes,100GB,"CNNs,Ensemble Methods,Neural Networks","Amazon Web services,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,Sometimes,,,,,,Rarely,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,Most of the time,,,,Sometimes,,Most of the time,Sometimes,,,"CNNs,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,Most of the time,,,Most of the time,,Often,,,,,Rarely,,Often,,,Most of the time,Sometimes,Sometimes,,,,,,,,Often,,,,,45,10,15,10,10,10,"Enough to code it again from scratch, albeit it may run slowly","The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,10-25% of projects,Entirely internal,Standalone Team,DDSM data (can be used for commercial products),Cleaning medical text reports.,Other,Share Drive/SharePoint,,Git,Always,110000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed part-time,,,Yes,,Researcher,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",Julia,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,30,40,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,I don't know,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Other,Sometimes,100GB,"CNNs,Ensemble Methods,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Julia,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,,Often,Most of the time,,,Most of the time,,Often,,,,Most of the time,Sometimes,,Often,,,,Most of the time,Often,,,,Sometimes,Most of the time,,Often,,Most of the time,,,,15,20,50,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,Other,Company Developed Platform,,"Bitbucket,Git",Most of the time,12000,EUR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,South Africa,24,Employed part-time,,,Yes,,Data Analyst,,,Google Cloud Compute,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Statistician",Self-taught,50,5,15,10,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Don't know,<1MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,R,SAS Base,TensorFlow",,,,,,,,,,,,,Often,,,,Often,,Sometimes,,,,Most of the time,,,,,,,,Often,,Most of the time,,,,,Rarely,,,,,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Segmentation,Simulation,Time Series Analysis",Most of the time,Often,Often,,,,Most of the time,,,,,,,Sometimes,Often,Often,,Sometimes,,,,,,,,Most of the time,Most of the time,,,Often,,,,45,15,10,10,20,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,Often,,,Often,Often,Often,,Often,,Often,Often,,,,,Often,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,200000,ZAR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by government,TensorFlow,Neural Nets,Scala,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,1-2 years,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",0,10,5,85,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Evolutionary Approaches,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important +Male,Brazil,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,Google Cloud Compute,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",University courses,0,25,25,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Neural Networks,Other","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Markov Logic Networks,Neural Networks,Recommender Systems,Simulation,Time Series Analysis",Sometimes,,Rarely,,,Rarely,Sometimes,,,,,,,,,Rarely,Rarely,,,Sometimes,,,,Sometimes,,,Often,,,Often,,,,10,50,10,20,10,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,Often,,,,,,Rarely,,,Often,Most of the time,,Rarely,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,,,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,Not Useful,"Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,,3 to 5 years,Researcher,University courses,50,20,0,20,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data,Relational data",Rarely,1GB,"CNNs,Neural Networks,RNNs","Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,RNNs",,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,Often,,,,Most of the time,,,,,Rarely,,,,,,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,Often,,,Most of the time,,,,,,,Most of the time,,,,Sometimes,,,100% of projects,More internal than external,Central Insights Team,MNIST; Kaggle,Collecting enough input,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"35,000",USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Argentina,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Google Search,"Blogs,College/University,Personal Projects,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,25,25,50,0,0,0,"Computer Vision,Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - GANs",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,100MB,"Decision Trees,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Amazon Web services,Java,Python",,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Evolutionary Approaches,Neural Networks,Time Series Analysis",,Often,,,,,Often,Often,,Often,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Often,,Sometimes,,,,,,,,Often,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,500000,ARS,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,43,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Time Series Analysis,Python,"Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Other",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,Physics,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",10,70,5,0,15,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,Germany,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Physics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,1-2 years,,Nice to have,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,0,10,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,,,,,, +Male,United States,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,Very useful,,Very useful,,Very useful,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Partially Derivative Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,,,,Other,,0 - 1 hour,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,Canada,24,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,R,Deep learning,Python,University/Non-profit research group websites,"College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,,,"Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",Text data,Always,,,"Java,Jupyter notebooks,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,15,45,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,Sometimes,,,,,,,,Often,,Most of the time,,Sometimes,,,,,100% of projects,Do not know,Other,,The acquisition needs to be standardize in order to be collect by different clinician but use by one team of researchers.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,,CAD,I am not currently employed,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,India,23,Employed part-time,,,No,Yes,Data Analyst,Perfectly,Employed by college or university,IBM SPSS Statistics,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,20,20,10,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,22,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,R,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,25,10,0,65,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Academic,500 to 999 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM Cognos,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SQL,Tableau",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,Most of the time,,,,,,,,,Often,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests",,,,,,Sometimes,Most of the time,Often,,,,,,Sometimes,,Sometimes,,,,,,,Often,,,,,,,,,,,50,8,2,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,Most of the time,,,,,,Often,,,Often,Sometimes,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Rarely,"40,000",USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics",Sometimes,,Sometimes,Sometimes,Often,Most of the time,Most of the time,,Most of the time,,,Most of the time,,Often,,Most of the time,,Sometimes,Often,Often,,,Most of the time,Most of the time,,,,,Often,,,,,30,25,10,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Privacy issues,Scaling data science solution up to full database,Other",,,,Sometimes,,,,,,,,,,,,,Sometimes,Often,,,,Most of the time,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,"170,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,39,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Python,Bayesian Methods,R,Government website,"Blogs,Conferences,Newsletters,Official documentation,Online courses,Textbook",,Not Useful,,,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,Other,University courses,5,10,10,75,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,10 to 19 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",,Often,,,,Sometimes,Most of the time,Sometimes,,,,,,Often,,Often,,,,,,Most of the time,,,,Most of the time,,,,Often,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process",Often,Sometimes,,,Most of the time,Often,,Sometimes,,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,"census data, survey data","surprisingly, most companies (including Fortune 100 ones) do not have a ""header file"" or some sort of guide for their databases; we don't even know what data is available unless we dig deep ourselves (I'm in consulting and I am talking about client data)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,210000,CAD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Romania,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Scala,Google Search,"College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,25,5,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,A bachelor's degree,Financial,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,10GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Java,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,80,10,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Sometimes,Often,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Never,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,33,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","College/University,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,Very useful,,,,Somewhat useful,,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Other,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,65,0,0,10,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Female,South Africa,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by government,NoSQL,Rule Induction,Python,University/Non-profit research group websites,"College/University,Online courses,Personal Projects,Textbook",,,Very useful,,,,,,,,Very useful,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Programmer",University courses,50,0,0,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",A bachelor's degree,Government,500 to 999 employees,Decreased slightly,1-2 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,Rarely,,,,,Rarely,,,,,,Sometimes,,,,,,,,Often,,Often,,Sometimes,,,,Rarely,Sometimes,,,,,Most of the time,,,,,Most of the time,,,Rarely,,,,,,,"Association Rules,Cross-Validation,kNN and Other Clustering,Recommender Systems",,Most of the time,,,,Most of the time,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,35,25,5,30,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,Sometimes,Most of the time,,,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,Most of the time,Most of the time,,10-25% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Subversion,Sometimes,30000,ZAR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Argentina,37,Employed full-time,,,Yes,,Statistician,Poorly,Employed by government,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Statistician,Self-taught,40,10,0,20,30,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Python,SQL,Tableau,TensorFlow",,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Simulation,Text Analytics",,,Sometimes,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,Often,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,,,,,40,10,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",Sometimes,Often,Sometimes,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,,,,,76-99% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Never,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",15,75,10,0,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Not important +Male,Brazil,36,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,C/C++,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Trade book,YouTube Videos",Very useful,,,,Very useful,,Very useful,,,Very useful,,,,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,35,5,20,30,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - RNNs",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",,1GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Hadoop/Hive/Pig,Java,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,,Most of the time,Often,Often,Often,Often,,Often,,Often,,Often,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,Sometimes,Most of the time,Most of the time,,,,50,25,0,25,0,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Other,public datasets,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,"30,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,5,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Telecommunications,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,100MB,"CNNs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,,Sometimes,,Most of the time,Often,,,,,Most of the time,,,,Most of the time,,,Often,Sometimes,Rarely,,Often,,Sometimes,,,,Most of the time,,,,,60,29,5,5,1,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,Most of the time,,,Sometimes,,,,,,Sometimes,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,clients data,poor text data to work caused by problems in transcription of low quality audios,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Other",git,Git,Sometimes,150000,BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,67,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,48,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",Not Useful,,Very useful,,,,Very useful,,Very useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,10,10,0,0,Recommendation Engines,"Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",High school,Academic,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,<1MB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,DataRobot,Java,R",,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Decision Trees,kNN and Other Clustering,Recommender Systems",,Often,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,60,30,5,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Privacy issues",,Often,,,,,,,Often,,,,,,,,Often,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,kaggle,learn new technologies,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",,6000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Netherlands,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,NoSQL,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,20,0,40,30,10,0,Time Series,Evolutionary Approaches,High school,Academic,I don't know,Increased slightly,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,Evolutionary Approaches,"C/C++,Jupyter notebooks,Mathematica,Python",,,,Often,,,,,,,,,,,,,Most of the time,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Naive Bayes,Neural Networks,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,Often,,Often,,,Often,,,,,,,Most of the time,,,,10,10,30,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Other",Dropbox,"Bitbucket,Git,Other",Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,95,0,0,0,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Other,Other,Text data,Never,,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,0,0,0,2,97,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,None,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Always,105000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,"Not employed, but looking for work",,,,,,,,Minitab,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Conferences,Friends network,Personal Projects,Tutoring/mentoring",,,Very useful,,Very useful,Very useful,,,,,,Very useful,,,,,Very useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",University courses,30,10,30,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Miner,Fine,Employed by company that makes advanced analytic software,IBM SPSS Modeler,"Ensemble Methods (e.g. boosting, bagging)",SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Very useful,Not Useful,,,Very useful,Very useful,Very useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer",Self-taught,40,0,40,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",Rarely,,,,,,,,,Often,Most of the time,Most of the time,Sometimes,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,Often,Most of the time,,,Rarely,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,Sometimes,Often,,,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,Often,Sometimes,Sometimes,Often,Most of the time,Most of the time,,,Most of the time,Rarely,Sometimes,Most of the time,Most of the time,,,,70,15,5,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,Sometimes,,,Rarely,,,,Sometimes,,,,,,,Most of the time,,,100% of projects,Entirely internal,Business Department,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Always,220000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Python,Bayesian Methods,R,"Google Search,Government website","College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,,University courses,10,10,0,80,0,0,,,"Some college/university study, no bachelor's degree",Non-profit,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,<1MB,,"Perl,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Most of the time,,,,,,,,,Rarely,,,Sometimes,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,25,10,0,35,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,Often,,Most of the time,Rarely,,,,,Most of the time,,Often,,,,Most of the time,,,26-50% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,0,USD,,6,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Podcasts,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,6 to 10 years,Other,University courses,60,0,0,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Don't know,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","QlikView,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,Often,,,,Most of the time,,,Sometimes,,,,,,,"Association Rules,Random Forests,Segmentation",,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,60,15,0,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,51-75% of projects,More internal than external,Standalone Team,,siloed knowledge of disparate datasets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,74000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,No,Yes,Other,Perfectly,Employed by college or university,SQL,Text Mining,R,Government website,"College/University,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,Not Useful,,,Somewhat useful,,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Psychology,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,Other,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Manufacturing,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,Regression/Logistic Regression,"Python,R,SAP BusinessObjects Predictive Analytics,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,Often,,,,,,,,,Rarely,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,35,30,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,Often,,,,100% of projects,More internal than external,IT Department,,small datasets; dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Git,Subversion",Rarely,18000,GTQ,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by non-profit or NGO,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",5,15,25,50,5,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,"1,000 to 4,999 employees",Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Other",Sometimes,1TB,"CNNs,Decision Trees,GANs,Neural Networks,Random Forests,RNNs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,Rarely,,Most of the time,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Most of the time,,Often,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,Often,,Most of the time,,Sometimes,,,,,Most of the time,,,,75,20,2.5,2.5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,Sometimes,,,,Most of the time,,,,,Often,,,Sometimes,,,,Often,,Less than 10% of projects,More internal than external,Standalone Team,None,Data was never used for analytics before. Cleaning takes extensive amount of time.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Other,Rarely,"70,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Online courses,Stack Overflow Q&A",,Very useful,,,,,,,,,Somewhat useful,,,Very useful,,,,,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,20,0,0,0,80,Other (please specify; separate by semi-colon),Logistic Regression,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,Indonesia,21,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Factor Analysis,Other,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,R Bloggers Blog Aggregator,< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Management information systems,3 to 5 years,Business Analyst,Self-taught,40,10,30,10,10,0,"Survival Analysis,Time Series",Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,R,Deep learning,Matlab,University/Non-profit research group websites,"Arxiv,YouTube Videos",Somewhat useful,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,6 to 10 years,Researcher,University courses,60,0,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Support Vector Machines (SVMs),A bachelor's degree,Academic,20 to 99 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Never,100MB,SVMs,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,Often,,Sometimes,,,,50,40,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Often,,,,,,,,,,Often,,,,None,More internal than external,Standalone Team,,,,,,,Rarely,"49,200",BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,41,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,Data Analyst,University courses,65,0,30,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SAS Base,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Often,,,,Often,Often,,Often,,,Sometimes,,,Sometimes,Sometimes,,,,40,20,5,5,30,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Explaining data science to others,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,Often,,Often,,76-99% of projects,Approximately half internal and half external,Central Insights Team,NA,"Understanding it, because there are so many different sources",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,53000,GBP,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Malaysia,33,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Anomaly Detection,Other,Google Search,"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Not very important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,1GB,Other,"C/C++,MATLAB/Octave,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Perl,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,Rarely,,Rarely,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,0,40,20,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,Often,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,Often,,26-50% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,,,Other,"Official documentation,Personal Projects,Tutoring/mentoring",,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher",Self-taught,55,10,5,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Increased slightly,6-10 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,1GB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Python",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,,Often,Sometimes,,Often,,,,,,Sometimes,,,,Often,Often,,Sometimes,,,,,,,,,,,50,20,0,15,15,0,Enough to run the code / standard library,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,,,,Often,,,,,,,,,76-99% of projects,More external than internal,Other,,Visualizing the analysis in the best way possible.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Shared access server,Bitbucket,Sometimes,34000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,NA,Retired,,,Yes,,Researcher,,Employed by college or university,Amazon Web services,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,Other,Kaggle competitions,20,0,0,20,60,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed part-time,,,No,Yes,Other,Fine,Employed by college or university,SAP BusinessObjects Predictive Analytics,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,"Data Analyst,Programmer",Work,40,0,50,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Mexico,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,28,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",5,25,30,10,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,1PB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,,Most of the time,,,,,,Often,,Sometimes,,,,,,,Rarely,,,Rarely,,,,Often,,Most of the time,,,,,,,,Rarely,Most of the time,,,Most of the time,Sometimes,,Most of the time,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",Sometimes,,,,,Rarely,,Rarely,Rarely,,,Rarely,,,Rarely,Sometimes,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,40,2,38,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,,,Often,,,,,,,Sometimes,,,,Sometimes,Often,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other",Company Developed Platform,,Git,Always,100000,SGD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,64,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler",University courses,40,10,30,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Other,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs,Other","Cloudera,DataRobot,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,KNIME (free version),NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,Most of the time,Often,Often,,Often,,,,,Most of the time,,,Often,,Rarely,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,Sometimes,,Often,,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,Sometimes,,Often,,Most of the time,Sometimes,,,,Often,,,Often,Often,,,Sometimes,,,Sometimes,Often,Often,,,,,Sometimes,Often,,,,30,20,10,20,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,Sometimes,,Often,Often,,,,,,,,,,Most of the time,,,,51-75% of projects,Approximately half internal and half external,Other,"eqifax, experian",timelines on getting data integration projects completed,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Most of the time,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Friends network,Kaggle,Online courses,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,Very useful,,,,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Necessary,,,,DataCamp,Other,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,Less than a year,I haven't started working yet,Kaggle competitions,20,0,0,0,70,10,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Mexico,46,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Very useful,,,Very useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Engineer,University courses,15,30,30,20,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Neural Networks","Amazon Web services,C/C++,NoSQL,Python,R,SQL",,Often,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks",,,,,,Often,Often,,,,,,,Sometimes,,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,,20,10,20,10,20,20,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",Often,,,,,,,,,,,Often,,Often,,,,Sometimes,,,,,26-50% of projects,More internal than external,Standalone Team,NLP corpus,Cleaning data and size,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git,Other",Sometimes,180000,MXN,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,Not Useful,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Engineer",Self-taught,45,25,0,0,30,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,,,,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Often,,,,,,,,,,Decision Trees,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,Amazon Web services,Anomaly Detection,R,"Google Search,Government website","Blogs,Conferences,Newsletters,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,More than 10 years,"Statistician,Other",Self-taught,30,0,50,0,0,20,,,A bachelor's degree,Government,"5,000 to 9,999 employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Important,Other,,Other,Most of the time,,,"R,SAS Base,SAS JMP,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,Often,,,,,,,,Rarely,,,,"Data Visualization,Time Series Analysis,Other",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,0,50,0,5,0,45,Enough to refine and innovate on the algorithm,Inability to integrate findings into organization's decision-making process,,,,,,,,Sometimes,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,Survey of Construction,"Understanding assumptions, lack of metadata","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Sometimes,103639,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,Very useful,Somewhat useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,60,20,0,10,0,"Natural Language Processing,Unsupervised Learning",Decision Trees - Random Forests,A bachelor's degree,Hospitality/Entertainment/Sports,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests",,,,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,40,10,10,30,10,0,Enough to run the code / standard library,"Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,,,Often,,,Often,,,,,,Most of the time,,,100% of projects,Entirely internal,Standalone Team,,Do not know whether the collection of data is useful or not. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,60000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Hungary,32,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Personal Projects",Very useful,,,,,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Jack's Import AI Newsletter",5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Physics,6 to 10 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,30,30,0,20,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Female,Russia,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Factor Analysis,R,University/Non-profit research group websites,"Conferences,Online courses",,,,,Very useful,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Egypt,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Kaggle,Online courses",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Miner,Data Scientist,Other",Work,0,60,30,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau",,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,Rarely,,Rarely,Most of the time,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,Rarely,,,,Often,Most of the time,Often,Sometimes,,,,,,,Most of the time,,,Often,,Often,,Often,,,Most of the time,,Often,Sometimes,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Sometimes,Often,,Often,Often,,Sometimes,,,Often,Often,,,,,Often,Sometimes,,51-75% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,300000,EGP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Other","College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,Data Scientist,University courses,30,5,5,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Rarely,,Rarely,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,,Most of the time,Sometimes,,,,,,Often,,Often,,Often,Often,,Sometimes,Most of the time,Often,,,Most of the time,,Sometimes,Most of the time,Most of the time,,,,30,20,10,35,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Sometimes,Sometimes,,,Sometimes,,,,,,Sometimes,,Rarely,Often,,,Often,Sometimes,Sometimes,Most of the time,,Often,76-99% of projects,More external than internal,IT Department,CMS Open Payments; Humedica; Truven; Optum; Premier; Thin,patient privacy concerns,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,85000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Taiwan,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Kaggle,Personal Projects,YouTube Videos",Very useful,,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,40+,Master's degree,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",Kaggle competitions,50,0,0,10,0,40,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,United States,77,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Time Series Analysis,,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,< 1 year,,,,,,,,,,,,,,,,2 - 10 hours,,No,Some college/university study without earning a bachelor's degree,A social science,,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,27,"Not employed, but looking for work",,,,,,,,Mathematica,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Very useful,,Very useful,Very useful,,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,KDnuggets Blog",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,3 to 5 years,I haven't started working yet,Self-taught,35,20,0,45,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Netherlands,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,10,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,Often,Sometimes,Rarely,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,,,Often,Most of the time,Often,Rarely,,,,,Often,Sometimes,Often,,Rarely,Often,Rarely,Sometimes,,Sometimes,,,,Often,Rarely,Often,Sometimes,,,,20,20,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT",Sometimes,Often,,,,,,Often,Sometimes,,,,Sometimes,,Often,,,,,,,,76-99% of projects,Entirely internal,IT Department,"Some demogrpahic infomration, but very little in general.","The enormous amount of databases and features of those, drowning in column names/definitions of fields.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,70000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,SAS Base,,SAS,,Other,,,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,More than 10 years,"Data Miner,Statistician",University courses,40,0,40,20,0,0,Other (please specify; separate by semi-colon),"Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Financial,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,1MB,Regression/Logistic Regression,"SAS Base,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,Most of the time,"A/B Testing,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Simulation",Most of the time,,,,,,Most of the time,Often,,,,,,,,,,,,,Sometimes,Often,,,,Most of the time,Most of the time,,,,,,,40,20,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Often,76-99% of projects,Approximately half internal and half external,Standalone Team,credit bureau data ,na,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,,Sometimes,222,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,Very useful,,,,,,,,Somewhat useful,,Very useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,10,30,10,0,"Computer Vision,Natural Language Processing","Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Neural Networks","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Naive Bayes,Natural Language Processing,Neural Networks,RNNs",,,Sometimes,Often,,,,,,,,,,,,,,Sometimes,Often,Often,,,,,Often,,,,,,,,,30,10,10,10,10,30,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,,Often,,,,,,,,,,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,Bitbucket,Sometimes,13000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Female,Canada,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",60,25,10,0,5,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,Often,Rarely,,Rarely,Often,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Time Series Analysis",,,,Often,,Most of the time,Often,Often,Most of the time,,,Most of the time,,Sometimes,,Sometimes,,,Often,Often,Often,,Often,,Sometimes,Sometimes,,,,Sometimes,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,Often,,Most of the time,Sometimes,,,Often,Most of the time,,,Often,Sometimes,,Most of the time,Often,Often,,Often,Most of the time,,51-75% of projects,More internal than external,Business Department,,data representation and comparability ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Rarely,"70,000",CAD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by company that makes advanced analytic software,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Kaggle,Official documentation",Somewhat useful,,,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Programmer,Statistician",University courses,0,10,10,30,50,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Java,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,Sometimes,Most of the time,,,,Often,,,,,Often,Often,,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,,,,,Often,Often,,,,,,Often,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,Sometimes,,Rarely,Sometimes,Often,,Sometimes,Often,,,Most of the time,,Most of the time,,Often,,,,,Often,,Sometimes,Sometimes,,Often,Rarely,Rarely,Rarely,Rarely,,,,25,10,10,5,50,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",,Often,,,,Most of the time,,,Sometimes,,,,,,Sometimes,Rarely,,,,,,,26-50% of projects,More internal than external,IT Department,,sparse data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,75000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by college or university,C/C++,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,"Researcher,Other",Self-taught,20,0,10,70,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Very important,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Other",Text data,Rarely,1GB,"Bayesian Techniques,Decision Trees","Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,,,,Sometimes,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Text Analytics",Rarely,,Rarely,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,,20,10,0,60,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Often,,,Often,,,Sometimes,,,,,,,,,,,Most of the time,,,100% of projects,Approximately half internal and half external,Other,26 students x 1 project / week with all unique open source data sets,Scraping data,Column-oriented relational (e.g. KDB/MariaDB),Other,Slack,Other,Never,"48,000",USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Other,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,Very useful,,,,,Somewhat useful,,Very useful,,Very useful,Very useful,Not Useful,Somewhat useful,,Somewhat useful,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Partially Derivative Podcast",1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,Other","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,35,0,0,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Tableau,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Company internal community,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,Very useful,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,1-2 years,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"Data Stories Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,25,10,10,50,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,"10,000 or more employees",,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,IBM Cognos,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Excel Data Mining,Minitab,NoSQL,Python,QlikView,R,SQL,Tableau",,Rarely,,,,,,,Rarely,Sometimes,,,Rarely,,,,Sometimes,,,,,,Rarely,,,Rarely,Rarely,,,,Often,Rarely,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Sometimes,,,,,,Rarely,Often,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,Often,Rarely,,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Often,Often,,Most of the time,Sometimes,,,Sometimes,,Sometimes,,,,Most of the time,,,Sometimes,,,Most of the time,,100% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Sometimes,"95,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Portugal,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Government website,"Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring",,,,,,,Very useful,,Very useful,,Very useful,,,,,,Very useful,,"Becoming a Data Scientist Podcast,DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,25,40,5,15,0,"Natural Language Processing,Recommendation Engines,Speech Recognition","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Perl,Python,R,SAS Base,SQL,Tableau,TensorFlow,Unix shell / awk",Most of the time,Most of the time,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,Often,,,Often,Most of the time,,Sometimes,,,,,Sometimes,,,,Most of the time,,,Most of the time,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics",Sometimes,,Sometimes,,,,,,,,,,,,Often,Often,,,Most of the time,,,,Often,,,Often,,,Most of the time,,,,,35,30,10,15,10,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,,Often,,Sometimes,Most of the time,,76-99% of projects,More internal than external,Central Insights Team,"climate,social",realiability,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,87000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,,,,Necessary,,Necessary,,,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,60,0,30,10,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,SVMs","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Tableau,TensorFlow",,,,,Most of the time,,Often,,Most of the time,,,,,Sometimes,Often,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,Often,,,,Rarely,Often,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Often,,Often,,,Most of the time,,Most of the time,,Most of the time,,Often,,,Often,Often,Often,,,,60,15,15,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",Sometimes,116000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,Very useful,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,35,15,0,10,40,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Cloudera,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,,Rarely,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"Cross-Validation,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,Text Analytics,Time Series Analysis",,,,,,Most of the time,,,Most of the time,,,Most of the time,,Sometimes,,Often,,,Often,Often,,Often,Most of the time,,Often,,,,Often,Most of the time,,,,60,10,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,,,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,90000,,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,SQL,Other,"Friends network,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",More than 10 years,Data Analyst,Self-taught,90,0,0,10,0,0,,,A master's degree,Insurance,"5,000 to 9,999 employees",Stayed the same,3-5 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,,"Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,40,0,0,10,50,0,,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT",Most of the time,,,,Sometimes,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,10-25% of projects,Entirely internal,Other,None,Coordinating with the IT team.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,38000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Portugal,44,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,KNIME (commercial version),Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Researcher",University courses,20,70,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Often,Often,Sometimes,Sometimes,,Sometimes,,Often,,Often,,,Sometimes,,Often,,Often,,,,,,,Sometimes,,,,30,30,20,20,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,,,,Often,Often,,,,,,,,,,,,,51-75% of projects,More internal than external,,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Jupyter notebooks,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,Somewhat useful,Not Useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,15,10,0,50,25,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SAS Base,SQL,TensorFlow",,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,Sometimes,,,,Often,,,,Often,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,Most of the time,,Most of the time,Often,,,,,Often,,Often,,Most of the time,Often,,Often,,Often,Sometimes,,,,,Often,Often,,,,50,25,15,10,0,0,Enough to tune the parameters properly,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,25000,,,8,,,,,,,,,,,,,,,,,, +Male,Mexico,24,Employed part-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,60,20,0,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Czech Republic,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Management information systems,More than 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Predictive Modeler",Self-taught,30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A master's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Tableau,Other",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,Often,,,,,Rarely,,,Often,,,,,,,,,,,Most of the time,,,Often,,,,Often,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Neural Networks,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,Sometimes,,,,Sometimes,Most of the time,Often,,,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,Often,,,,20,20,5,25,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Often,,,,Sometimes,,,,,,,,Often,,Often,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Uplift Modeling,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Researcher",Self-taught,50,0,0,0,0,50,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,Sometimes,,,,,,Rarely,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Most of the time,,,Rarely,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Naive Bayes,Random Forests,Segmentation,SVMs",Often,Often,Often,,,Often,Most of the time,Most of the time,Often,,,Sometimes,,,Sometimes,,,Often,,,,,Most of the time,,,Often,,Sometimes,,,,,,40,20,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Sometimes,,,,Often,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,credit bureau data;marketing data,entity resolution,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","I don't typically share data,Share Drive/SharePoint",,Git,Sometimes,180000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,"Kaggle,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,I don't know,Increased slightly,1-2 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau",,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,,,Rarely,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Often,Rarely,,,,,,,,Sometimes,,,,,,,Rarely,,,,,,,,,,,60,20,0,20,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Business Department,,,,,,,,83000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,20,15,15,50,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Statistician,Fine,"Employed by college or university,Employed by non-profit or NGO",Amazon Machine Learning,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,,Data Analyst,Kaggle competitions,40,20,10,10,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs",No education,Non-profit,500 to 999 employees,Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Other,Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Random Forests",,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,20,10,20,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data",,Sometimes,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,no,clean dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,60000,USD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,India,27,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,,Nice to have,Nice to have,,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Somewhat important,Very Important,,Somewhat important,Very Important,Somewhat important,,,,,,Very Important,,, +Male,Spain,33,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Spark / MLlib,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,Very useful,,Very useful,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Researcher",University courses,50,0,25,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,Sometimes,,,,Sometimes,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Rarely,,,,Most of the time,Often,Often,Often,Sometimes,,Often,,Sometimes,Often,Often,,,Most of the time,Often,Sometimes,Sometimes,Most of the time,,,,,Often,Often,Sometimes,,,,15,30,20,15,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,Often,,,,,Sometimes,,,,Often,,Sometimes,,,Often,,,76-99% of projects,More internal than external,Standalone Team,Twitter; Facebook,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,72000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,27,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,29,Employed full-time,,,Yes,,Data Analyst,,Employed by professional services/consulting firm,Tableau,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Data Analyst,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,70,10,1,19,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Rarely,100MB,"Decision Trees,Ensemble Methods","Microsoft Azure Machine Learning,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,,,Often,Often,,,,,,,Often,,,,,Often,,Often,,,,,,,,,,,80,10,3,2,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,19200,EUR,Has decreased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Software Developer/Software Engineer",University courses,95,5,0,0,0,0,,,A master's degree,Mix of fields,"10,000 or more employees",,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,,Relational data,,,,"SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,80,0,0,15,5,0,,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Sometimes,96000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Netherlands,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,C/C++,Social Network Analysis,C/C++/C#,Government website,"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Very useful,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,10,30,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,<1MB,Neural Networks,"C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Neural Networks,Prescriptive Modeling,Text Analytics",,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,Sometimes,,,,,60,20,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,Most of the time,,Often,Most of the time,,Often,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,,Rarely,120000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,62,Employed full-time,,,Yes,,Programmer,Poorly,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Programmer,Self-taught,80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","KNIME (free version),Python,R,RapidMiner (commercial version),SQL,Stan",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,Most of the time,Rarely,,,,,,,,Most of the time,Sometimes,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,SVMs,Text Analytics",,Sometimes,,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,Often,Often,,Sometimes,,Sometimes,,,Often,,,,,Sometimes,Sometimes,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team",Often,Often,,,Often,Sometimes,,Often,,,,,,,,Sometimes,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,127000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer,Software Developer/Software Engineer",University courses,20,0,0,70,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM Cognos,Python,R,SQL",,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,SVMs",,,,,,Often,,Often,,,,,,Rarely,,,,,,Sometimes,,,Most of the time,,,,,Often,,,,,,40,10,10,35,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Often,,,,Sometimes,,,,,,Most of the time,,,,Sometimes,,,,76-99% of projects,Entirely internal,Business Department,None,Data cleanliness ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,RapidMiner (commercial version),Social Network Analysis,Matlab,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"GPU accelerated Workstation,Workstation + Cloud service,Other",2 - 10 hours,Master's degree,No,Master's degree,Management information systems,1 to 2 years,"Computer Scientist,Engineer",University courses,10,10,0,80,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Germany,58,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,Python,Google Search,Other,,,,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Other,University courses,50,0,0,50,0,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,,10MB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","R,RapidMiner (commercial version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,None,Do not know,Other,,,Other,Email,,Git,Always,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,70,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A doctoral degree,Other,"10,000 or more employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Other,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Regression/Logistic Regression,Other","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,Most of the time,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction",,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,60,10,10,0,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,,Often,,Often,,,,Often,,Sometimes,,,Often,,,Less than 10% of projects,Entirely internal,Other,None,"The data creators are poorly trained or lack the time to record accurately. Data is often missing on incorrectly recorded (data contradicts itself, e.g. an entity is classified as with two mutually exclusive categorical values).",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,150000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Friends network,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,10,30,0,0,"Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,100GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,Often,,,,,,,,,,,,,,"Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,,Sometimes,,,,Often,,,,Most of the time,,,,,,,Often,,,,,,,Most of the time,,,,5,45,5,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,Most of the time,,Most of the time,Often,,,,,,,,Most of the time,,,26-50% of projects,More external than internal,IT Department,Company level data,"Data cleaning, Non-reliable source, Lack of variables","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,120000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Julia,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Official documentation,Online courses,Stack Overflow Q&A,Trade book,YouTube Videos,Other",,,,,Somewhat useful,,,,,Very useful,Very useful,,,Very useful,,Very useful,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Operations Research Practitioner,Self-taught,70,30,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Orange,Python,R,SQL,Stan,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Often,,Most of the time,,,,,,,,,Often,Sometimes,,,,,,Rarely,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics,Other",,Sometimes,Often,,,Most of the time,Most of the time,Often,,,,,,Sometimes,,Often,,,Rarely,,Often,,,,,,Often,Sometimes,Rarely,,Often,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",,,,Often,Most of the time,Sometimes,,Often,Often,,,,,,Often,,,,,,,,76-99% of projects,Entirely internal,Business Department,Census; Western PA Regional Data Center; ,Data cleaning. Understanding the process for collecting and recording data. Matching collected data to time intervals. Developing viable proxies for missing data elements.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,90000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Friends network,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,Very useful,Very useful,,,,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer",University courses,40,5,10,40,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",,,,,,Often,Most of the time,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,Often,,,,,,,,,,20,30,15,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues",Most of the time,,,Sometimes,Often,Sometimes,,,,,,,,,Most of the time,,Most of the time,,,,,,76-99% of projects,Entirely internal,Other,"MIDT, Aerocivil Traffic",Depuration,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,4000000,COP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,0,30,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,I don't know,Increased slightly,Don't know,Some other way,Very important,Other,Basic laptop (Macbook),Text data,Sometimes,10MB,Other,"Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,10,40,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Other",,,,,,,,,,,,Often,,,,,,,,,,Often,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,Other,Most of the time,"40,000",UGX,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,"DataCamp,edX,Other",GPU accelerated Workstation,40+,,No,Some college/university study without earning a bachelor's degree,A health science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Canada,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Stack Overflow Q&A",,,,,,Somewhat useful,Very useful,,,,,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Data Analyst,Programmer,Researcher",Self-taught,85,0,0,0,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Rarely,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Julia,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,Segmentation",,,,Sometimes,,,Most of the time,Often,Often,,,,,,,Often,,,,Sometimes,,,Often,,,Often,,,,,,,,20,40,5,25,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,,,,,,Sometimes,Often,,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,none,Hard to collect clinical data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,90000,CAD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Decision Trees,R,Google Search,"Blogs,Kaggle,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,1 to 2 years,Other,University courses,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Never,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","DataRobot,Jupyter notebooks,Python,R,SQL",,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,,,Most of the time,,,,Often,,,,Most of the time,,,,,,,Often,,,,,,,,,,,95,2,0,2,1,0,Enough to run the code / standard library,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,Often,Most of the time,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Never,100MB,Regression/Logistic Regression,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Simulation",,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,60,10,15,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,,Sometimes,,Often,Often,,,,,Often,Often,,,,,Most of the time,,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"89,000",,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping),Other","Newsletters,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,Somewhat useful,,,,Very useful,,Very useful,Somewhat useful,,,,"Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,Business Analyst,University courses,60,0,10,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,10GB,Regression/Logistic Regression,"Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Sometimes,,,Most of the time,,,,"Data Visualization,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,65,10,0,10,0,15,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,Sometimes,Often,,,,,,,,,Most of the time,Often,Most of the time,,,10-25% of projects,Entirely internal,Other,None,Data Quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Sometimes,"85,000",USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,United States,29,"Not employed, but looking for work",,,,,,,,Python,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Friends network,Stack Overflow Q&A",,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,5-10 years,Unnecessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,A social science,6 to 10 years,Business Analyst,Self-taught,100,0,0,0,0,0,,Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Malaysia,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,Very useful,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,20,30,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Other,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,,,,Sometimes,,,,Sometimes,,,,,,,,Often,,Often,,Sometimes,Rarely,Most of the time,,,,Sometimes,,,,Most of the time,Most of the time,Most of the time,Sometimes,Most of the time,,,,,,Sometimes,Most of the time,,,Most of the time,Rarely,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Often,Often,Sometimes,Most of the time,Most of the time,Most of the time,Often,,,,,Sometimes,Rarely,Often,,Often,Often,Sometimes,Often,,,,Sometimes,Sometimes,Sometimes,Sometimes,Most of the time,Often,,,,40,20,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,Sometimes,Often,Most of the time,,,Often,,,Sometimes,,Often,,,Most of the time,Most of the time,,Most of the time,Most of the time,,51-75% of projects,More external than internal,Other,none at the moment,Dirty Data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion,Other",Sometimes,169000,MYR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +A different identity,United States,68,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,Other,Neural Nets,Other,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Friends network,Personal Projects,Other,Other",Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,Researcher,Other,10,10,80,0,0,0,Other (please specify; separate by semi-colon),Evolutionary Approaches,A master's degree,Other,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,<1MB,Neural Networks,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,20,10,0,50,20,0,Enough to refine and innovate on the algorithm,Limitations in the state of the art in machine learning,,,,,,,,,,,,Often,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,Graph (e.g. GraphBase/Neo4j),Share Drive/SharePoint,,Git,Most of the time,,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,36,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",75,20,0,5,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Evolutionary Approaches",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Female,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,R,University/Non-profit research group websites,"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,31,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Very useful,"FastML Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,20,0,30,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",High school,Academic,I don't know,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,,10TB,"Bayesian Techniques,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Unix shell / awk",,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,Sometimes,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Markov Logic Networks,PCA and Dimensionality Reduction,Random Forests,Simulation",,,Most of the time,,,Often,Sometimes,,,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,Most of the time,,,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,Most of the time,Most of the time,,10-25% of projects,More external than internal,Standalone Team,DbGap;1000GP;GEO,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,15000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,17,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Textbook",Very useful,Very useful,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Other,Self-taught,60,5,35,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs",A professional degree,Academic,,,,,"N/A, I did not receive any formal education",Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Other",,,"CNNs,Ensemble Methods,Neural Networks,RNNs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Ensemble Methods,Neural Networks,RNNs,Simulation",,,,Sometimes,,,Most of the time,,Sometimes,,,,,,,,,,,Most of the time,,,,,Sometimes,,Often,,,,,,,15,70,0,15,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Often,26-50% of projects,Do not know,Other,,We collect it ourselves and the sample size is not always large enough,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Git,Sometimes,,,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,,,Spark / MLlib,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,Very useful,Somewhat useful,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Biology,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,0,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,10MB,"Decision Trees,Ensemble Methods","Orange,Python,R,SQL,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,Often,,,,,,,,,Often,Sometimes,,,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Segmentation",,,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,,,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,30,10,5,25,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,,,,,Often,,,,,,,,,,Often,,,,51-75% of projects,Entirely internal,Other,Twitter sentiment data,"We only get it in current snapshots, We don't have access to historical data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,60000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,I don't plan on learning a new ML/DS method,Python,Government website,"Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,6 to 10 years,"Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,50,10,25,15,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,,<1MB,"Bayesian Techniques,Regression/Logistic Regression,Other","C/C++,Jupyter notebooks,Python,R",,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction",,,Sometimes,,,Sometimes,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,,10,90,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations in the state of the art in machine learning,,,,,,,,,,,,Often,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,Census,Privacy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Always,90000,USD,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,32,"Not employed, but looking for work",,,,,,,,Amazon Web services,MARS,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",5-10 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Biology,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,50,25,NA,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,KNIME (free version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,Other,Self-taught,50,40,0,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,500 to 999 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,NoSQL,Python,R,SAS Base,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,Most of the time,,,Rarely,Often,,,Sometimes,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",,,Sometimes,,,Most of the time,Most of the time,,,,,Sometimes,,Often,,Often,,Sometimes,Sometimes,Often,Often,,Often,,,,Sometimes,,Sometimes,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Often,,,,,,,,Sometimes,Most of the time,,Often,,Sometimes,Sometimes,,,Often,,51-75% of projects,Entirely internal,Central Insights Team,"credit bureau data, google analytics",get raw data from original source,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Bitbucket,Sometimes,68000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,Very useful,"Jack's Import AI Newsletter,Linear Digressions Podcast,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A bachelor's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,"Ensemble Methods (e.g. boosting, bagging)",Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,Very useful,,Somewhat useful,,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,FlowingData Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer",Self-taught,80,0,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,Fewer than 10 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,Other,"Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,Rarely,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,,,,Sometimes,Most of the time,Often,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,,10,10,20,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Often,Sometimes,,,,,Often,,,,,,,,Most of the time,,,,76-99% of projects,Entirely internal,Standalone Team,,Poorly transcribed phone call text,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,"Git,Subversion",,70000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Nigeria,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by non-profit or NGO,KNIME (free version),Deep learning,SQL,I collect my own data (e.g. web-scraping),Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Non-profit,20 to 99 employees,Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Other,Traditional Workstation,Text data,Never,,Other,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Rarely,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,60,0,0,20,0,20,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Do not know,Other,None,,,Email,,Other,Never,150000,NGN,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by company that makes advanced analytic software,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,25,25,30,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Sometimes,1GB,"CNNs,Neural Networks","Java,Other,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,"CNNs,kNN and Other Clustering,Neural Networks,RNNs,Text Analytics",,,,Most of the time,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,Sometimes,,,,Often,,,,,50,10,20,0,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,,,,,,,Most of the time,,,Often,,26-50% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Git,Subversion",Rarely,,,,4,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",University courses,15,5,5,75,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,10GB,"Bayesian Techniques,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,Often,,,,,,,,Rarely,,,,,,Often,,,,,,,,Sometimes,,Often,,,,,,,,Often,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",Most of the time,,Often,,,Sometimes,Often,Sometimes,Rarely,,,,,,Often,Sometimes,,,Rarely,,Rarely,,Sometimes,,,,Rarely,,,Often,,,,70,5,0,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Often,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,93000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Computer Science,Less than a year,"Business Analyst,Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,50,0,45,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,United States,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Google Search,"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,Other,University courses,30,10,0,60,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression",High school,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,Regression/Logistic Regression,"Amazon Web services,MATLAB/Octave,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,Most of the time,,Sometimes,,,Sometimes,,,,,,,,,Often,,,,10,45,30,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,Sometimes,Sometimes,Often,,,,,,,,,Sometimes,,,,,Often,,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,97000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Statistician",University courses,20,5,15,50,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,Python,R,SQL",,Rarely,,,,,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",Often,,Sometimes,,,Sometimes,Often,,,,,Rarely,,Rarely,,Often,,,Sometimes,,Sometimes,,Sometimes,,,,Often,Sometimes,Sometimes,,,,,49,10,1,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Often,,,Most of the time,,,,,,,,,,,,,,,Sometimes,Most of the time,,51-75% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,90000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Programmer,Software Developer/Software Engineer",University courses,20,20,10,30,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,500 to 999 employees,,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,Rarely,,,,Rarely,Rarely,,Sometimes,,,,,,,Rarely,,Most of the time,,,,,,,,Rarely,Most of the time,,,Rarely,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,QlikView,R,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,,,,,50,10,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Sometimes,Most of the time,,,,,,,,,,,Sometimes,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Bitbucket,Rarely,150000,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,70,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Text Mining,R,"Government website,I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,500 to 999 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,Most of the time,Sometimes,Sometimes,Sometimes,,Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Most of the time,Often,,,,Often,,Sometimes,,Often,,,,,Often,,Often,,,,,,,,,,,60,5,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Sometimes,Often,,Often,Often,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,IT Department,,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Predictive Modeler",Self-taught,70,5,10,15,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)",Logistic Regression,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1TB,,"IBM Watson / Waton Analytics,Java,Microsoft Excel Data Mining,NoSQL,Python",,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Often,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Prescriptive Modeling,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,Most of the time,,,,,50,10,10,20,10,0,,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Other,github,Git,Rarely,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,,,,"FastML Blog,FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,Data Scientist,Work,40,10,40,5,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",Sometimes,Most of the time,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,Sometimes,,,,Sometimes,,,,Sometimes,Often,,Most of the time,,,,,,,,,,,50,1,29,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Often,,,,,,,,,Sometimes,,,,,,Often,,,76-99% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,185000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,38,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Personal Projects",Very useful,,,,,,Very useful,,,,,Very useful,,,,,,,No Free Hunch Blog,3-5 years,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,South Korea,30,Employed full-time,,,Yes,,Computer Scientist,,,TensorFlow,Deep learning,R,GitHub,"Kaggle,Online courses,Textbook,Other",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",Self-taught,20,60,10,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Markov Logic Networks",,Military/Security,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1TB,"CNNs,HMMs,Neural Networks","C/C++,NoSQL,Python,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Naive Bayes,Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,Often,Often,Sometimes,,,,,,,,,,,,,,10,10,40,30,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),,90000,SGD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Other,27,Employed full-time,,,No,Yes,Other,Fine,Employed by government,Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,I prefer not to answer,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Other,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Online courses,Stack Overflow Q&A,Textbook,Other",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,95,0,0,0,5,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Ukraine,34,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Other,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Official documentation,Online courses,Podcasts,Textbook",,Somewhat useful,,,,,,,,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Other",Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Master's degree,,,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Machine Translation,Natural Language Processing","Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Stan,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Textbook",,Somewhat useful,,,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,"Researcher,Statistician",University courses,20,30,40,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,500 to 999 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,DataRobot,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Perl,Python,R,SAS Base,SQL,Tableau",Rarely,Sometimes,,,,Often,,,,,,,,,,,Sometimes,,,,,,Often,,,,Rarely,,,Rarely,Sometimes,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,Often,,,,Often,Often,Often,,,Often,,Often,Often,Most of the time,,Sometimes,,,,,Often,,,Often,Often,,Often,Most of the time,,,,20,20,20,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,,,,,Most of the time,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,68000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,Google Search,"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Somewhat useful,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher",University courses,30,10,30,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,Rarely,,Sometimes,,,,,,,,Sometimes,,Sometimes,,Often,Often,,Often,Often,,,,Often,,Most of the time,,,,20,20,20,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,Data cleaning & pre-processing,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Conferences,Newsletters,Online courses,Textbook",,Somewhat useful,,,Very useful,,,Somewhat useful,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,edX,Udacity",Traditional Workstation,2 - 10 hours,Master's degree,No,Professional degree,,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,Brazil,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Researcher",University courses,10,10,20,60,0,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",Primary/elementary school,Mix of fields,10 to 19 employees,Increased slightly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Google Cloud Compute,Hadoop/Hive/Pig,MATLAB/Octave,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,TensorFlow",,,,,,,,Often,Sometimes,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,Most of the time,Often,,Sometimes,Most of the time,,,,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,Often,,,,Most of the time,Most of the time,Often,Most of the time,Often,,,,,,,,,,Often,Often,Often,Often,,,Often,,,,Most of the time,,,,40,30,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,,,Often,,,,,,,,,,,,Often,,,51-75% of projects,Approximately half internal and half external,IT Department,N,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Most of the time,40000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Python,Genetic & Evolutionary Algorithms,R,,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,Statistician,University courses,40,0,0,60,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,,,,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,,Bayesian Techniques,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization",,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,40,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Do not know,,,,,I don't typically share data,,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Statistician",Work,35,25,35,3,2,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Never,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,"Decision Trees,Neural Networks,Random Forests,SVMs",,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,20,35,5,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,51-75% of projects,Entirely internal,Other,,,,I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,,,A bachelor's degree,Retail,"10,000 or more employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SAP BusinessObjects Predictive Analytics,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Sometimes,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,Often,,,,,,,Often,,Sometimes,,,Often,,,Sometimes,Sometimes,,,,80,5,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,Most of the time,,Often,,,,,,,,Most of the time,Sometimes,,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Sometimes,55000,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed part-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"GitHub,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,Somewhat useful,,,,,,,Very useful,,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,Other,University courses,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A master's degree,Academic,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Rarely,10MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","NoSQL,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,Often,,,,,,,,,Most of the time,Often,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Often,,,,,,,,Most of the time,,Often,Most of the time,,,,Most of the time,,,,,,,51-75% of projects,Entirely internal,Standalone Team,N/A,ETL,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,"28,000",USD,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,Java,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Researcher",University courses,15,3,30,50,2,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Decreased slightly,More than 10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Java,MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Often,Often,Most of the time,,,,Sometimes,Often,,Often,,Rarely,,,Sometimes,,Often,,,Often,,,,Sometimes,,,,35,25,15,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,The fact that it is very dirty and often organized by the clients over time in an ad-hoc fashion. This often results in not being able to trust the data.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial",Rarely,100000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,Python,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Researcher,Self-taught,40,15,35,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",I don't know/not sure,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,1MB,"Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Time Series Analysis",,,,,,,Often,Rarely,,,,,,,,,,,,,,,,,,,,,,Often,,,,12,35,5,13,35,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,Often,,,Often,Often,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,35000,EUR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,Google Search,"Blogs,College/University,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,0,10,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Don't know,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Segmentation",Sometimes,Sometimes,,,,Often,Most of the time,Sometimes,,,,Sometimes,,,,Often,,,,Rarely,,,Sometimes,,,Sometimes,,,,,,,,80,5,0,15,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,Other,Naics Data base from https://www.naics.com/data-append-services-enhancement/,It is messy and take time to clean.,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,63500,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,,"Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs",A doctoral degree,Financial,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests",,,,,,Most of the time,Most of the time,Rarely,,,,Rarely,,,,,,,,Rarely,,,Rarely,,,,,,,,,,,10,40,20,20,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process",,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,,Never,200000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",I don't plan on learning a new tool/technology,Deep learning,Python,Government website,"Arxiv,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer",Self-taught,50,40,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Never,100GB,"Ensemble Methods,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk,Other",,Often,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,Often,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,39,1,50,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,Most of the time,,,Sometimes,,,,,,Sometimes,,,100% of projects,More internal than external,Other,https://disc.gsfc.nasa.gov/,Acquiring domain expertise to aid feature engineering,Other,Other,AWS filesystem,Git,Sometimes,"120,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Other,35,50,0,15,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,Not Useful,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,10,10,10,70,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Government,20 to 99 employees,Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,,,,,"Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,Tableau",Often,Often,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",,,Rarely,,,,Often,Rarely,,,,,,,,Rarely,,,,,,,Rarely,,,,,,Sometimes,Rarely,,,,10,0,0,10,80,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,Most of the time,Most of the time,Sometimes,,Sometimes,Most of the time,,Most of the time,,,Often,,,,Sometimes,,Most of the time,Most of the time,,10-25% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,Personal Projects,,,,,,,,,,,,Not Useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased significantly,Don't know,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,,Other,"Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,25,25,25,25,0,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,None,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Bitbucket,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,53,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Other,Fine,Employed by government,TensorFlow,Deep learning,Python,,"Arxiv,Blogs,Conferences,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,Very useful,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,,University courses,30,10,20,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Government,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Important,Other,Laptop or Workstation and local IT supported servers,Text data,Never,<1MB,Other,"Jupyter notebooks,Microsoft Excel Data Mining,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,kNN and Other Clustering,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,50,0,0,40,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Most of the time,,,Often,,,Most of the time,,,,,,Often,,,Most of the time,,Often,,,,100% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,South Korea,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,TensorFlow,Link Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,Very useful,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Work,40,10,20,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",No education,Academic,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,KNIME (free version),NoSQL,RapidMiner (free version),SQL,Unix shell / awk",,Rarely,,,,,,,Often,,,,,,Most of the time,,,,Sometimes,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,Often,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,Text Analytics",Rarely,,Sometimes,,,,Most of the time,Often,,,,,,,,,,Sometimes,Often,,,,Sometimes,Often,,Sometimes,,,Most of the time,,,,,10,20,10,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,Most of the time,Sometimes,,,,,,,,,,Often,,,26-50% of projects,More external than internal,Central Insights Team,kaggle; amazon; google; and others,"Configuraiton, extending and understanding the internal execution flow of the open source code, because most often they lack proper documentation :(","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Github; Bitbucket,"Bitbucket,Git",Rarely,10800000,KRW,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Very useful,,,,,Not Useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,40,10,50,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,Fewer than 10 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",Often,Rarely,Rarely,,Rarely,Often,Most of the time,Often,,,,,,,,Sometimes,,Sometimes,,,Rarely,,Sometimes,,,,Often,,Sometimes,Sometimes,,,,50,5,1,20,24,0,Enough to explain the algorithm to someone non-technical,Did not instrument data useful for scientific analysis and decision-making,,,Sometimes,,,,,,,,,,,,,,,,,,,,100% of projects,Do not know,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,150000,,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Textbook,Tutoring/mentoring",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,Somewhat useful,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,,University courses,30,0,15,50,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,Other","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,Other",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,Sometimes,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics,Time Series Analysis",Often,,Sometimes,Sometimes,Often,Most of the time,Most of the time,,,,,,,Often,,Most of the time,,,Most of the time,Sometimes,Often,,,Often,,,,,Most of the time,Sometimes,,,,20,20,20,30,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,Sometimes,Often,,,,,,,,,,,,Often,,,Often,Often,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Microsoft SQL Server Data Mining,Monte Carlo Methods,R,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Computer Scientist,Data Scientist,Engineer,Software Developer/Software Engineer",Self-taught,50,20,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,R,SQL,Stan,Tableau",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,,Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,Most of the time,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,Often,Often,,,Sometimes,,Sometimes,,,,,,,Often,,,,60,15,2,8,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,Sometimes,,,Often,Often,Sometimes,,,Often,,Sometimes,Sometimes,Often,,76-99% of projects,Entirely internal,Standalone Team,,Data quality; difficulty matching disparate data sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,MATLAB/Octave,Text Mining,Matlab,Google Search,"Blogs,Company internal community,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,Somewhat useful,,,,,,,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Machine Learning Engineer",Self-taught,50,0,50,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,"Bayesian Techniques,SVMs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,20,20,0,10,50,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Unavailability of/difficult access to data",,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Other,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United States,51,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,,Nice to have,,,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,"DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Biology,Less than a year,"Programmer,Other",Other,0,75,0,0,25,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,22,"Not employed, but looking for work",,,,,,,,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,55,0,30,0,10,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Female,Iran,23,Employed part-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Proprietary Algorithms,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Partially Derivative Podcast,1-2 years,,,,,,,Nice to have,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,Kaggle competitions,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,Somewhat important,,,,,,,,,,,Somewhat important,, +A different identity,Germany,19,"Not employed, but looking for work",,,,,,,,C/C++,Genetic & Evolutionary Algorithms,Python,Google Search,"Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,,,Very useful,Very useful,,1-2 years,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,,,,Other,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Master's degree,Yes,I did not complete any formal education past high school,,I don't write code to analyze data,I haven't started working yet,Self-taught,50,20,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Markov Logic Networks,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Arxiv,Blogs,Kaggle,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Data Scientist,Researcher",Self-taught,80,0,20,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Internet-based,10 to 19 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",Often,,Sometimes,Often,,Often,Most of the time,,,,,Sometimes,,Sometimes,,Sometimes,,,Most of the time,Often,Often,,Sometimes,,Often,,,,Often,,,,,10,40,20,20,10,0,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,Often,Sometimes,,,Sometimes,,,,,Sometimes,,100% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,100000,CAD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,I don't plan on learning a new ML/DS method,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,50,20,0,20,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,10 to 19 employees,Decreased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,R,SQL,Tableau",,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",Often,,,,,Often,Most of the time,Often,,,,,,Rarely,Often,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,40,5,5,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,Often,Sometimes,,100% of projects,Entirely internal,Standalone Team,Experian; Dun and Bradstreet; Census; TomTom;,Understanding what it means because often the client we are working with is not that familiar with it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",FTP,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,70000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,20,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,100 to 499 employees,Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Always,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,,,,,,,Often,,,,Often,,,,,,,Often,,,,,,,,,,,15,5,70,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,Often,,Often,,,,,,,Most of the time,,,,,,Most of the time,,100% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Rarely,,,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Programmer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Social Network Analysis,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Company internal community,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",1,33,33,33,0,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Retail,20 to 99 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Image data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees","C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression",,,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,10,10,10,10,60,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues",,,,Sometimes,Often,Often,,Sometimes,Sometimes,,Sometimes,,,,Often,,Sometimes,,,,,,10-25% of projects,More internal than external,Other,,Getting consistent reporting structures. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,50000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,Other,Self-taught,10,5,25,50,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",I prefer not to answer,Academic,"10,000 or more employees",Stayed the same,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Relational data,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,75,0,0,20,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",Most of the time,Sometimes,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,Most of the time,,10-25% of projects,Entirely internal,Other,Future project with EAB to gain deeper insight to enrolled students with a deeper understanding to consumer behavior,"Not sure yet, ask in a year","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Never,"101,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,,Decision Trees - Random Forests,A professional degree,Technology,"5,000 to 9,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Other,Workstation + Cloud service,"Image data,Text data,Relational data",Never,1GB,,"Amazon Web services,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL",,Most of the time,,,,,,,,,,,Rarely,,Often,,Often,,,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,30,30,0,40,0,0,,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,,,,,,,,,60000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Anomaly Detection,R,University/Non-profit research group websites,"College/University,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,A humanities discipline,1 to 2 years,I haven't started working yet,University courses,10,10,0,75,5,0,"Machine Translation,Speech Recognition,Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Female,United States,28,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,SQL,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Somewhat useful,Not Useful,Very useful,Very useful,Very useful,Very useful,,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,6 to 10 years,"Researcher,Statistician",University courses,10,0,10,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,Traditional Workstation,Text data,Most of the time,100MB,Regression/Logistic Regression,"C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,Other",,,,Rarely,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,Rarely,,,,,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Rarely,,,,,,,,Often,,,,,Often,,,,,,Often,,Often,Most of the time,,,,5,20,0,15,60,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Sometimes,,Most of the time,,,,,,Sometimes,Sometimes,Often,Most of the time,,,,Rarely,Most of the time,Often,Rarely,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,48000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed part-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,MARS,R,,"Friends network,Online courses,Stack Overflow Q&A",,,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Predictive Modeler,Other",Work,0,0,100,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100GB,Other,"Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,Often,,,,,Rarely,Rarely,,,Most of the time,,,Rarely,,,,Most of the time,,,"Lift Analysis,Prescriptive Modeling",,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,70,20,5,0,5,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,Sometimes,,Often,,,,,,,,,,,,,,Often,,Often,,Less than 10% of projects,More internal than external,Other,geodemographic data; credit data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,150000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Not Useful,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,20,10,40,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Financial,100 to 499 employees,Increased significantly,More than 10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",,,,"Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM Cognos,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,Sometimes,,,Rarely,Sometimes,Rarely,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,Sometimes,,,Most of the time,,,,"A/B Testing,Data Visualization,Lift Analysis,Logistic Regression,Segmentation",Often,,,,,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,Most of the time,,,,,,,,20,0,30,5,5,40,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Sometimes,Most of the time,,Most of the time,,,Sometimes,Sometimes,,,,Often,Often,Most of the time,,Most of the time,Most of the time,Most of the time,,,,26-50% of projects,Entirely internal,Standalone Team,,"special values, missing data, unreliable delivery, resource constraints for querying central DB","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,107000,CAD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,53,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Regression,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Newsletters,Online courses,YouTube Videos",,,,,,,,Somewhat useful,,,Very useful,,,,,,,Very useful,"FlowingData Blog,KDnuggets Blog",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,"Coursera,DataCamp,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,A humanities discipline,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Egypt,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,NoSQL,Time Series Analysis,Java,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,0,0,0,100,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10MB,Other,Java,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,100,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT",Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,Less than 10% of projects,,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,Sometimes,"265,000",EGP,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Online courses,Personal Projects",,Very useful,,Very useful,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,3 to 5 years,Other,Self-taught,50,10,30,0,10,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,QlikView,R,SQL,Tableau",,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Logistic Regression,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,,,,25,5,0,40,30,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,Often,,Most of the time,,,,,,,,,,,,Sometimes,,Often,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Rarely,"125,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Hungary,42,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,Not Useful,,,,,,Very useful,,Very useful,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",15,75,10,0,0,0,,"Decision Trees - Random Forests,Logistic Regression",,Technology,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Other,Basic laptop (Macbook),"Text data,Relational data",Sometimes,,,"Google Cloud Compute,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Text Analytics",Rarely,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,40,10,10,30,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,,5,,,,,,,,,,,,,,,,,, +Male,Switzerland,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,Python,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,Very useful,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Miner,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,0,49,49,2,0,"Natural Language Processing,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs",A master's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Evolutionary Approaches","Java,MATLAB/Octave,NoSQL,Perl,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,Sometimes,,,Often,Often,,Sometimes,,,,,,,,,Often,,,,,,Most of the time,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,10,20,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,,,,,,,,,,,Often,,,Sometimes,Often,,,26-50% of projects,Approximately half internal and half external,IT Department,Swiss Open Government Data,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Natural Language Processing,Logistic Regression,A bachelor's degree,Government,"5,000 to 9,999 employees",Decreased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,,Often,,,,,,,,,Often,,Often,Most of the time,,Often,,Often,,,,,,Sometimes,,,,,40,30,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,Business Department,"Bloomberg,factiva, CapitalIQ, Rueters,",Cost of the data and vision of the management,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,100000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Colombia,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,,Somewhat useful,,Very useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,30,20,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",Relational data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Mathematica,Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Most of the time,Most of the time,Often,,,,,,Often,,Often,,Often,Often,Often,,Most of the time,Often,,,,,,Sometimes,Sometimes,,,,30,30,15,20,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,Often,Sometimes,,,Often,,100% of projects,Approximately half internal and half external,IT Department,"Research dataset results, IEEE Datasets , Kaggle",Understand the context and the clean process,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,3500000,COP,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Italy,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,66,Retired,,,Yes,,Statistician,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,R,,Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,,Work,40,0,60,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",R,University/Non-profit research group websites,"Blogs,College/University,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Not Useful,Very useful,,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,I haven't started working yet,University courses,45,25,5,25,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Mix of fields,Fewer than 10 employees,,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Rarely,,Regression/Logistic Regression,"NoSQL,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,,,50,5,0,30,15,0,Enough to tune the parameters properly,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Rarely,,,Sometimes,,,,,,,,Often,,51-75% of projects,Approximately half internal and half external,Other,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Bitbucket,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed part-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,11 - 39 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Biology,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,Germany,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,,,,,Very useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Researcher,University courses,30,30,10,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,100TB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,Rarely,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,Often,Sometimes,,Often,,,,,,Most of the time,,,,,70,20,5,2.5,2.5,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Often,Most of the time,,,Sometimes,,,,,,Most of the time,,,,100% of projects,More internal than external,Other,none,formating and parsing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Colombia,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Personal Projects,Tutoring/mentoring",,,Very useful,,,,,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,10,5,5,80,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,100GB,"Decision Trees,SVMs","C/C++,Microsoft Azure Machine Learning,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering",,Often,,,,Most of the time,Often,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,40,25,10,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Other,Customer data,Authorization to use it,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Git,Other",Sometimes,55000000,COP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Not very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Rarely,100GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",Rarely,,,,,,,,Often,,,,,,Often,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,Often,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",Sometimes,,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,,30,5,40,5,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Rarely,65000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,63,Retired,,,Yes,,Data Scientist,Fine,Employed by college or university,Java,Genetic & Evolutionary Algorithms,Other,"Google Search,Government website","Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,30,0,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,Yes,Other,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Personal Projects",,Not Useful,,,,,Very useful,,,,,Somewhat useful,,,,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,A social science,1 to 2 years,"Data Analyst,Data Scientist",Kaggle competitions,40,10,0,0,50,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Very Important +Male,Brazil,25,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,,University courses,2.5,10,2.5,85,0,0,Time Series,"Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Insurance,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1MB,"Neural Networks,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks,SVMs,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,Often,,,,15,15,5,45,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Sometimes,,,,,Often,,Often,,,,,Rarely,,,10-25% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,Iran,25,"Not employed, but looking for work",,,,,,,,SQL,Deep learning,R,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,Less than a year,Researcher,Self-taught,20,0,0,80,0,0,Time Series,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,Poland,43,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Perfectly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",I don't plan on learning a new tool/technology,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Physics,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A doctoral degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +Female,United States,23,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,Somewhat useful,,Somewhat useful,,,Very useful,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,85,5,5,5,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,Most of the time,,,,,,,Rarely,,,,,,,Sometimes,Most of the time,,,Rarely,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Most of the time,,Most of the time,,,Most of the time,Most of the time,Sometimes,,,,Often,,Sometimes,,Most of the time,,Often,Most of the time,,Sometimes,Sometimes,Sometimes,,,,,,Often,Often,,,,40,15,5,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,Sometimes,Often,,,Most of the time,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,100% of projects,More internal than external,Standalone Team,zodiac,bottleneck on data access volume and speed,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,87500,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,C/C++/C#,University/Non-profit research group websites,"Conferences,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,,Self-taught,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Laptop or Workstation and private datacenters,Text data,Most of the time,1PB,"Bayesian Techniques,Decision Trees,Neural Networks","C/C++,MATLAB/Octave,Perl,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Simulation",,,Often,,,Sometimes,Most of the time,,,,,,,Sometimes,,,,Sometimes,,Often,Sometimes,,,,,,Most of the time,,,,,,,50,10,10,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,Sometimes,,,Sometimes,,,,Most of the time,,,,,,Often,,,,100% of projects,Approximately half internal and half external,Other,,"Size, Data transfer rates, Lack of documentation","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,"Google Drive, scp","Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,167000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Genetic & Evolutionary Algorithms,Python,Government website,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,Master's degree,A humanities discipline,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,Spark / MLlib,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,10,5,0,5,0,Natural Language Processing,"Bayesian Techniques,Neural Networks - RNNs",Primary/elementary school,Government,,,,,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,Neural Networks,"Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,,,,,"Decision Trees,Logistic Regression,Neural Networks,RNNs",,,,,,,,Often,,,,,,,,Often,,,,Often,,,,,Often,,,,,,,,,15,15,40,15,15,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,Less than 10% of projects,Entirely external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,39000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Italy,44,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,I don't plan on learning a new tool/technology,Social Network Analysis,Python,Google Search,"Conferences,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,Somewhat useful,,,,,Very useful,,Very useful,,Somewhat useful,,,Very useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,University courses,40,0,60,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Image data,Sometimes,10TB,"Bayesian Techniques,Decision Trees,GANs,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,Sometimes,Sometimes,,,Often,Sometimes,Most of the time,,,Sometimes,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Often,,,,,Often,,Most of the time,,Most of the time,,,,30,50,10,10,0,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,,,,,,,,Often,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,MATLAB/Octave,Other,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Engineer,University courses,20,0,0,80,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","MATLAB/Octave,Orange,Python,R,Other",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Rarely,,Rarely,,Sometimes,,,,,,,,,,,,,,,,Rarely,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Simulation,SVMs,Time Series Analysis",,,,,,Often,Often,,,,,,,Often,,Often,,,,Often,,,,,Sometimes,,Often,Rarely,,Often,,,,20,30,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",Sometimes,,,,,Often,,,Often,,,,,,,,,,,,Sometimes,,76-99% of projects,More internal than external,Other,,"Data Quality, Format","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"105,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Other,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Very useful,,Very useful,,Very useful,,,"DataTau News Aggregator,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist,Other",Self-taught,70,10,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Never,100MB,Regression/Logistic Regression,"Amazon Web services,NoSQL,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis,Other",,,,,,Sometimes,Often,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,Sometimes,Most of the time,,,60,10,0,10,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,,,Often,Often,,Sometimes,,,,,Most of the time,Often,,51-75% of projects,More external than internal,Other,Wikipedia; Twitter; Facebook; web-scraping; League of Legends; Dota 2,Information retrieval from APIs; storing and managing data with cloud services,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,99300,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Argentina,42,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by college or university,Employed by government",TensorFlow,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,"College/University,Conferences,Official documentation,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Engineer,University courses,80,0,0,20,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Important,Other,Other,Other,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Statistics,Mathematica,MATLAB/Octave,Perl,Python,R,Other",,,,Rarely,,,,,,,,Rarely,,,,,,,,Rarely,Often,,,,,,,,,Sometimes,Often,,Rarely,,,,,,,,,,,,,,,,Often,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Often,,,Most of the time,Often,Sometimes,Sometimes,Rarely,,,,Sometimes,,Sometimes,,Sometimes,,Often,Often,,Sometimes,,,Often,Often,Often,,Most of the time,,,,30,60,5,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,,,,,Often,Sometimes,,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Monte Carlo Methods,Python,Google Search,"College/University,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Not Useful,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,,University courses,30,15,50,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,High school,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Python,QlikView,R,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,Sometimes,Rarely,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,35,5,15,40,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,Often,Most of the time,,Often,,Sometimes,Sometimes,Often,,Sometimes,,,Sometimes,Often,,100% of projects,Entirely internal,Business Department,"None, all client data.","Federal data is incomplete, inaccurate, and of poor quality; systems are disparate; security","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,70000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Spain,52,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Personal Projects,Textbook",,,Somewhat useful,,Somewhat useful,,Very useful,,,,,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Researcher,Statistician",University courses,10,0,20,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Minitab,R,SAS Base,SAS Enterprise Miner,Tableau",,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,,,,Rarely,Sometimes,,,,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,Often,,Often,Sometimes,,Often,,,,,,Often,Rarely,,,,50,10,10,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,40000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Personal Projects,Other",,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Yes,Master's degree,Electrical Engineering,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,"Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Other,University courses,30,25,30,15,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,Retail,"10,000 or more employees",Decreased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SAS Base,SQL,Other",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Random Forests,Simulation,Time Series Analysis",,Sometimes,Sometimes,,,Often,Most of the time,Often,,,,,,Often,,,,Often,,,,,Often,,,,Most of the time,,,Most of the time,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,Most of the time,,,,,,Often,,Often,Often,,,,Sometimes,,Most of the time,,76-99% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Rarely,"125,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,0,15,25,50,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,"Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Often,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,Often,,,,,Often,,Sometimes,,,,"Collaborative Filtering,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,SVMs,Text Analytics",,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,Sometimes,,Sometimes,,,,Often,Often,,,,,35,25,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,Often,Sometimes,,,,,,,Often,,Often,,26-50% of projects,Approximately half internal and half external,Central Insights Team,"UCI ML Repository, Data World, Driven Data, Kaggle, CloudAnalytix",The biggest challenge is of course cleaning of data to make it usable. It is a huge process and takes a chunk of time. But definitely it is one to enjoy as I get a lot of insights when I am cleaning the data.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,65000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Friends network,Official documentation,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Other",Work,40,0,60,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Random Forests",Most of the time,,Most of the time,,,Often,Most of the time,,Sometimes,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,30,0,10,25,35,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,,Often,,,,,,,,Most of the time,Often,,,,Sometimes,,,,,100% of projects,More internal than external,Other,,"Scale; we run into capacity and run time challenges on a daily basis. Also, errors in human-entered data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Git",Sometimes,93000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Time Series Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,20,10,10,0,Time Series,"Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,3-5 years,Some other way,Important,Other,"GPU accelerated Workstation,Traditional Workstation",Other,,10GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",Rarely,,Sometimes,,,Often,,,,,,,,Rarely,,Sometimes,,Sometimes,,,Often,,,,,Rarely,,,,Most of the time,,,,50,5,0,15,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,,,,,,,Sometimes,,,,Most of the time,Often,,100% of projects,More internal than external,Other,,Hard to acquire- experiments,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,33000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Spain,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle",,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Hospitality/Entertainment/Sports,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,Other,"IBM Cognos,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Orange,Python,R,SQL",,,,,,,,,,Often,,,,,,,Rarely,,,,,,Sometimes,,Sometimes,,,,Rarely,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,10,5,0,20,66,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,,,,,,,,Most of the time,,,,,Often,,100% of projects,Do not know,Business Department,Facebook;instituye of statistics of Spain;Google Analytics,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"40,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Python,Random Forests,R,"GitHub,Google Search,Government website","Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",70,20,10,0,0,0,,,A master's degree,Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,6,4,3,7,10,70,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,,,,,,,Sometimes,,Often,,Often,Often,Often,,51-75% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,105000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,,"Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Somewhat important,,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important +Male,United States,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,More than 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Other,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Other,Traditional Workstation,Relational data,,100GB,,"Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,,,,Most of the time,Sometimes,,Sometimes,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Sometimes,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,Google Search,"College/University,Company internal community,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,"Business Analyst,Researcher,Other",Self-taught,80,5,15,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1GB,"Decision Trees,SVMs","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,Sometimes,,,Sometimes,Sometimes,,,,,Often,,,Often,,,,60,5,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Scaling data science solution up to full database",Often,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,51-75% of projects,Entirely external,Standalone Team,public data;survey data,Access to all the needed data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,65000,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,NoSQL,"Ensemble Methods (e.g. boosting, bagging)",Python,I collect my own data (e.g. web-scraping),"Blogs,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,Work,25,25,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,,10MB,,"Amazon Web services,Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Rarely,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,,,Most of the time,,,,PCA and Dimensionality Reduction,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,50,25,0,0,0,25,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,None, ,Column-oriented relational (e.g. KDB/MariaDB),Email,,Git,Never,110000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,,"Company internal community,Kaggle,Textbook,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,,,,,Very useful,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Nice to have,,Necessary,Nice to have,Necessary,,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",University courses,20,10,60,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,United States,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,College/University,Conferences,Online courses,Stack Overflow Q&A",Somewhat useful,,Very useful,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,PhD,Yes,Doctoral degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Anomaly Detection,Python,"Government website,University/Non-profit research group websites","College/University,Company internal community,Conferences,Online courses,Stack Overflow Q&A",,,Very useful,Very useful,Very useful,,,,,,Very useful,,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher",University courses,20,20,20,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,Rarely,Rarely,Sometimes,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,,,,Often,Most of the time,,,,,Most of the time,,,,Often,,,Most of the time,,Often,,,,,,,,Most of the time,,,,,40,30,10,10,10,0,Enough to tune the parameters properly,Limitations of tools,,,,,,,,,,,,,Rarely,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,USD,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,,"Arxiv,Blogs,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"O'Reilly Data Newsletter,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,Other,University courses,15,0,40,40,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Increased slightly,6-10 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",Always,100GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,Often,,,Often,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Sometimes,Often,,,,,Rarely,,,,Sometimes,,,,,,,Sometimes,,,,,,,Often,,,,10,10,60,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",,,,Often,Sometimes,Rarely,,,,Rarely,Rarely,,,Often,Often,,,Often,,,,,Less than 10% of projects,More internal than external,Standalone Team,,"Cleanliness, small size","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,"170,000",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Colombia,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,MATLAB/Octave,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Conferences,Newsletters,Online courses,Textbook",,,,,Somewhat useful,,,Somewhat useful,,,Very useful,,,,Very useful,,,,"O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",70,20,0,10,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,High school,Pharmaceutical,,,,,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Rarely,<1MB,"Neural Networks,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,R,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,Logistic Regression,Neural Networks",,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team",Most of the time,,,,,Often,,,,,,,Often,,,Most of the time,,,,,,,76-99% of projects,More internal than external,Standalone Team,,ninguno,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,24000,USD,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Time Series Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Kaggle",Somewhat useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,,University courses,30,0,50,20,0,0,Time Series,Neural Networks - CNNs,High school,Academic,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Other",Sometimes,1GB,"Bayesian Techniques,Neural Networks",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,30,10,20,10,30,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Other,Rarely,160000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Other,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),Other","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,,Nice to have,Nice to have,,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,30,0,0,10,0,60,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important +Female,United States,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle",,Very useful,,,Somewhat useful,,Very useful,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,,Work,0,10,60,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Other,,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,,,Often,Often,,,Often,,,,Often,,Sometimes,,Often,Often,,Often,,,,,Sometimes,,,,,,60,30,5,5,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),"College/University,Conferences,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",34,33,0,33,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Basic laptop (Macbook),Text data,Never,100MB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,Sometimes,Sometimes,Rarely,,,,Rarely,,,,Often,,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,,,,,Rarely,Often,Sometimes,,,,35,20,0,5,40,0,,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,More internal than external,Other,social media data,difficult to understand data without reading it but dataset is too big to read it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,32000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Not Useful,,Not Useful,,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Statistician,Other",Self-taught,80,0,0,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Video data,Relational data",,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Julia,Jupyter notebooks,MATLAB/Octave,Python,R,Stan,Unix shell / awk",Often,Often,,Most of the time,,,,,,,,,,,,Often,Often,,,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Simulation,SVMs,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,Often,Often,,,Often,Most of the time,Sometimes,,Often,,Sometimes,,,,,Often,,,,Most of the time,Often,,Most of the time,,,,15,35,0,25,25,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,,,,,Most of the time,,Often,Often,,,,,,,,,,,100% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,R,"GitHub,Google Search,University/Non-profit research group websites",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Time Series Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Somewhat useful,Not Useful,,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,65,5,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,"Neural Networks,Random Forests,Regression/Logistic Regression,Other","Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,Often,,,,"Data Visualization,Ensemble Methods,Time Series Analysis",,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,50,5,25,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,Sometimes,,Often,Most of the time,,Often,,,,,,Most of the time,Sometimes,,Sometimes,,,,Often,,100% of projects,Approximately half internal and half external,Other,N/A,Permissions,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,Sometimes,"85,000",USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,Very useful,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,20,5,30,25,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,SVMs",,,Sometimes,Often,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,Most of the time,Often,,,,Often,Most of the time,,,Most of the time,,,,,,30,20,5,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,"Git,Mercurial",,1200000,TWD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,Self-taught,90,5,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,20,10,50,0,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,Often,Most of the time,,,Often,,Often,,Often,,Sometimes,Most of the time,Often,Most of the time,,Often,Often,,,,Sometimes,Most of the time,Often,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",,Sometimes,,Often,Often,Often,,Most of the time,,,,,,,Sometimes,,,Sometimes,,,Often,,76-99% of projects,More internal than external,Standalone Team,None,The cleanliness and label quality. Several of our data sources exhibit systemic bias towards certain labels. It is unclear if this is intentional/unintentional or malicious. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"90,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,46,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by government,Jupyter notebooks,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,30,20,0,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,Government,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,Regression/Logistic Regression,"MATLAB/Octave,R,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Simulation",,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,10,30,10,50,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,Sometimes,,,,,,,,Most of the time,Most of the time,,,100% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,100000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,59,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,Stan,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Very useful,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,25,10,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Stan,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,Rarely,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Random Forests,Simulation",,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,Often,,,,,,,60,10,5,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Sometimes,Sometimes,,76-99% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Github; Google Drive; Dropbox,"Bitbucket,Git,Other",Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,20,10,50,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,,,,,Often,,Sometimes,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,Text Analytics",,,,,,,Often,Sometimes,,,,,,Sometimes,,Often,,,Most of the time,Most of the time,,,,Sometimes,,,,,Often,,,,,10,30,40,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,Most of the time,,,Often,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Google drive,Git,Sometimes,45000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Personal Projects,YouTube Videos",,,,Very useful,,,Very useful,,,,,Very useful,,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,,Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important +Male,United Kingdom,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,"Data Analyst,Researcher",Self-taught,50,40,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Internet-based,10 to 19 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,Regression/Logistic Regression,"IBM SPSS Statistics,Jupyter notebooks,Python,Tableau",,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Segmentation",,,,,,,Often,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,5,0,0,10,50,35,Enough to run the code / standard library,"Lack of data science talent in the organization,Privacy issues",,,,,,,,,Often,,,,,,,,Sometimes,,,,,,51-75% of projects,More internal than external,Other,Mass survey data; UK Census data,Sample representation,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Git,Sometimes,"80,000",GBP,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Google Search,"Arxiv,Blogs,College/University,Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,,,Not Useful,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Programmer,Software Developer/Software Engineer,Statistician",Work,10,0,10,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),SAS Base,Spark / MLlib,SQL,Stan,Unix shell / awk",Sometimes,Sometimes,,Sometimes,,,,Often,,,,,,,Most of the time,,Rarely,,,,Often,,,Sometimes,,,Sometimes,,,,Sometimes,,Most of the time,,Rarely,,,Rarely,,,Sometimes,Often,Most of the time,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Most of the time,,,Sometimes,Most of the time,Sometimes,Sometimes,,,Sometimes,Often,Often,,Most of the time,,,,,Often,,Often,,,,Most of the time,,,Most of the time,,,,15,60,0,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Often,,,,,,Often,,,,,,,,,Often,,,Most of the time,,100% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,62000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Google Search,"Blogs,Online courses",,Somewhat useful,,,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,Tableau,Other,Other",,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,Sometimes,Most of the time,,Most of the time,,,,Often,,,,Often,Often,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation",,Often,,,Often,,Most of the time,Most of the time,Sometimes,,,Sometimes,,,Often,Most of the time,,,,,Sometimes,Sometimes,Often,Often,,Often,,,,,,,,40,30,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,,,,Sometimes,,,,,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,,,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,Git,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,,NA,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,Cluster Analysis,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,,,,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Researcher",University courses,40,0,0,60,0,0,"Natural Language Processing,Speech Recognition,Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,10 to 19 employees,,,,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Other",Rarely,100GB,"CNNs,HMMs,Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,HMMs,Neural Networks,RNNs,Time Series Analysis",,,,Often,,Sometimes,Most of the time,,,,,,Often,,,,,,,Most of the time,,,,,Most of the time,,,,,Often,,,,30,20,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,,,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Genetic & Evolutionary Algorithms,Python,GitHub,"Arxiv,Textbook",Somewhat useful,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Engineer,Researcher,Software Developer/Software Engineer",Self-taught,60,0,30,10,0,0,"Computer Vision,Time Series","Evolutionary Approaches,Neural Networks - RNNs",High school,Other,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Don't know,1GB,"CNNs,Evolutionary Approaches,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,Spark / MLlib",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,"CNNs,Data Visualization,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Simulation",,,,Rarely,,,Often,,,Often,,,,,,,,,,Sometimes,Often,,,,,,Often,,,,,,,20,5,70,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Often,,,,Often,,,,,,,,,,,,Often,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Company Developed Platform",,Other,Never,600,JPY,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Bayesian Methods,R,Government website,Other,,,,,,,,,,,,,,,,,,,"Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - GANs",High school,Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Neural Networks","IBM Cognos,IBM SPSS Statistics,Microsoft SQL Server Data Mining,NoSQL,R",,,,,,,,,,Often,,Often,,,,,,,,,,,,,Most of the time,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,"Decision Trees,Neural Networks",,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,10,20,30,10,30,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Other",Self-taught,75,25,0,0,0,0,Time Series,,"Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,<1MB,,"IBM Cognos,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL,Tableau",,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,,,,Sometimes,Often,,,,55,15,0,10,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,Often,,,,,,,,,,,,Often,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Share Drive/SharePoint",,,Never,55000,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,63,"Not employed, but looking for work",,,,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Conferences,Kaggle,Online courses",,,,,Very useful,,Somewhat useful,,,,Very useful,,,,,,,,,15+ years,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer",Self-taught,20,50,30,0,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,India,21,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,Very useful,Very useful,,,,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,KDnuggets Blog",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,41,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,University/Non-profit research group websites,"Conferences,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog,Talking Machines Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,edX,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,I don't write code to analyze data,Other,University courses,10,40,5,40,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Kaggle,Official documentation,Online courses,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,40,10,0,50,0,0,Computer Vision,"Bayesian Techniques,Evolutionary Approaches,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Image data,Most of the time,1GB,"Neural Networks,SVMs","C/C++,MATLAB/Octave",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Neural Networks,Segmentation,SVMs",,,,,,Rarely,,,,,,,,Often,,,,,,Often,,,,,,Often,,Often,,,,,,5,75,10,10,0,0,Enough to refine and innovate on the algorithm,Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,100000,PLN,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,10,50,25,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,Other","Amazon Web services,Java,Jupyter notebooks,Python,Spark / MLlib,SQL",,Sometimes,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Sometimes,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,Time Series Analysis",Sometimes,,,,Sometimes,Often,,,,,,,,,Sometimes,Often,,Sometimes,,,Often,,,Sometimes,,,,,,Sometimes,,,,20,30,10,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process",Often,,,,Often,,,Often,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,150000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,YouTube Videos,Other",,,,,,,Very useful,,,,,,,,,,,Very useful,Data Machina Newsletter,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,20,80,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,India,23,"Not employed, but looking for work",,,,,,,,Other,Neural Nets,Python,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,,Necessary,Nice to have,,Necessary,Necessary,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,40,0,0,40,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Not Useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,30,10,10,0,20,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important +Male,United States,38,"Not employed, but looking for work",,,,,,,,R,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,Tutoring/mentoring,Other",,,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,A humanities discipline,Less than a year,I haven't started working yet,Self-taught,50,0,0,0,0,50,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,44,Employed full-time,,,Yes,,Engineer,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,Government website,"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A humanities discipline,More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,35,10,35,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",High school,Technology,100 to 499 employees,Increased significantly,More than 10 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,HMMs,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,KNIME (free version),Python,R,SQL,Tableau",,,,,,,,,,,,,,,Rarely,,Most of the time,,Often,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,,Often,Sometimes,,,,,Sometimes,Sometimes,,,,Sometimes,Most of the time,,Sometimes,,Sometimes,,,Often,,,Most of the time,,,,,35,0,0,20,45,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,dirty data; variation in the text data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,"98,000",USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,I don't plan on learning a new tool/technology,Proprietary Algorithms,Python,I collect my own data (e.g. web-scraping),Textbook,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Statistician",University courses,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,Fewer than 10 employees,Stayed the same,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Always,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,NoSQL,Perl,Python,QlikView,R",Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Rarely,Most of the time,Rarely,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Often,,Sometimes,Most of the time,,Often,Often,Sometimes,,Sometimes,Rarely,Sometimes,Sometimes,Most of the time,Often,Often,Sometimes,Often,Most of the time,Often,Sometimes,Often,,Often,Most of the time,Most of the time,Often,Most of the time,,,,25,25,5,10,35,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,250000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Monte Carlo Methods,Python,Government website,"College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,,,,,,< 1 year,,Nice to have,Nice to have,,Necessary,Necessary,Necessary,Nice to have,,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,,,University courses,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,Netherlands,50,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,I don't plan on learning a new ML/DS method,Python,Google Search,College/University,,,Very useful,,,,,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,,No,Some college/university study without earning a bachelor's degree,Computer Science,,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Canada,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Stan,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,Somewhat useful,"DataTau News Aggregator,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,1 to 2 years,"Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",10,20,10,50,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Increased slightly,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,SQL",,Often,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Segmentation",,,,,,Often,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Often,,,,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Sometimes,Often,Often,Sometimes,,Sometimes,Often,,,,,Sometimes,Most of the time,,Most of the time,,,Most of the time,Often,,100% of projects,More internal than external,Standalone Team,,The data is located in 100s of databases. ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,70000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Somewhat useful,,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",University courses,20,20,20,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased significantly,,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100TB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Cloudera,Flume,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Java,Julia,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,Perl,Python,QlikView,R,RapidMiner (free version),SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau,Unix shell / awk",,Most of the time,,Sometimes,Sometimes,,Sometimes,,Often,Rarely,Often,Sometimes,Sometimes,Rarely,Often,Rarely,Often,,Rarely,,Rarely,Sometimes,Sometimes,,Sometimes,Rarely,,,,Rarely,Sometimes,Rarely,Most of the time,,Often,,Sometimes,Sometimes,Sometimes,,Most of the time,Most of the time,,Sometimes,Often,,,Sometimes,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,,,Most of the time,Often,Most of the time,Often,Sometimes,Most of the time,,,,Often,Sometimes,Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Often,Sometimes,,Most of the time,Most of the time,Often,Sometimes,Most of the time,,,,35,20,5,10,20,10,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,Often,,Most of the time,,Most of the time,,,,Most of the time,,Often,,Most of the time,,76-99% of projects,Entirely internal,Central Insights Team,"Open datasets: international bodies (I.e. UN, world bank etc..) states & cities open database.",Finding complete datasets & access right.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"170,000.00",,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,University courses,50,20,0,30,0,0,Time Series,"Bayesian Techniques,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Colombia,23,"Not employed, but looking for work",,,,,,,,Julia,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Textbook",,,Very useful,,,,Very useful,,,,,,,,Very useful,,,,No Free Hunch Blog,1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Physics,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,United States,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Google Cloud Compute,Deep learning,Python,Google Search,"Blogs,Company internal community,Personal Projects,Stack Overflow Q&A",,Very useful,,Very useful,,,,,,,,Very useful,,Very useful,,,,,"Data Stories Podcast,Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Scientist,Researcher,Software Developer/Software Engineer,Statistician",Work,40,0,40,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Video data,Other",Most of the time,100MB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,Most of the time,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,,Most of the time,,,Often,Often,,,,,,,Often,,Often,,Often,,Often,Often,,,,Often,Most of the time,,Often,,Most of the time,,,,20,30,20,10,10,10,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,none,understand the data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Rarely,"170,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,15,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),FlowingData Blog,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,Norway,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,R,"GitHub,University/Non-profit research group websites","Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Telecommunications,"5,000 to 9,999 employees",Increased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Statistics,Python,R,SQL,TensorFlow,TIBCO Spotfire",,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Rarely,Rarely,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",Sometimes,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,Often,,,,Sometimes,,,,50,10,5,5,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Often,,Often,,,,,,,Sometimes,,,,,Often,,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Sometimes,835000,NOK,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Manufacturing,"10,000 or more employees",Stayed the same,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,,,,"IBM Cognos,R,SQL",,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,40,0,0,20,40,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Other,Sometimes,60000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,38,Employed part-time,,,Yes,,Data Analyst,Poorly,Employed by government,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer,Other",University courses,8,10,5,75,2,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Government,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Decision Trees,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression",,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,25,10,10,10,45,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,Often,,Sometimes,Sometimes,,,,,Often,Often,,Most of the time,,Sometimes,Often,Most of the time,,51-75% of projects,More internal than external,Central Insights Team,,Getting it - privacy is important and considered a mayor risk.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,65000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,,Somewhat useful,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Other,Self-taught,80,0,0,0,20,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,R,Google Search,"Blogs,Conferences,Personal Projects,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,,,,,,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Data Miner,Other",Self-taught,50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Segmentation,Simulation,Time Series Analysis",,Sometimes,,,,,Most of the time,Often,,,,,,,Often,Most of the time,,,,,,,,,,Sometimes,Often,,,Often,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,,Often,,Sometimes,Sometimes,,100% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,rmarkdown; shiny,Bitbucket,Always,100000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Switzerland,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Very useful,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",University courses,15,80,0,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Insurance,"5,000 to 9,999 employees",Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,10GB,Random Forests,"Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,TIBCO Spotfire,Unix shell / awk",,,,,Often,,,,Often,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,,,Sometimes,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Text Analytics",,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,20,30,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,,,,Often,,Often,,Most of the time,,Most of the time,,Often,,Often,,,,10-25% of projects,More internal than external,Business Department,weather;geocoding;population info,access them since store in SAP or many other system,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,share disk,Git,Sometimes,"130,000",CHF,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,,,,,,,,,,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,,Very Important,Very Important,,Very Important,Not important,Somewhat important,,Somewhat important,Somewhat important +Female,United States,26,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,30,20,20,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Retail,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,Often,Most of the time,,Most of the time,Most of the time,Often,,Most of the time,Most of the time,,Often,Often,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,,,,Sometimes,,100% of projects,More internal than external,Central Insights Team,Dunns Industry Data,Could expand the current data universe to include other relevant valuable data.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,88000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Podcasts",,Somewhat useful,,,,,,,,,,Very useful,Not Useful,,,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,I haven't started working yet,Self-taught,20,10,50,20,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Bayesian Techniques,,Academic,Fewer than 10 employees,Stayed the same,More than 10 years,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,MATLAB/Octave,R",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction",,,Often,,,Sometimes,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,20,10,30,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,30000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Tableau,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",Often,,,,,,Most of the time,Most of the time,,,,,,,Often,Most of the time,,,,,Sometimes,,Rarely,,,Often,,,Sometimes,,,,,30,10,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",Often,,,,Most of the time,Sometimes,,Rarely,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,26-50% of projects,More internal than external,Central Insights Team,Cortera; Emailage; Yelp; Yellow pages,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,160000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Amazon Web services,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,6 to 10 years,Data Analyst,University courses,50,5,20,15,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Always,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,Most of the time,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Segmentation",,,,,,Sometimes,Often,Often,Often,,,Most of the time,,,Most of the time,Often,,,,,,,,,,Sometimes,,,,,,,,50,10,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Most of the time,,,,,,,,Often,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Machine Learning Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Increased slightly,Less than one year,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,100MB,"Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Often,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,SVMs,Text Analytics",,,,,,Often,Often,,,,,,,Sometimes,,Often,,Often,Often,Sometimes,,,,,,,,Sometimes,Often,,,,,30,10,50,5,5,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,None;,Labeling raw data.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,Always,54080,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,43,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Not Useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Physics,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,85,0,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,75,5,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,,,"Hadoop/Hive/Pig,Impala,Java,Microsoft Excel Data Mining,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,Most of the time,,,Sometimes,,,Most of the time,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,60,0,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Often,,,Sometimes,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,140000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Canada,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,,"FlowingData Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,30,0,0,50,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Germany,38,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,Talking Machines Podcast",5-10 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher",University courses,15,40,0,15,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,50,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Time Series Analysis,Python,Government website,"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,5-10 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Other,Sort of (Explain more),Doctoral degree,Mathematics or statistics,,"Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,,Not important,Somewhat important +Male,United States,56,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Predictive Modeler",Work,5,5,65,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM Cognos,KNIME (free version),Mathematica,Python,R,RapidMiner (free version),SQL,Tableau",,Sometimes,,,,,,,,Rarely,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,Sometimes,,Sometimes,,Rarely,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Rarely,,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,Sometimes,,Sometimes,,Often,Sometimes,,,Often,Sometimes,,,Sometimes,,,,50,15,5,5,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Most of the time,,Most of the time,Sometimes,,Sometimes,,,,,,,,,,,,Often,Sometimes,,100% of projects,More internal than external,Business Department,,"Data hygiene, scrubbing. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,205000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Podcasts,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Researcher",University courses,5,25,25,25,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,Fewer than 10 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,Often,,,,Often,,,,,Rarely,Rarely,,Often,,,,,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,Often,Often,,,,Rarely,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,SVMs",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,,Often,Often,,,,,,,,,,,,Often,,,,,,30,15,30,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Often,Rarely,Often,Most of the time,Sometimes,,Often,,,,,,,Often,,,,Often,,Rarely,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,ftp,"Git,Other",Always,"104,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Portugal,42,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,"Predictive Modeler,Researcher",Self-taught,30,40,20,5,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Text data,Relational data",Always,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","MATLAB/Octave,R,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,,,,Most of the time,Most of the time,,,Most of the time,Often,,Most of the time,,Most of the time,Most of the time,,,,,50,20,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"40,000",EUR,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Amazon Machine Learning,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Other",Self-taught,20,0,70,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Microsoft Excel Data Mining,QlikView,R,RapidMiner (commercial version),SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Rarely,Rarely,Most of the time,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Often,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,Often,,Sometimes,Rarely,Rarely,Sometimes,Sometimes,,Often,,Sometimes,Sometimes,Sometimes,,,,25,25,10,25,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,,,Most of the time,Often,,,,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,Often,Often,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,250000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Data Scientist,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Conferences,Friends network,Official documentation,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Very useful,,,Very useful,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,,,,,,Very useful,"Data Elixir Newsletter,DataTau News Aggregator,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",Self-taught,40,20,30,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,,,,Most of the time,,,,Often,,,,,Often,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Most of the time,Rarely,,,Sometimes,Often,Most of the time,Often,Sometimes,,,,,Often,,Often,,,Sometimes,,Sometimes,,Often,Sometimes,,Often,,,Rarely,Rarely,,,,50,5,25,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Sometimes,Most of the time,,,,Sometimes,Sometimes,,,,,,,,Rarely,Often,,Sometimes,,,51-75% of projects,More internal than external,Business Department,,reliability on delivery,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint,Other",ftp,Git,Never,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Canada,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Hadoop/Hive/Pig,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,Researcher,Work,20,0,70,10,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,I don't know,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data",Never,10GB,"CNNs,Neural Networks,RNNs","C/C++,Java,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,Text Analytics,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,,Sometimes,,,,Often,,,,,,Often,Most of the time,,,Most of the time,Often,Most of the time,,,Sometimes,Often,,,,50,20,0,30,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of significant domain expert input,Limitations of tools",,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Other,FTP/WebDAV/SCP servers,"Git,Subversion",Most of the time,120000,BRL,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Czech Republic,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,SQL,Deep learning,R,University/Non-profit research group websites,"Arxiv,College/University,Friends network,Stack Overflow Q&A",Very useful,,Very useful,,,Very useful,,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,"Researcher,Statistician",Self-taught,70,0,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,100 to 499 employees,Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,,,Bayesian Techniques,"Julia,Mathematica,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Time Series Analysis",,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,0,80,0,20,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,,I don't typically share data,,,Rarely,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,67,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Matlab,Google Search,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Software Developer/Software Engineer,University courses,5,5,80,10,0,0,Machine Translation,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Other,Fewer than 10 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,<1MB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,Often,,,,,,,,,,,,,,,,,Often,,Often,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Logistic Regression",,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,40,20,20,20,0,0,Enough to explain the algorithm to someone non-technical,Limitations in the state of the art in machine learning,,,,,,,,,,,,Often,,,,,,,,,,,26-50% of projects,Entirely internal,Business Department,physiobank,inconsistency,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Always,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Russia,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,PhD,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Statistician",Self-taught,50,30,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,United States,51,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Other",Other,50,40,10,0,0,0,,,"Some college/university study, no bachelor's degree",CRM/Marketing,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,Tableau,Unix shell / awk",,Often,,,,,,,Often,,,,,,,,Most of the time,,,,,,Sometimes,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,"A/B Testing,Lift Analysis,Logistic Regression,Prescriptive Modeling,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,,,,20,0,0,40,40,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,Most of the time,,,,,Often,,,,Most of the time,Sometimes,,,Most of the time,Most of the time,,,100% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,140000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,,Matlab,Other,"Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,50,0,0,5,35,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Other,Rarely,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,,,,,,Often,,,,40,10,10,40,0,0,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,,10-25% of projects,Entirely external,Standalone Team,,,,Email,,,Never,63000,INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Predictive Modeler,Statistician,I haven't started working yet",University courses,10,60,0,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Other",,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,,,,,"DataTau News Aggregator,KDnuggets Blog",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity,Other",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Engineer,Self-taught,50,20,20,0,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important +Female,Other,31,Employed full-time,,,Yes,,Other,Fine,Employed by non-profit or NGO,R,Monte Carlo Methods,Stata,"Government website,I collect my own data (e.g. web-scraping)",College/University,,,Very useful,,,,,,,,,,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,Researcher,University courses,10,10,20,60,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression",A master's degree,Non-profit,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,,Bayesian Techniques,"IBM SPSS Statistics,Microsoft Excel Data Mining,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,10,10,0,50,30,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,51-75% of projects,More internal than external,Other,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,,,24000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Not Useful,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,Other,Self-taught,100,0,0,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,A bachelor's degree,Insurance,I don't know,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Random Forests,Segmentation,Simulation",,,Sometimes,,,,Often,Sometimes,,,,,,Sometimes,Often,,,,,,,,Sometimes,,,Most of the time,Sometimes,,,,,,,50,10,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",Often,,,,Most of the time,,,,Most of the time,,,,,,Often,Most of the time,,,,,,,51-75% of projects,More internal than external,Other,"Census, Weather, Crime",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,,Rarely,150000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,KNIME (free version),Monte Carlo Methods,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,"R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Statistician",Self-taught,30,20,10,30,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,Less than one year,Some other way,Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Don't know,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,Sometimes,Often,,,,Often,Often,,,,,,,,Most of the time,,,,,,Often,,,,,Often,,,Most of the time,,,,10,50,10,20,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,Often,,,,,,,Often,,Most of the time,Most of the time,,Most of the time,,,,Often,Often,,,51-75% of projects,More internal than external,Other,financial data; banking data; hospital data,unexhausted data collection ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"48,000",GHS,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Portugal,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Kaggle,Online courses,Textbook,YouTube Videos",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),40+,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important +Female,United States,29,Employed full-time,,,Yes,,Researcher,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Predictive Modeler",Work,30,5,50,5,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Rarely,Sometimes,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Sometimes,Rarely,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,,,Rarely,Often,Sometimes,Sometimes,,Sometimes,,,,Often,,,Often,,,,20,10,45,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Most of the time,,,,,,Often,,,,,,,Sometimes,Sometimes,Often,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",s3,Git,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,R,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by non-profit or NGO,Employed by government",Python,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Podcasts",,Somewhat useful,Very useful,Not Useful,,Very useful,Very useful,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,40,0,0,30,30,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,<1MB,"Decision Trees,Random Forests","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Decision Trees,Random Forests",,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,50,20,0,30,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,,,,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,51-75% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Other,Sometimes,115000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Regression,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Personal Projects,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,6 to 10 years,"Business Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees",,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,5,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,Other,Python,,"Arxiv,Personal Projects,Other",Very useful,,,,,,,,,,,Very useful,,,,,,,"FastML Blog,Jack's Import AI Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Other,Kaggle competitions,30,10,30,0,30,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Python,R,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,CNNs,Cross-Validation,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process",,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Other,,,,,,Git,Most of the time,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Self-employed",Java,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Not Useful,"KDnuggets Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,3 to 5 years,Researcher,Self-taught,30,10,30,25,5,NA,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,Rarely,Most of the time,Most of the time,Often,Sometimes,,,Often,,Often,,Sometimes,,,Often,Often,Often,,Sometimes,,Often,,,Rarely,Often,Often,,,,30,30,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,Sometimes,,,,,,Often,,,,Often,,,100% of projects,More internal than external,Standalone Team,,explaining why variables must be encoded a certain way,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",generic cloud file sharing,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,75000,CAD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Other,Other,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,6 to 10 years,"Business Analyst,Data Analyst",Work,60,20,10,5,5,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",CRM/Marketing,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Stan,Tableau",,Often,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Segmentation,Simulation,Time Series Analysis",Rarely,,Rarely,,,,Often,,,Sometimes,,,,Sometimes,,Often,,Sometimes,,,,,,,,Rarely,Rarely,,,Most of the time,,,,20,20,45,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Often,,,,,Rarely,,,,,,,,,Sometimes,,,,51-75% of projects,Entirely internal,Standalone Team,weather;cinema;economic;holidays;events;,cleaning and merging the numerous data sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint,Other",Shared drive,Git,Sometimes,60000,GBP,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Canada,40,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,,"Business Analyst,Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Female,Other,31,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Java,Anomaly Detection,Java,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Linear Digressions Podcast,O'Reilly Data Newsletter",5-10 years,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,No,Master's degree,Computer Science,I don't write code to analyze data,Researcher,University courses,20,10,NA,60,10,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Russia,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by non-profit or NGO,Orange,Deep learning,Matlab,GitHub,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,NA,NA,10,0,"Computer Vision,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Manufacturing,"1,000 to 4,999 employees",Stayed the same,Don't know,An external recruiter or headhunter,Not at all important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,Never,,"Decision Trees,Regression/Logistic Regression","MATLAB/Octave,Microsoft Excel Data Mining,Orange,Python",,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,,,"Logistic Regression,Random Forests",,,,,,,,,,,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,90,5,1,1,3,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Rarely,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,None,Do not know,Other,Don't have it ,Develop software which will find how many and kinds of defects on material surface ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Sometimes,25000,RUB,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Scientist,Other",Self-taught,10,60,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,Rarely,,Sometimes,,,,,Often,Sometimes,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Sometimes,Sometimes,,,,Often,Most of the time,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,,,,Often,,,,60,15,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,,,,Sometimes,,,,,Sometimes,Often,,Sometimes,,,Most of the time,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,Weather; Credit Card Compilers,Ad-hoc classification hierarchies; externally maintained datasets (excel workbooks),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,97500,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,21,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Genetic & Evolutionary Algorithms,R,I collect my own data (e.g. web-scraping),Blogs,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Statistician",University courses,30,0,40,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","NoSQL,Orange,Perl,Python,R,SAS Base,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,Often,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,,,,Often,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Random Forests",,,Often,,,,Most of the time,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,15,30,10,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Need to coordinate with IT,Organization is small and cannot afford a data science team",,Often,,,Often,Often,,,,,,,,,Most of the time,Most of the time,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,,I don't typically share data,,Git,Sometimes,56000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Mexico,31,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,,,,,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important +Male,United States,34,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,GitHub,"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,,,,Somewhat useful,,3-5 years,,Necessary,Necessary,,,Necessary,Nice to have,Nice to have,,Nice to have,,,,,Traditional Workstation,2 - 10 hours,PhD,No,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Researcher",University courses,20,0,0,80,0,0,Time Series,"Evolutionary Approaches,Hidden Markov Models HMMs","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,United States,71,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Python,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,15,40,5,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Time Series Analysis",,Sometimes,,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,,Often,,,Often,Often,Sometimes,,Often,Often,,,Often,,,Most of the time,,,,40,15,10,25,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,Often,,Sometimes,Sometimes,,,,Often,,Sometimes,,,Often,,,,,,,,,100% of projects,More external than internal,Standalone Team,,"Source data is mostly out of ERP software, customers aren't uniform in using the tool as it should be. This leads to dirty data. Second of all, ERP customers aren't recipient for machine learning algorithms to prescribe decisions.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Git",Sometimes,"90,000",USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,Uplift Modeling,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Conferences,Official documentation,Personal Projects",,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,40,0,20,40,0,0,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,Often,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Rarely,,,,Sometimes,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,Often,,,,20,40,20,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,,160000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Canada,29,Employed part-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Neural Nets,Python,GitHub,"Arxiv,Blogs,Conferences,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,0,10,20,70,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by government",Julia,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Not Useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,25,30,5,0,10,Natural Language Processing,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",High school,Government,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation",Text data,Sometimes,10GB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,Rarely,,,,Often,,Often,,,Rarely,Rarely,,,,,,Sometimes,,,Rarely,Often,,Often,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,HMMs,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics",,Rarely,Sometimes,Sometimes,,Sometimes,Often,,Often,Rarely,,,Rarely,,,Often,Sometimes,Sometimes,Often,Sometimes,Sometimes,,,Rarely,Rarely,,Often,Sometimes,Often,,,,,50,30,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,Sometimes,,,,Often,,Rarely,,,,,,,Sometimes,,Often,,10-25% of projects,More external than internal,IT Department,sensitive data,noisy; preprocessing; small,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",server,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"83,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Researcher,Statistician",University courses,0,60,30,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,Often,,,Sometimes,,,,,,,,,,"Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,SVMs",,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More external than internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Git,Rarely,1000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Brazil,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Very useful,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Master's degree,Computer Science,,"Computer Scientist,Researcher",University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,47,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Not Useful,,Very useful,,Very useful,Very useful,Other (Separate different answers with semicolon),5-10 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important +Female,India,48,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,,"Jack's Import AI Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Unnecessary,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,3 to 5 years,"Data Analyst,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",60,20,0,0,0,20,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,33,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Company internal community,Friends network,Kaggle,Official documentation,Online courses",,,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Statistician",Work,30,10,60,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Increased significantly,6-10 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Association Rules,CNNs,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,SVMs,Time Series Analysis",Rarely,Rarely,,Often,,,,Often,Most of the time,,,,,Often,,Often,,,,,Often,,Most of the time,Sometimes,,,Most of the time,Often,,Sometimes,,,,60,20,5,5,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,Sometimes,,Most of the time,,,Sometimes,,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,Often,Sometimes,,10-25% of projects,More internal than external,Other,"eurostat data, quandl, world bank, google open datasets","transformations, explanation of the data in details","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,25000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed part-time,,,No,Yes,Other,Perfectly,Self-employed,TensorFlow,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",5,80,5,5,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Company internal community,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,Very useful,Somewhat useful,,,,,,Very useful,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,,University courses,20,10,40,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Insurance,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,R,RapidMiner (free version),SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,Often,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,,,Often,,Often,,Often,,,,Often,,Often,Often,,,,,Often,,Often,,,,40,20,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,,,Often,,Most of the time,,,Most of the time,Most of the time,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,75000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,YouTube Videos",,,,,,,Very useful,,Very useful,,,,,,,,,,"Data Elixir Newsletter,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Sometimes,100MB,Regression/Logistic Regression,"C/C++,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,20,60,20,0,0,0,Enough to tune the parameters properly,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,"Git,Subversion",Sometimes,1600000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,,,,,,Very useful,,,,Very useful,,,Very useful,,,Very useful,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Electrical Engineering,,"Computer Scientist,Engineer,Researcher",Work,NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Not important,Somewhat important +Male,United States,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,,Very useful,Very useful,Very useful,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician",University courses,20,0,60,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Perl,Python,R,SAS Base,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,Most of the time,,Most of the time,,,,,Rarely,,,,Sometimes,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",Often,,Sometimes,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Most of the time,,Sometimes,Rarely,,,,Sometimes,,,,,Sometimes,Often,,,,,30,10,0,20,10,30,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,Often,,,,,,,,,,,,Most of the time,,Most of the time,Sometimes,,Most of the time,76-99% of projects,More internal than external,Other,"I work in biological/medical research, so I frequently use public-use knowledgebases, such as: the LANL HIV sequence database; CATNAP; GenBank; KEGG; BioMart; IEDB; etc.",Comprehension and provenance. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Labkey Server; Dropbox (alas),"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,"120,000",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Greece,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Amazon Web services,Monte Carlo Methods,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Tutoring/mentoring",Very useful,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Video data",Sometimes,1TB,"CNNs,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,Other",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Collaborative Filtering,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,,Most of the time,Sometimes,,,Sometimes,Often,,,,,Often,,,,,,Often,Most of the time,,,,Often,Often,,Often,,,,,,0,50,30,10,10,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Sometimes,,,,,,Often,,,,Often,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,25000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Web services,Genetic & Evolutionary Algorithms,,Government website,"Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,3-5 years,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Other,University courses,40,10,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,R,University/Non-profit research group websites,"College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Programmer,Researcher",University courses,10,10,20,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Other,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Often,,,,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Random Forests",,,,,,Rarely,Most of the time,,Sometimes,,,,,,,Sometimes,,,,,,,Rarely,,,,,,,,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,Often,,Most of the time,Sometimes,,Often,Often,,,,,,,,Most of the time,,Sometimes,,Most of the time,,76-99% of projects,More internal than external,Business Department,,dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,I don't typically share data,Share Drive/SharePoint",,Bitbucket,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses",,,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,University courses,20,20,10,20,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Increased slightly,Don't know,A tech-specific job board,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data",Most of the time,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Mathematica,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Time Series Analysis",,,Often,,,Often,Often,Often,Often,,,,,Often,,Often,,,,,Often,Often,Often,,,,Often,Often,,Sometimes,,,,20,30,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,57,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,KNIME (commercial version),Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Friends network,Kaggle,Online courses,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,"Coursera,Other",GPU accelerated Workstation,11 - 39 hours,Other,Sort of (Explain more),Bachelor's degree,Other,I don't write code to analyze data,"Engineer,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",30,15,0,0,50,5,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,,Somewhat important,Somewhat important +Male,United States,69,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,,"Kaggle,Non-Kaggle online communities",,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,Other (Separate different answers with semicolon),1-2 years,,,,,Nice to have,Nice to have,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,,,Very Important,,,,Very Important,Very Important,,,Very Important,,, +Male,United States,45,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,KNIME (free version),Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle",,,,Somewhat useful,,,Very useful,,,,,,,,,,,,"FlowingData Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Other,University courses,70,0,0,10,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression","KNIME (free version),Microsoft Excel Data Mining,Minitab,R",,,,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,Often,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Often,Most of the time,,,,,,,,,,,,,,Often,,Sometimes,,,,,,Sometimes,,,,,40,10,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,,,,,,,,,Most of the time,Often,,100% of projects,More internal than external,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Researcher,University courses,30,0,20,40,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,Rarely,,,,Sometimes,,Often,,Sometimes,Often,Sometimes,Sometimes,,Often,,,Rarely,,Sometimes,Often,Sometimes,,,,10,30,40,15,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,Most of the time,Sometimes,,76-99% of projects,Approximately half internal and half external,Standalone Team,,pdfs,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,26200,GBP,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Official documentation,Online courses,Personal Projects,Other,Other",,,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,40,20,20,10,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Manufacturing,20 to 99 employees,Increased slightly,6-10 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data,Relational data",Most of the time,1TB,"CNNs,Decision Trees,Neural Networks,Random Forests,SVMs","C/C++,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,Sometimes,Sometimes,Sometimes,Rarely,,,,,,Often,,Often,,,,,,,,,Often,,,Sometimes,Rarely,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Often,Often,,,,,,,,,,,Often,Often,,Sometimes,,,,,Most of the time,,Often,,,,50,15,10,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Often,,,,,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,,Manual labeling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",Sometimes,180000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by college or university,Jupyter notebooks,Text Mining,Python,Google Search,"Friends network,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,Very useful,,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,More than 10 years,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Rarely,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,SVMs",Sometimes,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,,,,30,10,0,40,20,0,,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,Less than 10% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",,42000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Deep learning,,"GitHub,Google Search","Conferences,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,,,,Very useful,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,,University courses,30,30,30,10,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,Rarely,Rarely,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,Sometimes,,,Rarely,,,Often,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",Sometimes,,Sometimes,,Rarely,Often,,Often,Often,,,,,Often,Sometimes,Often,,,Often,,Sometimes,,Often,Sometimes,,Sometimes,,,Sometimes,,,,,30,20,10,10,30,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Rarely,,,Sometimes,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,"Coursera,edX",Other,2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,30,20,20,10,5,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,47,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,"Blogs,Official documentation,Online courses,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,"10,000 or more employees",,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,,,,,Sometimes,Most of the time,Most of the time,,,Most of the time,,,Sometimes,Most of the time,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,37,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,,,5-10 years,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),11 - 39 hours,Other,Yes,Doctoral degree,A social science,,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Ukraine,18,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Genetic & Evolutionary Algorithms,C/C++/C#,University/Non-profit research group websites,"Blogs,College/University,Podcasts,Trade book,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,"Emergent/Future Newsletter (Algorithmia),The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,0 - 1 hour,Online Courses and Certifications,Yes,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,20,20,30,30,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Belarus,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,France,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Machine Learning,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",Rarely,,,,,,,,,,,,Sometimes,,Rarely,,Most of the time,,,,,Rarely,Often,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Often,,,,,Most of the time,Most of the time,Often,Often,,,,,,,,,,Most of the time,,Often,Often,Often,,,,,,Most of the time,Most of the time,,,,55,15,10,10,10,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team",,,,,Often,Sometimes,,,,,,,,,,Sometimes,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Google Search,"Non-Kaggle online communities,Online courses",,,,,,,,,Somewhat useful,,Very useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,"Computer Scientist,Engineer,Machine Learning Engineer",Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,31,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by a company that performs advanced analytics,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook,YouTube Videos",,,,,,Very useful,Very useful,,Very useful,,,Very useful,,,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Decreased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Don't know,10GB,Regression/Logistic Regression,"Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Recommender Systems,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Most of the time,,,,10,50,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,51-75% of projects,Entirely internal,Business Department,,Falta de capacidad de almacenamiento para grandes cantidades de registros. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Always,2400000,BSD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,,,,,,,,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,,,,,,,,,,, +Male,Argentina,27,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Java,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,40,0,50,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Gradient Boosted Machines,Neural Networks","MATLAB/Octave,Python,QlikView,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,,Rarely,,,,,,"Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Simulation,Time Series Analysis",,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,,,,,,,Often,,,Most of the time,,,,75,10,5,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,Sometimes,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,Data flow is a disaster,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Bitbucket,Rarely,15000,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,C/C++,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,75,5,5,10,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"5,000 to 9,999 employees",Increased significantly,3-5 years,A tech-specific job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,10TB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,Rarely,,Rarely,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,Often,,Often,,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",Sometimes,,,Often,,Often,,Sometimes,Sometimes,,,Sometimes,,,,,,,,Sometimes,Rarely,,Sometimes,,Rarely,,,,,,,,,30,30,25,10,5,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Linear Digressions Podcast,5-10 years,,Necessary,,,Necessary,,,,Necessary,,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Yes,Doctoral degree,Computer Science,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Female,United States,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Other,Other,R,University/Non-profit research group websites,"Arxiv,Conferences,Other",Very useful,,,,Very useful,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Machine Learning Engineer,Other",Other,20,0,0,30,0,50,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Mix of fields,"10,000 or more employees",Decreased significantly,1-2 years,An external recruiter or headhunter,Important,Other,"Laptop or Workstation and local IT supported servers,Other",Relational data,Rarely,<1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","MATLAB/Octave,Python,R,RapidMiner (free version),SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,Rarely,,,Rarely,,,,Sometimes,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other,Other,Other",Sometimes,,Often,,Rarely,Most of the time,Most of the time,Often,Most of the time,Often,,Often,,Most of the time,Sometimes,Often,Rarely,Sometimes,Rarely,Often,Often,,Often,Sometimes,Sometimes,Often,Most of the time,Sometimes,Rarely,Most of the time,Most of the time,Often,Most of the time,10,30,0,10,25,25,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,Sometimes,,Sometimes,,,Often,,Often,,Sometimes,,,,51-75% of projects,Do not know,Other,Only UCI repository and others to test new algorithms,"It is not centralized, and some data is still sitting in separate Excel files or paper files. There are no standard IDs used for our students across all systems, so some matching must be done by hand.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Most of the time,101000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,More than 10 years,Data Analyst,Self-taught,25,25,25,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning",,,Other,"5,000 to 9,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Orange,Python,R,SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,Often,,,,,,Rarely,,Most of the time,,Most of the time,,,,Sometimes,,,,,Most of the time,,,Rarely,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,10,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,140000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,NoSQL,Proprietary Algorithms,R,Google Search,"Arxiv,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Other",Self-taught,33,0,34,33,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Decreased significantly,More than 10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","C/C++,Java,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Stan",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,Sometimes,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Sometimes,,,,Sometimes,Sometimes,,,Often,,,,,Often,,Sometimes,,,,,Sometimes,,Most of the time,,,,5,20,20,5,50,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,,Often,Often,Sometimes,,,,,Most of the time,,,,Sometimes,Often,,,,,,Sometimes,,Less than 10% of projects,Entirely internal,Other,,poor development practices,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Never,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Belgium,38,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,"I collect my own data (e.g. web-scraping),Other","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",10,10,70,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A professional degree,CRM/Marketing,10 to 19 employees,Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Rarely,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",,,Sometimes,,,Often,Sometimes,Often,Sometimes,,,Often,,,,Sometimes,,Sometimes,Often,,Sometimes,,Often,,,,Sometimes,,Often,,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,Often,,,,Sometimes,,Sometimes,,,,,Often,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,26000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Other,Bayesian Methods,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,25,25,40,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Internet-based,"5,000 to 9,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Statistics,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SAS Base,SQL,Tableau,TIBCO Spotfire",,Rarely,,,,,,,Sometimes,,,Rarely,,,,,Most of the time,,,,,,,,Rarely,,,,,,Most of the time,,Rarely,,,,,Rarely,,,,Rarely,,,Sometimes,,Rarely,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Often,,,,Sometimes,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,Often,,,,,,,,20,40,15,10,15,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,,76-99% of projects,Entirely internal,Standalone Team,None,Speed,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,132000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Belarus,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher",Self-taught,20,70,5,0,5,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Text data",Rarely,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,TIBCO Spotfire",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,Often,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests",,,,,,Often,,Sometimes,Rarely,,,Sometimes,,,,Sometimes,,,Often,,,,Sometimes,,,,,,,,,,,50,20,15,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Sometimes,,,,,,,,,,Rarely,,,,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,27000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,France,27,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,Very useful,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Master's degree,Computer Science,,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Female,United States,42,"Not employed, but looking for work",,,,,,,,TensorFlow,Anomaly Detection,Python,Other,"College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Not Useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,,,5-10 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,,,"Coursera,DataCamp,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Biology,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Work,25,25,35,0,15,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United Kingdom,27,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,,Data Stories Podcast,1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,Mathematics or statistics,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by college or university,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,40,5,15,40,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Academic,"1,000 to 4,999 employees",Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Always,,Bayesian Techniques,"IBM Cognos,NoSQL,Python,R,SAS Enterprise Miner,SQL",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,Sometimes,,,,,,Rarely,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Time Series Analysis",,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,,,15,40,5,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",Sometimes,,,,Often,,,,Sometimes,,,,Often,,,Rarely,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,50000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,40,30,20,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,Often,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,Often,Often,Most of the time,Most of the time,Most of the time,,,,Sometimes,,Most of the time,,Sometimes,,,Most of the time,Most of the time,Often,,Often,Often,,,,,Most of the time,Sometimes,,,,50,20,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Often,Most of the time,,,,,,,,,,,Most of the time,,,26-50% of projects,Entirely internal,Other,"ImageNet (for Transfer Learning), Wikipedia (for NLP models training)","Its unstructured (text, image) most of the time. We also are in the scenario of small data, gathering it via crowdsourcing, scrapping or logging of users behaviour.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Bitbucket,Most of the time,210000,BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by company that makes advanced analytic software,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Kaggle,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,Very useful,,,Very useful,,,,,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,NA,25,25,0,25,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Technology,20 to 99 employees,Increased slightly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Other",Text data,,1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Perl,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Rarely,Most of the time,,Rarely,,,,,,,,,Rarely,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,Rarely,Rarely,Rarely,Rarely,,,Rarely,,Rarely,,Rarely,,,,,Rarely,,Rarely,,,,,Rarely,Rarely,Rarely,,,,90,5,0,5,0,0,Enough to run the code / standard library,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More external than internal,IT Department,Anything I can find,converting file types,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,67000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by government,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,"Business Analyst,Data Analyst,Researcher,Other",University courses,5,0,5,90,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Rarely,Rarely,,,,,,Rarely,Rarely,,,,,,,,Most of the time,,,,Rarely,,,,,,Rarely,Often,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,Rarely,Rarely,,Often,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,Often,,Sometimes,,,Sometimes,,,,,,,,,,Sometimes,Most of the time,,,,40,20,5,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Most of the time,,,Often,,,,Often,,Sometimes,Often,,Often,,,,,Often,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,120000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Java,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Conferences,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,,,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,0,0,30,70,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Telecommunications,"5,000 to 9,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Random Forests","Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,NoSQL,Spark / MLlib,SQL",,,,,Most of the time,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Decision Trees,kNN and Other Clustering,Random Forests",Most of the time,,,,,,,Often,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,30,50,10,0,10,0,Enough to run the code / standard library,"Dirty data,Privacy issues",,,,,Often,,,,,,,,,,,,Often,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Always,1000000,TWD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,"Google Search,University/Non-profit research group websites","College/University,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,Very useful,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,15,30,40,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk,Other",,Rarely,,,,,,,Often,,,,Rarely,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Sometimes,,,Most of the time,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,,Often,,Rarely,Sometimes,,,Sometimes,Often,,,Often,Most of the time,,Often,Often,,,,70,5,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,Sometimes,,Most of the time,Often,,Most of the time,Rarely,,Most of the time,,,Often,Most of the time,,,,Sometimes,Most of the time,Most of the time,,100% of projects,Entirely internal,Other,"nltk, spacy, google corpus",Finding it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,120000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,18,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,< 1 year,,,,,,,,,,,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,I did not complete any formal education past high school,,Less than a year,Programmer,Self-taught,70,30,0,0,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,60,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by college or university,TensorFlow,Deep learning,Matlab,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,More than 10 years,Researcher,Work,0,0,80,20,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Increased significantly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Text data",Most of the time,100TB,"Bayesian Techniques,CNNs,Ensemble Methods,HMMs,Neural Networks,RNNs,SVMs,Other","C/C++,Mathematica,MATLAB/Octave,NoSQL,Python,R,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,Most of the time,,,,Often,,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Rarely,Sometimes,Often,,Most of the time,Sometimes,,Often,,,,,Most of the time,,,,Sometimes,,Often,Often,,,,Most of the time,Most of the time,,Often,Often,Often,,,,20,30,30,10,10,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,Less than 10% of projects,Entirely internal,Standalone Team,common benchmark sets that reviewers require in publications;,"preprocessing/segmentation of image data is not addressed well in current machine learning. Results on curated overly clean standard data sets are too optimistic and unrealistic. Inappropriate rescaling has been performed on many 'standard' data sets (e.g. anamorphic rescaling towards 256x256 pixels). The machine learning community is rather blind for the underlying problems. People without any fundamental knowledge of image processing do terrible things to the data and are surprised if it does not work. CNNs are fantastic but there is no magic and the old adage, garbage-in/garbage-out still holds strongly.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,120000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Czech Republic,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Data Scientist,University courses,30,30,5,35,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Often,,Most of the time,,Often,,Sometimes,Most of the time,,Sometimes,,,,,Sometimes,Sometimes,Most of the time,,,,30,25,25,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,,,,,,,,,,Sometimes,,,Often,,,100% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Stan,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A professional degree,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,,,,,Often,Sometimes,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Text Analytics",,,Most of the time,,,Most of the time,Most of the time,,,,,Most of the time,,Sometimes,,Sometimes,,,Often,,,,,,,,,,Often,,,,,50,20,10,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,Often,,,Most of the time,,,,Sometimes,Often,,,,,,Often,,,100% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Always,150000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Other,Always,1GB,Regression/Logistic Regression,"SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Random Forests",,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,50,30,20,0,0,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Often,Most of the time,Most of the time,,,,Sometimes,,,,,,,,,,Often,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Not Useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,"Partially Derivative Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",50,30,10,0,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Not important,Not important +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Non-Kaggle online communities,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,10,30,30,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Hospitality/Entertainment/Sports,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,Jupyter notebooks,Python,QlikView,R,SQL,TensorFlow",,Often,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,Often,Often,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Segmentation,Time Series Analysis",Often,,,,,,Often,Often,,,,,,,Sometimes,Often,,,,,,,,,,Often,,,,Often,,,,40,15,5,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,Often,,,,,,,Sometimes,,,Sometimes,,,,Most of the time,,,51-75% of projects,More internal than external,Other,,Lack of consistent data warehousing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,110000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Machine Learning,Social Network Analysis,Julia,University/Non-profit research group websites,"College/University,Conferences",,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,More than 10 years,Other,Work,10,10,80,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,100GB,"Decision Trees,HMMs,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,Mathematica,Microsoft R Server (Formerly Revolution Analytics),R",,Sometimes,,,,,,,,,,,,,,Most of the time,Most of the time,,,Rarely,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,HMMs,Lift Analysis,Logistic Regression,Prescriptive Modeling,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,,Sometimes,Sometimes,,,,,Often,,Often,Often,,,,,,Often,,,,Sometimes,Often,,Rarely,Sometimes,,,,10,50,10,10,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning",,Often,Sometimes,,,,,,,Sometimes,,Often,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,"210,000",USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,57,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Company internal community,Official documentation,Online courses,Personal Projects",,Somewhat useful,,Very useful,,,,,,Very useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,5,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM Watson / Waton Analytics,Minitab,Python,QlikView,R,SQL",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,,,,,Sometimes,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,Most of the time,,Sometimes,,Most of the time,,,,,,Most of the time,Most of the time,,,,70,5,2,10,13,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,,,Most of the time,Often,,,Often,,,Sometimes,,Sometimes,,,,,,,,,100% of projects,More internal than external,IT Department,None currently,Joining normalized tables from different databases to form the necessary data set to answer a business question.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,115000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Tutoring/mentoring",Somewhat useful,Very useful,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,,,Somewhat useful,,"DataTau News Aggregator,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Financial,"10,000 or more employees",Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SQL,TensorFlow",,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,Often,Most of the time,Sometimes,Sometimes,,,Often,,,,Often,,,,,Often,,Most of the time,,,Most of the time,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,,,,,Often,,,,,Sometimes,Most of the time,,,,,Most of the time,,,51-75% of projects,More internal than external,Business Department,,feature engineering,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,120000,TRY,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Poland,38,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Jupyter notebooks,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer",Self-taught,60,10,0,20,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data,Other",Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,C/C++,Cloudera,Flume,Hadoop/Hive/Pig,IBM SPSS Statistics,Impala,Julia,Mathematica,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau,Unix shell / awk",,Rarely,,Often,Often,,Sometimes,,Often,,,Rarely,,Rarely,,Rarely,,,,Rarely,Rarely,,,Rarely,Often,,Often,,,,Sometimes,Often,Most of the time,,,,,Rarely,Rarely,,,Rarely,,Rarely,Rarely,,,Often,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",Sometimes,Often,,Sometimes,,Most of the time,Often,Often,Often,,,,,,Often,Often,,,Often,Sometimes,Sometimes,,Often,,,Often,,Sometimes,,Sometimes,,,,50,20,0,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,Often,Most of the time,Often,,,,,,Often,,Less than 10% of projects,Entirely internal,Standalone Team,census,optimizing data computation to overcome infrastructure limits for existing data volumes,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,I don't typically share data",,Generic cloud file sharing software (Dropbox/Box/etc.),,300000,PLN,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,43,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,Tableau,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Official documentation,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,Very useful,,,,,,,Somewhat useful,,,,Not Useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,Less than a year,Other,University courses,0,0,50,50,0,0,"Time Series,Unsupervised Learning",,A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,100MB,Other,"Hadoop/Hive/Pig,Python,R,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,Often,,,,20,20,0,35,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Often,,Most of the time,,,,,,,,,Often,Rarely,,100% of projects,Entirely internal,Other,Twitter; Met.ie weather data,Asking the right question that the dataset cound answer.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,30000,GBP,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Researcher",University courses,100,0,0,0,0,0,,,A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,Philippines,22,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Other,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,0,5,5,10,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,Very useful,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Engineer",Work,20,0,50,20,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,Often,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,,Sometimes,Sometimes,Often,,,Often,,,,Sometimes,,Sometimes,,Sometimes,Often,Sometimes,Often,,,Sometimes,,Often,,Often,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,Sometimes,,Often,,,,,Often,Most of the time,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint,Other",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,85000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,43,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Other,Python,"Government website,University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Very useful,,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Traditional Workstation,Workstation + Cloud service",11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,"Business Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Very Important +Male,United States,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,,Very useful,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Programmer,Software Developer/Software Engineer",University courses,10,0,10,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Blogs,College/University,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,"FlowingData Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Business Analyst,Software Developer/Software Engineer",University courses,10,0,0,90,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,100MB,,"Java,Jupyter notebooks,KNIME (free version),Python,R,SQL,Tableau",,,,,,,,,,,,,,,Rarely,,Often,,Sometimes,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Rarely,,,Often,,,,,,,"Data Visualization,Prescriptive Modeling,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,,,,80,0,0,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,,Often,Most of the time,,,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,Often,Sometimes,,76-99% of projects,More internal than external,IT Department,census; weather; energy star; AHRI,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,110000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Italy,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Neural Nets,C/C++/C#,University/Non-profit research group websites,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,DataCamp,Traditional Workstation,2 - 10 hours,PhD,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,Julia,Time Series Analysis,R,Government website,"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Podcasts,Textbook",Somewhat useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,,Very useful,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Engineer,Researcher",Self-taught,20,10,60,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SAS Enterprise Miner,SQL,Stan,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,Sometimes,,,Most of the time,Rarely,,Sometimes,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,Most of the time,Sometimes,,Sometimes,,Sometimes,Most of the time,Often,Most of the time,,,Often,,Often,,Often,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,,,Often,,Most of the time,Sometimes,Rarely,,,,,,Most of the time,Often,Often,,,76-99% of projects,More internal than external,IT Department,Esri,Data Cleansing,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,,100000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,SAS Base,Random Forests,SAS,University/Non-profit research group websites,"College/University,Conferences,Friends network,Online courses,Personal Projects,YouTube Videos",,,Very useful,,Somewhat useful,Very useful,,,,,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,"Data Analyst,Data Scientist,Researcher,Statistician",University courses,0,10,40,50,0,0,"Survival Analysis,Time Series",Logistic Regression,A master's degree,Government,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Oracle Data Mining/ Oracle R Enterprise,Other",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Rarely,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Simulation,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,,Often,,,Often,,,,20,20,0,25,35,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"70,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Egypt,33,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,1 to 2 years,Other,Self-taught,50,0,0,0,50,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Academic,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Other,Basic laptop (Macbook),Relational data,,10MB,Other,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,20,0,0,30,50,0,Enough to explain the algorithm to someone non-technical,Need to coordinate with IT,,,,,,,,,,,,,,,Most of the time,,,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Other,Rarely,84000,EGP,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,Work,50,0,50,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Never,<1MB,Other,"Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,,Often,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Data Visualization,Natural Language Processing",,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,60,0,0,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Other",,Often,,,,,,,Often,,Often,,,,,,,,,,,Most of the time,76-99% of projects,More external than internal,Central Insights Team,Product Review Data,Varied Sources,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Sometimes,1200000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, but looking for work",,,,,,,,Other,Support Vector Machines (SVM),Haskell,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,,Necessary,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,R,Google Search,"Company internal community,Kaggle,Podcasts,Stack Overflow Q&A",,,,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,,,,3-5 years,,,,,,,,,,,,,,,,,,No,Master's degree,,I don't write code to analyze data,"Data Analyst,Operations Research Practitioner",Work,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Poorly,Self-employed,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,30,30,20,15,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,Very important,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Video data,Text data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Neural Networks,Random Forests,RNNs,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,Java,Julia,Jupyter notebooks,KNIME (free version),Mathematica,MATLAB/Octave,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,Sometimes,,,,Sometimes,,,,,,Sometimes,Sometimes,Often,,Sometimes,Sometimes,Often,,,,,,,,,Often,Most of the time,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,Most of the time,,Most of the time,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,Most of the time,,Often,Most of the time,Sometimes,,Sometimes,,,Sometimes,Sometimes,,Often,,Often,Most of the time,Most of the time,Often,,Sometimes,,Most of the time,Sometimes,Often,Often,Often,Sometimes,,,,30,40,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,Sometimes,Often,,Sometimes,,Often,Often,,Sometimes,,Most of the time,Sometimes,,Often,Often,,,Most of the time,,,51-75% of projects,More external than internal,Standalone Team,ImageNet;Visual Genome;MS COCO; Flickr; Mexican Census,Storage; Retrieval; Cleaning;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"120,000",MXN,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by college or university",TensorFlow,Deep learning,Java,Google Search,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,More than 10 years,"Researcher,Software Developer/Software Engineer",Self-taught,30,30,10,20,10,0,"Machine Translation,Natural Language Processing,Recommendation Engines","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A doctoral degree,Academic,100 to 499 employees,Increased slightly,1-2 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,HMMs,"IBM Watson / Waton Analytics,Java,Microsoft SQL Server Data Mining,NoSQL,SQL,Tableau",,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Collaborative Filtering,kNN and Other Clustering,Natural Language Processing,Recommender Systems,Text Analytics",Sometimes,,,,Often,,,,,,,,,Often,,,,,Most of the time,,,,,Often,,,,,Most of the time,,,,,40,10,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Often,,,Most of the time,Often,,,Often,,Often,,,,,,,Sometimes,Sometimes,,Most of the time,,Less than 10% of projects,Entirely internal,Standalone Team,,collecting it into one place and then transforming it into quantifiable units,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Most of the time,135000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Stories Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Computer Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100GB,"Decision Trees,Regression/Logistic Regression","IBM Watson / Waton Analytics,Java,NoSQL,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,Often,,,,,,Often,,,,"Decision Trees,Naive Bayes,Prescriptive Modeling,Text Analytics",,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,,,,,,Sometimes,,,,,40,20,1,30,9,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Most of the time,,,,Often,,,,,,,,,,Often,,,,51-75% of projects,More internal than external,Standalone Team,,quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Other,"College/University,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Master's degree,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,27,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites,Other","Arxiv,College/University,Company internal community,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,Very useful,,,,,,,,,,Somewhat useful,Not Useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",5,5,15,75,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,Some other way,Important,Research that advances the state of the art of machine learning,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data,Other",Most of the time,100MB,"Bayesian Techniques,Evolutionary Approaches,HMMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Spark / MLlib",,Rarely,,Most of the time,,,,,Rarely,,,,,,,,Often,,,,Often,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Simulation",,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,35,40,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",Dropbox; Google Drive,Git,Rarely,23000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,,,Very useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,Insurance,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Perl,Python,R,SAS Base,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,Sometimes,Often,,Often,,,,,Sometimes,,,Often,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",Often,,Sometimes,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,40,15,10,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,Sometimes,Often,,Often,,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",Sometimes,95000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,France,29,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Self-employed,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,Very useful,,,Very useful,,,,,FastML Blog,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Github Portfolio,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Female,Republic of China,26,Employed part-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Often,,,,,,,35,25,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues",,,,,Most of the time,,,,,,,,,,,,Often,,,,,,10-25% of projects,Approximately half internal and half external,Business Department,n/a,Original database is unorganized with multiple wrong entries. Cleaning and reconstruct database is time consuming,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"240,000",CNY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Canada,67,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Julia,Deep learning,Python,GitHub,"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,,,,Very useful,"DataTau News Aggregator,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Computer Scientist,Machine Learning Engineer",University courses,5,20,0,55,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Python,R,SAS Enterprise Miner,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,Often,,Often,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs",,,,,,Often,Most of the time,Often,Most of the time,,,Sometimes,,Sometimes,,,,Sometimes,,Most of the time,Sometimes,,Most of the time,Often,Most of the time,Often,,Most of the time,,,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,,,,9,,,,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by government,SAP BusinessObjects Predictive Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Very useful,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,20,30,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Always,10GB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Most of the time,,Most of the time,,Often,,Often,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs",,,,Most of the time,,Often,,Often,,,,,,Most of the time,,Often,,,,Most of the time,Sometimes,,,Sometimes,,,,Most of the time,,,,,,60,10,15,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,,,,,Most of the time,,,,Sometimes,,,Most of the time,,Most of the time,,Sometimes,,None,More internal than external,Standalone Team,"CasiaWeb faces Microsoft 1 billion images feret database image facial expressions vggface dataset",Accessing and filtering noise images in the web for processing,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"30,000",EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Not Useful,,Very useful,Very useful,,Very useful,Very useful,,,,,,Very useful,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Computer Scientist,DBA/Database Engineer,Engineer,Operations Research Practitioner,Researcher,Software Developer/Software Engineer",Self-taught,70,20,0,0,10,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Image data,Text data,Relational data",Never,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,50,0,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,Most of the time,Often,,,Often,Sometimes,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Often,Most of the time,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Never,,MAD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Self-taught,30,20,40,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,1GB,"Ensemble Methods,Gradient Boosted Machines","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics",Sometimes,,,,,Often,Often,,,,,Often,,,,,,,Often,,Sometimes,Often,Sometimes,,,,,,Often,,,,,60,20,10,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",Often,,Often,Sometimes,Most of the time,Sometimes,,,Sometimes,,,,,Often,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Most of the time,163500,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,30,0,0,70,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,45,Employed full-time,,,Yes,,Business Analyst,,,Python,Link Analysis,SQL,GitHub,"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,6 to 10 years,,Self-taught,40,50,5,5,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,I don't know,Increased significantly,Don't know,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,,"Decision Trees,Random Forests","Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Time Series Analysis",,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,50,20,0,0,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Sometimes,,,,,,Most of the time,,Often,,,,,,,Most of the time,,,None,Entirely internal,Business Department,,getting them to believe in it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,65000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,23,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Poorly,Self-employed,C/C++,,C/C++/C#,University/Non-profit research group websites,"College/University,YouTube Videos",,,Somewhat useful,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",0 - 1 hour,Master's degree,No,Bachelor's degree,Other,Less than a year,"Machine Learning Engineer,Programmer,Other,I haven't started working yet",University courses,0,0,0,100,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,Regression,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,0,0,0,0,90,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Technology,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Other,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Segmentation",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,28,2,10,30,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,,,,,,,Often,,,,,,,100% of projects,Entirely internal,Other,,,Other,Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"110,000",USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,SQL,Time Series Analysis,SQL,Google Search,"Blogs,Company internal community,Friends network,Official documentation,Personal Projects,Textbook,YouTube Videos",,Very useful,,Very useful,,Very useful,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Other,Self-taught,20,0,50,10,0,20,,,High school,Mix of fields,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,100GB,,"Amazon Web services,Java,SQL",,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,20,50,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,Often,,,Most of the time,,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,Other,we do not,clean data from our clients,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Always,82500,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,University courses,20,30,0,40,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Often,Often,Sometimes,Often,,,Sometimes,,,,Most of the time,,,Often,,Most of the time,,Sometimes,,,,,Sometimes,Often,,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,,Sometimes,"50,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Canada,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Personal Projects",,,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,,Other,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Java,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,,Very useful,,Very useful,,,,,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Data Scientist,Statistician",University courses,20,0,50,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,Increased significantly,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,SVMs,Time Series Analysis",Rarely,,Rarely,,,Most of the time,Sometimes,Often,Often,,,Often,,Sometimes,,Sometimes,,Rarely,Rarely,Often,Often,Rarely,Often,Rarely,,,Sometimes,Often,,Sometimes,,,,25,25,25,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,Often,,,,,,,Sometimes,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,120000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Argentina,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses",Very useful,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,50,20,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Image data,Always,100GB,"CNNs,Neural Networks,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Most of the time,,Often,Rarely,,,,,,,Sometimes,,Often,,,,Most of the time,Often,,,,,Often,,Often,,,,,,30,50,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning",,,,,Sometimes,,,,,,,Often,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Never,247000,ARS,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,15,80,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - RNNs",A bachelor's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,India,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,,Python,GitHub,"College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,Somewhat useful,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Sweden,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,GitHub,"College/University,Kaggle,Online courses,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,,Very useful,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,60,20,20,0,0,0,"Adversarial Learning,Natural Language Processing","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,1TB,"Gradient Boosted Machines,Neural Networks,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Neural Networks,SVMs",,,,,,Most of the time,Most of the time,,,,,Sometimes,,,,,,,,Often,,,,,,,,Most of the time,,,,,,50,30,5,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Git,Mercurial",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Philippines,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Very useful,,Very useful,Very useful,,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Other,,,,,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Relational data,,,Other,"Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests",Sometimes,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,,,,,70,10,0,10,10,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,"UCI Machine Learning Repository, Kaggle Datasets","Data Mining, Wrangling like missing data, outliers, distribution etc..","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Friends network,Online courses",,,,,,Very useful,,,,,Very useful,,,,,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Canada,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer",University courses,50,10,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,Often,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,,,Sometimes,Most of the time,Most of the time,Sometimes,Often,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,Most of the time,Sometimes,Sometimes,,Most of the time,,,,50,20,0,10,20,0,Enough to tune the parameters properly,"Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,Most of the time,,Sometimes,Sometimes,,Less than 10% of projects,Entirely internal,Other,,"Labelled data, generalization of results acheived using small labelled dataset to big unlabelled data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,70000,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),"Company internal community,Conferences,Friends network,Podcasts,Stack Overflow Q&A",,,,Very useful,Somewhat useful,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Analyst,Researcher",Self-taught,30,10,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Ensemble Methods,Neural Networks,RNNs","Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Microsoft Excel Data Mining,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Often,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,Rarely,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Simulation,Text Analytics",Sometimes,,,,,Often,Often,Rarely,,,,,,Often,,Sometimes,,,Most of the time,Most of the time,,,Rarely,,,Sometimes,Sometimes,,Most of the time,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",,,,,Sometimes,,,,,,,,,Often,Most of the time,,,Often,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,105000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,South Korea,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Link Analysis,R,"GitHub,Google Search","Arxiv,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",,Computer Scientist,Self-taught,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,55,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Other,University courses,15,0,30,50,0,5,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",Other,Most of the time,100GB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,RNNs,Simulation",,,Rarely,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,Often,,,,,Often,,Most of the time,,,,,,,50,10,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Often,,Sometimes,Often,,Often,Sometimes,,,,,,,Sometimes,,,,51-75% of projects,More internal than external,Other,None,Size and scale of problem (high-energy physics uses ~terabytes of data and complex simulations). Understanding all the details of your datasets.,Other,Other,world-wide grid storage + local cluster/node,"Git,Subversion",Most of the time,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,39,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Netherlands,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",40+,Github Portfolio,Yes,Master's degree,Electrical Engineering,,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,Canada,31,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,IBM SPSS Statistics,Time Series Analysis,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Computer Scientist,Self-taught,50,25,0,25,0,0,Time Series,Neural Networks - CNNs,,Academic,I don't know,,,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Rarely,,,"Jupyter notebooks,MATLAB/Octave,Python,R,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,35,20,5,40,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,Often,,Often,,,,,Often,,,,,,Most of the time,,,Often,,Often,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Stayed the same,1-2 years,Some other way,Not at all important,Other,Basic laptop (Macbook),"Image data,Text data,Relational data",Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,,Rarely,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,20,10,10,0,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Git,Most of the time,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,R,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,25,5,0,10,10,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SQL,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Often,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,Sometimes,,Sometimes,,Often,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,Most of the time,,,,25,45,15,10,5,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,Often,,,,Often,Sometimes,,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,350000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,Very useful,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,,,"FastML Blog,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Predictive Modeler,Researcher,Statistician",Self-taught,90,0,0,0,10,0,"Reinforcement learning,Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Insurance,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,HMMs,Other","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Simulation,Time Series Analysis",Often,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,0,50,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Most of the time,Often,Sometimes,,,,Sometimes,Most of the time,,,Most of the time,,Most of the time,,Most of the time,Most of the time,Often,Sometimes,Most of the time,,Less than 10% of projects,Entirely internal,Standalone Team,NDA,Producing actionable insights ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,75000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,Mathematica,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",60,20,0,0,20,0,,,A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Somewhat important,,Traditional Workstation,"Image data,Text data,Relational data",Rarely,10GB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Random Forests,Segmentation,Simulation",,,,,,Sometimes,Often,,,,,,,,,,,,,,,,Rarely,,,Often,Sometimes,,,,,,,50,10,0,10,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Other",Often,,,,,,,,,,,,,,,,,,,,,Often,51-75% of projects,Do not know,IT Department,,,Other,I don't typically share data,,"Bitbucket,Git",Sometimes,72000,BRL,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Software Developer/Software Engineer,University courses,50,10,10,25,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Military/Security,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation",Image data,Don't know,10GB,"Bayesian Techniques,HMMs,Other","MATLAB/Octave,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Naive Bayes,Segmentation,Simulation,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,,,,,,Most of the time,Often,,,,Sometimes,,,,,,,,Often,Sometimes,,,Often,,,,35,20,5,25,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,Often,,,Sometimes,,,,,,,,,,Most of the time,,Often,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,71000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Ireland,52,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,R,Social Network Analysis,R,Google Search,"Blogs,Company internal community,Friends network,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Not Useful,,Not Useful,,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Biology,More than 10 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",60,30,5,0,2,3,,Logistic Regression,A professional degree,Government,"5,000 to 9,999 employees",Increased slightly,3-5 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Most of the time,10GB,Regression/Logistic Regression,"Jupyter notebooks,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation",,,,,,,Often,,,,,,,,,Often,,,,,,Sometimes,,,,Often,,,,,,,,60,30,5,5,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,Often,,Often,,,,Often,,Often,,Often,Sometimes,Often,,Often,,Often,Often,,,Less than 10% of projects,More internal than external,Standalone Team,govt datasets,"understanding it, data integrity","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Portugal,28,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Somewhat useful,Very useful,,,,Very useful,,,Very useful,,,Very useful,Very useful,KDnuggets Blog,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",2 - 10 hours,PhD,Yes,Doctoral degree,Computer Science,3 to 5 years,"Engineer,Programmer,Researcher",University courses,15,30,20,30,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Germany,48,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Other,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Bayesian Techniques,A doctoral degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,United States,42,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,Very useful,,,Very useful,,,Very useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",3-5 years,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX","Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,Physics,,Engineer,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,Adversarial Learning,,A bachelor's degree,Financial,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,1GB,Decision Trees,"Amazon Web services,Java,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"A/B Testing,Decision Trees",Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,60,30,10,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,Most of the time,Often,,,,,,,,,,Most of the time,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,48000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,Somewhat useful,,,Very useful,,,,,Very useful,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer",Self-taught,60,20,0,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Python,R,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Often,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",Most of the time,,,,,Rarely,Most of the time,Rarely,,,,,,,Rarely,Sometimes,,,,,,,Rarely,,,Sometimes,,,,Sometimes,,,,60,5,0,5,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,,,,,,,,Rarely,,,,,,Sometimes,Sometimes,,,76-99% of projects,More internal than external,Central Insights Team,Data scraped from competitor's websites (I work for ecommerce company),Poor documentation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,145000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Personal Projects",,Somewhat useful,,,,Not Useful,Somewhat useful,,,,,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Researcher,Statistician",University courses,15,0,5,75,5,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Decreased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1MB,"Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Segmentation,Text Analytics",Often,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,45,5,0,0,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database",Often,Sometimes,,,Most of the time,Often,,,Most of the time,,Often,,,Sometimes,Sometimes,Most of the time,Often,Sometimes,,,,,10-25% of projects,Entirely internal,Other,US Census,"Lack of automated data pulls, the data pulls are often manual","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,,Rarely,"87,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Czech Republic,23,Employed part-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Official documentation,Stack Overflow Q&A",Somewhat useful,Very useful,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst",University courses,40,0,30,30,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,,,,,Sometimes,,,,,,Rarely,,Most of the time,,,,,Sometimes,,Often,,,Most of the time,,,,Most of the time,Sometimes,Often,,,,,,,,Most of the time,Often,,,Often,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,Sometimes,,Most of the time,Most of the time,Often,,,,Often,,Often,,Often,,Often,Sometimes,Often,Most of the time,,Often,,,Most of the time,,Often,,,,,,65,10,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,,,,,,,,,,Often,Often,Often,,,51-75% of projects,Entirely internal,Standalone Team,,cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Git,Subversion",Rarely,"30,000",,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Random Forests,SQL,Google Search,"Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",University courses,10,5,75,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,500 to 999 employees,Decreased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk,Other",,Often,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,Often,,,Sometimes,Most of the time,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Random Forests",Most of the time,,,,,Sometimes,Often,,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,53,1,1,20,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Sometimes,,Often,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,Rarely,Rarely,Rarely,Sometimes,Most of the time,10-25% of projects,More internal than external,Other,,Validation; instrumentation,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Table in DB,Bitbucket,Rarely,126000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Friends network,Stack Overflow Q&A,Tutoring/mentoring",,,,,Somewhat useful,Very useful,,,,,,,,Very useful,,,Somewhat useful,,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician",University courses,40,0,30,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Retail,Fewer than 10 employees,Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,Sometimes,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Often,Most of the time,Often,,Sometimes,Most of the time,Sometimes,Often,Most of the time,Most of the time,Often,,Most of the time,Often,Sometimes,Often,Most of the time,,,,50,15,10,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,Sometimes,Sometimes,,Most of the time,Often,,Often,Often,,Sometimes,,,,Often,Often,,,,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,R,Bayesian Methods,R,Google Search,"Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",,,,,,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,,"Data Elixir Newsletter,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",University courses,50,10,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,Mix of fields,10 to 19 employees,Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Python,R",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Often,Often,Most of the time,,,,Often,,,Most of the time,,,Most of the time,Often,,,Sometimes,,,,65,10,0,15,10,0,Enough to tune the parameters properly,Data Science results not used by business decision makers,,Rarely,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,100000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,30,0,0,0,0,70,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Canada,56,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,80,5,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Markov Logic Networks",Primary/elementary school,Technology,500 to 999 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Never,10GB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,75,5,0,10,10,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Privacy issues",,,,,Often,Sometimes,,,,,,,,,,,Most of the time,,,,,,76-99% of projects,Entirely internal,Other,None,Obtaining permission to access the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Subversion",Rarely,102000,CAD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Netherlands,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Researcher,,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Online Courses and Certifications,No,Master's degree,Fine arts or performing arts,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,98,0,0,2,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,Denmark,26,"Not employed, but looking for work",,,,,,,,R,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Personal Projects,Textbook",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Github Portfolio,No,Doctoral degree,Mathematics or statistics,Less than a year,Researcher,Self-taught,90,5,0,5,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Female,Other,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Neural Nets,R,GitHub,"Blogs,Conferences,Kaggle,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Mexico,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Time Series Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,60,20,5,5,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",A professional degree,Technology,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,1MB,"Bayesian Techniques,Evolutionary Approaches","Amazon Web services,C/C++",,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Evolutionary Approaches,Naive Bayes,PCA and Dimensionality Reduction",,,,,,,Sometimes,,,Often,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,40,30,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,,,,,Often,,,,,,Often,,,Often,Often,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,500000,MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Spain,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses",,,Very useful,,Somewhat useful,,Very useful,,,,Very useful,,,,,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Researcher,University courses,0,0,20,80,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,500 to 999 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",Image data,Most of the time,1TB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,IBM SPSS Modeler,Jupyter notebooks,Minitab,NoSQL,Python,RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,Often,,,,,,,Often,,,,,,Most of the time,,,,,,,,,Rarely,Often,,,,Most of the time,,,,Often,,,,,,Sometimes,Most of the time,,,Rarely,Often,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",Most of the time,Often,,,Most of the time,Most of the time,Most of the time,Often,,,,Often,,Most of the time,,Often,,Sometimes,,Often,Often,Often,Often,Most of the time,,Most of the time,,Often,Rarely,,,,,20,40,10,10,20,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,Often,,,,,,,Sometimes,,Often,Sometimes,,76-99% of projects,More internal than external,Standalone Team,"Social networks, crms, Medical imagery",Volume and variety,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,,"52,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Google Search,"Blogs,Podcasts,Textbook",,Somewhat useful,,,,,,,,,,,Very useful,,Very useful,,,,"Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,5,55,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Internet-based,500 to 999 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,Often,Most of the time,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Time Series Analysis",Often,,Most of the time,,,Often,Most of the time,,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,,,Often,,,,20,25,25,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Sometimes,,,,,,,,,Often,,,,,,,,,100% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,85000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,65,Retired,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Python,Google Search,"Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,50,25,15,10,0,0,"Natural Language Processing,Survival Analysis","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Genetic & Evolutionary Algorithms,C/C++/C#,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,0,0,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Gradient Boosting",High school,Technology,100 to 499 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,,"QlikView,R,SAP BusinessObjects Predictive Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,"A/B Testing,Naive Bayes,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,40,10,40,0,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Sometimes,325000,MXN,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Textbook",Very useful,,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Software Developer/Software Engineer",University courses,10,15,60,15,0,0,Computer Vision,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Regression/Logistic Regression,RNNs","C/C++,Flume,Google Cloud Compute,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,Often,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,Rarely,Often,,Most of the time,Most of the time,,,,,,,Most of the time,,Sometimes,,,,Most of the time,Sometimes,,,,,Often,,Rarely,,,,,,35,5,20,25,15,0,Enough to refine and innovate on the algorithm,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,,Other,Rarely,"299,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Stan,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,Kaggle competitions,20,0,50,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,Fewer than 10 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,Unix shell / awk,Other",,Often,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Most of the time,Rarely,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,Most of the time,,,,10,20,30,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,Rarely,,Often,Sometimes,,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Never,160000,RUB,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Other",University courses,20,20,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",,A bachelor's degree,Non-profit,10 to 19 employees,Stayed the same,Less than one year,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,100MB,Regression/Logistic Regression,"Amazon Web services,Python,R,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Rarely,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Segmentation",Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,0,0,0,20,80,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,,,,,,Most of the time,,,,Most of the time,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,110000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,40,5,30,20,5,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Decreased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10GB,Ensemble Methods,"Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",,,,,,,Most of the time,,,,,,,Sometimes,,,,,Often,,Often,,,,,Sometimes,,Often,Sometimes,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,9,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Friends network,Kaggle",,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,Software Developer/Software Engineer,Kaggle competitions,50,0,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",No education,Internet-based,100 to 499 employees,Decreased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Other,Laptop or Workstation and local IT supported servers,Text data,,10MB,"Ensemble Methods,Regression/Logistic Regression","Java,Python",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Ensemble Methods,Logistic Regression",,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,transunion;clarity;idanlytics;threatmatrix;,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Git",,2100000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,,,High school,Technology,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Always,10GB,,"C/C++,Java,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python",,,,Most of the time,,,,,,,,,,,Rarely,,,,,,Sometimes,,Sometimes,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,40,0,0,60,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,Often,Sometimes,Sometimes,,,,Often,,Often,,,,,Often,,,Often,,,,100% of projects,More internal than external,Standalone Team,usgs,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,115000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Company internal community,Conferences,Newsletters,Official documentation,Personal Projects,Trade book",,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Very useful,,,,Somewhat useful,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Statistician",University courses,58,0,40,2,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Manufacturing,"5,000 to 9,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,Other",Rarely,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Sometimes,Often,,,,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,Often,Often,Often,Sometimes,Most of the time,Most of the time,Often,Most of the time,Often,,Most of the time,Often,,Often,Most of the time,,,,60,5,5,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,Sometimes,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,Often,Often,Most of the time,,,100% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,175000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs",High school,Technology,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Sometimes,<1MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests","IBM Watson / Waton Analytics,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs",,,,Often,,,,Sometimes,,,,,,,,,,Often,Often,Often,,,Sometimes,,Sometimes,,,,,,,,,40,40,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,None,More internal than external,IT Department,Kaggle;,Cleaning data;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,600000,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,Somewhat useful,Very useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Other",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,30,0,20,20,20,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,22,Employed full-time,,,Yes,,Researcher,Poorly,Employed by professional services/consulting firm,SQL,Text Mining,R,GitHub,Textbook,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,Less than a year,Researcher,Other,10,20,35,0,0,35,Other (please specify; separate by semi-colon),Bayesian Techniques,High school,Other,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100MB,Bayesian Techniques,"Amazon Web services,Hadoop/Hive/Pig,Java,Python,R",,Rarely,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,Association Rules,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,0,20,10,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,,I don't typically share data,,,,35000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses",,Very useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,,,3-5 years,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Doctoral degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,0,75,0,25,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Canada,58,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Python,GitHub,"Blogs,College/University,Kaggle,Online courses,Textbook",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,"Coursera,DataCamp,edX",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Other",Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Other,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Conferences,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Financial,10 to 19 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Java,Jupyter notebooks,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,Rarely,,,,Often,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Simulation,Text Analytics",,Rarely,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,,,Sometimes,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,10,30,5,5,30,20,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,Often,,,,,Most of the time,,,,Most of the time,Often,,Less than 10% of projects,More internal than external,Other,Kaggle; Lending Club; UCI,"Owner don't understand their own data, no data dictionary","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,20000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,YouTube Videos,Other",,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,,,,,,Very useful,KDnuggets Blog,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Doctoral degree,Other,3 to 5 years,"Data Analyst,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,0,0,50,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,United States,66,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other,Other",,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,90,2,0,8,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Other,500 to 999 employees,Stayed the same,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation","Relational data,Other",,,Other,"SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,20,0,10,10,5,55,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,None,Do not know,Other,Google Maps,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,SQL,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,,University courses,0,0,10,90,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Military/Security,500 to 999 employees,Increased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,<1MB,Other,"Mathematica,Python,R",,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Simulation,Text Analytics",,,,,,Rarely,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,Often,,Rarely,,,,,30,30,15,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Sometimes,,Sometimes,Most of the time,,,100% of projects,More external than internal,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,90000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,GitHub,"Arxiv,Blogs,Conferences,Textbook",Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,Predictive Modeler,Work,10,0,90,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,10GB,"Gradient Boosted Machines,Regression/Logistic Regression","C/C++,R,Stan",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Gradient Boosted Machines,Simulation",,,,,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,30,40,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Subversion",,7000000,JPY,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Chile,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,Python,,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Academic,I don't know,,Don't know,A general-purpose job board,Very important,Other,Other,Image data,Never,,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,10,0,0,10,0,80,,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Other,,,Other,I don't typically share data,,Other,Sometimes,25000000,CLP,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Friends network,Kaggle,Newsletters,Stack Overflow Q&A",,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,,Other,30,0,60,5,5,0,"Computer Vision,Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Pharmaceutical,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Image data,Most of the time,100GB,Neural Networks,"Jupyter notebooks,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Neural Networks,Segmentation",,,,Most of the time,,Sometimes,Most of the time,,Sometimes,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,40,20,15,25,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Sometimes,,,Sometimes,,,,,,,,Sometimes,Often,,51-75% of projects,Entirely internal,Other,LIDC; BRATS,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,110000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,"Blogs,Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,6 to 10 years,Researcher,University courses,40,10,40,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,,,,,Rarely,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,,,,,,175000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Not Useful,,,Very useful,"FastML Blog,KDnuggets Blog",5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,No,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Software Developer/Software Engineer",Self-taught,40,40,0,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,South Korea,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Support Vector Machines (SVM),Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Somewhat useful,,Not Useful,,,,,,< 1 year,Unnecessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,,Self-taught,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Text Mining,R,I collect my own data (e.g. web-scraping),"College/University,Friends network,Personal Projects",,,Very useful,,,Somewhat useful,,,,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,Business Analyst,University courses,40,20,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Other,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Don't know,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Often,,,Sometimes,Most of the time,Often,Rarely,,,,,,,Often,,Sometimes,Sometimes,Rarely,,,Often,,,,Sometimes,,Sometimes,Sometimes,,,,70,5,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,76-99% of projects,More internal than external,Standalone Team,EDGAR; PACER; data.gov,source is primarily unstructured text,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,80000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,GitHub,"Blogs,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,1GB,Bayesian Techniques,"NoSQL,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Naive Bayes",Most of the time,,,,,Often,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,30,30,0,35,5,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,76-99% of projects,Entirely internal,Business Department,,Model generator,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Bitbucket,Rarely,30000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Researcher",Self-taught,80,20,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","C/C++,Julia,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Sometimes,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,Rarely,Often,,Most of the time,Most of the time,,,Sometimes,,Sometimes,Often,Often,,Sometimes,,,Sometimes,Most of the time,,,Most of the time,,Most of the time,Often,,Most of the time,,Most of the time,,,,20,30,0,50,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,Often,,Most of the time,,,,,,,,,,Most of the time,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Rarely,"140,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,,Very useful,Very useful,,,Very useful,"Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,40,0,60,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Financial,,,,,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Other",Most of the time,100GB,"Random Forests,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Spark / MLlib,SQL,Unix shell / awk,Other",,Most of the time,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,Sometimes,Most of the time,,,"Bayesian Techniques,kNN and Other Clustering,Naive Bayes,Random Forests,SVMs,Time Series Analysis",,,Sometimes,,,,,,,,,,,Rarely,,,,Most of the time,,,,,Often,,,,,Often,,Often,,,,30,40,10,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,,,Most of the time,,Often,Most of the time,Often,,Often,,,,Most of the time,,,,,Most of the time,,26-50% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,aws,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,1600000,NPR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,0,5,0,95,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,QlikView,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Often,,,Sometimes,,,,Most of the time,Rarely,Most of the time,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,GANs,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Often,,Often,,Often,Often,,Often,,,,Often,Often,,,,,Often,,Often,,,,,Often,,,,,,60,20,10,5,5,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Most of the time,,,,,,Most of the time,,,,,,,,Often,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,4800000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed part-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Random Forests,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,"Partially Derivative Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,,,"Coursera,Udacity,Other",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Other,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",45,45,0,5,5,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression",A professional degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Portugal,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,,"Data Elixir Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,More than 10 years,"Programmer,Researcher",Work,20,20,50,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,Often,,,,Sometimes,Sometimes,,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,Often,Most of the time,,Most of the time,Most of the time,Most of the time,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Git,Other",Sometimes,20000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,Deep learning,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Personal Projects,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,30,10,10,50,0,0,"Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Naive Bayes,Natural Language Processing,SVMs",Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Git,Sometimes,190000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Russia,29,Employed full-time,,,Yes,,Programmer,Poorly,Self-employed,,,,,"Arxiv,College/University,Conferences,Stack Overflow Q&A",Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Programmer,Work,15,30,40,15,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Most of the time,100GB,"CNNs,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Decision Trees,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,Often,,Often,,Sometimes,,,Rarely,Often,,Sometimes,,Rarely,,,,Often,,,Sometimes,,,,,Sometimes,,,,,,40,30,15,15,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Other,,,,,,Bitbucket,Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Portugal,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Researcher",University courses,35,15,25,20,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,Fewer than 10 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Rarely,Sometimes,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,Often,Sometimes,Sometimes,,,,,,,Often,,Rarely,Often,Often,Sometimes,,Sometimes,,,,,Sometimes,,,,,,40,35,5,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Never,"30,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Scientist,Work,20,5,55,15,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Other",Work,30,20,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Enterprise Miner",,Often,,,,,,,,,,,Often,,,,Often,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,Rarely,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests",,,,,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,15,45,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,70000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Korea,31,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Textbook,Tutoring/mentoring",,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,,Very useful,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,,2 - 10 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important +Male,India,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Friends network,Non-Kaggle online communities,Stack Overflow Q&A",,,,,,Somewhat useful,,,Very useful,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,R,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests,Segmentation",,,,,,Often,Often,Often,,,,,,,,,,,,,,Sometimes,Often,,,Sometimes,,,,,,,,30,20,30,10,10,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Scaling data science solution up to full database",,,,,,,,,,Sometimes,Often,,,,,,,Often,,,,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,1800000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts",,,,,,Somewhat useful,Very useful,,Very useful,,Very useful,Somewhat useful,Not Useful,,,,,,"FlowingData Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,Often,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,Often,Often,,,Sometimes,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Segmentation,Text Analytics,Time Series Analysis",Often,,,,Often,Often,Often,,,,,,,,,Often,,,,Sometimes,,,,,,Often,,,Often,Often,,,,15,25,15,10,35,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,,Sometimes,,,,Often,,,,,Often,,,,26-50% of projects,More external than internal,Standalone Team,Fred Data; Bureau of Labor Statistics; Weather data; College Board and education related datasets; Economics indicators; World Health datasets,Data cleaning and normalization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",DropBox,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,103000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,69,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Other",,,,,,,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Operations Research Practitioner,Predictive Modeler,Statistician",Self-taught,50,20,20,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A professional degree,Mix of fields,Fewer than 10 employees,Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,100MB,"Neural Networks,Regression/Logistic Regression","IBM SPSS Statistics,Mathematica,R,SAS Base",,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,,Often,,,,,,Often,Often,Sometimes,,,,Rarely,Often,,,,,Often,Often,,Sometimes,Sometimes,,,,50,15,10,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,Sometimes,,,,Often,,51-75% of projects,More internal than external,Central Insights Team,Census; Lifestyle; Market Research,Quality assessment and cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,80000,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts",,,,,,,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,High school,Financial,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,,1TB,"Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Logistic Regression,Naive Bayes,Neural Networks,Random Forests",,,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,Sometimes,,,Sometimes,,,,,,,,,,,40,45,0,15,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Sometimes,,,,,,,,,,,,,,,Often,,,,Sometimes,Sometimes,Often,76-99% of projects,Entirely internal,IT Department,none,computing power,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,shiny server;documents,Subversion,Never,32000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Friends network,Kaggle,Online courses",,,,,,Very useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Business Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,0,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,I don't know,Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,10GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,,,Sometimes,,,Most of the time,Sometimes,Sometimes,,,,,Most of the time,,Most of the time,,,,Sometimes,,,Most of the time,Sometimes,Rarely,,,Often,Sometimes,,,,,70,10,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Often,,51-75% of projects,Entirely external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Management information systems,3 to 5 years,"Data Analyst,DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",15,60,25,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Increased slightly,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,Often,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Sometimes,Most of the time,Rarely,,,,,,,,Sometimes,,,,,,,Rarely,,,,,Sometimes,Rarely,Sometimes,,,,65,5,10,15,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,Most of the time,,,Often,Often,Often,,,Often,,Most of the time,,,,Often,,,100% of projects,More internal than external,IT Department,,Understanding what the data mean in a business context (lack of a clear semantic layer on top of the data). Cleaning/staging data for analysis.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Other,Rarely,75000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,24,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Online courses,Podcasts",,,Very useful,,Somewhat useful,,,,,,Very useful,,Somewhat useful,,,,,,"Data Stories Podcast,DataTau News Aggregator,FlowingData Blog",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,,Less than a year,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,India,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Business Analyst,University courses,20,10,10,30,30,0,Time Series,Logistic Regression,A master's degree,Financial,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,Regression/Logistic Regression,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,40,20,10,10,20,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,quandl,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,55,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,University/Non-profit research group websites,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",30,50,15,0,5,0,,Logistic Regression,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,Very useful,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"FastML Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,,,Necessary,,,Nice to have,Nice to have,Necessary,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Kenya,24,"Not employed, but looking for work",,,,,,,,SQL,Time Series Analysis,C/C++/C#,University/Non-profit research group websites,"Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,Somewhat useful,,Very useful,,"Becoming a Data Scientist Podcast,The Analytics Dispatch Newsletter",1-2 years,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,20,0,0,70,0,10,Time Series,Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important +Male,Russia,28,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Arxiv,Blogs,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,Somewhat useful,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Physics,,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,49,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",40,50,5,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Technology,100 to 499 employees,Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10TB,Regression/Logistic Regression,"Cloudera,Hadoop/Hive/Pig,Microsoft SQL Server Data Mining,Spark / MLlib,SQL",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,Random Forests",,,,,,,,Often,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,50,20,10,10,10,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,,,,,Often,Sometimes,Often,Most of the time,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,healthcare payers,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Bitbucket,,70000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Data Scientist,Engineer,Researcher",Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Internet-based,100 to 499 employees,Increased significantly,1-2 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,10GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs,Time Series Analysis",,,,,,,,,,,,Often,,Rarely,,Often,,,,,,,Often,,,,,Sometimes,,Most of the time,,,,65,35,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Often,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Researcher",University courses,5,15,25,55,0,0,"Natural Language Processing,Recommendation Engines","Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,Most of the time,,,,,,,Often,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",Often,,,,Sometimes,,Most of the time,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Often,,,,,20,20,15,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,,,Most of the time,Often,,,,,,,,Often,,,,,,,,,76-99% of projects,More external than internal,Central Insights Team,Mattermark,Text cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Sometimes,110000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,23,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Deep learning,R,Google Search,"Arxiv,Other",,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Basic laptop (Macbook),,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,Other,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing",Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,33,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Conferences,Newsletters,Official documentation,Stack Overflow Q&A,Textbook",,,Very useful,,Very useful,,,Not Useful,,Somewhat useful,,,,Somewhat useful,Very useful,,,,"No Free Hunch Blog,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,6 to 10 years,Researcher,University courses,25,0,40,35,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Non-profit,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10GB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Often,,,,Often,,,,,Sometimes,,Rarely,,,,Sometimes,,,Sometimes,,,,50,20,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Sometimes,Often,,Often,Most of the time,,,,Rarely,,Often,,,,Rarely,Sometimes,,,100% of projects,Entirely internal,Standalone Team,,"Many tables in database, get overview over which ones to use for a specific task",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"57,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,32,"Not employed, but looking for work",,,,,,,,R,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,50,0,0,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Brazil,27,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",Julia,Deep learning,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,40,0,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,20 to 99 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,1TB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Google Cloud Compute,Julia,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,Often,,,,,,,,Often,Sometimes,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,Rarely,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,Sometimes,,Often,Often,Sometimes,Sometimes,,,,,Often,,Sometimes,,,,,Often,,Sometimes,,,Most of the time,,Often,,,,,,30,20,20,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,,,,Sometimes,Sometimes,,,,,,Most of the time,,,,Often,,,100% of projects,Do not know,Other,,The quality of the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Most of the time,60000,BRL,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,Tableau,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,The Analytics Dispatch Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,"edX,Other",Basic laptop (Macbook),11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Other,Less than a year,I haven't started working yet,University courses,10,20,70,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst",University courses,15,0,55,30,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,Often,,,,,Rarely,,,Often,Rarely,,,Often,,,,,,,"Logistic Regression,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Rarely,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,26-50% of projects,Do not know,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Git,Rarely,,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,15,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,43,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Researcher,Other",Self-taught,25,50,25,0,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Insurance,100 to 499 employees,Stayed the same,Don't know,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,100MB,Regression/Logistic Regression,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,Rarely,,,,,,Sometimes,,,Often,,,,10,10,0,30,50,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Do not know,Other,"A.M. Best Bestlink, Dataferret","Errors in data, missing data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,180000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,75,Retired,,,Yes,,Statistician,Poorly,Employed by college or university,R,Decision Trees,R,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,"Researcher,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",50,40,0,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Stan,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,Not Useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,,Self-taught,20,10,50,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Academic,I prefer not to answer,,,,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,,10TB,"Bayesian Techniques,Ensemble Methods,Regression/Logistic Regression","C/C++,Python,R,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Often,,,,Most of the time,,Sometimes,,,,,,,Sometimes,,Rarely,,,Most of the time,,,,,,Sometimes,,,Sometimes,,,,33,33,0,0,34,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,,Often,,,,,,,Often,,,51-75% of projects,Approximately half internal and half external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Other",Slack,"Bitbucket,Git",Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,I don't plan on learning a new ML/DS method,SQL,Google Search,"Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,Necessary,Unnecessary,,,"Coursera,DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Somewhat useful,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer,Other",University courses,20,15,15,40,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Other,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,10MB,"CNNs,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",Sometimes,,Often,Sometimes,,Most of the time,Most of the time,,Sometimes,,,,,Sometimes,,Most of the time,,,Most of the time,Often,Often,,Often,,Sometimes,,,Often,Most of the time,,,,,35,20,10,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,Sometimes,Often,,,,,Sometimes,Most of the time,,,,Often,,,Most of the time,,,Often,Most of the time,,51-75% of projects,More external than internal,Standalone Team,dermnet,"We're have doctors annotate data, which takes a long time and is hard to scale/speed up.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,120000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Belgium,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Conferences,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,20,10,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A master's degree,Military/Security,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Most of the time,1TB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,Often,,Sometimes,,,Most of the time,,,Often,,Sometimes,Sometimes,,,,,,Often,,Less than 10% of projects,Do not know,Standalone Team,"ImageNet, VOC, COCO",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Git,Rarely,35000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,France,25,"Not employed, but looking for work",,,,,,,,SAS Base,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,0,20,0,60,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Mexico,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,Very useful,,Very useful,,Somewhat useful,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Miner,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,KNIME (free version),NoSQL,Python,R",,,,,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,,,,,,,,Often,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis",Often,,,,,,,,,,,,,,,Often,,Most of the time,Most of the time,Often,,,,,,,,,Most of the time,Most of the time,,,,50,20,10,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Most of the time,Sometimes,,,,,,Often,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,Politics ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,19000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,SQL,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"Data Elixir Newsletter,DataTau News Aggregator,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,69,0,0,1,0,Supervised Machine Learning (Tabular Data),,A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,Python,GitHub,"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Operations Research Practitioner,Programmer",University courses,40,0,15,40,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,10 to 19 employees,Stayed the same,1-2 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,R,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,,,,,Sometimes,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation,Time Series Analysis",Sometimes,,Often,,Sometimes,Often,Often,,,,,,,Sometimes,,Often,,,,Sometimes,,,Often,,,Often,,,,Often,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Sometimes,,,,,,,,,Often,,Sometimes,,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,"outlier detection, input validation & cleaning","Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,80000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,,Python,"GitHub,Google Search",Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,,,"Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,Do not know,IT Department,,,,,,,,99000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),,Kaggle Competitions,Yes,Master's degree,,,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important +Female,France,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Python,Text Mining,R,,"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,Very useful,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,,University courses,20,0,0,60,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Text data,Sometimes,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,Often,,,,Often,,,,,Most of the time,,Most of the time,,,Often,,,Often,Sometimes,,,,20,50,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,,,,Sometimes,,,,,,Sometimes,,,10-25% of projects,,,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Israel,46,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,DataRobot,Proprietary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,20,0,50,20,10,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Relational data,,10MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Rarely,,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,Sometimes,,Sometimes,,Often,Often,,Most of the time,,,Sometimes,Often,Sometimes,,Most of the time,,,,10,30,30,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,200000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,Not Useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,"FastML Blog,FlowingData Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",Work,30,10,40,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,Rarely,,,,,Rarely,,,Sometimes,Often,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",Often,,Sometimes,,,Often,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,Often,Most of the time,,,,45,5,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,Often,,Most of the time,,,,Sometimes,Often,,,Sometimes,,,Often,Most of the time,,Most of the time,,Often,,51-75% of projects,More internal than external,Business Department,CMS Data,Privacy concerns (HIPAA),"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,120000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Other,"Arxiv,Friends network,Kaggle,Personal Projects,Podcasts,YouTube Videos",Somewhat useful,,,,,Very useful,Very useful,,,,,Very useful,Somewhat useful,,,,,Somewhat useful,"DataTau News Aggregator,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",15,5,15,60,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,Often,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Often,Often,,,Most of the time,Most of the time,Often,Often,,Rarely,Most of the time,Often,Sometimes,,Most of the time,,Rarely,Often,Often,Often,,Often,Rarely,Often,Rarely,,Sometimes,Often,Often,,,,50,5,20,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,Sometimes,,,,,Often,Most of the time,Sometimes,,51-75% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other",Company Developed Platform,,Other,Rarely,125000,USD,Other,7,,,,,,,,,,,,,,,,,, +Male,Denmark,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other",Very useful,,,,,,,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Researcher,University courses,0,0,0,90,0,10,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Always,1GB,,"Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,Often,,,,"Collaborative Filtering,Data Visualization",,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,10,0,50,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Sometimes,,Sometimes,Most of the time,Most of the time,,,,Rarely,,Most of the time,Rarely,,,Rarely,,,100% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,10000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Monte Carlo Methods,Python,GitHub,"Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,30,10,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Ensemble Methods,Gradient Boosting",A doctoral degree,Other,Fewer than 10 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Always,1GB,"Bayesian Techniques,Ensemble Methods,Random Forests","Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,,,Often,,,Often,,,,,,,,,Often,,Often,,,Most of the time,,,,Sometimes,,,,10,30,20,20,20,0,Enough to refine and innovate on the algorithm,"Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Most of the time,,,Most of the time,Often,,,100% of projects,Entirely internal,IT Department,,healthcare security requirements,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Poland,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,,Python,Google Search,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,University courses,50,20,10,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Always,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Perl,Python,R,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,Rarely,,,,,,,,,,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",,,Rarely,Most of the time,,Sometimes,Most of the time,Rarely,Often,,,Often,Rarely,Rarely,Often,Rarely,,Rarely,Most of the time,Most of the time,Sometimes,,Sometimes,,Most of the time,Rarely,,Rarely,Most of the time,,,,,30,20,20,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,none;,typos;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,10000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,R,Cluster Analysis,Stata,Government website,"College/University,Friends network,Personal Projects",,,Very useful,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Fine arts or performing arts,3 to 5 years,"Business Analyst,Researcher",Self-taught,30,40,20,10,0,0,Time Series,Logistic Regression,A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,Some other way,Somewhat important,Other,Basic laptop (Macbook),"Text data,Relational data",Rarely,<1MB,Regression/Logistic Regression,"MATLAB/Octave,Python,R,SAS Base,Other,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,Rarely,,,,,Sometimes,,,,,,,,,,,Most of the time,Most of the time,,"Data Visualization,Logistic Regression,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,Sometimes,,Often,Most of the time,,,,45,30,0,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,Often,Most of the time,Most of the time,,100% of projects,Entirely external,Other,Anything; National Archives; Texas Railroad Commission; USDA Agricultural Records; US Census ,Finding it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Other,25,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,"DataTau News Aggregator,Talking Machines Podcast,The Analytics Dispatch Newsletter",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Management information systems,1 to 2 years,I haven't started working yet,University courses,10,30,50,10,0,0,Natural Language Processing,Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",16-20,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"YouTube Videos,Other",,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,Other,University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Often,,,,Rarely,Often,,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,,Sometimes,,Often,Sometimes,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,Time Series Analysis",Rarely,,,,,Most of the time,Sometimes,Most of the time,Most of the time,,,Most of the time,,Rarely,Rarely,Most of the time,,,,,Rarely,,Most of the time,,Rarely,,Sometimes,,,Most of the time,,,,20,30,30,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,Often,,,Sometimes,,,Often,,,Sometimes,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,fred,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"250,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Not Useful,,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,"Coursera,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,PhD,Yes,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,70,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,20+,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Not important,Not important,Not important,Very Important,Somewhat important,Very Important,Not important +Male,Other,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,,,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,40,0,5,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Sometimes,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,Often,,Sometimes,,Often,,,,,Often,,Rarely,,Often,,,Rarely,Sometimes,Often,,,,15,10,5,10,60,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",,18000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Other,37,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by government,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,Very useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX",Other,2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",10,40,0,50,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important +Male,People 's Republic of China,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Programmer",Self-taught,20,70,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Other,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Ensemble Methods,Random Forests,SVMs,Other","Amazon Web services,Microsoft R Server (Formerly Revolution Analytics),NoSQL,QlikView,R,Other",,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Rarely,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,"Association Rules,Collaborative Filtering,Data Visualization,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Often,,,Sometimes,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,Sometimes,,Often,,Sometimes,Most of the time,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,Often,,,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,Other,,"Build a dataset ""big enough"" ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,"45,000",,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,37,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by college or university,Python,Text Mining,R,Government website,"Friends network,Kaggle,Textbook",,,,,,Somewhat useful,Very useful,,,,,,,,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,Engineer,Self-taught,80,10,10,0,0,0,"Machine Translation,Natural Language Processing,Time Series","Decision Trees - Random Forests,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,"Arxiv,Blogs,College/University,Conferences,Kaggle,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,,,,Very useful,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Data Scientist,Engineer,Researcher",University courses,20,0,30,50,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,,Often,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,70,20,0,0,10,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,67,"Not employed, but looking for work",,,,,,,,SAS Base,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Other",Very useful,Very useful,,,,,Very useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",5-10 years,Nice to have,Nice to have,,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,More than 10 years,"Business Analyst,Computer Scientist","Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Spain,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Time Series Analysis,Python,GitHub,"Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,90,0,10,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,33,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,Google Search,"Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Engineering (non-computer focused),Less than a year,"Operations Research Practitioner,Researcher",University courses,20,50,0,30,0,0,,,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Other,28,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Programmer,University courses,30,0,10,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Often,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Rarely,,,,Sometimes,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,Sometimes,,Often,Most of the time,,,,,,,Sometimes,,Rarely,,,,Most of the time,Rarely,,,,Often,,,,,,,,,50,25,5,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,Sometimes,,,,,Most of the time,,,,Often,,,100% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,22000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Cluster Analysis,R,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Programmer,Software Developer/Software Engineer",University courses,20,10,60,5,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,Fewer than 10 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Text data",Sometimes,10GB,"CNNs,Gradient Boosted Machines,Neural Networks","C/C++,Jupyter notebooks,Python,R,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Natural Language Processing,Random Forests,Recommender Systems",,,,Most of the time,,Most of the time,Often,,,,,,,,,,,,Sometimes,,,,Often,Often,,,,,,,,,,80,3,10,4,3,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,Sometimes,,Sometimes,Most of the time,,Most of the time,,10-25% of projects,,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,Very useful,,Very useful,Very useful,,3-5 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,Yes,Doctoral degree,Other,3 to 5 years,Researcher,Work,10,40,30,15,5,0,Speech Recognition,Hidden Markov Models HMMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important +Female,United States,23,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Programmer,University courses,10,10,10,40,30,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,Tableau",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Neural Networks,Random Forests,RNNs,Text Analytics",,,,,,Often,Often,,,,,,,,,,,,Most of the time,Sometimes,,,Often,,Sometimes,,,,Most of the time,,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,,,Often,Sometimes,,Most of the time,,,,,,,,,Most of the time,Often,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,6 to 10 years,Data Scientist,Self-taught,80,10,5,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,Rarely,,Rarely,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,,Often,Most of the time,Sometimes,Most of the time,,,Most of the time,,,,,,Sometimes,,Sometimes,Sometimes,,Most of the time,,,Sometimes,Often,,Sometimes,Often,,,,20,15,5,20,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",Sometimes,Sometimes,,,Often,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,100% of projects,Entirely internal,Business Department,Census; GIS,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,135000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,30,5,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,Russia,29,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,Python,Google Search,"Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Traditional Workstation,,,No,Master's degree,Engineering (non-computer focused),,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Female,United Kingdom,31,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Social Network Analysis,Java,University/Non-profit research group websites,"Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,,,Very useful,,,Very useful,Very useful,"Data Elixir Newsletter,KDnuggets Blog",3-5 years,,Necessary,Necessary,Necessary,,,,Nice to have,Nice to have,,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,Yes,Master's degree,Computer Science,,"Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,32,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,C/C++,Deep learning,C/C++/C#,GitHub,"College/University,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,,,Very useful,,,,,"DataTau News Aggregator,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Computer Scientist,University courses,25,0,0,75,0,0,Computer Vision,"Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Academic,I prefer not to answer,Increased slightly,More than 10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Sometimes,1GB,"Bayesian Techniques,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Statistics,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),TensorFlow",,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,Often,,Often,,Sometimes,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,,,,,Often,Most of the time,Often,,Often,Often,Most of the time,,Most of the time,Often,,Sometimes,,,Most of the time,Most of the time,Most of the time,,Often,,,,20,50,10,20,0,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,Often,,Often,Often,,,,,,,Often,,51-75% of projects,More external than internal,Central Insights Team,NA,NA,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Bitbucket,Sometimes,32000,RON,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,25,20,0,50,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Image data,Rarely,10GB,"Bayesian Techniques,CNNs,Random Forests","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,SQL,TensorFlow",,Sometimes,,Sometimes,,,,Rarely,Rarely,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Time Series Analysis",,,Often,Sometimes,,Most of the time,Most of the time,Rarely,,,,Rarely,,,,,,,,,Often,,Sometimes,,Rarely,Rarely,,,,Often,,,,10,40,20,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,,Sometimes,Sometimes,,,,Sometimes,,,,,,,26-50% of projects,Entirely internal,Standalone Team,Copernicus;public forestry data from government,Preprocessing;standardization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,22000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Predictive Modeler,Researcher,Other",University courses,60,10,10,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,3-5 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Sometimes,,Most of the time,Most of the time,Often,Often,,,Often,,,,Most of the time,,,,,Often,,Sometimes,,,,,Often,,,,,,30,20,5,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,Often,,Often,,Often,,,,,,,Often,Often,,51-75% of projects,More internal than external,Other,,Small samples and very high dimensionality of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,France,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,,,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher",University courses,15,0,40,40,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,20 to 99 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Most of the time,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,Stan",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,Rarely,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,Sometimes,,Sometimes,,,,,Sometimes,,Most of the time,,,Sometimes,,Often,,Often,,,,65,15,5,10,5,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,Often,Most of the time,Most of the time,,,,,Most of the time,,,Most of the time,Often,,Sometimes,Often,,Most of the time,Most of the time,,Less than 10% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,50000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,"Data Machina Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",Work,20,10,50,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,,"Official documentation,Online courses",,,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,"Data Scientist,Researcher,Other",Self-taught,70,10,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,,Often,,,,Often,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,Rarely,,,Often,,,,,,,Sometimes,,,,20,5,5,15,30,25,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Other",Most of the time,,Sometimes,,,Sometimes,,,Often,,,,Often,,Sometimes,,Sometimes,,,,,Most of the time,76-99% of projects,Entirely internal,Other,,"Long query times, unstable Hadoop infrastructure, lack of labels","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,131000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Rarely,1GB,Other,"Google Cloud Compute,Java,Python,TensorFlow",,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,"kNN and Other Clustering,Natural Language Processing,Time Series Analysis",,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,10,20,70,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,,,,,,,,,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,Eventful.com; planalytics,Customer privacy concerns,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,50000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",< 1 year,Necessary,,Necessary,Nice to have,Necessary,Necessary,,,,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Yes,Master's degree,Computer Science,Less than a year,Programmer,University courses,20,20,10,30,20,0,"Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important +Male,New Zealand,39,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Podcasts,Stack Overflow Q&A",,,,,,,Very useful,,,,,,Not Useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Other,Basic laptop (Macbook),Relational data,Sometimes,1TB,"CNNs,Neural Networks,RNNs","Amazon Machine Learning,Amazon Web services,TensorFlow",Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,50,5,5,10,30,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,Sometimes,,,Most of the time,,,,Sometimes,,,Rarely,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,na,get proper knowledge,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,80000,NZD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Mexico,48,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,R,Google Search,"Arxiv,Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Engineer,Researcher",University courses,20,10,10,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Increased slightly,6-10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Markov Logic Networks,Random Forests","Jupyter notebooks,NoSQL,Orange,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,Rarely,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics",Often,,Often,,Most of the time,Often,Most of the time,Often,Often,,,,,Often,,,,Often,Often,,,,Often,Most of the time,,,,,Often,,,,,40,30,5,5,10,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,,,,Most of the time,,,Sometimes,Often,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,Geographic data,Find the right patterns,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,"800,000",MXN,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Textbook,Trade book,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,20,10,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A master's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Segmentation,Simulation,Text Analytics",,,,,,,Sometimes,Most of the time,,,,Often,,,,Often,,,,Often,,,Often,,,Most of the time,Often,,Often,,,,,10,20,35,5,30,0,Enough to tune the parameters properly,Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,Very useful,Very useful,,,,,,Very useful,Very useful,Very useful,Not Useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,50,7,20,20,3,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Python,R,RapidMiner (free version),SQL,TensorFlow",,Sometimes,,,,,,,Often,,,,,,,,Most of the time,Sometimes,Often,,,,,,,,,,,,Most of the time,,Sometimes,,Rarely,,,,,,,Often,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Sometimes,,Sometimes,Most of the time,Sometimes,,,,Sometimes,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,55,5,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Sometimes,,Sometimes,Most of the time,,Sometimes,,,Often,,,Often,Sometimes,,Often,,Sometimes,Often,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Personal Projects,Podcasts",,Very useful,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,,,"DataTau News Aggregator,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,45,5,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,Telecommunications,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",,10TB,,"Amazon Web services,Hadoop/Hive/Pig,IBM Cognos,Impala,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Perl,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Rarely,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,Often,Rarely,,,Often,,,Sometimes,Often,Sometimes,Often,,,,,,,,Often,Most of the time,,,Most of the time,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Prescriptive Modeling,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,,Often,Most of the time,,Most of the time,Often,,,Sometimes,Sometimes,Often,Most of the time,Sometimes,,Sometimes,,,,Often,,Sometimes,,Often,Often,Sometimes,Often,Often,,,,40,0,0,0,0,60,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Sometimes,Sometimes,Most of the time,Sometimes,,Most of the time,Sometimes,,,,Rarely,Most of the time,Most of the time,,Sometimes,,,Often,Often,,10-25% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,155000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,Not Useful,Somewhat useful,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,,,,,,,,,,,,,,edX,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,Supervised Machine Learning (Tabular Data),Evolutionary Approaches,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,30,15,20,15,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",I don't know/not sure,Other,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Image data,Don't know,<1MB,Decision Trees,"Google Cloud Compute,R",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Simulation,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,10,10,30,50,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues",,,,,,,,,Sometimes,,Often,,,,,,Often,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,70000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,Bayesian Methods,Python,"GitHub,Government website,University/Non-profit research group websites","Friends network,Official documentation,Personal Projects,Trade book",,,,,,Very useful,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,"DataTau News Aggregator,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Analyst,Software Developer/Software Engineer",Work,60,0,35,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk,Other",,Most of the time,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,Often,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Often,Most of the time,Sometimes,,,,,,Rarely,Often,,,,,Often,,Most of the time,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Other",,,Sometimes,,Rarely,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Rarely,26-50% of projects,More internal than external,Standalone Team,Acxiom; Liveramp,It is not predictive,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,120000,USD,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Decision Trees,R,University/Non-profit research group websites,"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Predictive Modeler,Programmer,Software Developer/Software Engineer",Work,20,0,75,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,1TB,"Markov Logic Networks,Regression/Logistic Regression,SVMs","Java,Python,R",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Markov Logic Networks,Natural Language Processing,Segmentation,SVMs,Text Analytics",Often,,,,,,Often,,,,,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,25,50,0,0,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,,,,Often,,,,,,Often,,,,,Most of the time,,,10-25% of projects,More internal than external,Central Insights Team,,It is too disparate,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Git,Subversion",Rarely,85000,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Colombia,27,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Neural Nets,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Very useful,Very useful,,,,"Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Predictive Modeler",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Other (please specify; separate by semi-colon)",A master's degree,Academic,I don't know,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data,Other",Most of the time,10MB,"Decision Trees,Ensemble Methods,Random Forests,Other","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,Rarely,Rarely,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,Time Series Analysis",,Rarely,Sometimes,,,Often,Most of the time,Most of the time,Often,,,,,Most of the time,,,,,Rarely,Sometimes,,,Often,,,,,,,Most of the time,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Rarely,,,,,,,,Rarely,,,,,Rarely,Rarely,,,76-99% of projects,More internal than external,Other,Public weather data,Data space transformation,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Other",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,28000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Hungary,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Social Network Analysis,Python,Google Search,"Conferences,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer",Self-taught,60,0,40,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",,Financial,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,Perl,Python,R,SAS Base,SQL",,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,Sometimes,,,,,Sometimes,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Simulation,Text Analytics",Most of the time,Often,,,,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,Sometimes,,,,,Sometimes,,,Sometimes,,Often,Sometimes,,Sometimes,,,,,40,15,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Often,Often,,,Often,,,Sometimes,,Often,,,Sometimes,,,,Most of the time,,,,Often,,51-75% of projects,More internal than external,Central Insights Team,,data quality issues,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,9720000,HUF,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Python,Monte Carlo Methods,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Engineer,Researcher",Self-taught,100,0,0,0,0,0,,,A master's degree,Government,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Never,,,"C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Simulation",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,50,0,0,40,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,,Often,,Often,Often,,,,,Often,Often,Often,Often,,,Often,Often,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,Python,Bayesian Methods,R,GitHub,Textbook,,,,,,,,,,,,,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Other",Self-taught,60,10,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R",,,,,,,,,,,,,,,,,Often,,,,,Often,,Sometimes,Rarely,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Recommender Systems",Sometimes,,,,Sometimes,Often,Often,Often,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,300000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,26,"Not employed, but looking for work",,,,,,,,R,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,,1-2 years,,,,,,,,,,,,,,Coursera,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,70,0,10,0,20,Other (please specify; separate by semi-colon),Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,Microsoft R Server (Formerly Revolution Analytics),Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,More than 10 years,"Data Analyst,Engineer,Programmer,Researcher",University courses,0,25,25,50,0,0,Time Series,"Evolutionary Approaches,Markov Logic Networks",Primary/elementary school,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Random Forests,Regression/Logistic Regression","C/C++,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Neural Networks,Random Forests,Segmentation",,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,Most of the time,,,,,,,,30,25,0,45,0,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,Often,,,,,,,Most of the time,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,25000,BRL,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Singapore,26,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,Coursera,GPU accelerated Workstation,40+,Kaggle Competitions,Yes,Bachelor's degree,Physics,1 to 2 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,59,Employed part-time,,,No,Yes,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Stack Overflow Q&A",,,,,,Very useful,,,,,,,,Very useful,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,,Bachelor's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,United States,36,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Friends network,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,60,0,30,10,0,0,Computer Vision,,A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Other,Basic laptop (Macbook),Text data,Never,<1MB,,"Java,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,Rarely,,Often,,,,"A/B Testing,Text Analytics,Time Series Analysis",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Sometimes,,,Sometimes,,Sometimes,Sometimes,,,Rarely,,,,,,,,Often,,,76-99% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,200000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,,KDnuggets Blog,< 1 year,,Necessary,,,Necessary,,Nice to have,Necessary,Unnecessary,,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Data Analyst,Self-taught,60,10,30,0,0,0,Natural Language Processing,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Other,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Tutoring/mentoring",,,,,,,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,I never declared a major,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,30,20,0,0,Computer Vision,,High school,Financial,"5,000 to 9,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Never,,Decision Trees,"C/C++,Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Java,Jupyter notebooks,Orange,Python,SAS Base,Unix shell / awk",,,,Rarely,Rarely,,,,Most of the time,,Sometimes,Sometimes,,,Rarely,,Most of the time,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Often,,,,,,,,,,Most of the time,,,,"Decision Trees,Logistic Regression",,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,40,40,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,,,,Most of the time,,10-25% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,72000,PEN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Microsoft Excel Data Mining,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,More than 10 years,Researcher,Self-taught,50,40,0,0,10,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",,Academic,500 to 999 employees,Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,1MB,"Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,,,,,,Often,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,25,50,0,10,15,0,Enough to tune the parameters properly,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Often,,,,,,,,,,,,,10-25% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,160000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Data Scientist,,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Data Scientist,Self-taught,30,10,10,0,50,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Academic,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Often,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics",Often,,,,,Often,Most of the time,Most of the time,Often,,,,,,,Often,,,Often,,,,Often,Often,,,,,Often,,,,,50,10,10,20,10,0,Enough to run the code / standard library,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Colombia,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Online courses,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Other,Other,15,20,30,30,5,0,"Computer Vision,Time Series,Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,,,,,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Other,Rarely,10MB,"Neural Networks,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,Often,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Neural Networks,Segmentation,Simulation,SVMs",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,Often,,,,,,Sometimes,Often,Sometimes,,,,,,40,25,10,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,100% of projects,More internal than external,Standalone Team,Most of data are propietary,Structuring and cleaning data to be able to work with it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,15000000,COP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Canada,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",35,35,0,10,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,20 to 99 employees,Decreased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,Sometimes,10GB,RNNs,"Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Neural Networks,RNNs",Often,,,,,Most of the time,Sometimes,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,10,5,80,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Organization is small and cannot afford a data science team",Often,,Often,,,,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,110000,CAD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,56,Retired,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,R,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,Other,Self-taught,40,30,20,10,0,0,Recommendation Engines,,A bachelor's degree,Other,500 to 999 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Sometimes,,,"NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,0,30,50,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",Often,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,IT Department,goverment provided car related,post business rules data lake is not well indexed ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,91000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Philippines,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Anomaly Detection,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,5,0,15,0,,,A professional degree,Mix of fields,10 to 19 employees,Stayed the same,Less than one year,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,,1GB,,"Amazon Web services,Python",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,10,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,720000,PHP,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Anomaly Detection,Python,,"Arxiv,Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,25,25,20,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Decreased slightly,More than 10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Random Forests,Regression/Logistic Regression","Cloudera,Google Cloud Compute,IBM Watson / Waton Analytics,KNIME (free version),Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Often,,,Rarely,,,,,Rarely,,,,,,Rarely,,,Sometimes,,Sometimes,,,,,,,Sometimes,,Often,,,,,,,,Often,Most of the time,,,,,,Often,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,,Sometimes,Most of the time,Most of the time,Often,Most of the time,,,,,Often,,Often,,Rarely,,,Sometimes,,Most of the time,Sometimes,,,,Sometimes,,,,,,25,25,25,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,Often,,Sometimes,,,Most of the time,,,,,Rarely,Often,,,51-75% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,60000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Not Useful,,,,,,,Very useful,Very useful,,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,20,40,0,40,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Image data,Most of the time,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Minitab,Python,R,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,Rarely,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation",,,,Sometimes,,Most of the time,Most of the time,,,Rarely,,,,Often,,Often,,Rarely,,Most of the time,Most of the time,,,,,Often,Most of the time,,,,,,,29,10,20,40,1,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,Often,Often,,Rarely,,,Most of the time,,Sometimes,,100% of projects,Entirely internal,Standalone Team,MsCOCO; ImageNet; StarCraft Replay Dataset; Pedestrian datasets in general,Limited hardware to process dataset,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,"Bitbucket,Git,Other",Sometimes,26400,BRL,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses",,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Programmer,University courses,10,40,25,25,0,0,"Natural Language Processing,Speech Recognition",,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Netherlands,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Other",University courses,10,40,25,20,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Retail,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Most of the time,,,,,,Often,Often,,,,Often,,,,,,,,,Sometimes,,Often,Sometimes,,,,,,,,,,70,5,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Most of the time,,,,,Often,,Often,Most of the time,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,Competitor pricings,Cleaning and performance,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,65000,EUR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Text Mining,Python,Google Search,"Arxiv,College/University,Company internal community,Conferences,Kaggle,Online courses",Very useful,,Somewhat useful,Somewhat useful,Very useful,,Not Useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,6 to 10 years,,University courses,10,20,40,30,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation",Image data,Sometimes,1GB,"CNNs,GANs,Neural Networks,RNNs,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Other,Other",,,,Often,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,"CNNs,Data Visualization,GANs,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation,SVMs",,,,Often,,,Sometimes,,,,Rarely,,Rarely,Often,,Sometimes,,,,Often,Rarely,,,,Sometimes,Sometimes,Often,Rarely,,,,,,20,20,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,,Often,,,,Sometimes,,,,Sometimes,,Sometimes,Sometimes,Often,,Often,Sometimes,Often,,10-25% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Tableau,Text Mining,R,"Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other,Other",,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,More than 10 years,,University courses,60,10,0,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1MB,Regression/Logistic Regression,"IBM SPSS Statistics,Julia,Jupyter notebooks,KNIME (free version),Python,R,RapidMiner (free version),Tableau,Unix shell / awk,Other,Other,Other",,,,,,,,,,,,Often,,,,Rarely,Rarely,,Rarely,,,,,,,,,,,,Rarely,,Often,,Rarely,,,,,,,,,,Rarely,,,Rarely,Rarely,Rarely,Rarely,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Rarely,Most of the time,Rarely,,,,,,Sometimes,,Rarely,,,,,Sometimes,,,,,,,,,,,,,50,20,0,20,10,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,,,Sometimes,,,,,,Often,Sometimes,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,250000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Other,C/C++/C#,Other,"Other,Other,Other",,,,,,,,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,"Engineer,Programmer",Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Other,"1,000 to 4,999 employees",Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Never,<1MB,Bayesian Techniques,"C/C++,Java,Mathematica,Python,SQL",,,,Sometimes,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,None,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,"Git,Subversion",,216000,RUB,Has decreased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),Tutoring/mentoring,,,,,,,,,,,,,,,,,Not Useful,,,< 1 year,Nice to have,Nice to have,,,Nice to have,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Other,20,0,0,0,0,80,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Pakistan,24,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,,Matlab,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Python,Neural Nets,Python,Google Search,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,Not Useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,40,0,0,60,0,0,Survival Analysis,Logistic Regression,,Academic,20 to 99 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Most of the time,10MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Rarely,Most of the time,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,40,20,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Rarely,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Rarely,54000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Predictive Modeler,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Other,University courses,20,5,30,5,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Insurance,"1,000 to 4,999 employees",Stayed the same,6-10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Always,10GB,Regression/Logistic Regression,"R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression",,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,40,40,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input",,,,,Often,,,,,,Often,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,Fannie Mae and Freddie Mac loan level mortgage performance dataset,"understanding the data, how to translate the data into information that we care about",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,87000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Very useful,Not Useful,Very useful,Somewhat useful,,,,"KDnuggets Blog,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,10,90,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,R",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",,,Rarely,,,Often,Most of the time,Sometimes,,,,,,Often,,Often,,,,,Often,,Sometimes,Sometimes,,,,,,,,,,75,10,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Often,,,Sometimes,Often,,,Rarely,,Sometimes,Most of the time,,,Rarely,,Sometimes,,,76-99% of projects,Entirely internal,Central Insights Team,,,,I don't typically share data,,Git,Rarely,180000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Personal Projects",,,,,,,Very useful,,Very useful,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,70,0,20,0,10,0,"Computer Vision,Natural Language Processing,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Technology,"1,000 to 4,999 employees",Decreased slightly,6-10 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data,Other",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Tableau,TensorFlow,Other",Rarely,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Rarely,Rarely,,Often,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Often,,Most of the time,,,Most of the time,Most of the time,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Other",,,Sometimes,Often,Most of the time,Most of the time,,Often,Most of the time,,,Often,,Most of the time,,,,Most of the time,,,,Most of the time,100% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Other,Sometimes,700000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Argentina,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,Not Useful,,,Very useful,,Somewhat useful,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",25,20,30,20,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,10MB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Python,R,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",Often,,,,,Rarely,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,,,,Most of the time,,Often,,,,Sometimes,,,Rarely,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Rarely,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,76-99% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,65000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Other,Java,,"College/University,YouTube Videos",,,Very useful,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,0,30,30,40,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A doctoral degree,Government,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Always,10GB,,C/C++,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,35,50,5,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"130,000",USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Mexico,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Other",University courses,60,5,10,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Always,1GB,"Ensemble Methods,Markov Logic Networks,Random Forests","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,Sometimes,Often,,,,,,,,Sometimes,Most of the time,,,Rarely,Rarely,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,,,,Sometimes,Most of the time,,Sometimes,,,,,,,,Sometimes,,Most of the time,Rarely,Sometimes,,Often,Rarely,,Sometimes,,,Most of the time,,,,,20,30,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,Sometimes,,Sometimes,,Often,Often,,,,,Most of the time,Often,,,,,Rarely,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Commercial Data Platform,,Git,Rarely,40000,MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"DataTau News Aggregator,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,35,20,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Never,10GB,"CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,Sometimes,,Sometimes,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,Sometimes,,,,,,Sometimes,,,,,,,,,,Most of the time,Sometimes,,,,,,,Sometimes,Often,,,,,30,40,0,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,Often,,,,Often,,,Often,,,,,,Often,,,,10-25% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,170000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Switzerland,42,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,10,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Other,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Always,,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,MATLAB/Octave,NoSQL,R,Tableau,Other",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,,,Most of the time,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests",,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,10,30,30,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,data privacy and data cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,225000,CHF,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by college or university,Microsoft R Server (Formerly Revolution Analytics),Support Vector Machines (SVM),R,"GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Company internal community,Online courses,Stack Overflow Q&A,Textbook",,Very useful,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Other,University courses,35,5,10,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data,Other",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau,Other",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,Sometimes,Sometimes,,Often,,,Often,,,,Sometimes,,,"Association Rules,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation",,Often,,,,Often,,Often,,,,,,Often,,Often,,,,,,Sometimes,Often,Sometimes,,,Sometimes,,,,,,,55,15,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,66000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Text Mining,R,I collect my own data (e.g. web-scraping),"Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,DBA/Database Engineer",Work,90,NA,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,20 to 99 employees,Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,100GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Association Rules,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction",,Sometimes,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,Rarely,,,,,,,,,,,,,20,10,0,50,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Sometimes,,,,,,,,,Often,,,,,,,,,26-50% of projects,More internal than external,IT Department,Oracle Examples,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Stan,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Stack Overflow Q&A,Textbook",,Very useful,,,Very useful,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Psychology,More than 10 years,"Researcher,Other",University courses,30,5,30,35,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,1GB,"Gradient Boosted Machines,Regression/Logistic Regression","Python,R,SQL,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Sometimes,Rarely,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,HMMs,Logistic Regression,PCA and Dimensionality Reduction,Simulation",Sometimes,,,,,Sometimes,Often,,,,,Rarely,Most of the time,,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,,,,,,20,30,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Sometimes,200000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses",,Somewhat useful,Somewhat useful,,,,Somewhat useful,Not Useful,Somewhat useful,,Somewhat useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,20,0,15,65,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Text data,Never,100GB,"CNNs,Gradient Boosted Machines,Neural Networks,Other","Java,Jupyter notebooks,Python,R,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Often,,,,"CNNs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction",,,,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,Most of the time,,,,,,,,,,,,,50,10,10,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database",,,,,,,,,Sometimes,,,Sometimes,,,,Often,Often,Often,,,,,51-75% of projects,Approximately half internal and half external,Business Department,None,Extracting the relevant features and information for observations discrimination ,Document-oriented (e.g. MongoDB/Elasticsearch),Other,Company Server,Git,Rarely,"34,000",,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United States,27,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer",University courses,15,20,0,50,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects",Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Romania,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Traditional Workstation,11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important +Male,Canada,63,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Business Analyst,DBA/Database Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,45,20,10,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,Not very important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1TB,"CNNs,Decision Trees,Gradient Boosted Machines,Random Forests","C/C++,Cloudera,Flume,Hadoop/Hive/Pig,IBM Cognos,Java,Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,QlikView,R,SAS Base,Spark / MLlib,SQL,Tableau,Unix shell / awk,Other,Other,Other",,,,Rarely,Often,,Sometimes,,Often,Often,,,,,Sometimes,,Most of the time,,,,Rarely,,,Sometimes,,,Sometimes,,,,Most of the time,Sometimes,Often,,,,,Sometimes,,,Most of the time,Most of the time,,,Often,,,Sometimes,Sometimes,Most of the time,Often,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,Sometimes,Often,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Often,,,Sometimes,,Often,,Often,Often,,,,Often,Most of the time,,,,,25,25,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Most of the time,,Most of the time,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,Often,,Most of the time,Often,Most of the time,,26-50% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,154000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Other,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Russia,21,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Unix shell / awk,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,Fewer than 10 employees,Increased slightly,Don't know,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Rarely,1GB,"Neural Networks,Random Forests","Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,Other",,Sometimes,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,5,10,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,,,Rarely,,,,,,,,,,Most of the time,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,33000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Somewhat useful,,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,0,0,60,0,"Computer Vision,Recommendation Engines,Reinforcement learning","Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Text data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,Random Forests,Recommender Systems,Text Analytics",Often,,,,Most of the time,Most of the time,Often,,Sometimes,,,,,Often,,,,,,Often,,,Often,Most of the time,,,,,Often,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,Often,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Git",Sometimes,40000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,DBA/Database Engineer,,,R,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,Somewhat useful,Very useful,R Bloggers Blog Aggregator,< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Vietnam,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,"Business Analyst,Data Analyst,Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Other,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Association Rules,Data Visualization",,Often,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,0,10,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,Often,,,Sometimes,,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,"No access directly to live database, so data is 1 day lag and also need to query data to the computer and then process","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,120000000,VND,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Canada,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,Talking Machines Podcast",1-2 years,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Udacity,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Researcher,I haven't started working yet",Self-taught,85,5,5,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Other,47,Employed part-time,,,No,Yes,Other,,Employed by government,MATLAB/Octave,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Sort of (Explain more),Professional degree,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,I prefer not to answer,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Female,United States,27,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Regression,SAS,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Statistician,University courses,5,5,0,88,2,0,Survival Analysis,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Non-profit,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Never,1MB,Bayesian Techniques,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,60,20,0,20,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)","Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Denmark,43,"Not employed, but looking for work",,,,,,,,Other,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,No Free Hunch Blog,< 1 year,Necessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"edX,Other","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",0 - 1 hour,Github Portfolio,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Researcher",University courses,40,0,9,50,1,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,United States,50,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,,"No Free Hunch Blog,Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,75,0,0,5,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Female,Canada,23,Employed part-time,,,No,Yes,Other,Fine,Employed by government,Tableau,Text Mining,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Personal Projects",,Somewhat useful,Very useful,,,,Very useful,Very useful,,,,Somewhat useful,,,,,,,"The Analytics Dispatch Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,"Business Analyst,Other",University courses,15,15,5,60,5,0,Survival Analysis,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Genetic & Evolutionary Algorithms,R,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Kaggle Competitions,Yes,Master's degree,A health science,I don't write code to analyze data,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,Time Series,Other (please specify; separate by semi-colon),A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Italy,NA,Employed full-time,,,Yes,,Data Scientist,,,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Official documentation,Online courses,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Data Miner,Work,30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Telecommunications,100 to 499 employees,Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Never,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs,Other","C/C++,Hadoop/Hive/Pig,Java,Microsoft SQL Server Data Mining,NoSQL,R,RapidMiner (commercial version),SQL,TensorFlow,Unix shell / awk,Other",,,,Rarely,,,,,Often,,,,,,Often,,,,,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,Most of the time,,,,,,,,Sometimes,,,,Sometimes,,,Often,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,RNNs,Time Series Analysis",,Sometimes,Most of the time,Sometimes,,Most of the time,Most of the time,Most of the time,,,Often,,,,Most of the time,Often,,Most of the time,,,,,Often,,Sometimes,,,,,Sometimes,,,,50,20,5,20,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process",Often,Often,,,,Often,,Often,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Rarely,"55,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,76,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,,,,Very useful,,,,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,DataTau News Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,15,0,0,60,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Russia,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Other,Less than a year,Other,Self-taught,65,25,5,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,RapidMiner (commercial version)",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Random Forests",,,,,,Often,,Most of the time,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,60,20,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",,,,,,Often,,,Most of the time,,,Sometimes,Most of the time,,Often,,,,,,,,100% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,78000,RUB,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Amazon Machine Learning,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A professional degree,Other,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Logistic Regression,Naive Bayes,Prescriptive Modeling,Time Series Analysis",Sometimes,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,Sometimes,,,,,,,,Most of the time,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,100% of projects,Entirely external,Business Department,None.,Normalizing/cleaning.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Always,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Online courses",,,,,,Somewhat useful,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Programmer",Work,40,0,50,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","KNIME (free version),Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,Rarely,Rarely,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",,,,,,,Most of the time,Rarely,Rarely,,,,,Rarely,,Rarely,,,,,,,Sometimes,,,Sometimes,,,,Rarely,,,,50,10,0,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Sometimes,Most of the time,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,51-75% of projects,More internal than external,IT Department,,Data understanding; Unclean data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1830000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,Monte Carlo Methods,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer",Other,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,500 to 999 employees,Stayed the same,1-2 years,Some other way,Important,,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,Often,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,Most of the time,,,,25,25,10,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,Most of the time,,,Most of the time,,,,,,,,,,,Often,,,10-25% of projects,Entirely internal,IT Department,Consumer population survey;neilsen;Vermont information processing;,Formats are different and not a lot of time available for learning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint,Other",Slack,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Canada,29,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,C/C++,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,Not Useful,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer",Self-taught,20,80,0,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Increased significantly,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Other",Most of the time,1GB,"CNNs,Ensemble Methods,GANs,Neural Networks,Random Forests,RNNs","C/C++,Java,Jupyter notebooks,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",Most of the time,,,Sometimes,,Often,Most of the time,Rarely,Sometimes,,Often,,,,,,,,Sometimes,Often,Often,,Sometimes,,Sometimes,Rarely,,Rarely,,,,,,80,5,15,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,Often,Most of the time,Most of the time,,,,Sometimes,,,Sometimes,,Sometimes,Sometimes,Often,,,,Most of the time,,100% of projects,More internal than external,Standalone Team,,"Dirty data, parameter tuning.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,30000,CAD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,MATLAB/Octave,Time Series Analysis,Matlab,Google Search,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,1 to 2 years,Engineer,Self-taught,60,0,20,10,5,5,Time Series,Bayesian Techniques,,Academic,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Rarely,,Bayesian Techniques,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,10,5,1,1,3,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Graph (e.g. GraphBase/Neo4j),Email,,,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,South Africa,29,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,30,20,0,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Always,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,,,,,Ensemble Methods,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,60,20,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,Sometimes,,Often,,,,,,,,,Often,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,30000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,Amazon Machine Learning,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Very useful,Very useful,,,,Somewhat useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Scientist,Engineer,Operations Research Practitioner,Programmer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",50,25,5,20,0,0,"Natural Language Processing,Reinforcement learning,Speech Recognition","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data",Sometimes,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,RNNs","Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Ensemble Methods,GANs,HMMs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,Most of the time,,,,,Often,,Sometimes,,Sometimes,,,,,,Often,Most of the time,Often,,,,,,,,Most of the time,,,,,40,25,5,15,15,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Company internal community,Online courses,Stack Overflow Q&A",,Very useful,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,10,50,5,30,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Always,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,Other","R,SQL,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,Sometimes,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,10,50,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,Sometimes,800000,PKR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,DataTau News Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,50,30,15,0,5,0,Reinforcement learning,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,NoSQL,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,Often,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Segmentation,Simulation,Text Analytics",,,Often,,,,Most of the time,Often,,,,,,,,Often,,Sometimes,,,,Often,,,,Often,Sometimes,,Often,,,,,40,20,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,,,,,,Sometimes,,,Often,,,,Often,,Sometimes,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Most of the time,"110,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United Kingdom,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Somewhat useful,Not Useful,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Predictive Modeler,Researcher,Statistician",Work,20,20,50,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Always,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Statistics,Impala,Microsoft SQL Server Data Mining,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SQL,Unix shell / awk",,,,,,,,,Most of the time,,,Sometimes,,Often,,,,,,,,,,,Sometimes,,,,,,Most of the time,,Most of the time,,,Rarely,,Often,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Often,Sometimes,,,,Most of the time,,,Sometimes,Often,,,,,,,Often,,,,,,,Sometimes,,,,30,20,25,10,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,Often,,,,Often,,,,,,,,,Often,Most of the time,,,,100% of projects,Approximately half internal and half external,Business Department,,Reliability of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,"40,000",,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Sweden,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Cloudera,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,50,20,20,10,0,0,"Recommendation Engines,Time Series",Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Neural Networks,Random Forests","Amazon Web services,Cloudera,Hadoop/Hive/Pig,NoSQL,Oracle Data Mining/ Oracle R Enterprise",,Sometimes,,,Often,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,,,,,,Often,Often,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input",Often,Often,,,,,,Often,Often,,Often,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Canada,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,Very useful,,,,Not Useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,10,10,50,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",High school,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,<1MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,IBM Cognos,Python,R,SQL,Tableau",,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Sometimes,,,,,,,Most of the time,Most of the time,,,Often,Sometimes,Rarely,,Sometimes,,,Often,,,Often,Most of the time,,,,45,35,0,5,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,Rarely,Most of the time,,,,,,,,,,,,,,Often,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",,64500,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,52,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Bayesian Methods,R,"Government website,University/Non-profit research group websites","Blogs,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,,,,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,"FlowingData Blog,R Bloggers Blog Aggregator",10-15 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Work,30,0,50,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important +Male,Russia,16,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,Very useful,,,,,,Very useful,,Very useful,,,,Very useful,Jack's Import AI Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Github Portfolio,Yes,I did not complete any formal education past high school,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important +Male,Spain,39,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Company internal community,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,Very useful,,,Not Useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"DBA/Database Engineer,Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer,Other",Other,75,0,15,0,10,0,"Machine Translation,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Telecommunications,"10,000 or more employees",Stayed the same,Less than one year,A general-purpose job board,Very important,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Random Forests","Hadoop/Hive/Pig,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,Sometimes,,,,Most of the time,Most of the time,Often,,,Sometimes,,Often,,Often,,Often,Rarely,,,,Often,,,,,Rarely,Sometimes,,,,,40,20,25,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues",,,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,Sometimes,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,Clean data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint,Other",Kibana; Zeppelin; Graphite; Adminer,"Git,Other",Sometimes,36000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Unix shell / awk,Time Series Analysis,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,"Data Analyst,Researcher",University courses,30,10,20,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Government,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data",,100GB,"Decision Trees,SVMs","C/C++,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Neural Networks,Random Forests",,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,40,40,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,100% of projects,More external than internal,Standalone Team,"Geogradis, Landsat, Environment Canada, Water Survey Canada",Spatial resolution of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,25000,CAD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"GitHub,Google Search","Blogs,College/University,Tutoring/mentoring",,Very useful,Very useful,,,,,,,,,,,,,,Somewhat useful,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,I did not complete any formal education past high school,,,"Business Analyst,Data Miner,Data Scientist",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Ukraine,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Hadoop/Hive/Pig,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,70,0,0,30,0,0,,,A doctoral degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Other",,,,,,,Somewhat useful,,,,,Very useful,,,,,,,"DataTau News Aggregator,FastML Blog,KDnuggets Blog",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Other,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Canada,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,Udacity",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Psychology,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,30,0,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Male,United States,31,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,GitHub,"Arxiv,Conferences,Newsletters,Textbook",Very useful,,,,Somewhat useful,,,Very useful,,,,,,,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Computer Scientist,Engineer,Programmer",Self-taught,50,30,0,20,0,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Not very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data",Rarely,1GB,"CNNs,RNNs","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Natural Language Processing,Neural Networks,RNNs",,,,Often,,Most of the time,Often,,Often,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,10,40,0,20,30,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Often,Most of the time,,,Most of the time,,,Sometimes,,,,,,,Often,,,,10-25% of projects,More external than internal,Central Insights Team,Imagenet; MS COCO; SQUaD; WikiDB,In ability to integrate multiple data sources. They seem to have different distributions and generally are not transferable.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,"100,000",USD,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer",Work,40,10,50,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Pharmaceutical,10 to 19 employees,Increased slightly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,Most of the time,,,Most of the time,,,Sometimes,Often,,,,,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,,,Often,,Rarely,,,,,,,,Most of the time,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics",,,Sometimes,,Often,,Most of the time,Sometimes,,,,,,Often,,Often,Sometimes,Sometimes,Often,,Sometimes,Often,Sometimes,,,,,Sometimes,Often,,,,,30,25,5,30,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Often,,Often,,,,,,,,Often,Sometimes,,,76-99% of projects,Approximately half internal and half external,Standalone Team,social media; example data sets like IMDB; census; leaked email; weather; public company information,"Cleaning is always required, but highly encoded data is the worst. Some data sets provide core information in one file, the field names in another, and the key to decoding encoded values in a PDF that requires a human to read.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,"150,000",USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,28,Employed full-time,,,No,Yes,Other,Poorly,"Employed by professional services/consulting firm,Employed by government",Python,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Unnecessary,Nice to have,Necessary,,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Management information systems,3 to 5 years,"Data Analyst,DBA/Database Engineer,Operations Research Practitioner",University courses,0,10,50,40,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,United States,100,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,Microsoft R Server (Formerly Revolution Analytics),Social Network Analysis,Matlab,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst",Self-taught,10,70,10,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Manufacturing,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",,,Neural Networks,"Amazon Web services,C/C++,MATLAB/Octave",,Often,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,Often,,,,40,40,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,U.S. Census data,Combining multiple sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,DataRobot,Cluster Analysis,R,Google Search,"College/University,Friends network,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,,,,Very useful,Very useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,Spark / MLlib,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,Rarely,Sometimes,,,Often,,Sometimes,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",,Sometimes,Sometimes,Sometimes,Often,Most of the time,Most of the time,Sometimes,Often,,,,,Often,,Sometimes,,Sometimes,Often,Sometimes,Often,Often,Often,Often,,Often,Often,Sometimes,Often,,,,,70,20,5,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,Most of the time,,,,Most of the time,,,,,,Often,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,"350,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,Trade book,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,"Data Machina Newsletter,FlowingData Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,"Data Analyst,Researcher",Self-taught,30,20,30,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,,,,Often,Most of the time,Often,,,,Often,,Often,Often,Most of the time,,,,,Often,Often,Often,,,,,Often,Often,Most of the time,,,,40,15,5,10,30,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,Sometimes,,,Often,Often,Most of the time,,,,,Sometimes,Often,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,82500,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,,,,Nice to have,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Other,0,30,0,0,0,70,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,,,,Very Important,,,,,,,,,, +Male,Nigeria,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,20,60,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Internet-based,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,Often,,,,Often,,,,,,,,,,,Sometimes,,,Sometimes,Often,,,,,,"Association Rules,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Text Analytics",,Often,,,,,Most of the time,Often,Often,,,,,Often,,Often,,Often,,,,,Often,Sometimes,,,,,Often,,,,,60,10,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,Often,,,,,,Most of the time,,,,,Often,,76-99% of projects,More external than internal,IT Department,Data.world; quandl; proprietary ,Having to clean the data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"35,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Switzerland,77,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,,,Proprietary Algorithms,Other,I collect my own data (e.g. web-scraping),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Statistician",Other,60,0,20,20,0,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),A doctoral degree,Academic,,,,,Important,Research that advances the state of the art of machine learning,Traditional Workstation,Relational data,Sometimes,<1MB,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,0,0,0,0,90,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,,,,,,,,,,,,,,,Often,,,None,Do not know,,,,Other,I don't typically share data,,,Don't know,,,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,Employed full-time,,,No,Yes,Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,Less than a year,Engineer,Self-taught,20,70,0,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Online courses,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,,,Very useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Tableau",,Most of the time,,,Often,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",Rarely,,Rarely,,,Often,Most of the time,Often,Sometimes,,,Rarely,,Rarely,Sometimes,Often,,Rarely,Sometimes,Rarely,Sometimes,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Often,,,,40,25,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,Sometimes,Sometimes,,,Often,Sometimes,Often,,Often,,100% of projects,,,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,135000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,44,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Necessary,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Male,Ukraine,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,Google Search,"Blogs,Official documentation,Online courses",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Other,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,75,25,0,0,0,0,Computer Vision,Support Vector Machines (SVMs),A professional degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,,,,,,,,,,Very Important,,Very Important,,Very Important +Male,Belgium,37,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,,"Official documentation,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Engineer,Researcher",Work,33,0,33,34,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Ensemble Methods,Neural Networks,Random Forests,SVMs","Jupyter notebooks,NoSQL,Python,R,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,,,,Rarely,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Sometimes,,Often,,,,,,Sometimes,,Often,,Sometimes,,Sometimes,Sometimes,,Often,,,,,Sometimes,,Most of the time,,,,35,15,15,15,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,Often,,,Sometimes,,,Most of the time,Sometimes,,26-50% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,80000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that performs advanced analytics,MATLAB/Octave,Neural Nets,Python,,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Very useful,,,,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,,1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,A social science,1 to 2 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A doctoral degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,United States,24,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,College/University,Friends network,Personal Projects",Somewhat useful,,Somewhat useful,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,3 to 5 years,"Programmer,Researcher",Work,45,5,45,5,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Academic,"5,000 to 9,999 employees",Increased significantly,3-5 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Mathematica,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"CNNs,Decision Trees,Logistic Regression,Neural Networks,Random Forests",,,,Often,,,,Most of the time,,,,,,,,Often,,,,Most of the time,,,Often,,,,,,,,,,,20,20,0,0,60,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"45,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,"Data Stories Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Necessary,,,,,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer",University courses,0,30,20,40,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,,,,,,,Very Important,Very Important,,,,,,,, +Male,Germany,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Anomaly Detection,Python,,"Blogs,Friends network,Official documentation,Stack Overflow Q&A,Textbook",,Very useful,,,,Very useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,20,30,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Rarely,Often,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Random Forests",Rarely,,,,,Often,Most of the time,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,30,5,5,30,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,,,,Often,Often,,,,,,Sometimes,,,Sometimes,Often,Often,Most of the time,,51-75% of projects,Do not know,Other,"Maxmind, 42matters, 51degree",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,70000,EUR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Cloudera,Neural Nets,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,20,20,10,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Other,"5,000 to 9,999 employees",Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,Other",,Rarely,,,Rarely,,,Rarely,Rarely,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Often,Most of the time,Often,Often,,,,,Often,Sometimes,Often,,Often,,,Often,,Often,,,,,,Often,Sometimes,,,,60,20,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Other",Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,Most of the time,76-99% of projects,More internal than external,Business Department,,People putting up barriers ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,88000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Bayesian Methods,SQL,Google Search,"College/University,Company internal community,Other",,,Somewhat useful,Very useful,,,,,,,,,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",University courses,20,0,50,27,3,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,CRM/Marketing,500 to 999 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1MB,Regression/Logistic Regression,"Amazon Web services,R,SQL",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Association Rules,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,15,30,5,10,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,Sometimes,,,,,Rarely,Most of the time,Sometimes,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,110000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Denmark,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Python,Survival Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook",,Very useful,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Researcher",University courses,0,0,50,50,0,0,Time Series,Logistic Regression,A professional degree,Academic,Fewer than 10 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Sometimes,10MB,Regression/Logistic Regression,"NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Often,,,,,,Most of the time,,,Often,,,,5,50,10,30,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Git,Sometimes,350000,DKK,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,40,Employed part-time,,,No,Yes,Computer Scientist,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by government",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,"Coursera,Other","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,0,10,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Management information systems,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,,Somewhat important,Not important,Not important,Somewhat important +Male,United States,41,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,R,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,Less than a year,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Retail,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,,Decision Trees,"Jupyter notebooks,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Prescriptive Modeling,Random Forests",,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,30,30,10,30,0,0,Enough to tune the parameters properly,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Subversion,Rarely,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Regression,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,Self-taught,80,10,5,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Mix of fields,"5,000 to 9,999 employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Never,1GB,Regression/Logistic Regression,"NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Scaling data science solution up to full database",,,,,,,,,,,Often,,,,,,,Often,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,450000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,"5,000 to 9,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data",Never,100MB,Neural Networks,"C/C++,Java,MATLAB/Octave,Perl,Python",,,,Often,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Neural Networks",,,,,,,Often,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,0,50,0,50,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Subversion,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,58,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Not Useful,Very useful,,,,"Data Elixir Newsletter,KDnuggets Blog",3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity,Other",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,48,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Friends network,Kaggle,Online courses,Personal Projects",,,,,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Software Developer/Software Engineer,Other",Self-taught,50,10,20,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,MATLAB/Octave,NoSQL,Python,SQL",Rarely,Most of the time,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Random Forests,SVMs",Sometimes,,,,,,Often,Often,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Most of the time,,Often,,,,Often,,Sometimes,,,,,Often,Often,,,Often,Most of the time,,26-50% of projects,Entirely internal,Other,,"Inconsistent formats, getting access","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,195000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Anomaly Detection,Python,,"Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Not Useful,Very useful,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Work,30,30,30,9,1,0,"Supervised Machine Learning (Tabular Data),Time Series",Other (please specify; separate by semi-colon),A bachelor's degree,CRM/Marketing,Fewer than 10 employees,Increased slightly,Less than one year,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data,Other",,,Regression/Logistic Regression,"IBM SPSS Statistics,Jupyter notebooks,Orange,Python,R,SQL,Other,Other",,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,Rarely,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,Most of the time,Rarely,,"Data Visualization,kNN and Other Clustering,Segmentation,Simulation",,,,,,,Most of the time,,,,,,,Rarely,,,,,,,,,,,,Most of the time,Rarely,,,,,,,10,5,5,30,30,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,,,,Sometimes,,,,Most of the time,,,,,,Often,Rarely,,,,Often,,,100% of projects,Entirely internal,Standalone Team,,Lack of clear research question,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Share Drive/SharePoint",,Other,Never,60000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,1 to 2 years,"Data Analyst,Researcher",Self-taught,40,20,10,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Most of the time,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Often,,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,Rarely,,Often,,Sometimes,Most of the time,Most of the time,Most of the time,,,Often,,Sometimes,,Most of the time,,Sometimes,,Often,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,30,25,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,Most of the time,Often,,,,Often,,Sometimes,,Sometimes,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,420000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Social Network Analysis,Scala,"GitHub,Google Search,University/Non-profit research group websites",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Machine Learning Engineer,Programmer,Other",Work,50,50,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",Increased significantly,6-10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Always,100TB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Python,R,Spark / MLlib,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,"kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Simulation,SVMs,Text Analytics",,,,,,,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,0,30,30,0,40,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,26-50% of projects,More internal than external,Standalone Team,"location, geo, address",finding right source ,Other,"Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,120000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Greece,38,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Kaggle,Online courses,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,20,20,0,50,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,10MB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,,Rarely,Most of the time,,Often,,,,Most of the time,,Often,,Often,,,,Often,Often,,Often,Rarely,,,,Sometimes,,,,,,30,30,0,10,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,13000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,NoSQL,Rule Induction,Haskell,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Conferences,Official documentation,Personal Projects",Very useful,,Very useful,,Very useful,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,"Engineer,Machine Learning Engineer",University courses,25,0,25,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Other,Don't know,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Naive Bayes,Random Forests,SVMs,Time Series Analysis",,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,NA,100,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,None,Entirely internal,Other,none,gathering clean and representative data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,38,30,5,25,2,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Text data",Always,100GB,"CNNs,Neural Networks,Random Forests","Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,SVMs,Text Analytics",,,,Often,,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,Most of the time,,,Often,,,Sometimes,,Rarely,Sometimes,,,,,40,15,20,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,Often,,,Sometimes,Most of the time,,,,Sometimes,,,Less than 10% of projects,More internal than external,Other,GTSRB;Wikipedia;NLTK Corpuses/stopwords,It's hard to communicate that preparing the data is time-consuming,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Never,56500,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites,Other","Blogs,Official documentation,Online courses,Personal Projects,Other,Other",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Work,100,0,0,0,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1TB,,"IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Other",,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,Most of the time,,Often,Often,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",Sometimes,,Sometimes,,,,Most of the time,Rarely,,,,,,,,,,Sometimes,Often,Often,,,,,,,,,Sometimes,,,,,5,30,10,30,10,15,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Other",Sometimes,,,,,Most of the time,,,Often,Most of the time,,,,,,,,,,,,Most of the time,76-99% of projects,Entirely internal,Other,Project Data Sphere,This really isn't applicable as we are consultants and the challenges vary by industry and client.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,"150,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Spain,54,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Google Search,"Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer,Statistician",University courses,0,5,40,30,10,15,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10TB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,Most of the time,Sometimes,,,,Sometimes,,,,,Rarely,,,Rarely,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,,,,,Often,Sometimes,,,Rarely,Sometimes,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Sometimes,,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,Sometimes,Most of the time,,Often,Rarely,Often,,,,Sometimes,Most of the time,,Most of the time,,Sometimes,Often,Often,Often,,Often,,,,5,20,5,10,20,40,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Often,,Sometimes,,,,,,,,,Most of the time,,Often,,Often,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Other",Rarely,70000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Mathematica,Proprietary Algorithms,Python,Google Search,"Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses,Podcasts,Trade book",Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Talking Machines Podcast,< 1 year,,,Nice to have,,Nice to have,,,,,,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Engineer,Researcher,Statistician,Other",Self-taught,80,20,0,0,0,0,"Adversarial Learning,Survival Analysis",Hidden Markov Models HMMs,I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Not important,,,,,,,,,,,,,,, +Male,Turkey,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Talking Machines Podcast",< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,Computer Vision,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Statistician,Poorly,Employed by government,Stan,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer,Statistician",University courses,90,5,0,1,4,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Java,R,SQL,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,Often,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Sometimes,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,70,5,1,10,14,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Often,,,,Most of the time,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,Census; job stats,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Never,87000,BRL,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United Kingdom,50,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,Very useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Other,No,Master's degree,Management information systems,1 to 2 years,,University courses,50,20,0,30,0,0,Reinforcement learning,"Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,India,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,Google Search,"College/University,Friends network,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,Very useful,,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Male,United States,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Textbook,Tutoring/mentoring",,,Very useful,,,,Very useful,,,,Very useful,,,,Somewhat useful,,Somewhat useful,,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,PhD,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,10,20,0,40,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,France,22,"Not employed, but looking for work",,,,,,,,Other,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Textbook",Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,50,10,30,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,United States,56,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by non-profit or NGO,Python,,SQL,"GitHub,I collect my own data (e.g. web-scraping)","Non-Kaggle online communities,Online courses",,,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Coursera,Other,2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,A health science,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,,Logistic Regression,A professional degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Spain,NA,Employed full-time,,,No,Yes,,,Employed by professional services/consulting firm,R,,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,,15+ years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Engineer,Predictive Modeler",Work,50,0,50,0,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,R,Time Series Analysis,R,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Other,Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,51,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Random Forests,R,GitHub,"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,No,Master's degree,A humanities discipline,I don't write code to analyze data,Other,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Iran,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Master's degree,A social science,1 to 2 years,Business Analyst,University courses,60,20,10,10,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United Kingdom,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Text Mining,Python,Other,"Blogs,Friends network,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,20,0,10,0,10,60,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,500 to 999 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","NoSQL,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,Often,,,,Most of the time,,,,,,,,,,,Often,Sometimes,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,40,10,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",,Sometimes,Often,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,100% of projects,Entirely internal,Business Department,Land Registry data; Ordnance Survey data; various other UK government Open Data,Legal concerns involving Ordnance Survey and Open Data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,"45,000",GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Russia,20,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,1,15,70,4,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1TB,Other,"C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,20,40,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Sometimes,,,,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Mercurial,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,United States,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Self-employed,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,75,0,0,0,5,,,I prefer not to answer,Academic,,,,,Somewhat important,Other,Basic laptop (Macbook),Relational data,Sometimes,1MB,Other,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression",Sometimes,,,,,Often,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,80,10,0,10,0,0,Enough to run the code / standard library,"I prefer not to say,Other",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Web services,Time Series Analysis,Python,GitHub,"Blogs,College/University,Kaggle,Personal Projects",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,45,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,Very useful,Not Useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",5-10 years,Nice to have,Unnecessary,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Other,Yes,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Software Developer/Software Engineer,Other",Other,20,5,40,5,25,5,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important +Male,United Kingdom,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Time Series Analysis",,,,,,Sometimes,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,40,15,5,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Other",,,,,Sometimes,,,,,,,,,,,,,,,,,Often,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,150000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Very useful,,Very useful,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,25,50,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,CRM/Marketing,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1MB,Regression/Logistic Regression,"R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Sometimes,,,"A/B Testing,Data Visualization,Logistic Regression,Segmentation,Text Analytics",Most of the time,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,Often,,,,,10,5,5,30,50,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,,Often,,,Often,Often,,Often,,,,Often,,,,Often,Often,,100% of projects,Entirely internal,Central Insights Team,"US Census NPD",Time and talent constraints,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,,Always,15000,USD,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,47,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,Textbook",,,,,,,,,,,,,,,,,,,"DataTau News Aggregator,Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,20,50,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Never,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,Often,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,Sometimes,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,Rarely,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,Often,Sometimes,Sometimes,,Sometimes,,,,,,Often,Sometimes,,,,40,20,0,20,0,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,150000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,Google Search,"Company internal community,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,Other,University courses,5,0,25,70,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1MB,"Bayesian Techniques,CNNs,Neural Networks,RNNs,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Natural Language Processing",,,,,,Most of the time,Often,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,50,5,35,0,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,,Often,,,Often,Rarely,,,Often,Most of the time,,,,Most of the time,,,Rarely,,51-75% of projects,More internal than external,Other,reddit;twitter,data annotation -- always a bottleneck for us,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"60,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,70,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,Other",,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer,Other",Self-taught,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,NoSQL,Python,R,SAS Base,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,Rarely,Sometimes,,,,Often,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,Often,,Often,,,,,Often,,,Sometimes,Sometimes,,,,,,Sometimes,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Sometimes,,,,Often,Often,Often,Sometimes,Sometimes,,,,,Sometimes,,Often,,Often,,Sometimes,Often,,Often,Often,,,,,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,Often,,,Sometimes,Often,Often,,Sometimes,Sometimes,,,Often,Sometimes,Sometimes,Often,Most of the time,,76-99% of projects,More internal than external,Business Department,,Getting access to it with freedom to use.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,Google Search,"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Very useful,Very useful,,Very useful,,,,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,30,10,0,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,,,,,,,,,50,30,10,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,447000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,54,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Very useful,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,Other,University courses,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Other,"1,000 to 4,999 employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",,Rarely,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,Sometimes,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,,Stability of Claudera platform,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Always,160000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Italy,56,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,30,0,40,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Internet-based,500 to 999 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,NoSQL,Perl,Python,R,RapidMiner (commercial version),SQL,TensorFlow,Unix shell / awk,Other",,,,Sometimes,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Most of the time,,,Often,Most of the time,,Sometimes,Sometimes,,,,,,,,Often,,,,Often,,Often,Often,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,Often,Rarely,,Often,Often,Sometimes,,,,Often,,Sometimes,,Often,,Often,Most of the time,Most of the time,,,Most of the time,,,,,,Most of the time,Most of the time,,,,15,15,20,10,15,25,Enough to tune the parameters properly,"Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,,Often,,,,,Sometimes,,,Often,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Rarely,,,Has decreased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Ukraine,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Fine,,DataRobot,Text Mining,Python,Google Search,"Personal Projects,Other",,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Data Miner,Operations Research Practitioner,Other",Self-taught,90,5,5,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Internet-based,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,,10MB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,,15,70,10,0,5,0,Enough to run the code / standard library,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,None,Do not know,Standalone Team,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Always,12000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"edX,Other",Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Data Miner,Programmer,Researcher",Self-taught,50,40,0,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,27,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Python,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Other,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau",,,,,,,,,,,Rarely,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,Rarely,Rarely,,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics,Other",,,,,,Often,Most of the time,Often,,,,,,,,Often,,,,,,Most of the time,,,,Most of the time,,,Often,,,,Most of the time,50,20,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team",Most of the time,,,,Most of the time,,,,Sometimes,Most of the time,Often,,Most of the time,,,Often,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,cleaning; storage,Other,"Email,Share Drive/SharePoint",,Git,Most of the time,40000,USD,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Egypt,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,Switzerland,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,"College/University,Conferences,Kaggle,Newsletters,Personal Projects,Podcasts",,,Very useful,,Very useful,,Very useful,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,40,30,0,30,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Logistic Regression",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Always,100GB,"Evolutionary Approaches,Gradient Boosted Machines","Google Cloud Compute,Hadoop/Hive/Pig,Java,NoSQL,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"CNNs,Collaborative Filtering,Data Visualization,Decision Trees,Naive Bayes,Time Series Analysis",,,,Often,Sometimes,,Often,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Often,,,,,26-50% of projects,Entirely internal,Standalone Team,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,72,Employed part-time,,,Yes,,Statistician,Poorly,Employed by college or university,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,Statistica (Quest/Dell-formerly Statsoft),TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,Often,,,,,Sometimes,,,Rarely,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,Sometimes,,,,Sometimes,Often,Often,Rarely,,,Often,,Sometimes,,Often,,,,Often,Often,Most of the time,Often,,,,,Often,,Often,,,,30,30,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,Sometimes,,,,,,,,,,,,,,,Often,Sometimes,,,51-75% of projects,More internal than external,Central Insights Team,Kaggle,Data management,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"20,000",,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,College/University,Kaggle,Online courses,Personal Projects",Somewhat useful,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,"FastML Blog,FlowingData Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,0,30,20,20,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,CRM/Marketing,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,10GB,"Decision Trees,Random Forests,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,Often,,,,,,Often,Sometimes,,,,,,Most of the time,,Often,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,,,,,Sometimes,Often,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,,Sometimes,,,,,Often,,,,,Most of the time,,Sometimes,,Sometimes,Often,,Often,,Sometimes,Most of the time,Often,,,,20,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,Often,,Sometimes,,,Most of the time,,Sometimes,Sometimes,,Often,Often,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,52800,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Pakistan,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"GitHub,University/Non-profit research group websites","Friends network,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,40,0,0,60,0,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,India,25,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Weka,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,Somewhat useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Other,20,50,0,0,0,30,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Mexico,44,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Other,"Government website,I collect my own data (e.g. web-scraping)","Non-Kaggle online communities,Personal Projects,YouTube Videos",,,,,,,,,Somewhat useful,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,60,10,10,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",Often,Most of the time,,,,Most of the time,,Most of the time,Often,,,,,,Often,Most of the time,,,,Most of the time,Most of the time,Sometimes,Most of the time,Often,,Most of the time,Most of the time,Often,Most of the time,,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,,,Sometimes,,,Sometimes,,,,,,Often,,Most of the time,26-50% of projects,Approximately half internal and half external,Central Insights Team,Twitter; Youtube; INEGI,Interpreting multi-language texts.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,6000,USD,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Other,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Textbook,Other",Somewhat useful,,,,,,,,,,,,,,Very useful,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Machine Learning Engineer,University courses,80,0,10,10,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Telecommunications,10 to 19 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,TensorFlow,Other",,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Often,,,"A/B Testing,Data Visualization,Ensemble Methods,RNNs,Text Analytics",Often,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,30,30,30,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,None,Getting clients to give enough of it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,40000,,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Very useful,,,,Very useful,,,Very useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer",Work,50,20,20,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",Sometimes,Sometimes,,,,,,,Sometimes,,,,,Sometimes,Most of the time,,Most of the time,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Sometimes,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,Sometimes,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,Often,,Most of the time,Most of the time,,,,,Often,Often,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Often,Often,Often,,,,,Often,Often,Often,,,,,Often,,Often,Often,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,R,Survival Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,"Data Analyst,Predictive Modeler,Researcher",University courses,50,20,0,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,Often,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Segmentation",,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,60,10,0,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,,,,Sometimes,Sometimes,,10-25% of projects,More internal than external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Email",,,Rarely,100000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,Very useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Predictive Modeler,Statistician,I haven't started working yet",Self-taught,25,50,25,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Random Forests,Segmentation,Time Series Analysis",Often,Often,,,,Often,Often,Often,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,Sometimes,,,,30,30,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,36,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,Very useful,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"FlowingData Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,20,5,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,"Text data,Relational data",Always,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Sometimes,Sometimes,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,,Often,Most of the time,Most of the time,,,,,,,Often,,,,,Often,,Most of the time,,,Often,,,,,,,,35,20,35,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues",Most of the time,,,Often,,,,Often,Most of the time,Most of the time,,,,,Often,,Most of the time,,,,,,51-75% of projects,More internal than external,Other,Weather data;,synchronizing what operations is using to what the algorithm is using,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Rarely,65000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,0,20,50,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"CNNs,Neural Networks,RNNs","IBM Watson / Waton Analytics,Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,Segmentation",,,,Often,,,,,Often,,,,,,,Often,,,Most of the time,Most of the time,,,,,Often,Sometimes,,,,,,,,10,15,50,5,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,,,Sometimes,Often,Often,,,,,Often,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Other,Rarely,130000,USD,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,SQL,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,30,10,50,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Random Forests",,,,,,,,,,,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,80,10,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,Sometimes,,Less than 10% of projects,More external than internal,Central Insights Team,traffic open data,data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,0,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,30,0,0,60,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,,,,"Amazon Web services,Hadoop/Hive/Pig,Java,Python,Spark / MLlib,SQL,Unix shell / awk",,Rarely,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Time Series Analysis",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,5,0,0,0,0,95,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Iran,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,C/C++,,Java,GitHub,"Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,0,20,60,0,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning",,A master's degree,Financial,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Rarely,1GB,,"C/C++,Java,NoSQL,R,SQL",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Natural Language Processing,Simulation",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,60000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ireland,42,"Not employed, but looking for work",,,,,,,,Java,Deep learning,R,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Management information systems,1 to 2 years,Other,University courses,25,25,10,10,20,10,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Japan,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,C/C++,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,3 to 5 years,"Programmer,Software Developer/Software Engineer",Work,70,0,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series",Bayesian Techniques,I don't know/not sure,Internet-based,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,Stan",,Sometimes,,Most of the time,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,"Bayesian Techniques,Cross-Validation,Neural Networks,Time Series Analysis",,,Often,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,10,70,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,Often,Often,,,,,,Often,,,,,,,None,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Don't know,0,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Switzerland,29,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,Python,Factor Analysis,Python,Government website,"Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,Somewhat useful,,Not Useful,,Not Useful,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,Work,15,20,50,5,10,0,,,A professional degree,Academic,"10,000 or more employees",Increased slightly,6-10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,,,,"Mathematica,Python,R,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,Often,Sometimes,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,0,15,25,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,Rarely,89000,CHF,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle",,Very useful,,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,Other,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Brazil,40,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Non-Kaggle online communities,YouTube Videos,Other,Other,Other",,,,,,,Very useful,,Somewhat useful,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Management information systems,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,65,5,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,Data Analyst,Self-taught,40,0,30,0,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",Often,Rarely,,,,Most of the time,Most of the time,Often,Often,,,,,,Often,Most of the time,,Often,,,Often,,Often,,,Most of the time,,Most of the time,,,,,,15,60,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,Often,,Sometimes,Most of the time,,Most of the time,,,Often,,Most of the time,,Sometimes,,Most of the time,Most of the time,,100% of projects,Entirely internal,Central Insights Team,aggdata,integrity of clients' data sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,127000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,C/C++,Neural Nets,,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,20,0,80,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Python,Spark / MLlib",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,"CNNs,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks",,,,Often,,,,Often,,,,,,,,Most of the time,,Often,Often,Often,,,,,,,,,,,,,,70,10,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,26-50% of projects,,,,,,,,,,25000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,Python,,"Blogs,College/University,Friends network,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,Very useful,,< 1 year,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Evolutionary Approaches,Hidden Markov Models HMMs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,I don't plan on learning a new ML/DS method,R,Other,"College/University,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,edX,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner,Statistician","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,University/Non-profit research group websites,"College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,Somewhat useful,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,3 to 5 years,"Data Scientist,Researcher,Other",University courses,20,30,50,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,"Random Forests,Regression/Logistic Regression,SVMs","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,Segmentation",,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,65,10,0,10,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,,76-99% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,Rarely,150000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Other,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Textbook",,Very useful,,,,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,20,0,20,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Python,QlikView,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,Most of the time,,,,,,,,,Often,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other",Sometimes,,,,,Sometimes,Often,Sometimes,Often,,,Sometimes,,Rarely,,Often,,,,Sometimes,,,Often,Rarely,,Rarely,Rarely,Rarely,Rarely,Sometimes,,,Often,20,15,0,5,10,50,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Rarely,Often,,,,Sometimes,,Often,,,,,,Rarely,,Rarely,Often,Often,,10-25% of projects,Entirely internal,Other,OSM; Weather data; WorldClim; Twitter,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,80000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,RapidMiner (free version),Deep learning,Python,GitHub,"Arxiv,College/University,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Researcher",University courses,20,20,20,39,1,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,500 to 999 employees,Increased significantly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs",,,,,,,Often,Often,Often,,,Often,,Most of the time,,,,,,Often,Most of the time,,Most of the time,,Sometimes,,,Sometimes,,,,,,99,1,0,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Privacy issues",Sometimes,Often,,,,,,,,,,,,,,,Sometimes,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,112000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,Very useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Not Useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Engineer,Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",40,10,20,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Other,20 to 99 employees,Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Rarely,Sometimes,,Rarely,,,,"Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,Most of the time,,,,55,10,5,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Often,,,,Sometimes,Sometimes,Sometimes,,,Often,,Most of the time,,,Sometimes,Often,,,51-75% of projects,Entirely external,Other,"UK CAA data, FAA data. ","Diverse formats, missing data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,150000,GBP,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,FastML Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,45,0,20,20,0,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Personal Projects,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,Very useful,,,,,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition",Decision Trees - Random Forests,A bachelor's degree,Internet-based,500 to 999 employees,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes",Sometimes,Sometimes,,,,,Often,Often,,,,,,Often,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,30,30,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,550000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Anomaly Detection,Python,GitHub,College/University,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Researcher",University courses,20,0,30,50,0,0,"Computer Vision,Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data",Rarely,1TB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks","C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Segmentation,SVMs,Time Series Analysis",,,Sometimes,,,,,,,,,Sometimes,,Often,,,,,,Often,,,,,,Sometimes,,Most of the time,,Often,,,,60,20,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,Often,Sometimes,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,,,,Git,Rarely,40000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Physics,3 to 5 years,Software Developer/Software Engineer,University courses,10,10,0,80,0,0,Supervised Machine Learning (Tabular Data),"Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Male,United States,60,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed part-time,,,No,Yes,Other,Poorly,Employed by company that makes advanced analytic software,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,Less than a year,Other,University courses,20,0,10,70,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,India,23,"Not employed, but looking for work",,,,,,,,QlikView,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Spain,25,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,KDnuggets Blog,3-5 years,Nice to have,Necessary,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,10,0,60,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,44,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Impala,Deep learning,R,,"Blogs,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Researcher,Other",University courses,90,0,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Insurance,100 to 499 employees,Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,10MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Often,,,Sometimes,,,,"Logistic Regression,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,Sometimes,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,Sometimes,Sometimes,,,,,,Sometimes,,,,Often,,,,51-75% of projects,Do not know,Other,"USDA NASS Crop data, USDA FCIC insurance data, Many industry propriety models",,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,350000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,42,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",Online courses,,,,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"DataCamp,edX,Other",Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,,Researcher,University courses,NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Canada,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Text Mining,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,,,Very useful,,Very useful,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer",Work,0,0,100,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,IBM Watson / Waton Analytics,Java,NoSQL,Spark / MLlib,SQL,Unix shell / awk,Other,Other",,,,Rarely,,,,,,,Rarely,,Rarely,,Sometimes,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,Sometimes,Most of the time,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,,Often,,,,,,,Most of the time,,,Most of the time,,Often,,Sometimes,,,,,Most of the time,Most of the time,,,,,50,15,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,Sometimes,Sometimes,,,,Often,,,,,Most of the time,Often,,Sometimes,,Most of the time,Often,Sometimes,,76-99% of projects,More internal than external,Standalone Team,Company information,Waiting for data to be made available,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,155000,CHF,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,New Zealand,45,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by non-profit or NGO,Employed by government",TensorFlow,Deep learning,Python,,"Blogs,Online courses,Personal Projects",,Very useful,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,10,50,20,0,0,,,A bachelor's degree,Government,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Other",,,,"Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,TensorFlow",,Sometimes,,,Rarely,,,Rarely,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Rarely,,,,,,"CNNs,kNN and Other Clustering,Random Forests,Segmentation",,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,Rarely,,,Rarely,,,,,,,,10,5,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,100% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,130000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Italy,23,"Not employed, but looking for work",,,,,,,,R,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,YouTube Videos",,,Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Finland,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,Python,Government website,"Blogs,Company internal community,Personal Projects",,Very useful,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,Data Scientist,University courses,0,0,30,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Often,,Sometimes,Most of the time,Most of the time,Sometimes,,,,Often,,Sometimes,Sometimes,Most of the time,,,Most of the time,Rarely,Most of the time,,Sometimes,Most of the time,,Most of the time,,,Most of the time,Often,,,,60,5,10,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,,,Most of the time,,Sometimes,,,,Most of the time,Sometimes,,100% of projects,Do not know,,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,No,Yes,Researcher,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Social Network Analysis,R,"Google Search,Government website","Online courses,Personal Projects",,,,,,,,,,,Somewhat useful,Very useful,,,,,,,FlowingData Blog,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,A social science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,A professional degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,31,Employed full-time,,,Yes,,Researcher,Fine,,Stan,Neural Nets,Python,Google Search,"Arxiv,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,,Very useful,,Very useful,,,Very useful,,Very useful,,Very useful,Very useful,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer",University courses,30,0,20,40,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,Julia,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,Tableau,TensorFlow,Other",,,,Rarely,,,,,Rarely,,,,,,,Often,Often,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,Sometimes,Often,,,Often,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Most of the time,Sometimes,,Most of the time,Most of the time,,Often,,,,Often,Often,,Most of the time,,Sometimes,Most of the time,Often,Often,,Often,,Often,,Most of the time,Often,Most of the time,Sometimes,,,,10,30,30,20,10,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,51-75% of projects,Approximately half internal and half external,Standalone Team,MNIST; uci datasets; text corpora; research datasets,scaling algorithms to big data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,70000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,University courses,10,0,30,20,40,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,,10GB,"Bayesian Techniques,HMMs,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,Rarely,,,Sometimes,Often,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,Often,,,Rarely,Most of the time,,,,,,,40,20,0,20,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,Sometimes,,,Often,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,75000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Other","Arxiv,Company internal community,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,Other,25,0,0,25,0,50,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,500 to 999 employees,,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Time Series Analysis",Often,,,Most of the time,,Most of the time,,Sometimes,Sometimes,,,,,Sometimes,,Often,,,,Most of the time,Most of the time,Sometimes,Sometimes,,Often,,,,,Most of the time,,,,30,20,40,0,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,125000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,18,Employed part-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,Anomaly Detection,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,Somewhat useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,3 to 5 years,,Self-taught,50,0,0,20,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,500 to 999 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,"HMMs,Neural Networks,Random Forests,SVMs","Java,Python",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",Most of the time,,,Rarely,,Sometimes,,,,,,,Rarely,Often,,,,,,Rarely,Often,,,,,,,,Often,,,,,30,10,0,20,40,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,,I don't typically share data,,,Most of the time,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Female,United Kingdom,55,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Text Mining,R,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,,,,Very useful,,Very useful,,,,,,,Very useful,Very useful,,Somewhat useful,,"Data Stories Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,Statistician,University courses,30,10,30,0,0,30,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Financial,100 to 499 employees,Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,Regression/Logistic Regression,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,,,Often,Often,,,,,,,Sometimes,,Sometimes,,,,Rarely,Sometimes,,,,,Sometimes,Often,,,Often,,,,10,30,30,10,20,0,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,Sometimes,,,,Often,Sometimes,,,,51-75% of projects,Entirely internal,Standalone Team,Financial; Economic (government/official statistics) ,Complicated dependencies,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,150000,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,,"Newsletters,Stack Overflow Q&A,YouTube Videos",,,,,,,,Very useful,,,,,,Very useful,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",University courses,20,10,20,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Financial,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Other","KNIME (free version),R,SAS Base",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Data Visualization,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Often,,,,,,,,Sometimes,Often,,,,,,,Sometimes,,,,,,,Rarely,,,,50,20,10,19,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,78000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Russia,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Elixir Newsletter,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,90,0,5,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,100 to 499 employees,Decreased slightly,Don't know,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Never,,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Simulation",,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,,,,50,30,0,10,10,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,Often,,,,,,,Often,Often,,,,,,10-25% of projects,More internal than external,,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,400000,RUB,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Iran,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Not Useful,,,,Very useful,,Very useful,,Very useful,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,30,0,30,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Technology,,,,,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Data Visualization,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",,Rarely,Sometimes,,,,Most of the time,,,,,,,Sometimes,,,,Sometimes,,Often,Often,,,Sometimes,,Rarely,Most of the time,Most of the time,Most of the time,,,,,5,20,30,30,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,Rarely,Often,,Sometimes,,Most of the time,,,,,,,Often,,,51-75% of projects,More external than internal,IT Department,,We need the data to be reliable and not biased; most people deny to answer the questionnaire properly.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,DOP,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Conferences,Online courses,Personal Projects,Textbook",,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,20,15,20,30,10,5,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,RNNs,SVMs","MATLAB/Octave,Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Often,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Decision Trees,Logistic Regression,Random Forests,Recommender Systems,Segmentation",Sometimes,Sometimes,,,,Often,,Often,,,,,,,,Often,,,,,,,Often,Often,,Often,,,,,,,,40,20,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Scaling data science solution up to full database",,,,,Sometimes,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,,Sometimes,,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,80000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Other,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler",Self-taught,25,20,20,15,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Female,Italy,26,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,0,0,0,100,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression,Markov Logic Networks",A master's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,,1GB,Regression/Logistic Regression,"Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Often,,,,,,,,,,Often,,,,,Often,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation",,,,,,,Often,,,,,,,,,Often,,,,,Often,,,,,,Often,,,,,,,30,20,30,15,5,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Other,23,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,Fine,Employed by professional services/consulting firm,DataRobot,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Online courses,Personal Projects",,,,,,Not Useful,Very useful,,,,Very useful,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Unnecessary,,,,Udacity,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Kaggle Competitions,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Programmer",Self-taught,80,10,0,5,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Markov Logic Networks",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Germany,25,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book",Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,,,Very useful,Very useful,Very useful,Somewhat useful,Not Useful,Very useful,Very useful,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,20,5,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,10 to 19 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Always,10GB,"HMMs,Regression/Logistic Regression,SVMs,Other","C/C++,Java,Julia,MATLAB/Octave,NoSQL,Perl,Python,R,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Often,Rarely,,,,,Rarely,,,,,,Often,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",Often,,,,,Often,Often,,,,,,,Often,,,,,,,Most of the time,,,,,,,Often,,Often,,,,50,20,5,10,10,5,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Rarely,,,Most of the time,Sometimes,,,,,,,,,Often,Sometimes,,Most of the time,,Often,Often,,10-25% of projects,Entirely internal,Standalone Team,,Human Error,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Other",Always,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,41,Employed full-time,,,Yes,,Operations Research Practitioner,Perfectly,Employed by government,KNIME (free version),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Data Analyst,Data Scientist,Operations Research Practitioner",Self-taught,30,20,20,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,500 to 999 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Java,Julia,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Orange,Python,R,SAS Base,Spark / MLlib,SQL",Rarely,Rarely,,,,,,,,,,,,,Rarely,Rarely,Rarely,,Often,,Sometimes,,,,,,,,Often,,Often,,Often,,,,,Often,,,Rarely,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs",,,Rarely,,,Often,,Often,Sometimes,,,,,Often,,Often,,,,Sometimes,Often,Often,Often,,,,Often,Sometimes,,,,,,25,25,10,25,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,Often,Often,,,,Sometimes,,,,,,Most of the time,,Sometimes,,,,Often,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,,Rarely,120000,CAD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Necessary,Nice to have,,Necessary,Nice to have,Nice to have,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,United Kingdom,45,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Stack Overflow Q&A,Textbook",Not Useful,,Very useful,,Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,10,0,50,40,0,0,"Natural Language Processing,Recommendation Engines,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,"5,000 to 9,999 employees",Increased slightly,Less than one year,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,,10MB,"HMMs,SVMs,Other","Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Other,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,,Often,Rarely,,"Association Rules,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Natural Language Processing,Recommender Systems,Segmentation,SVMs,Text Analytics",,Sometimes,,,,Most of the time,Sometimes,,,,,,Often,Sometimes,,,,,Most of the time,,,,,Sometimes,,Often,,Sometimes,Most of the time,,,,,50,20,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Sometimes,,Sometimes,Sometimes,,,,Most of the time,,Often,,Often,,Most of the time,,10-25% of projects,More internal than external,Other,Bnc; brown corpus,Cleaning it. Especially spelling variation (mainly nlp),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Most of the time,50000,GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Colombia,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Friends network,Stack Overflow Q&A",Somewhat useful,,,,Very useful,Very useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Machine Learning Engineer,Researcher",University courses,20,10,20,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Always,100MB,"Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Natural Language Processing,Neural Networks,Random Forests,RNNs",,,,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,Often,,Sometimes,,,,,,,,,10,5,2,8,5,70,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,,Often,,Sometimes,,26-50% of projects,More internal than external,Standalone Team,phishtank; google safe browsing,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,155000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,40,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Self-employed",Java,Other,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Conferences,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,0,20,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,Other","Java,KNIME (free version),R,SQL,Other",,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,Most of the time,,,"Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,SVMs,Text Analytics",,,Sometimes,,,Often,,Sometimes,,,,,,Sometimes,,Often,,Often,,,,,,,,,,Sometimes,Most of the time,,,,,40,10,10,30,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Other",,,,,Often,,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,More internal than external,Other,Dbpedia; opendatasoft ,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",Dropbox; Google Drive,Git,Sometimes,55000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,42,"Not employed, but looking for work",,,,,,,,Java,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Other",,,,Somewhat useful,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,,5-10 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,Other",Work,20,0,50,0,30,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,United Kingdom,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Self-employed,Python,Neural Nets,Python,Other,"Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Statistician",Work,0,80,19,0,1,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Sometimes,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation",Sometimes,,,,,Often,,Sometimes,,,,,,,Often,Most of the time,,,,,,Most of the time,Rarely,,,Sometimes,,,,,,,,40,15,10,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,,,,Sometimes,Sometimes,,,Most of the time,,,Most of the time,,Sometimes,,,,Often,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Sometimes,125000,GBP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Sweden,42,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,"FastML Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,,Self-taught,40,20,30,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,10MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,Often,,,,Rarely,Sometimes,,,,,,,,,Often,Rarely,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Simulation",,,,,,Sometimes,Most of the time,Sometimes,,,,Often,,Rarely,,Rarely,,,,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,30,15,0,40,15,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,Often,,,,,Often,,Often,,,,,,,76-99% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,500000,SEK,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,59,Retired,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,60,0,20,20,0,0,"Reinforcement learning,Time Series,Unsupervised Learning","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Other,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Not Useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",Work,25,5,20,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,SAS Base,Spark / MLlib,SQL,TensorFlow",,Often,,Rarely,Rarely,,,,Sometimes,,,,,,Rarely,,Often,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,,Rarely,,,Often,Most of the time,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Often,,,,,,Often,,Often,,,Often,Sometimes,,,Sometimes,,,,,,Often,Sometimes,,,,50,15,0,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,Sometimes,Most of the time,,,,,,Rarely,Often,,,,Sometimes,Often,,100% of projects,More internal than external,Other,"I analyze election data for the LA County Registrar-Recorder/County Clerk, so that is sometimes public, sometimes proprietary. I use census data, real-estate data and stock market data.","the codes for voting precincts and their mapping into consolidated voting precincts. They have funky codes that don't connect to census data easily. Also the mapping into consolidated precincts changes every 10 years (for every new census) so it's hard to tell what is what. When I compute results on precinct level, its hard to go from those precinct codes to a map.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Never,"52,000",USD,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Argentina,35,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Somewhat useful,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,,"Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important +Male,Pakistan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,75,10,0,15,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Java,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,Often,,,,,,Rarely,,,,Rarely,,,,,,Rarely,,Often,,,,Often,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Segmentation",,Sometimes,Sometimes,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,Sometimes,,,,,,,,75,10,2,5,8,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,,Often,Often,,,Rarely,,,Sometimes,,,Sometimes,Sometimes,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,90000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",,A doctoral degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,,"IBM Cognos,Microsoft Excel Data Mining,Python,QlikView",,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,50,10,0,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,Most of the time,,,Often,Often,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,3000000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Ireland,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,,,,,"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Sometimes,100MB,"Decision Trees,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib",Sometimes,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Random Forests,SVMs",,,Often,,,,Often,Sometimes,,,,,,,,,,Sometimes,,,,,Often,,,,,Sometimes,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Company internal community,Conferences,Non-Kaggle online communities",,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher",Work,0,0,50,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAS JMP,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Most of the time,,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Often,,Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,,Sometimes,,Often,,,,30,30,5,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools",,Often,Sometimes,,Often,,,,,,,,Sometimes,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"110,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Tableau,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,"Researcher,Other",Self-taught,40,30,30,0,0,0,,Logistic Regression,A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,,Sometimes,10GB,"CNNs,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Statistics,Jupyter notebooks,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Sometimes,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,Sometimes,,,Most of the time,,,,,,,"Lift Analysis,Logistic Regression",,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,,,,,40,8,2,40,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Rarely,,,Often,,,Rarely,,,,,,,Most of the time,,,,,Often,Most of the time,,100% of projects,Do not know,Standalone Team,"FAA Flight Records, Advertising Segment Data",Size and noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,55000,CAD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,,Not Useful,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,Business Analyst,Work,55,5,30,10,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A professional degree,Financial,Fewer than 10 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner,SQL",,Rarely,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Often,Often,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Segmentation",,,,,,Most of the time,Most of the time,Most of the time,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,75,15,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Most of the time,,,Most of the time,Often,,Often,Sometimes,,,,,,,,,,,,Often,,Less than 10% of projects,More internal than external,Standalone Team,credit bureau histories,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,2800000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",15,5,75,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Image data,Rarely,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,Often,,,,Often,,,,60,10,2,10,18,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Often,,Often,,,,,,Most of the time,,26-50% of projects,Approximately half internal and half external,Other,Imagenet; Kaggle competitions; MSCOCO,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,35000,GBP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,44,"Not employed, but looking for work",,,,,,,,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Other",Other,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,I don't write code to analyze data,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,43,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,< 1 year,Necessary,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,Necessary,,,,"edX,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Master's degree,A social science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,R,Text Mining,R,University/Non-profit research group websites,"Textbook,YouTube Videos,Other",,,,,,,,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast",1-2 years,Unnecessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,Less than a year,"Business Analyst,Researcher",University courses,0,10,0,90,0,0,Survival Analysis,Decision Trees - Random Forests,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Very Important +Male,United States,61,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,IBM SPSS Modeler,Social Network Analysis,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,6 to 10 years,Operations Research Practitioner,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,,Academic,"5,000 to 9,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,Decision Trees,"IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAP BusinessObjects Predictive Analytics,SAS Enterprise Miner,SQL,Tableau,Other",,,,,,,,,,,,Often,Often,,,,,,,,,,Often,Sometimes,Most of the time,,,,,,Most of the time,,Sometimes,,,,Sometimes,,Rarely,,,Most of the time,,,Sometimes,,,,Most of the time,,,"A/B Testing,Data Visualization,Decision Trees,Text Analytics",Often,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,55,10,5,20,10,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,,,Most of the time,,,,,Often,Sometimes,,,,,,,76-99% of projects,Entirely internal,IT Department,"mostly goverment and voter data, Health records, etc",Security,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",,"80,000",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Government website,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Other,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Brazil,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,I don't plan on learning a new tool/technology,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,65,15,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data,Relational data",Most of the time,10GB,Neural Networks,"Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Often,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,,,,,,,,,Sometimes,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,20,40,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,Often,,,Often,,,Often,,,Most of the time,,,,,,Most of the time,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Sometimes,18000,BRL,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,Data Analyst,Kaggle competitions,40,10,25,0,25,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Stayed the same,1-2 years,An external recruiter or headhunter,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,R,SAS Base",,,,,,,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Rarely,,,,Most of the time,,Most of the time,,,,Most of the time,,,,Sometimes,,,,Sometimes,Often,,Most of the time,,,Sometimes,,,,Often,,,,40,40,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",Often,,Often,,Most of the time,,,,Most of the time,,,,,Sometimes,,,,,,,,,Less than 10% of projects,More internal than external,Other,"Census, cibil, bloomberg etc",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,2000000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,30,60,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Telecommunications,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Other,Laptop or Workstation and private datacenters,Relational data,Always,1MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,"Data Visualization,Logistic Regression,Natural Language Processing",,,,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",,,,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that doesn't perform advanced analytics",Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,60,0,25,15,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,<1MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,Most of the time,,Most of the time,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,Most of the time,,,,,,35,40,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Unavailability of/difficult access to data,Other",,,,Often,,,,,,,,,,,,,,,,,Often,Sometimes,100% of projects,Do not know,Standalone Team,,"by their own nature, the data sets are small (usually <200)","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Always,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,62,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",TensorFlow,Text Mining,Python,I collect my own data (e.g. web-scraping),"Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,,,,Very useful,,,,Very useful,,Very useful,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,More than 10 years,Other,Self-taught,100,0,0,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,10GB,Bayesian Techniques,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,20,30,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Rarely,,,,Most of the time,,,,,,,,,,,Most of the time,Sometimes,,,,Often,,51-75% of projects,More external than internal,IT Department,None,"Cleaning data, data wrangling, developing insightful data visualizations that are meaningful to others, explaining what the data means and why action needs to be taken based on analysis ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,"200,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Deep learning,Python,"GitHub,Google Search","College/University,Non-Kaggle online communities,Online courses,Textbook",,,Very useful,,,,,,Very useful,,Very useful,,,,Somewhat useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Researcher,University courses,30,20,20,30,0,0,Time Series,Logistic Regression,A bachelor's degree,Academic,I don't know,Stayed the same,Don't know,Some other way,Not very important,Other,Basic laptop (Macbook),"Image data,Text data",,100MB,Other,"Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,Often,,,,,Sometimes,,,,,,,,,,,,,,"Data Visualization,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,Often,,,,20,40,20,20,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Often,,,,,,,,,,,Sometimes,Often,,76-99% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Other,Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Spain,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,Somewhat useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Researcher,Statistician",Self-taught,50,10,10,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Video data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Amazon Web services,Java,Julia,Jupyter notebooks,Microsoft Azure Machine Learning,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,Most of the time,,,,,Sometimes,,,,,,Rarely,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Sometimes,,Most of the time,Often,Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,Often,Most of the time,,Most of the time,,Often,Sometimes,,Most of the time,,Most of the time,,,,55,25,10,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Often,,Most of the time,,Often,Most of the time,,,Often,,Often,,,,Often,Sometimes,Most of the time,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,MRI Public Data;CST Public Data;Public Hospital Data,Structure and lack of communication with the datawarehouse,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,35000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,India,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,35,0,0,35,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Not important +Male,Germany,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Analyst,Researcher,Statistician",Self-taught,35,20,35,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,20 to 99 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs",Often,,,,,Often,Most of the time,Sometimes,,,,,,,,Often,,,,,Rarely,Most of the time,Often,,,Most of the time,,Sometimes,,,,,,50,10,0,10,0,30,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,Often,Sometimes,Most of the time,,Often,Sometimes,Rarely,,,Often,,Most of the time,Most of the time,Sometimes,Often,,Most of the time,Often,,51-75% of projects,Entirely internal,Standalone Team,weather; maps; open data (governant and public organization data),cleaning and processing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Most of the time,,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,62,"Not employed, but looking for work",,,,,,,,Mathematica,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Operations Research Practitioner,Programmer",Self-taught,70,0,10,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,United States,25,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,C/C++/C#,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle",,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Partially Derivative Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,"Traditional Workstation,Workstation + Cloud service,Other",2 - 10 hours,Master's degree,Yes,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Operations Research Practitioner,Programmer,I haven't started working yet",University courses,20,20,5,40,10,5,"Adversarial Learning,Speech Recognition","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Other,Neural Nets,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Biology,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,50,25,25,0,0,0,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,GPU accelerated Workstation,Text data,Sometimes,100GB,Neural Networks,"C/C++,Java,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,R,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,Sometimes,Most of the time,,Often,,,,,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Natural Language Processing,Neural Networks,Text Analytics",,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,40,0,50,10,0,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Often,Less than 10% of projects,Entirely internal,Other,"PubMed, MIMIC II & III, i2b2","Size, transferring data over slow networks to the compute resources.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,MS SQL Server 2016,Bitbucket,Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Canada,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",Spark / MLlib,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Data Miner,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",,100GB,"Neural Networks,SVMs","NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Data Visualization,Natural Language Processing",,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,20,20,0,20,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,Sometimes,,,,,,Sometimes,,,51-75% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,,,,Very useful,,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Miner,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,20,0,5,75,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Ensemble Methods,Random Forests,RNNs,SVMs","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Random Forests,RNNs,SVMs",,,,,,Most of the time,Most of the time,,Rarely,,,,,Often,,,,,Most of the time,,,,Most of the time,,Sometimes,,,Most of the time,,,,,,20,5,5,55,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,Most of the time,,,,Sometimes,,,76-99% of projects,More internal than external,IT Department,,"Getting permission from IT to save our incoming data. Even though storage is relatively inexpensive, we do not save our events for historical or data analysis purposes.","Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Always,115000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Not Useful,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,Very useful,Very useful,Not Useful,,,Very useful,"Data Elixir Newsletter,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,Stayed the same,Don't know,A general-purpose job board,Very important,Other,Workstation + Cloud service,Relational data,,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Perl,Python,R,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,,,,,Sometimes,Often,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,,,Rarely,Rarely,Sometimes,,,,50,5,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,Often,Sometimes,,,,,,Sometimes,,,Sometimes,Often,,100% of projects,Approximately half internal and half external,Other,ICPSR; IPUMS; historical data,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,180000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Random Forests,Python,Google Search,"Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,Very useful,,Very useful,Very useful,,Very useful,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,,,A doctoral degree,Technology,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,,10MB,,"Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Often,,,Rarely,,,,,,,"Data Visualization,Segmentation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,60,0,0,40,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,51-75% of projects,More internal than external,Central Insights Team,Mintigo; Aberdeen,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Rarely,95000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,,Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting","Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Rarely,,,,,,Often,,,Often,,,,,Sometimes,,,,,,25,5,10,10,50,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,Often,,,,Often,Rarely,,,Often,Rarely,Sometimes,Rarely,Sometimes,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,49,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,Oracle Data Mining/ Oracle R Enterprise,Social Network Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,Very useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,30,10,30,5,5,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Government,10 to 19 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Regression/Logistic Regression,SVMs","KNIME (free version),Microsoft Excel Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,Often,,,,Often,,Often,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,Often,,Often,Most of the time,,,,40,20,10,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Often,Most of the time,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Most of the time,12900,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects",,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,More than 10 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Other",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,Often,Sometimes,Often,,Sometimes,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,Often,,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,,,Most of the time,,,,,70,2,4,20,4,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Most of the time,,,,,,,,,,Often,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Sometimes,160000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Amazon Machine Learning,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Official documentation,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Researcher,Statistician",Work,5,0,20,75,0,0,"Natural Language Processing,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Sometimes,10MB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Perl,Python,R,Other,Other",,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,,,Often,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Sometimes,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,Often,,,,Sometimes,,,,,,Most of the time,Often,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Often,Rarely,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,Often,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,68000,,Other,5,,,,,,,,,,,,,,,,,, +Female,United States,29,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Engineering (non-computer focused),,Engineer,University courses,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,Not Useful,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Very Important +Female,United States,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Conferences,Official documentation,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,Very useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Engineer,Other,0,0,20,0,0,80,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,,,Sometimes,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Sometimes,,,,Sometimes,Most of the time,Most of the time,,Often,,,,,Sometimes,,Often,,,Rarely,,Often,,Often,Most of the time,,,,,,,,,,30,10,40,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Other",Sometimes,,,,Often,,,,,,,,,,,,,,,,,Often,100% of projects,More internal than external,Standalone Team,,lack of standardization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,dropbox;slack,Git,,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,67,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,,"Arxiv,Blogs,Friends network,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10TB,"Random Forests,Other","Cloudera,Hadoop/Hive/Pig,Impala,Java,NoSQL,R,Spark / MLlib,SQL,Unix shell / awk,Other",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,Sometimes,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,Most of the time,Sometimes,,,,,,Often,Most of the time,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Random Forests",,,,,,Often,Most of the time,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,10,20,20,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,Often,Sometimes,,,,,,,,Sometimes,,,Often,,,76-99% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,46000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Anomaly Detection,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,Very useful,,Very useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",10,60,10,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Never,100MB,Regression/Logistic Regression,"Impala,Jupyter notebooks,Python,SQL,Tableau",,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Lift Analysis,Logistic Regression,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,Rarely,Rarely,,,,,,,,,,,Rarely,,,Sometimes,,,,40,5,5,30,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Sometimes,,,,,,Often,,,Sometimes,,,,Often,,76-99% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,"Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,0,10,40,50,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,100GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,65,5,10,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Limitations of tools",Sometimes,Often,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,Less than 10% of projects,More internal than external,Other,,,,,,,,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",15,15,0,20,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,65,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Other,Yes,Master's degree,Mathematics or statistics,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,United States,29,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"edX,Udacity",Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst",University courses,10,0,60,30,0,0,Unsupervised Learning,Logistic Regression,I prefer not to answer,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Web services,Deep learning,R,University/Non-profit research group websites,"Blogs,Conferences,Friends network,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Very useful,Very useful,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,University courses,50,0,40,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service","Image data,Other",,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,Rarely,,Often,Most of the time,Often,Often,,,Often,,Often,,Often,,,,Sometimes,Most of the time,,Often,,,,,,,Sometimes,,,,25,25,0,30,20,0,Enough to refine and innovate on the algorithm,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Often,Sometimes,,100% of projects,Approximately half internal and half external,Other,"The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), JWGray Breast Cancer Cell Line Panel",The number of features vastly dwarfs the number of samples.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,105000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,IBM SPSS Modeler,,R,Government website,"Blogs,Conferences,Personal Projects,Tutoring/mentoring",,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,,,,,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,More than 10 years,Researcher,Self-taught,100,0,0,0,0,0,,"Logistic Regression,Other (please specify; separate by semi-colon)",High school,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,Regression/Logistic Regression,IBM SPSS Statistics,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,PCA and Dimensionality Reduction",Sometimes,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,40,10,0,10,40,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,medicare.gov,Integrating the different data sources into one.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Other,Always,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,20,0,30,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,1PB,"Neural Networks,Regression/Logistic Regression,RNNs","Java,Spark / MLlib",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Recommender Systems,Simulation",Most of the time,Often,Sometimes,,Often,Most of the time,Most of the time,,,,,,,,Often,Most of the time,,Sometimes,,Often,,,,Often,,,Often,,,,,,,20,20,40,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",Sometimes,,,Often,Most of the time,Sometimes,,,Sometimes,,,Sometimes,,,,,,Often,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,Yes,,Data Scientist,,,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Researcher",University courses,0,0,40,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",Don't know,100GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SQL",,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Rarely,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,Sometimes,Often,,Sometimes,,Rarely,,Sometimes,Often,,Most of the time,,,,,,,Sometimes,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,Often,,,,,,,,,,,Sometimes,,,76-99% of projects,Entirely internal,Other,,Dirty data from bugs in counters; historical changes in schemas not documented ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,90000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,16,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM Watson / Waton Analytics,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Not Useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Brazil,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses",,Somewhat useful,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,SQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,,Partially Derivative Podcast,1-2 years,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,Business Analyst,Self-taught,50,0,0,50,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,United States,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,Jupyter notebooks,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Software Developer/Software Engineer,I haven't started working yet",University courses,20,0,0,70,10,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,10MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,,,Most of the time,,Often,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,Sometimes,,,100% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Romania,65,Employed full-time,,,No,Yes,Machine Learning Engineer,Poorly,Employed by college or university,C/C++,Random Forests,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer",Self-taught,50,0,30,0,20,0,Computer Vision,Decision Trees - Random Forests,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Google Cloud Compute,Bayesian Methods,Python,"Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Software Developer/Software Engineer,University courses,40,0,10,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,Academic,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data,Other",,100GB,"Bayesian Techniques,CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Rarely,,,,,,,,,Sometimes,,,,Often,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,Sometimes,Often,,Most of the time,Most of the time,,,,,,,Sometimes,,Often,,Rarely,Sometimes,Often,Often,,Often,,Often,,,,Sometimes,,,,,30,30,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database",Most of the time,,Often,,Most of the time,,,,Most of the time,,,Sometimes,Sometimes,,,,,Sometimes,,,,,100% of projects,More internal than external,Other,MIMIC,"Its sparse, irregular, and noisy nature.","Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,Somewhat useful,,,,,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - RNNs",A doctoral degree,Technology,100 to 499 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,100GB,"Neural Networks,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Perl,Python,R,Spark / MLlib,Unix shell / awk",,,,Sometimes,,,,,Rarely,,,,,,Rarely,,Rarely,,,,,,,,,,,,,Most of the time,Rarely,,Rarely,,,,,,,,Rarely,,,,,,,Most of the time,,,,"Logistic Regression,Neural Networks",,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,50,0,30,5,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Sometimes,Most of the time,,,,Sometimes,,,,Most of the time,,Most of the time,,Often,,Often,,,,51-75% of projects,More internal than external,Standalone Team,None,Dirty/incomplete data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,"110,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,A social science,Less than a year,Business Analyst,Self-taught,100,0,0,0,0,0,Survival Analysis,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important +Female,Chile,32,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by government,Hadoop/Hive/Pig,Rule Induction,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,YouTube Videos",,,Very useful,,,,,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher",University courses,0,20,30,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Decision Trees,Regression/Logistic Regression,SVMs","Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,Sometimes,,,,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,Rarely,,,Most of the time,Most of the time,,,,,,,40,20,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,Most of the time,,,,,,Most of the time,,,,Sometimes,,,,51-75% of projects,Entirely internal,Business Department,DEIS;Public Data from Goverment,Understand the business behavior and support planing processes,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,2100000,CLP,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,"Data Machina Newsletter,Linear Digressions Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",45,15,0,0,0,40,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,Regression/Logistic Regression,"Amazon Web services,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,10,20,65,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,95000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Israel,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,"Employed by non-profit or NGO,Self-employed",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Textbook",,,,,,,Very useful,,,,,,,,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst","Online courses (coursera, udemy, edx, etc.)",15,55,15,0,15,0,Other (please specify; separate by semi-colon),,"Some college/university study, no bachelor's degree",Non-profit,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,,,"Java,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,45,10,10,5,30,0,Enough to run the code / standard library,"I prefer not to say,Scaling data science solution up to full database",,,,,,,Often,,,,,,,,,,,Rarely,,,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,50,0,20,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,,"Cloudera,Hadoop/Hive/Pig,Impala,Java,Python,Spark / MLlib,Tableau",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,Rarely,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,Association Rules,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Most of the time,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Never,70000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",,Online Courses and Certifications,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,QlikView,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer",University courses,0,0,40,40,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Always,1TB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,SQL",,Most of the time,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",Most of the time,Often,Often,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Often,,,,Most of the time,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",Often,,,,,,,,Often,,Often,,Often,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Work,0,20,40,0,40,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Video data,Text data,Relational data",,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Often,Sometimes,,,,Sometimes,,Often,,Sometimes,,,,,Sometimes,,Often,,,,,,,,,,,20,20,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,Sometimes,,Sometimes,Often,,,,,,,,,,Sometimes,Sometimes,,,76-99% of projects,Entirely internal,IT Department,,Must work directly with cloud provider APIs.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,102000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Not Useful,,,Somewhat useful,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,30,20,0,30,10,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",Python,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects",,Somewhat useful,,,,,,,,,,Very useful,,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,25,0,5,0,,,,Financial,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,,,,,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,0,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Business Department,,,,,,,,,HUF,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed part-time,,,No,Yes,Other,Perfectly,Employed by college or university,SAS Base,I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,,No Free Hunch Blog,1-2 years,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Traditional Workstation,Other",0 - 1 hour,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner",Self-taught,90,0,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Canada,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,61,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by company that makes advanced analytic software,Stan,Deep learning,R,Google Search,"Arxiv,Blogs,College/University,Conferences,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,"Researcher,Software Developer/Software Engineer,Statistician",University courses,10,5,5,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Never,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Java,MATLAB/Octave,Python,R,SQL,TIBCO Spotfire,Unix shell / awk",,Rarely,,Often,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,Often,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,20,30,0,20,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,Often,Often,,,Sometimes,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Rarely,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed part-time,,,Yes,,Predictive Modeler,Perfectly,Employed by a company that performs advanced analytics,Python,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Predictive Modeler,"Online courses (coursera, udemy, edx, etc.)",30,30,25,0,15,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Decreased slightly,3-5 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,Sometimes,,,Sometimes,,Most of the time,,,,Rarely,Rarely,Rarely,Rarely,,Sometimes,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",Often,,,,,Most of the time,,Most of the time,,,,Often,,,,Most of the time,,,,,Most of the time,,Often,,,,,Sometimes,,,,,,30,40,10,20,0,0,Enough to tune the parameters properly,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Sometimes,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,don't know,Git,Rarely,,,Other,8,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Deep learning,Matlab,"Google Search,Government website,University/Non-profit research group websites","College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,University courses,15,5,0,80,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,United Kingdom,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Scientist,Researcher",University courses,50,20,30,0,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A bachelor's degree,Pharmaceutical,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Java,KNIME (free version),NoSQL,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Often,,,,Often,,Sometimes,,Sometimes,,,,,Sometimes,,Rarely,,,,,,,,,,,40,20,5,25,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,GOSTAR; ChEMBL; Pubchem; SureChEMBL,Data integration and cleaning.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",Confluence wiki; shared filesystem,Git,Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Sweden,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,R,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Data Scientist,Self-taught,50,50,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A bachelor's degree,Insurance,100 to 499 employees,Increased significantly,1-2 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Relational data",Never,1TB,"CNNs,Decision Trees,Neural Networks,Random Forests","Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests",,,,Most of the time,,Often,Most of the time,Often,,,,,,Sometimes,,,,,,Most of the time,,,Often,,,,,,,,,,,60,15,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,Often,,,Most of the time,Sometimes,,,Often,,,,,,,,Often,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Other,Never,400000,SEK,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,1,Employed full-time,,,Yes,,Scientist/Researcher,Fine,,Other,Deep learning,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,Engineer,Self-taught,30,10,10,30,5,15,"Computer Vision,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",High school,Manufacturing,"10,000 or more employees",Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Image data,Text data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Orange,Perl,Python,R,SAS JMP,SQL,Tableau",,Rarely,,,,,,,Rarely,,,,,,Sometimes,,Often,,,,Often,,Often,Often,,,,,Rarely,Rarely,Sometimes,,Most of the time,,,,,,,Often,,Sometimes,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",Sometimes,,,,,Most of the time,Most of the time,Often,Often,Rarely,,Often,,Often,,Often,,,,Sometimes,Often,,Most of the time,,,Sometimes,,Rarely,,,,,,50,20,5,20,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,I prefer not to say,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data,Other",Most of the time,,Sometimes,,Often,Sometimes,Most of the time,,Sometimes,Sometimes,,,Often,,,,,,,,Sometimes,Often,10-25% of projects,Entirely internal,Other,"noaa, census, others ... ","some is cleaning, but some is the iteration on ""this gives a signal, but what does it mean"". It is absolutely critical to be able to connect mathematical significance with ""physics"" driving the phenomena. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Other,We have no internal architecture to do this,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Don't know,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,44,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,Python,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",,Github Portfolio,Sort of (Explain more),Doctoral degree,Computer Science,Less than a year,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,50,40,0,0,10,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Female,Canada,26,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by company that makes advanced analytic software,Employed by college or university",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,15,20,50,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,NoSQL,Python,TensorFlow",,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Neural Networks,Random Forests,SVMs",,,,,,Often,Often,Sometimes,Sometimes,,,,,,,,,,Most of the time,Often,,,Sometimes,,,,,Sometimes,,,,,,40,30,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,Often,Sometimes,,,,Most of the time,,26-50% of projects,Entirely external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,60000,CAD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,R,,R,"Google Search,Other","College/University,YouTube Videos",,,Very useful,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Data Analyst,University courses,30,0,20,50,0,0,Time Series,Logistic Regression,A master's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,Other,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis,Other,Other",,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,Most of the time,Most of the time,,50,40,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,,,,Sometimes,,,,,,,Often,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,"73,000",USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Argentina,32,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,R,Other,"Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,,,,,Very useful,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,1 to 2 years,,University courses,5,0,15,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Decreased slightly,Don't know,Some other way,Very important,Other,Basic laptop (Macbook),Other,Rarely,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction",,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,60,20,0,10,10,0,Enough to tune the parameters properly,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Do not know,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Other,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Company internal community,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,Somewhat useful,,,,,,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,20,5,20,55,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,Often,,,,Sometimes,,,,,Sometimes,Most of the time,,Most of the time,,,,,,Most of the time,,,,Rarely,,,,Sometimes,,Sometimes,,,,,Rarely,Rarely,Rarely,Most of the time,Most of the time,,,Rarely,,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Most of the time,,Often,,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Most of the time,,Often,,,Most of the time,,Often,,,,,,,Most of the time,,,,10,55,0,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,,,,Most of the time,,,100% of projects,Entirely internal,Other,None,Using spark on big data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Never,95000,USD,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,University courses,30,0,0,70,0,0,Supervised Machine Learning (Tabular Data),"Markov Logic Networks,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,South Africa,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Jupyter notebooks,Time Series Analysis,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Conferences,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,Somewhat useful,Somewhat useful,,,,,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,"O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,0,40,40,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Other (please specify; separate by semi-colon)",,Academic,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,100GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,Often,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Simulation,Text Analytics",Often,,Most of the time,,Sometimes,Sometimes,Often,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,25,15,25,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,,,,Sometimes,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,Handling large volumes,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,100000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,80,10,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Non-profit,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1GB,"Decision Trees,Neural Networks,Random Forests","MATLAB/Octave,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests",,,,,,,Most of the time,Often,,,,,,,,Often,,,,Often,,,Often,,,,,,,,,,,10,5,0,80,5,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Don't know,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,,,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Personal Projects",,Very useful,Very useful,,,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,PhD,Sort of (Explain more),Bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Textbook",,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Nice to have,,,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Machine Learning Engineer",Self-taught,30,40,0,30,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Sweden,39,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Newsletters,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,Not Useful,,,,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,,Other,University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important +Male,United Kingdom,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Newsletters,Textbook",Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,,,,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,50,10,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Often,Sometimes,,Often,,Often,,Most of the time,,Often,,Often,Often,,Most of the time,Sometimes,,Sometimes,Sometimes,Often,,Most of the time,,,,30,20,20,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process",Often,Sometimes,,Often,Often,,,Often,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"100,000",GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,63,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Social Network Analysis,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Conferences,Friends network,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler,Programmer,Statistician",Self-taught,90,0,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Retail,500 to 999 employees,Decreased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10MB,"Decision Trees,Regression/Logistic Regression","R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,Most of the time,Sometimes,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Time Series Analysis",Often,Often,,,,Sometimes,,Sometimes,,,,,,,Most of the time,Most of the time,,,,Often,Often,,,Sometimes,,Most of the time,,,,Sometimes,,,,30,20,10,2,10,28,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Sometimes,Often,,,,,,Most of the time,,,,,Most of the time,,Less than 10% of projects,Entirely internal,Business Department,census,"transformation of raw data from diverse sources into usable, coherent analytics datasets","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Other,Rarely,120000,CAD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Newsletters,Personal Projects,Textbook",,,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,Other,University courses,20,0,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",,,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Other",,,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,50,20,0,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,10-25% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Sometimes,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,Other,40+,Kaggle Competitions,No,Master's degree,Physics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,25,0,10,15,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Python,Time Series Analysis,Java,,"Blogs,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,0,30,70,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Ensemble Methods,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,100GB,"Ensemble Methods,SVMs","Java,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Segmentation,SVMs,Text Analytics",,,,,,Often,Most of the time,,,,,,,Often,,,,,Most of the time,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,15,50,20,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,,,,Often,,,,Often,,,,,,,76-99% of projects,More internal than external,Other,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Rarely,78000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Official documentation,Stack Overflow Q&A",,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Researcher,Other",University courses,10,0,10,80,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,,100GB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,,,,,Often,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,80,10,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,,51-75% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"37,000",USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Support Vector Machines (SVM),Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,60,0,40,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,,1GB,Regression/Logistic Regression,"Amazon Web services,Java,NoSQL,R,Spark / MLlib,SQL,Other,Other",,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,,,,Often,Often,,"Bayesian Techniques,Logistic Regression,Naive Bayes,Other",,,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,Often,,,20,5,5,10,5,55,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Sometimes,Often,Sometimes,,,Often,,,,Often,,Sometimes,,,,Sometimes,,Often,,51-75% of projects,Approximately half internal and half external,Other,,,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Most of the time,175000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,KNIME (free version),,,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Statistician",University courses,20,0,20,40,20,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Other,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,100GB,Random Forests,"IBM SPSS Modeler,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS JMP,Spark / MLlib,SQL",,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,Rarely,Sometimes,,Rarely,Most of the time,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Random Forests,Simulation",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,5,20,35,40,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Lack of data science talent in the organization",,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,Do not know,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Git,Sometimes,46000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,New Zealand,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,40,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),Text Mining,R,,"College/University,Trade book",,,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Operations Research Practitioner,Predictive Modeler",Self-taught,95,5,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Prescriptive Modeling,Time Series Analysis",,Rarely,,,,Rarely,,,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,,,50,50,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Need to coordinate with IT",Sometimes,,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Email",,Other,,2000000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Denmark,56,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,"Business Analyst,DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,30,0,0,50,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A professional degree,Pharmaceutical,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100GB,"CNNs,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SAP BusinessObjects Predictive Analytics,SAS JMP,SQL,Unix shell / awk",Often,,,,,,,Sometimes,,,,,,,,,Often,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,Sometimes,,,Often,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,,Often,,,,Often,,Rarely,,,,,,,Sometimes,,Often,,,,,,,,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,Inability to integrate findings into organization's decision-making process,,,,,,,,Often,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,N/A,too little data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,650000,DKK,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,SQL,,"College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Statistician",Work,20,10,20,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1TB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Most of the time,,Sometimes,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,Most of the time,,,,Sometimes,,,,85,5,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,Often,,Often,Often,,,,Most of the time,Often,,,Sometimes,,,Most of the time,,,76-99% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,168000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed part-time,,,No,Yes,Statistician,Fine,Self-employed,TensorFlow,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,High school,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Monte Carlo Methods,Python,Google Search,"Arxiv,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,Very useful,Very useful,,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher",University courses,20,20,20,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,SAS Base,SQL,Unix shell / awk",,,,Sometimes,Most of the time,,,,Most of the time,,,,,,Sometimes,,Most of the time,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,,,,,,Rarely,,,,Sometimes,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,,,Sometimes,Often,Often,Often,,,Often,,,,Often,,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,20,20,10,5,45,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,I don't typically share data",,Bitbucket,,"300,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,New Zealand,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Stan,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,"FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,50,10,15,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,CRM/Marketing,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,R",,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,,Sometimes,,Rarely,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Most of the time,,Sometimes,Sometimes,,Most of the time,,Most of the time,,,Most of the time,,Often,Most of the time,,,,,20,10,5,10,25,30,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,,,,Often,,,,,,,,,,Often,,Often,,,10-25% of projects,Entirely internal,Standalone Team,None,Organising the data into a suitable format for analysis ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Amazon Web services,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"College/University,Newsletters,Stack Overflow Q&A",,,Very useful,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,50,0,40,10,0,0,"Natural Language Processing,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Telecommunications,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Text data,Other",Sometimes,10GB,"Bayesian Techniques,Neural Networks,RNNs","C/C++,Julia,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,RNNs,Segmentation,Text Analytics,Time Series Analysis",,,Often,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,Often,Most of the time,,,Often,Most of the time,,,,40,40,10,0,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,Often,,,,Often,,,,Often,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,Finding it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,100000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Bayesian Methods,Python,,"Blogs,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,Time Series,,A master's degree,Financial,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Sometimes,100MB,,"C/C++,Hadoop/Hive/Pig,Python,R",,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Often,,,,10,50,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,France,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Support Vector Machines (SVM),C/C++/C#,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Female,Other,18,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Perfectly,Self-employed,,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,,Very useful,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher",Self-taught,90,0,0,10,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,,Text data,Sometimes,,,Orange,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,8,40,20,30,2,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Privacy issues",,,,,,,,,,,,,,,,,,,,,,,100% of projects,Do not know,Other,Non,Little timetable,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Email,,Subversion,Most of the time,1,,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Neural Nets,R,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,"Other,I haven't started working yet",University courses,25,0,0,75,0,0,,Logistic Regression,A master's degree,Telecommunications,100 to 499 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,Regression/Logistic Regression,"Amazon Web services,Microsoft Excel Data Mining,R",,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Simulation",,,,,,Rarely,Often,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,50,20,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,,Rarely,80000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Canada,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,Self-taught,60,0,0,40,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Reinforcement learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Female,France,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Statistician,Other",University courses,5,5,30,55,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,Spark / MLlib,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Often,Sometimes,Most of the time,,,Sometimes,,Often,,Most of the time,,,,,Often,,Most of the time,,,Most of the time,,,,Sometimes,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,Most of the time,,Most of the time,Sometimes,Often,,Often,Most of the time,,Rarely,,Often,Sometimes,Sometimes,,Often,,,,Often,,Less than 10% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git,Other",Never,43500,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Australia,34,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by government,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,35,15,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Don't know,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Microsoft Excel Data Mining,Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,,Often,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,60,10,10,10,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,,,,,,,,,,Often,,Often,,,Sometimes,,,26-50% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Female,Colombia,26,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,R,University/Non-profit research group websites,"Friends network,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Very useful,,,,,,Very useful,,Very useful,Very useful,,,Very useful,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst",Work,10,50,40,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A professional degree,Other,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Often,,Most of the time,,Rarely,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Text Analytics",,Often,,,,Sometimes,Most of the time,Often,,,,,,,,Often,,,Often,,,Rarely,Sometimes,,,,,,Often,,,,,30,15,10,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database",Often,,,,,Most of the time,,Sometimes,Sometimes,,,,,,Often,,,Most of the time,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,54000000,COP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,"Blogs,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,,,A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Random Forests",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Lift Analysis,Random Forests,Segmentation,Text Analytics",Often,,,,Often,Often,Often,,,,,,,,Often,,,,,,,,Often,,,Often,,,Often,,,,,30,30,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization",,,,Often,Often,,,,Often,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,36,Employed part-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",,Very useful,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",5,5,90,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,10GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,SAS Base,Spark / MLlib,SQL,TensorFlow",,Rarely,,Rarely,,,,Sometimes,Often,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,Sometimes,,,,Most of the time,,,,,,,Rarely,,,Often,Often,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,,,Sometimes,,Often,,,Often,Often,Often,,Most of the time,,,Often,,,Often,Often,,,,10,30,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,,Sometimes,,,,,,Sometimes,,,76-99% of projects,Entirely internal,Other,Weather,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Sometimes,70000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",Somewhat useful,,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and private datacenters,Image data,Never,1GB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression,SVMs","C/C++,Google Cloud Compute,Jupyter notebooks,Python,R,TensorFlow",,,,Most of the time,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,SVMs,Time Series Analysis",,,Often,,,Often,Most of the time,,,,,,,,,Often,,Often,,,,,,,,,,Often,,Often,,,,40,30,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,,,,,Sometimes,,,,,Often,,,None,Do not know,Other,ncbi cancer datasets; real estate sales datasets;,"getting insights with new sources of data (unlabeled data, implications of dropping na data)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United States,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,43,5,20,2,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Canada,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,,"Bayesian Techniques,Logistic Regression",A doctoral degree,Other,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",,10MB,,"Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,30,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,90000,CAD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Other,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,Not Useful,Somewhat useful,Very useful,,,,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,10,30,30,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,Other,Other,Other",,Sometimes,,Most of the time,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Often,Often,Often,"A/B Testing,Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks",Most of the time,,Often,,,,,,Often,,,Sometimes,,,Rarely,Sometimes,,,Sometimes,Most of the time,,,,,,,,,,,,,,20,50,5,5,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Often,,Often,,,,Sometimes,Often,,Most of the time,,,,,,,,Often,,,26-50% of projects,More internal than external,Central Insights Team,Google Street View;ImageNet,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Git,Mercurial",Rarely,85000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Poland,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Stan,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer,Statistician",Self-taught,35,20,10,15,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,10 to 19 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Stan",,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,Rarely,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,Sometimes,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,Rarely,Sometimes,,,Often,Most of the time,Sometimes,,,,,,,,Often,,Often,,,Most of the time,,,,,Most of the time,Most of the time,,,Sometimes,,,,15,50,10,10,15,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,Sometimes,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Git,Other",Sometimes,70000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Friends network,Kaggle",,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Data Analyst,Self-taught,30,25,5,10,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,I don't know,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Often,,Often,Often,,,Often,,,,,,,,,Often,,Often,,,Often,,Often,,Often,,,,40,30,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Often,,,,,,,,Often,Often,,,,,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,13818,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,Not Useful,Very useful,Somewhat useful,,,,"Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,Researcher,Self-taught,60,0,20,20,0,0,"Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",High school,Academic,500 to 999 employees,Stayed the same,Don't know,A career fair or on-campus recruiting event,Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100MB,"Bayesian Techniques,Ensemble Methods,HMMs,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SQL,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,Sometimes,Sometimes,,,,,Most of the time,,,,"Bayesian Techniques,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Time Series Analysis",,,Often,,,,,,Sometimes,,,,Rarely,Rarely,,Most of the time,,Sometimes,,,,,,,,,,,,Most of the time,,,,60,20,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Often,Most of the time,,,,,,Sometimes,,Often,Most of the time,,Most of the time,,Most of the time,Most of the time,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Company internal community,Conferences,Kaggle,Online courses",,,,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Other",University courses,40,0,0,60,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,1TB,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,Often,,,,,,,,,Often,,,,,,,Often,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",Often,Often,,,,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,50,20,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Most of the time,Sometimes,Sometimes,,,,Often,,,,,,,,Often,,,Sometimes,Often,,51-75% of projects,More internal than external,Central Insights Team,school data sets,"cleanliness, ID consistency, completeness, privacy","Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Rarely,"160,000",,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,More than 10 years,"DBA/Database Engineer,Programmer,Other",Work,20,0,80,0,0,0,,,A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Always,,,"Hadoop/Hive/Pig,Python,SQL,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Time Series Analysis",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,60,0,5,25,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,I prefer not to say,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Often,,Often,,,,,,,,,,Sometimes,,Often,,,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Git,Mercurial",Sometimes,,,Has decreased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +A different identity,United States,0,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Genetic & Evolutionary Algorithms,C/C++/C#,GitHub,"Arxiv,Official documentation,Online courses,Personal Projects,Textbook",Somewhat useful,,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,15+ years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Other,Sort of (Explain more),Master's degree,Other,,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Evolutionary Approaches,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,19,"Not employed, but looking for work",,,,,,,,Tableau,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,,,,,Very useful,,,,,,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,20,20,0,60,0,0,,Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,Brazil,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,90,8,0,2,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",Rarely,,,,,,Often,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Sometimes,Often,,Most of the time,Most of the time,,,,Often,Often,,,Often,,,Most of the time,Sometimes,,10-25% of projects,Entirely internal,Business Department,None,Privacy issues,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,200000,BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Other,,Online Courses and Certifications,No,Bachelor's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,,Very Important,Very Important,Very Important +Male,Australia,56,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,,,,,,,,Very useful,,,,,"DataTau News Aggregator,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Other",Self-taught,90,10,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,"10,000 or more employees",,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Always,100GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,RNNs","Amazon Web services,C/C++,DataRobot,Jupyter notebooks,NoSQL,Python,R,Stan,TensorFlow,Unix shell / awk",,Sometimes,,Most of the time,,Rarely,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,Most of the time,Most of the time,,Most of the time,Most of the time,,Most of the time,Most of the time,Often,Sometimes,,Sometimes,,,,,,Most of the time,Sometimes,,,,Often,,,,,Sometimes,,,,45,25,10,10,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,"GDB-17, PubChem",Fitting it in GPU memory,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Bitbucket,Sometimes,75000,USD,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,,Genetic & Evolutionary Algorithms,SAS,,"Friends network,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Psychology,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician,Other",University courses,20,0,20,60,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,Other,,,,,Somewhat important,,Laptop or Workstation and private datacenters,Text data,,100GB,,"SAS Base,SAS Enterprise Miner,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,,,,Rarely,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Neural Networks,Prescriptive Modeling,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,40,10,10,10,0,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Rarely,,,,,,51-75% of projects,More internal than external,Standalone Team,,cleaning it up,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,New Zealand,57,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,FastML Blog",3-5 years,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,,More than 10 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,20,0,0,60,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important +Female,United States,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Fine,Self-employed,,,,,"Blogs,Kaggle",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,I don't write code to analyze data,"Researcher,Other",,50,50,0,0,0,0,,,A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,,,,,,, +Female,India,24,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,,"Data Machina Newsletter,FastML Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,35,30,0,35,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important +Male,France,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,R,"Google Search,Government website","Blogs,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Very useful,,,,,,,Very useful,,,Very useful,,"FastML Blog,Jack's Import AI Newsletter,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Somewhat important +Male,United States,49,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Conferences,Friends network,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Partially Derivative Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,10,20,10,30,0,30,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,50,Employed part-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,Survival Analysis,R,Government website,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,,,,Other,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Other,I haven't started working yet",University courses,15,0,0,80,5,0,"Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,NoSQL,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher",University courses,10,30,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,Often,,Most of the time,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs",,Sometimes,Sometimes,,,Most of the time,Often,Often,,,,,,Often,,Most of the time,,,,Often,,,Sometimes,,,,,Sometimes,,,,,,25,20,20,10,25,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Privacy issues",,,,,Most of the time,,,,,,,,Often,,,,Often,,,,,,26-50% of projects,More external than internal,IT Department,cannot disclose,data processing and cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Most of the time,,INR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Australia,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Uplift Modeling,R,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Miner,Data Scientist",University courses,25,0,25,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,R,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Often,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Random Forests,Segmentation",Most of the time,,,,,Most of the time,,Often,Often,,,,,,Often,Sometimes,,,,,,,Often,,,Often,,,,,,,,20,50,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring",Sometimes,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,170000,AUD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Other,26,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by college or university,NoSQL,Bayesian Methods,C/C++/C#,Google Search,"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Engineer,Programmer,Software Developer/Software Engineer",University courses,10,10,10,70,0,0,"Natural Language Processing,Recommendation Engines",Support Vector Machines (SVMs),A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Java,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,Self-taught,60,10,10,10,5,5,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1TB,,"C/C++,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,Often,,,,"Data Visualization,Decision Trees,Neural Networks,Prescriptive Modeling,Time Series Analysis",,,,,,,Sometimes,Sometimes,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,Often,,,,60,10,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,Rarely,Often,Most of the time,Sometimes,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,Census data,Archaic structures ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,80000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Canada,NA,Employed full-time,,,No,Yes,Other,Perfectly,Employed by government,R,,,,Online courses,,,,,,,,,,,,,,,,,,,,1-2 years,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,Self-taught,100,0,0,0,0,0,,,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,45,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Management information systems,Less than a year,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Reinforcement learning,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important +Male,Japan,45,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Textbook",,Somewhat useful,,,,,,,,,Very useful,,,,Very useful,,,,,3-5 years,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,,,Unnecessary,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,10,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,"Amazon Web services,Python,SQL",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,70,10,10,10,0,0,Enough to tune the parameters properly,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Online courses,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,10,15,60,15,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,Egypt,25,Employed full-time,,,Yes,,Researcher,,Employed by college or university,Python,Deep learning,Matlab,Google Search,"College/University,Kaggle,Online courses,Textbook,Tutoring/mentoring",,,Very useful,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,Very useful,,"Data Machina Newsletter,FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Machine Learning Engineer,Researcher",University courses,30,50,5,15,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Sometimes,<1MB,"Bayesian Techniques,Neural Networks,SVMs","Java,MATLAB/Octave,Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,SVMs",,,,,,Sometimes,,,,,,,,Often,,,,Sometimes,,,Often,,,,,,,Often,,,,,,50,30,10,5,5,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,Mammography dataset,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,3000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,South Korea,27,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Statistica (Quest/Dell-formerly Statsoft),Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Researcher,University courses,0,0,20,80,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",A doctoral degree,Academic,10 to 19 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,,"Bayesian Techniques,Decision Trees,Random Forests","IBM SPSS Modeler,IBM SPSS Statistics,R,SQL",,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Cross-Validation,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Time Series Analysis",,Sometimes,,,,Sometimes,,,,,,,,,,Often,,Sometimes,,Sometimes,,,Sometimes,,,,,,,Often,,,,60,0,0,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,Sometimes,,,Sometimes,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,20000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Not Useful,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Engineer,Programmer,Researcher",Self-taught,34,33,33,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Financial,10 to 19 employees,Decreased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Other,Most of the time,100GB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,HMMs,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL,TensorFlow",,Sometimes,,Often,,,,,,,,,,,Sometimes,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,HMMs,Logistic Regression,Naive Bayes,Neural Networks",,,Sometimes,,,Most of the time,Most of the time,,Most of the time,Sometimes,,,Rarely,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,,,30,25,10,5,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Rarely,Sometimes,,Most of the time,,,,Most of the time,Most of the time,Often,Often,Often,,Sometimes,Most of the time,,,Often,Sometimes,Often,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Other",,Git,Sometimes,180000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Brazil,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,DataRobot,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Conferences,Friends network,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Recommendation Engines,Unsupervised Learning",Logistic Regression,A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,Regression/Logistic Regression,"Java,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Collaborative Filtering,Data Visualization,Recommender Systems",,,,,Often,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,20,20,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,,51-75% of projects,More internal than external,IT Department,Twitter; GitHub; DBLP; PubMed ,The manipulation of big data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,Git,Sometimes,48000,BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,53,Employed part-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,I don't plan on learning a new ML/DS method,R,"GitHub,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,500 to 999 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,,,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization",Rarely,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,10,0,0,30,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,Often,,,,,,,Sometimes,,,,,Most of the time,,,76-99% of projects,Entirely internal,Other,"City permitting data, US Government data",needed data not available,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"80,000",USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,United States,59,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Rule Induction,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Statistician",University courses,10,10,30,50,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Non-profit,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10MB,Decision Trees,"Amazon Web services,Jupyter notebooks,Python,R",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Time Series Analysis",Often,Sometimes,,,,Often,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,80,10,5,5,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,Often,Often,,,,Often,,,,,,,,,,,Often,,100% of projects,More internal than external,Central Insights Team,ACS,api being cut off,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,slack,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed part-time,,,Yes,,Computer Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer,Software Developer/Software Engineer",University courses,50,0,0,40,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,A master's degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,Neural Networks,"NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks,Segmentation,Text Analytics",,,,,,,Often,,,,,,,,,,,,,Sometimes,,,,,,Often,,,Most of the time,,,,,20,10,60,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,Often,Often,,,,Sometimes,,Less than 10% of projects,More external than internal,Standalone Team,LinkedIn; Salesforce; Gmail,"Cleaning,, Preparing data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Most of the time,<1MB,Regression/Logistic Regression,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Often,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,0,10,70,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations of tools,Unavailability of/difficult access to data",Often,Often,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,43000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Singapore,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Anomaly Detection,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Other,"Conferences,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,6 to 10 years,"Data Scientist,Operations Research Practitioner",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",High school,Other,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Relational data,Other",Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,SAS Base,SAS Enterprise Miner,SQL,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Rarely,,,,,Most of the time,Most of the time,,,Sometimes,,,,Sometimes,Sometimes,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,Often,,Sometimes,,,,Most of the time,,Sometimes,,Rarely,,,,Most of the time,Often,Sometimes,Often,Sometimes,,,,Sometimes,,Most of the time,,,,40,20,15,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Most of the time,Sometimes,,Most of the time,,,,,,Sometimes,,,,,,Often,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,148000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Impala,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,Very useful,,Very useful,,,,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",15,10,15,50,10,0,"Computer Vision,Natural Language Processing","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,10 to 19 employees,Increased slightly,6-10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Most of the time,10GB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","Google Cloud Compute,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,,Often,,Often,,,,,,,,Sometimes,,Often,,Sometimes,Sometimes,Often,Sometimes,,,,Often,,,,,Often,,,,50,20,10,15,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,Often,,,Often,,,,,Often,,,,,,,,,,,Often,,100% of projects,,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Always,85000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Spain,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",TensorFlow,Text Mining,Scala,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,Not Useful,,,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,"Computer Scientist,Programmer,Researcher,Other",Self-taught,70,0,20,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Logistic Regression",Primary/elementary school,Telecommunications,Fewer than 10 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,,"Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Perl,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Sometimes,,Sometimes,Sometimes,,,,Often,,,,,,,,Rarely,,,,,,,,,,Often,,,Sometimes,Often,,,,,,,,,,Often,,,,,Rarely,,Often,,,,"Collaborative Filtering,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Simulation",,,,,Sometimes,,Most of the time,,,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,Often,,Most of the time,Often,,,,,,,3,15,3,64,15,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,,,,,Sometimes,,,Often,,Often,,,Most of the time,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Rarely,34000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Other,University courses,30,0,20,20,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression",High school,Other,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,Always,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,,,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,25,25,15,15,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,Often,Most of the time,,,Most of the time,,,,,,,,,,Often,,Often,,26-50% of projects,Entirely internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,66000,GBP,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +A different identity,United States,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,Not Useful,Very useful,,Somewhat useful,,,,Somewhat useful,,1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Australia,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,51,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,Retail,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,10GB,"Bayesian Techniques,Decision Trees,Random Forests","R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems",Sometimes,,Sometimes,,,,,,,,,,,,,,,Sometimes,Often,,,,Often,Sometimes,,,,,,,,,,2,2,1,4,1,90,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,Most of the time,,,Most of the time,,Often,,,,,Most of the time,Often,,,,,,76-99% of projects,More internal than external,Central Insights Team,n/a,n/a,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,0,CAD,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Canada,50,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Not Useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,"FastML Blog,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,More than 10 years,"Data Miner,Researcher",Self-taught,30,50,0,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Other,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Relational data,Never,100MB,"CNNs,Gradient Boosted Machines,Neural Networks","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,"CNNs,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Text Analytics",,,,Sometimes,,,Most of the time,,,,,Sometimes,,,,Sometimes,,,Often,Sometimes,,,,,,,,,Often,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,Sometimes,,,Most of the time,Most of the time,,,,,,,,,Most of the time,Most of the time,,,100% of projects,More internal than external,Central Insights Team,none currently,data integrity due to poor data entry,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint,Other",secure ftp,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,85000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,,11 - 39 hours,Github Portfolio,No,Master's degree,Other,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,Python,Google Search,"Arxiv,Blogs,College/University,Stack Overflow Q&A",Somewhat useful,Very useful,Very useful,,,,,,,,,,,Somewhat useful,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,,Self-taught,40,30,10,20,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Other",Sometimes,1GB,"CNNs,GANs,Neural Networks,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,Most of the time,,Sometimes,Most of the time,,,,Sometimes,,,Sometimes,,,,,,Often,Often,,,,,,Most of the time,,,Often,,,,70,10,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input",,,,,Often,,,,,,Sometimes,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Harddrive,Git,Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,"Google Search,Government website","Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Statistician",University courses,80,0,15,5,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Mix of fields,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100GB,Regression/Logistic Regression,"Cloudera,Hadoop/Hive/Pig,Impala,Python,SAS Base,SAS Enterprise Miner,SQL,TIBCO Spotfire",,,,,Sometimes,,,,Most of the time,,,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,Rarely,,,Most of the time,,,,,Rarely,,,,,"Data Visualization,Logistic Regression,Segmentation",,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,60,15,8,5,12,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,Sometimes,,,Most of the time,,,,Sometimes,,,,Rarely,Most of the time,Often,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,113221,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,United States,54,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",50,20,0,0,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Image data,Most of the time,100GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Python",Rarely,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation",,,Often,,,Often,Often,,,,,,Rarely,,,Often,,Sometimes,,,Sometimes,,,,,,Often,,,,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Organization is small and cannot afford a data science team,Privacy issues",,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Chile,33,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Spark / MLlib,Association Rules,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,Somewhat useful,,,,,,,Very useful,,Very useful,Very useful,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Statistician",University courses,40,0,10,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10MB,"Regression/Logistic Regression,Other","Julia,R,SAS Base,SQL,Stan,Unix shell / awk",,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,Often,Often,,,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Segmentation,Other",,,Sometimes,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,40,30,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Mercurial,Subversion",Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Colombia,36,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Conferences,Tutoring/mentoring",,,Very useful,,Very useful,,,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Business Analyst,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Telecommunications,"10,000 or more employees",Decreased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,IBM Watson / Waton Analytics,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,TensorFlow",Sometimes,Sometimes,,,Often,,,,,,,,Rarely,,,,Most of the time,,,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,"Collaborative Filtering,Evolutionary Approaches,Natural Language Processing,Random Forests,RNNs,Time Series Analysis",,,,,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,10,30,10,20,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,,,Most of the time,,Most of the time,,,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,,51-75% of projects,More internal than external,Business Department,,Model building personnel hiring,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Bitbucket,Rarely,100000000,COP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Australia,48,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,Predictive Modeler,Self-taught,70,20,10,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Other,"5,000 to 9,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,IBM SPSS Statistics,Java,Jupyter notebooks,KNIME (free version),NoSQL,Orange,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,TensorFlow",,Often,,Sometimes,,,,,Sometimes,,,Sometimes,,,Often,,Often,,Sometimes,,,,,,,,Sometimes,,Sometimes,,Often,,Often,,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,Often,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Often,Often,Often,Sometimes,,,Often,,Often,,Often,,Sometimes,,Sometimes,Often,,Often,,,Sometimes,,Often,Often,Sometimes,,,,40,25,0,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,Often,,Often,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,Statistics Bureau; Government Open Data; Customer Datasets,"Cleaning and integration in a repeatable, deployable manner","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,Other,"Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Somewhat useful,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",20,10,70,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased significantly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,MATLAB/Octave,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Rarely,,,,,,Most of the time,,,,"Bayesian Techniques,kNN and Other Clustering,Logistic Regression",,,Sometimes,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,70,5,0,0,25,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,Other,I don't typically share data,,Bitbucket,Sometimes,96000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,,,,,"KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",University courses,10,0,40,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,100GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Data Visualization,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,Often,,,,50,10,0,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,,Often,,,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,120000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,Neural Nets,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,Somewhat useful,,,,Very useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,6 to 10 years,"Data Scientist,Software Developer/Software Engineer",University courses,30,0,50,20,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data,Other",Sometimes,1GB,"Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,C/C++,Jupyter notebooks,Python,SQL,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",Sometimes,,,,Sometimes,Often,Most of the time,,,,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,20,10,30,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Often,Most of the time,,,Often,,,,Sometimes,Most of the time,,Sometimes,,,,Often,Sometimes,,76-99% of projects,More internal than external,IT Department,"relevant industry data, third party proprietary data scraping",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,I don't typically share data",,Git,Sometimes,150000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Other,No,Bachelor's degree,Psychology,1 to 2 years,I haven't started working yet,Self-taught,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Logistic Regression,High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Japan,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,Other,Other,Python,University/Non-profit research group websites,"Arxiv,Official documentation,Personal Projects",Somewhat useful,,,,,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Data Analyst,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data,Other",Rarely,1TB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,Often,,,,,,,,,,Often,Most of the time,,Often,,,Often,Often,Often,,Most of the time,,,,20,20,20,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),,9000000,JPY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,31,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,Google Search,"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,,,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Bayesian Methods,Java,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Data Stories Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,University courses,40,30,0,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Company internal community,Official documentation,Stack Overflow Q&A",,,,Very useful,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,0,0,70,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Non-profit,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,,100MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,SQL",,Rarely,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,Often,,,Often,Most of the time,Sometimes,,,,,,Often,,Sometimes,,Sometimes,,,Often,,Sometimes,,,,Often,Sometimes,,,,,,30,20,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,Most of the time,,,Sometimes,,,Sometimes,,,Most of the time,,Most of the time,,Most of the time,,100% of projects,More internal than external,Standalone Team,"publicly available genomic/transcriptomic datasets (TCGA, Ensembl, dbGaP published datasets, Broad CCLe, etc.)",not enough of it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"60,000",USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",10,10,20,0,0,60,Outlier detection (e.g. Fraud detection),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Mix of fields,,,,,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Don't know,10TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib",Sometimes,Often,,,,,,,Sometimes,,,,Rarely,,,,Most of the time,,,,,Rarely,,,,,Rarely,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,Sometimes,,Sometimes,,Most of the time,Often,,,,,Most of the time,Sometimes,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,Sometimes,,,,,,,Sometimes,,Often,Most of the time,,100% of projects,Entirely internal,IT Department,,Data is inconsistently collected and intermittent.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git,Subversion",Most of the time,85000,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,45,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Jupyter notebooks,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Newsletters,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,,6 to 10 years,Researcher,University courses,90,5,0,5,0,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Academic,500 to 999 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,,1GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,MATLAB/Octave,R",,Rarely,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation",,Often,Often,,,Most of the time,Most of the time,Often,Sometimes,,,,,,Sometimes,Often,,Sometimes,,,Often,,Often,,,,Often,,,,,,,60,20,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,,,,,,,,Often,,100% of projects,More internal than external,Other,transactional data bases.,sometimes the reliability of the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,28000000,CLP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Textbook",,,Very useful,,,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",28,20,20,28,4,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,10GB,"Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Python,R,Spark / MLlib",,Most of the time,,,Sometimes,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Cross-Validation,Logistic Regression,Random Forests",Often,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,50,10,10,10,10,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,Sometimes,,Less than 10% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,85000,CAD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed part-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,,,,"Blogs,College/University,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Somewhat useful,Not Useful,,,,Very useful,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,1 to 2 years,"Researcher,Other",University courses,50,10,0,40,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Mix of fields,10 to 19 employees,Increased significantly,Less than one year,A general-purpose job board,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,<1MB,"Bayesian Techniques,Regression/Logistic Regression,Other","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Most of the time,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,50,0,0,20,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,none; would perhaps use external data from clients in the future,Human activity is not systematic which makes it difficult to create simple models of the data (and it should be simple).,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,14000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Canada,38,Employed full-time,,,Yes,,Other,Perfectly,Employed by government,R,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,40,0,0,40,0,,,"Some college/university study, no bachelor's degree",Government,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,,,"R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,25,0,0,15,60,0,Enough to run the code / standard library,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Sometimes,,,,Often,,,100% of projects,More internal than external,Other,Statistics Canada; Industry Canada; Community Data,I'm new - still learning the techniques and the math,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,58000,CAD,Has decreased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,More than 10 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Academic,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","R,SAS Base,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,Sometimes,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation",,,Rarely,,,Sometimes,Often,Often,Sometimes,,,,,,,Most of the time,,,,,Often,,Often,,,,,,,,,,,30,30,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,Other,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Official documentation,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",70,20,5,0,5,0,Natural Language Processing,"Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,Tableau",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Rarely,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,SVMs,Text Analytics",,,,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,,Most of the time,,,,,,,,,Often,Often,,,,,45,20,5,10,20,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Other",,,,,,Sometimes,,,,Sometimes,,,,,,Often,,,,,,Most of the time,51-75% of projects,Approximately half internal and half external,Standalone Team,PubMed; NIH; Patents,Organization,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,125000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed part-time,,,No,Yes,Other,Fine,Employed by college or university,Jupyter notebooks,Monte Carlo Methods,Python,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Somewhat useful,,1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,75,0,25,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important +Male,United States,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Tableau,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,Very useful,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",Kaggle competitions,40,20,0,10,30,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",A professional degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Video data,Text data",Rarely,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Sometimes,,Sometimes,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,25,10,5,30,30,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Often,,,,,Often,Often,,,,,,,,,Often,Often,,,,76-99% of projects,More internal than external,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,USD,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,R,Neural Nets,R,University/Non-profit research group websites,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Predictive Modeler,Researcher",Self-taught,25,10,50,10,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",Primary/elementary school,Academic,I don't know,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Other,Basic laptop (Macbook),"Text data,Relational data",,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Python,R,SQL,TensorFlow",,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Rarely,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Simulation,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Often,,,,,,Often,,Most of the time,,,,Sometimes,,,Often,,,,Most of the time,,,Sometimes,,,,35,25,0,15,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,Often,Sometimes,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,DHS; SRTR; USRDS,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,88000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,55,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",40+,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Programmer",Self-taught,25,25,0,0,0,50,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Not important +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,Anomaly Detection,Python,,"Company internal community,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,1 to 2 years,Data Scientist,Self-taught,20,0,80,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Often,,,,,,,Rarely,,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Often,,,,40,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,145000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by non-profit or NGO,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Rarely,,"Random Forests,Regression/Logistic Regression","KNIME (free version),Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Often,,,,,,,,,Most of the time,,,,,Most of the time,,Sometimes,,,,,,,Sometimes,,,,80,0,0,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Sometimes,Rarely,,,,Often,,,Sometimes,Rarely,,,,,,Most of the time,,,,,,,26-50% of projects,Entirely internal,Standalone Team,"Omics data- free from NCBI Genbank, PRIDE, Metabolights, GNPS, MetabolomicsWorkbench",NA,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,55000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,R,Regression,R,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,University courses,60,0,30,10,0,0,Time Series,Logistic Regression,,Academic,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Other,Sometimes,,"Regression/Logistic Regression,Other",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis,Other",,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,Most of the time,,,Often,Most of the time,,,25,25,25,10,15,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,Less than 10% of projects,,,,,,,,,Never,1800000,INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,55,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Traditional Workstation,11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,10,10,70,10,0,Outlier detection (e.g. Fraud detection),,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Nigeria,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,,Python,,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,edX,,2 - 10 hours,,No,Bachelor's degree,Electrical Engineering,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Colombia,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,Very useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",3-5 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,37,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Uplift Modeling,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",35,25,15,0,0,25,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,10GB,"Neural Networks,Regression/Logistic Regression","IBM Cognos,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,,,,,,,Sometimes,,,,,,,Often,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,Often,,,,,,Sometimes,,,,"A/B Testing,Lift Analysis,Logistic Regression,Neural Networks",Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,,,Rarely,,,,,,,,,,,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,Often,,,Sometimes,Often,,,,,Sometimes,Most of the time,,,,Most of the time,Often,,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,200000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,63,Retired,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Personal Projects",Very useful,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,,FlowingData Blog,1-2 years,Nice to have,Nice to have,Nice to have,,Nice to have,,,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,29,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",SQL,Other,"Blogs,Stack Overflow Q&A",,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"DBA/Database Engineer,Engineer",Self-taught,80,5,15,0,0,0,,,A bachelor's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Always,100MB,"Decision Trees,Regression/Logistic Regression","SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,"Data Visualization,Decision Trees,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,Often,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,Often,Most of the time,,,,30,10,10,30,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,Often,Often,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,Other,,Data fields in the past did not have clear specific definition so legacy data is inconsistent,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"84,000",,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,"Arxiv,Blogs,Conferences,Stack Overflow Q&A",Very useful,Very useful,,,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,,,30,40,0,30,0,0,,,,Internet-based,,,,,,,,,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Time Series Analysis",,,,Often,Often,Often,Often,Often,Often,Often,,Often,Often,Often,,Often,Often,,,Often,Often,Often,Often,Often,Often,Often,,,,Often,,,,30,30,30,10,0,0,,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Data Analyst",Work,50,0,20,0,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Gradient Boosted Machines","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,Sometimes,,,Often,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,,,,Most of the time,,Sometimes,,Sometimes,,,Often,,,Most of the time,Often,,,,60,20,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,,,,Most of the time,,,,,,,,Sometimes,,Most of the time,Most of the time,,,10-25% of projects,More external than internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",,160000,AUD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,Colombia,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Employed by government,Python,Deep learning,Python,"GitHub,Google Search","Blogs,College/University,Online courses,YouTube Videos,Other",,Very useful,Very useful,,,,,,,,Very useful,,,,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,3 to 5 years,Statistician,University courses,10,20,20,30,20,0,"Natural Language Processing,Reinforcement learning,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Decision Trees",,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,10,30,10,40,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",Often,Often,,Most of the time,Often,Often,,,,,Often,,,,,,Most of the time,,,,Often,,51-75% of projects,,Central Insights Team,,,Key-value store (e.g. Redis/Riak),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,33000000,COP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Conferences,Newsletters,Online courses,Personal Projects",,Very useful,,,Somewhat useful,,,Somewhat useful,,,Very useful,Very useful,,,,,,,"DataTau News Aggregator,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,35,40,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Simulation,Text Analytics,Time Series Analysis",Often,,,,,Most of the time,Most of the time,Often,Most of the time,Sometimes,,Most of the time,,,,Most of the time,,Often,Most of the time,,,,Often,,,,Often,,Most of the time,Sometimes,,,,15,50,15,10,10,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Sometimes,Sometimes,Most of the time,,,,Often,,,,Sometimes,,,,,,,Often,Often,,100% of projects,Entirely internal,Standalone Team,maxmind,data changes over time,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,125000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,,,Not Useful,Somewhat useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",3-5 years,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,,"Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Online courses,Other",,,,,,,,,,,Somewhat useful,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Engineer,University courses,10,40,30,15,0,5,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Decreased slightly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Always,10GB,CNNs,"Python,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,Often,,,"CNNs,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Often,,,Most of the time,,,,,,,,,,,,,Often,Often,,,,,,,Sometimes,,,,,,50,10,20,5,15,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,,,,Often,Most of the time,,,Sometimes,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,MNist,,"Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,5000,AED,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,Company internal community,,,,Somewhat useful,,,,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Programmer,Software Developer/Software Engineer",Self-taught,50,35,0,0,15,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Rarely,<1MB,"Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",Rarely,Most of the time,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Rarely,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Neural Networks,Segmentation",Rarely,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",,Often,,,,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,Other,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Researcher,Self-taught,80,10,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Japan,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Not Useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,Survival Analysis,Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A,Textbook,Other",,,,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",University courses,20,5,20,54,1,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Oracle Data Mining/ Oracle R Enterprise,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,Rarely,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",Often,,,,Rarely,Often,Most of the time,Often,,,,Often,,Rarely,,Often,,,,Sometimes,Sometimes,,Often,,,,,,Rarely,Rarely,,,,50,16,2,12,15,5,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Rarely,,,Most of the time,,,,Sometimes,,,,,,,,,,,Often,Often,,76-99% of projects,More internal than external,Other,,Lack of Governance,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Rarely,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Canada,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,,"Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100GB,Regression/Logistic Regression,"MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",Sometimes,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,Often,,,,60,15,20,0,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Often,Often,,,,,Often,,,,,,Often,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Rarely,140000,CAD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Israel,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,10,45,0,45,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Time Series Analysis",,,,Sometimes,,Most of the time,Often,,Most of the time,,,Most of the time,,,,Often,,,,,,,,,,,,,,Often,,,,30,5,30,12,23,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Privacy issues",,,,,Most of the time,Sometimes,,,,,,,,,,,Sometimes,,,,,,10-25% of projects,Entirely internal,Central Insights Team,,training data is different from production data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,"348,000",ILS,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Python,Text Mining,SQL,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,Very useful,,Very useful,,Very useful,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A doctoral degree,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,10MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,Rarely,,Rarely,,,,,Rarely,,,,,Rarely,Sometimes,,Sometimes,Rarely,,,,30,10,30,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,76-99% of projects,Do not know,Other,U.S. Census,Combining data from different sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,20000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,"Researcher,Software Developer/Software Engineer",University courses,5,5,15,70,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Retail,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1TB,"Decision Trees,Regression/Logistic Regression","Cloudera,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,RapidMiner (free version),SQL",,,,,Rarely,,,,,,,,,,,,,,,,Rarely,Rarely,,,,,Rarely,,,,Often,,Rarely,,Rarely,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Text Analytics",,,,,,,,Rarely,,,,,,Sometimes,,,,,,,Rarely,,,,,,,,Rarely,,,,,75,0,0,10,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,Most of the time,,,,Often,,,,,,,,,,Sometimes,Often,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",GoogleDrive,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,94500,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Regression,SAS,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Time Series,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,Not Useful,Somewhat useful,,,,Somewhat useful,Partially Derivative Podcast,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,35,5,15,15,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,People 's Republic of China,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,"Blogs,College/University,Friends network,Kaggle,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,Necessary,Nice to have,,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,,,,Very useful,Very useful,,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Researcher,Work,0,10,80,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Academic,"5,000 to 9,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Often,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,Often,Often,,,Often,,,,,Often,Often,,,,,80,10,0,5,5,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Google Cloud Compute,Anomaly Detection,Python,Google Search,"Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,20,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Most of the time,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Neural Networks,RNNs,Time Series Analysis",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,Sometimes,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Limitations of tools",,,,,,,,,,,Often,,Often,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,21,"Not employed, but looking for work",,,,,,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Other,"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Australia,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,60,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,39,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Newsletters",,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,30,0,20,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,Most of the time,,,,,,Often,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,70,10,5,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",Often,,,,Often,,,,Sometimes,,,,Often,,,,,,,,Sometimes,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Rarely,35000,,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Personal Projects",Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,0,59,0,1,0,"Adversarial Learning,Computer Vision,Machine Translation,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Technology,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,HMMs,Random Forests,Regression/Logistic Regression","C/C++,Java,NoSQL,Python,R,RapidMiner (free version),SQL,Other",,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Most of the time,,,,,,,,,Often,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,,,,,Most of the time,Most of the time,Sometimes,,Often,,,Sometimes,,,Most of the time,,,,,Often,,Often,,,Sometimes,Most of the time,,,Most of the time,,,,60,10,5,15,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Organization is small and cannot afford a data science team,Privacy issues",Often,,,Most of the time,,,,,,,,,,,,Often,Sometimes,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Never,63000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation",,,,,,,Very useful,,,Very useful,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,Financial,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,Never,1GB,"CNNs,Random Forests,SVMs","NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Decision Trees,Logistic Regression,Neural Networks,Random Forests,SVMs",,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,60,30,10,0,0,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,Sometimes,,,,Often,Most of the time,,,,Sometimes,,,,,,Often,,None,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Other",Commercial Data Platform,,Other,Never,"20,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Republic of China,32,Employed full-time,,,No,Yes,Data Analyst,,,MATLAB/Octave,Bayesian Methods,Matlab,Google Search,"College/University,Personal Projects,Textbook",,,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Business Analyst,University courses,0,0,0,100,0,0,Survival Analysis,Other (please specify; separate by semi-colon),,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,,,,,,, +Male,Singapore,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,SQL,Regression,SQL,GitHub,Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Decision Trees - Random Forests,,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,,Text data,Rarely,100MB,Decision Trees,"Minitab,RapidMiner (commercial version)",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,5,10,5,30,40,10,Enough to run the code / standard library,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,Less than 10% of projects,More internal than external,Other,,Not easily understandable data to process into meaningful information,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Other,Rarely,"60,000",SGD,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Personal Projects,Stack Overflow Q&A",Very useful,,Somewhat useful,,Very useful,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,Self-taught,80,0,15,5,0,0,"Adversarial Learning,Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A professional degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Video data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,GANs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Often,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,GANs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs,Time Series Analysis",,,Sometimes,Often,,Sometimes,,,,,Often,,,,,Often,,,,Most of the time,Often,,,,,Often,Sometimes,Often,,Often,,,,10,70,5,0,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Limitations in the state of the art in machine learning,Privacy issues",Sometimes,,,,,Sometimes,,,,,,Most of the time,,,,,Often,,,,,,100% of projects,More internal than external,Standalone Team,,Privacy,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,500000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Fine arts or performing arts,3 to 5 years,"DBA/Database Engineer,Other",Self-taught,80,20,0,0,0,0,,,,Other,500 to 999 employees,Decreased slightly,1-2 years,A tech-specific job board,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,RapidMiner (free version),Spark / MLlib,SQL",,Rarely,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,Often,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,Rarely,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Text Analytics,Time Series Analysis",,,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,70,10,3,10,7,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Most of the time,,,Sometimes,Sometimes,,,,,,,,,,Often,,,,100% of projects,More external than internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Google Drive,Git,Sometimes,85000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Australia,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle",,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,RNNs,SVMs","Amazon Web services,Python",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,RNNs,SVMs",,,,Most of the time,,Often,,Often,,,,,,Rarely,,,,,,,,,,,Often,,,Sometimes,,,,,,20,15,10,20,35,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,,,,,,,,Often,,,,,,Often,,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Newsletters,Official documentation,Online courses,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",10,30,30,20,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,500 to 999 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests",Most of the time,,Most of the time,,,Sometimes,Most of the time,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,50,5,0,30,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",Most of the time,,,,,,,,Sometimes,,,Most of the time,Most of the time,,,,,,,,,,100% of projects,Entirely internal,Other,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Email,,Git,Never,77000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Self-employed,Stan,Bayesian Methods,SAS,Government website,Conferences,,,,,Very useful,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,More than 10 years,Statistician,Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Pharmaceutical,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Always,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,C/C++,Microsoft R Server (Formerly Revolution Analytics),Minitab,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL",Rarely,,,Rarely,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,Rarely,,Most of the time,,,,,Most of the time,Sometimes,Most of the time,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Time Series Analysis",Most of the time,Rarely,,,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,,,Often,,Often,Sometimes,,,Most of the time,,,Often,,,,50,30,5,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Most of the time,,,,Most of the time,,,,,Often,,,Most of the time,,Most of the time,Often,Often,,,,,,100% of projects,More internal than external,Central Insights Team,nhanes,DIRTY DATA,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,300000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Stan,Deep learning,Python,Other,"Arxiv,College/University,Online courses,Textbook,YouTube Videos",Very useful,,Somewhat useful,,,,,,,,Very useful,,,,Somewhat useful,,,Very useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",5-10 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,"Coursera,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",40+,PhD,Yes,Doctoral degree,Computer Science,,"Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Australia,57,Employed full-time,,,Yes,,Researcher,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Bayesian Methods,R,,"Blogs,College/University,Conferences,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,Researcher,Work,0,10,80,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,10 to 19 employees,Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Other,Most of the time,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Often,,,Sometimes,,,Often,,,,,20,30,10,10,10,20,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,76-99% of projects,More internal than external,Other,None,,,"Email,Share Drive/SharePoint",,,Sometimes,150000,AUD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United States,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Time Series Analysis,,"Government website,University/Non-profit research group websites","College/University,Textbook,YouTube Videos",,,Very useful,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Female,United States,53,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,,3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Other,1 to 2 years,Software Developer/Software Engineer,Other,25,40,NA,20,15,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,R,Time Series Analysis,SAS,I collect my own data (e.g. web-scraping),"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Programmer,Other",Self-taught,10,60,20,10,0,0,Time Series,Bayesian Techniques,A master's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,<1MB,Bayesian Techniques,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,10,50,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT",Often,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,NAICS database,Lack of data in the correct granule,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Subversion,Rarely,100000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,MATLAB/Octave,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,1 to 2 years,Researcher,University courses,20,30,20,20,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Most of the time,1GB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Neural Networks,Segmentation",,,,Often,,Often,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Subversion,Sometimes,10000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Korea,45,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Online courses,YouTube Videos,Other",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,50,5,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,100MB,"Neural Networks,Regression/Logistic Regression,RNNs","Java,KNIME (free version),Orange,Python,R,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,Often,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,RNNs",,,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,50,30,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,26-50% of projects,More internal than external,IT Department,...,...,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,50000,KRW,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,0,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Deep learning,Python,University/Non-profit research group websites,Conferences,,,,,Very useful,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,Software Developer/Software Engineer,Self-taught,80,0,0,0,0,20,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Rarely,10TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,KNIME (free version),MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Random Forests,RNNs,SVMs",,,Often,,,,Sometimes,Often,,,,Sometimes,,,,Sometimes,Sometimes,Often,,,,,Often,,Sometimes,,,Often,,,,,,0,0,0,0,100,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,I prefer not to say,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,Often,,Sometimes,,,,,Often,,,Most of the time,Sometimes,Often,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Tableau,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Other,University courses,20,0,0,80,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Government,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,15,10,25,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Often,Sometimes,,,Most of the time,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,,26-50% of projects,More internal than external,IT Department,,Dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,,Always,70000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,,"Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,"Data Analyst,Researcher",University courses,NA,15,40,30,15,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Financial,20 to 99 employees,Increased slightly,Don't know,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,Random Forests,"Amazon Web services,Python,R,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,Sometimes,Often,Rarely,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,30,10,15,15,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,Often,,Often,,,,,,,,Often,,,,,,Often,,Sometimes,,26-50% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,65000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Company internal community,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,Somewhat useful,,,,,,,,Very useful,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,20,0,10,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Never,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,Sometimes,,,Often,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,,,Most of the time,,,Most of the time,,,,Often,,,Sometimes,,,Often,Sometimes,,,,20,30,0,20,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,,,,,Often,,Sometimes,Most of the time,,,,,,Sometimes,Most of the time,,Sometimes,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,110000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Programmer,University courses,30,30,10,20,10,0,,"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,I collect my own data (e.g. web-scraping),Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,"Engineer,Programmer",University courses,10,10,0,30,0,50,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Often,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,,,Sometimes,Most of the time,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Often,Sometimes,,,,30,20,10,30,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Often,,,,,,,Often,Often,,,,,,,,,,,,,,76-99% of projects,More external than internal,Other,no third party dataset and many public datasets on online,get an idea of a person in charge to understand what do they do with this dataset.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,80000000,KRW,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Australia,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,R,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,University courses,25,25,25,25,0,0,,,A doctoral degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,,"Mathematica,MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,30,30,0,0,Enough to run the code / standard library,"Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Other,Sometimes,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Online courses",,Very useful,,,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,A social science,I don't write code to analyze data,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,,,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important +Male,Malaysia,26,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,Jupyter notebooks,Regression,R,Google Search,"Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important +Male,Argentina,49,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Vietnam,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by government,Microsoft Excel Data Mining,,SQL,Other,"Personal Projects,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Researcher",Work,20,10,60,10,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,,"Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,50,20,20,10,0,0,,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Often,,,,,,,,Often,Often,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Sometimes,800,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Australia,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A doctoral degree,Academic,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Other,Laptop or Workstation and private datacenters,Image data,,100GB,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,PCA and Dimensionality Reduction,Simulation",,,,,,,,,,,,,,Often,,,,,,,Often,,,,,,Often,,,,,,,50,20,NA,10,20,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,Sometimes,,,,,,,Sometimes,,,,,,Often,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Mercurial,Subversion",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Company internal community,Conferences,Personal Projects,Textbook,Trade book",,Not Useful,Very useful,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,Not Useful,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,Business Analyst,Self-taught,33,1,33,33,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow",,Sometimes,,,,,,,Sometimes,,,,,,,,Often,,,,,Sometimes,,Most of the time,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,,I don't typically share data,,Git,,100000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Ireland,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,Predictive Modeler,Work,25,0,70,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Insurance,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1TB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Impala,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,,,,,Most of the time,,Sometimes,Often,Most of the time,,,,,Sometimes,Often,Often,,,Most of the time,,,,,,,,45,20,25,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT",Often,,,,,,,,,,,,,,Often,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Rarely,150000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Engineer,Software Developer/Software Engineer",Other,35,15,0,40,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Philippines,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,The Analytics Dispatch Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,40,0,30,0,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Jupyter notebooks,Microsoft Excel Data Mining,Minitab,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,Sometimes,,,,,Most of the time,,Often,,Sometimes,,,,,,,Often,,,Sometimes,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics",,Sometimes,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,Sometimes,,,Most of the time,,Most of the time,,Most of the time,,,Often,Most of the time,Most of the time,,,Often,Most of the time,Most of the time,Sometimes,,,,,40,22.5,0,22.5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Most of the time,Sometimes,,,Most of the time,Most of the time,,,,,Often,,,Most of the time,,,,,,,Often,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,1950000,PHP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Canada,40,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,50,20,0,20,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Canada,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Tableau,Bayesian Methods,Python,,Official documentation,,,,,,,,,,Very useful,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,A health science,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,United States,59,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,58,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by professional services/consulting firm,KNIME (free version),Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Official documentation,Online courses,Personal Projects,Textbook",,,,,,,,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,A master's degree,Other,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,,Traditional Workstation,Text data,Never,,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,2,2,0,2,0,94,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Sometimes,,,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Republic of China,22,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"GitHub,Google Search","Arxiv,College/University,Kaggle,Stack Overflow Q&A",Very useful,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Other,1 to 2 years,Software Developer/Software Engineer,University courses,0,20,0,50,30,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Female,Taiwan,24,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,15,0,30,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important +Male,India,24,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,,Somewhat useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Operations Research Practitioner,Researcher,Software Developer/Software Engineer",Work,30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",,A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,Sometimes,Often,,,,Often,,,,,Most of the time,,,Most of the time,,,,,Often,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation,Time Series Analysis",Sometimes,,Sometimes,,,Often,Most of the time,Often,,,,Often,,Sometimes,,Often,,,,,,,Sometimes,,,,Often,,,Most of the time,,,,40,20,15,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,Often,,Most of the time,,,Often,Often,,,Sometimes,,Sometimes,,,,,,Often,Often,,51-75% of projects,Approximately half internal and half external,IT Department,,Data availability.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"150,000",BRL,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Survival Analysis,Decision Trees - Random Forests,High school,Technology,"1,000 to 4,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,100MB,Random Forests,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees",Often,,,,,Sometimes,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,5,10,5,35,45,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,Often,,,,,,,,,,,Most of the time,,Sometimes,,,,Often,,10-25% of projects,Entirely internal,Business Department,NA,Limited Data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Rarely,250000,INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Colombia,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Cluster Analysis,Python,,"Conferences,Friends network",,,,,Very useful,Very useful,,,,,,,,,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Statistician",University courses,20,20,10,50,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",,Technology,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Perl,Python,R,SAS Enterprise Miner,SQL",,,,Rarely,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",Sometimes,,Rarely,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Often,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Other,Other,Other,Other,"Non-Kaggle online communities,Personal Projects",,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,Other,Self-taught,90,0,4.8663,5,0,0.1337,,,A bachelor's degree,Government,500 to 999 employees,Stayed the same,,Some other way,Very important,Analyze and understand data to influence product or business decisions,Other,Text data,Always,10MB,Other,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,Sometimes,25,5,0,10,60,100,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data,Other",Most of the time,,,,Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,Sometimes,,,,Most of the time,Most of the time,100% of projects,More internal than external,Other,Google; Yoloswag,"The monkeys responsible for extracting it find every excuse under the sun not to give the data to anyone or allow anyone else permission to extract it. Basically, they don't want to do a basic chair-moistener job, but also don't want anyone else to do it.",Other,"Email,Other",Internal file management system,Other,Never,75000,AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,Statistica (Quest/Dell-formerly Statsoft),Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,,,Primary/elementary school,Technology,20 to 99 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Relational data,Other",Most of the time,1GB,,"Microsoft Azure Machine Learning,Python,SQL",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,"Data Visualization,Decision Trees,Text Analytics",,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,,10,20,10,50,10,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,,,,,Sometimes,,,Often,,,,,,,26-50% of projects,,,,,,,,,,120000,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,19,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Personal Projects",,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Data Scientist,Other,25,0,0,0,5,70,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,100 to 499 employees,Increased significantly,More than 10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,1GB,"HMMs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs,Natural Language Processing,SVMs",,,,,,Sometimes,Sometimes,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,Rarely,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,10-25% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,10000,,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Other,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,Python,Google Search,"Friends network,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,"Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",University courses,30,10,40,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,Financial,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,10GB,Regression/Logistic Regression,"Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Simulation,Time Series Analysis",Often,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,30,0,65,5,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Sometimes,,,,,,,,,,,,76-99% of projects,Entirely internal,Other,trading exchange data,distributed systems / concurrency bugs,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Always,150000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Mexico,29,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Cloudera,Text Mining,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,,,,,,,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Statistician",Self-taught,50,10,20,10,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,Often,,,Sometimes,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Segmentation",,,,,,Often,,Most of the time,,,,,,,,Most of the time,,Rarely,,,,,,,,Rarely,,,,,,,,30,35,10,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data,Other",Often,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,Most of the time,Often,10-25% of projects,More external than internal,Central Insights Team,None,There is a lot of data and few and slow tools to manage it,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,25000,MXN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Somewhat useful,,Somewhat useful,Very useful,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Scientist,Other",Kaggle competitions,0,25,25,0,50,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,Sometimes,,,Often,Often,,,,,,,Often,Often,,Sometimes,,,,Often,,,,30,10,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Unavailability of/difficult access to data",,,,Often,Often,,,,,,,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,SNL Financial; FFIEC banking data ,Messy data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,Rarely,90000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Very useful,,,Very useful,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer,Statistician",Self-taught,10,10,30,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SAS JMP,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,,Sometimes,,Most of the time,,Often,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Often,Sometimes,Often,,Sometimes,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,Often,Most of the time,,Often,,,Most of the time,,Sometimes,Sometimes,,Often,,,,Most of the time,,,,40,10,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Most of the time,Often,,Often,Most of the time,,Sometimes,,,Often,Often,,,,Often,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,165000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Blogs,Kaggle,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Programmer",University courses,0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,,,,"Amazon Web services,Hadoop/Hive/Pig,Java,Python,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,0,0,100,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,None,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,149000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Master's degree,Yes,I did not complete any formal education past high school,,,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Monte Carlo Methods,Python,GitHub,"Blogs,College/University,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,Somewhat useful,,,,,,,,,,,Somewhat useful,,,Very useful,,"Data Elixir Newsletter,No Free Hunch Blog,Partially Derivative Podcast",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,Other,Work,20,0,60,0,10,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Other","Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,,Often,Often,,,Sometimes,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,20,20,5,5,10,40,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,Often,,,,Sometimes,,,,,,,Sometimes,,,,Rarely,,,,10-25% of projects,More internal than external,Other,Zip-Codes.com; CAB,Class imbalance,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,"67,500",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Decision Trees,R,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,10,20,5,50,0,15,Time Series,Logistic Regression,A bachelor's degree,CRM/Marketing,20 to 99 employees,Decreased slightly,Less than one year,A general-purpose job board,Very important,,Traditional Workstation,Relational data,,<1MB,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Text Analytics",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,70,0,0,15,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",Sometimes,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Sometimes,50000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Mexico,38,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Text Mining,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,Somewhat useful,,Very useful,,,,,Very useful,Very useful,Very useful,,,Very useful,Very useful,Somewhat useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,40,20,15,25,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,CRM/Marketing,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,QlikView,R,RapidMiner (free version),Tableau",,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,Rarely,Often,,Rarely,,,,,,,,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Often,,,,Often,Most of the time,,,,,,,Often,Sometimes,Often,,,,Sometimes,,Most of the time,,Often,,Most of the time,Sometimes,,Sometimes,Often,,,,20,40,10,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Scaling data science solution up to full database",Most of the time,Often,Sometimes,,Most of the time,,,Often,Most of the time,,,,,,,,,Sometimes,,,,,76-99% of projects,More internal than external,Business Department,National surveys|Brand tracking|Sindicated data bases|Sales audits,"Understanding sampling and collection methods, survey structure as well.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,720000,MXN,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Japan,65,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Unix shell / awk,,C/C++/C#,,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,Programmer,Self-taught,100,0,0,0,0,0,Speech Recognition,,"Some college/university study, no bachelor's degree",Other,20 to 99 employees,Decreased significantly,Don't know,,"N/A, I did not receive any formal education",,,Text data,,,,"C/C++,Perl,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,70,0,0,30,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,"3,000,000",JPY,Has decreased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Engineer,Fine,"Employed by non-profit or NGO,Self-employed",TensorFlow,Neural Nets,Python,Other,"Arxiv,Blogs,Company internal community,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,,,,,,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Data Analyst,Engineer,Programmer",Work,25,0,75,0,0,0,"Computer Vision,Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs",A bachelor's degree,Military/Security,"5,000 to 9,999 employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Other",Sometimes,100GB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R",,,,Often,,,,,,,,,,,,,Sometimes,,,Most of the time,Sometimes,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,HMMs,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation",,,Often,,,,Often,,,,,,Sometimes,,,Often,,Often,,,Often,,,,,,Often,,,,,,,20,25,5,25,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,n/a,n/a,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,133000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Google Search,"Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Data Analyst,Work,45,20,0,30,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - GANs",Primary/elementary school,Other,"10,000 or more employees",Decreased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Other","C/C++,Jupyter notebooks,Python,QlikView,R,SQL",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,Rarely,Often,,,,,,,,,Most of the time,,,,,,,,,,"Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Neural Networks,Random Forests,Segmentation,Other",,,,,,,,,Often,,,Often,,,Often,,,,,Often,,,Often,,,Often,,,,,Often,,,60,15,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Sometimes,,,,Often,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,55000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Hong Kong,59,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Python,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,,,"Data Machina Newsletter,No Free Hunch Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",30,60,5,5,0,0,"Survival Analysis,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",High school,Financial,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Google Cloud Compute,IBM Cognos,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,RapidMiner (commercial version),SAS Enterprise Miner,SQL,Tableau",Often,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Often,,,,,Often,,,,,Most of the time,Often,,,,,Often,,,Often,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,Often,,,,Most of the time,Most of the time,,,Most of the time,Often,,,Most of the time,,,,45,5,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,Often,,Most of the time,,51-75% of projects,More internal than external,Business Department,Central Credit Bureau; Bloomberg; Reuters Feed,Data Quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,Git,Sometimes,,,,9,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,,,Python,,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Online courses",,,Very useful,,,,,,,Somewhat useful,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Scientist,Predictive Modeler,Other","Online courses (coursera, udemy, edx, etc.)",0,0,50,50,0,0,,,,Insurance,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,,Regression/Logistic Regression,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,25,25,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,,,Sometimes,,,51-75% of projects,Entirely internal,Business Department,"OSHA, BLS, Census bureau",,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Git,Sometimes,86000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Republic of China,20,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Vietnam,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,Java,Monte Carlo Methods,Java,"GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University",,Somewhat useful,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Fine arts or performing arts,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",30,20,30,10,10,0,Computer Vision,Decision Trees - Gradient Boosted Machines,,Academic,100 to 499 employees,Decreased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Rarely,1TB,Bayesian Techniques,"C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python",,,,Sometimes,,,,Sometimes,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Sometimes,Often,Sometimes,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,30,10,10,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,,,,,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Other,Poorly,Employed by government,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",28,70,0,0,0,2,,,A bachelor's degree,Government,"5,000 to 9,999 employees",Decreased slightly,Less than one year,Some other way,Very important,Other,Traditional Workstation,Relational data,Never,,,"Cloudera,Hadoop/Hive/Pig,Impala,Python,SAS Enterprise Miner,SQL",,,,,Often,,,,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,90,0,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,Often,,Often,,,,,,,,,,,,,,,,Most of the time,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Subversion,,72000,BRL,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Social Network Analysis,R,Google Search,"Blogs,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,Very useful,Very useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,,,,,Nice to have,,,,Unnecessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,,Very Important,Very Important,Somewhat important,,,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Data Scientist,Predictive Modeler,Statistician,Other",University courses,15,5,20,30,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Other,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,DataRobot,Python,R",,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,Often,Sometimes,,,Most of the time,,,,,,,,,,,5,5,5,5,80,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools",,Often,,,Sometimes,Most of the time,,,,,Most of the time,,Rarely,,,,,,,,,,100% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Git,Other",Always,135000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,40,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Python,Regression,Python,University/Non-profit research group websites,"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,,3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,,,,Other,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs",,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,,,,,,,,,,Very Important,,,,Very Important, +Female,United States,28,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Other",Self-taught,50,50,0,0,0,0,,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Fine,Employed by a company that performs advanced analytics,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,College/University,Friends network,Online courses,Stack Overflow Q&A",Very useful,,Somewhat useful,,,Very useful,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Self-taught,30,30,20,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Most of the time,1GB,"Evolutionary Approaches,Other","Jupyter notebooks,Mathematica,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Often,,,Most of the time,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Ensemble Methods,Evolutionary Approaches,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,Simulation,Time Series Analysis",,,,,,,,,Rarely,Most of the time,,,,,,Often,,,,,Sometimes,,,Often,,,Most of the time,,,Sometimes,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,,Always,60000,USD,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,,,R,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,Not Useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Data Scientist,Self-taught,50,0,20,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,,Rarely,,,,,,50,30,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,,Often,,,,,,Sometimes,,,Often,,,,,,Sometimes,Sometimes,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Never,155000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Japan,29,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,,,Necessary,Necessary,,,Nice to have,,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",15,75,0,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,,,,Very Important,,,,,,,Very Important,Very Important +Male,Canada,34,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,38,Employed full-time,,,Yes,,Other,,Employed by a company that doesn't perform advanced analytics,Python,Social Network Analysis,SQL,,"Blogs,Conferences,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,Somewhat useful,,,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",,,Retail,"10,000 or more employees",,,,Not very important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",,,,"KNIME (free version),Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Most of the time,,,Most of the time,,,,,,,"Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,50,10,10,25,5,0,Enough to tune the parameters properly,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Other,Google Search,"Blogs,Kaggle,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,"Traditional Workstation,Workstation + Cloud service",40+,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Natural Language Processing","Gradient Boosting,Neural Networks - CNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,I don't plan on learning a new ML/DS method,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects,Other",,,,,,,,,,,Somewhat useful,Very useful,,,,,,,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Scientist,Software Developer/Software Engineer,Other",Self-taught,20,20,60,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs","Some college/university study, no bachelor's degree",Pharmaceutical,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Mathematica,NoSQL,Orange,Python,Spark / MLlib,SQL,Tableau,Other",,Most of the time,,Rarely,,,,,Rarely,,,,,,Rarely,,,,,Rarely,,,,,,,Sometimes,,Rarely,,Often,,,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,Sometimes,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",,Rarely,Sometimes,,Rarely,Often,Most of the time,Often,Often,,,,,,,,,Sometimes,,,Often,,Often,Rarely,,,,,Sometimes,,,,,50,15,2,30,3,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Other",Often,,,,,,,,,Sometimes,,,,,,,,,,,,Often,100% of projects,Approximately half internal and half external,IT Department,"clinicaltrials.gov, pubmed, cortellis, pdb","Setting up SolrCloud indexing over the data in hadoop/hbase for utilization by python was horrendous, lots of problems. Prefer ElasticSearch / Kibana in future.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",S3,"Bitbucket,Git",Rarely,250000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Singapore,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,United States,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"KDnuggets Blog,Linear Digressions Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",15,50,0,30,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,Malaysia,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Perfectly,Self-employed,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,YouTube Videos",,Very useful,,,Somewhat useful,Very useful,Very useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,More than 10 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Manufacturing,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",Sometimes,100MB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Java,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Gradient Boosted Machines,Neural Networks,Random Forests,Time Series Analysis",,,,,,,Most of the time,,,,,Most of the time,,,,,,,,Most of the time,,,Often,,,,,,,Often,,,,30,20,20,0,30,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,Often,,,,,Most of the time,,,,,,Most of the time,Sometimes,,,,,,,51-75% of projects,Entirely internal,Business Department,market prices from publishers,sharing; consistency,Other,Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"150,000",MYR,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,Brazil,32,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Official documentation,Online courses,Textbook,Tutoring/mentoring",,,,,,,,,,Somewhat useful,Very useful,,,,Somewhat useful,,Very useful,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",GPU accelerated Workstation,40+,Github Portfolio,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Data Miner,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,29,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,Researcher,University courses,30,5,15,40,10,0,"Natural Language Processing,Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Never,10MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,Sometimes,Most of the time,,,Rarely,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Rarely,,,,Sometimes,Most of the time,Sometimes,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Rarely,,Rarely,Sometimes,Rarely,,,Rarely,,,Rarely,,,,,5,5,0,15,5,70,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT",Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,76-99% of projects,Entirely internal,Business Department,"census data, other assorted public datsets",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,120000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Anomaly Detection,R,Google Search,"Blogs,Conferences,Personal Projects",,Very useful,,,Very useful,,,,,,,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Engineer,Researcher,Statistician",Work,50,0,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Technology,"5,000 to 9,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation",Sometimes,,,,,Most of the time,Most of the time,,Most of the time,,,Often,,,,,,,Sometimes,,Sometimes,,Most of the time,,,Often,,,,,,,,30,12,12,22,24,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,Often,,,,,,,Often,,,,Most of the time,Often,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,,155,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Statistician",Self-taught,65,25,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Government,"1,000 to 4,999 employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Relational data,Other",Most of the time,100MB,"Bayesian Techniques,Decision Trees,HMMs,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,Rarely,,Most of the time,,,,,,,,,Often,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Rarely,,Sometimes,,,Sometimes,Most of the time,Sometimes,,,,,Rarely,Often,,Often,,Rarely,,,Often,,Rarely,,,,,Rarely,,Most of the time,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Privacy issues",Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,100% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Subversion,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,NA,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,,3-5 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,Tableau,Time Series Analysis,Python,Google Search,"Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,10GB,"CNNs,Neural Networks","Amazon Web services,Python,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,Often,,Often,Most of the time,Often,Often,,,,,,,,,,,Most of the time,Often,,Often,,,,,,,,,,,30,10,5,20,20,15,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,Sometimes,,,,,Most of the time,,,,,Often,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,29,Employed full-time,,,Yes,,Data Miner,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",Python,Neural Nets,R,Other,"Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Statistician",Work,30,30,30,10,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests",,,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,,,Sometimes,,,,,,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,,Sometimes,,,,10-25% of projects,Entirely external,IT Department,,dose not have enough time to learn something new,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Rarely,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Australia,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,A health science,,Other,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,India,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,"Data Scientist,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",80,10,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Internet-based,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Markov Logic Networks","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,70,10,0,20,0,0,Enough to tune the parameters properly,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,"Not employed, but looking for work",,,,,,,,SAS Base,Cluster Analysis,SAS,Other,"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",3,95,2,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Other (please specify; separate by semi-colon)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Indonesia,25,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,I don't plan on learning a new ML/DS method,Other,Google Search,"Blogs,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,PhD,No,Master's degree,"Information technology, networking, or system administration",,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,,"Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,Very useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series",,A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Never,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,TensorFlow",,,,,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,15,25,5,35,20,0,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,10-25% of projects,More internal than external,Standalone Team,,Integration due to privacy,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,,,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Google Search,University/Non-profit research group websites","College/University,Personal Projects,Textbook",,,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Kaggle competitions,100,0,0,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,20+,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Very Important +Male,United States,NA,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"DataTau News Aggregator,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",35,20,15,0,30,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Java,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,Often,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,RNNs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Often,Sometimes,,,Most of the time,,,,Often,,,,Most of the time,,,Most of the time,,Most of the time,,,,,Most of the time,,,,70,20,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Female,United States,17,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Stan,Bayesian Methods,Python,,"Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Very useful,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Researcher,Software Developer/Software Engineer,Other",University courses,35,5,10,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Mix of fields,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","Amazon Machine Learning,Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Stan,Unix shell / awk,Other",Rarely,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Most of the time,Most of the time,,,,,Most of the time,Rarely,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Most of the time,,Most of the time,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Often,,,,,Sometimes,,Sometimes,,,,,Rarely,,Most of the time,,,,20,45,5,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",Sometimes,,,Rarely,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,,,,,Sometimes,,,Sometimes,51-75% of projects,Entirely internal,Other,DAT; FMIC; RA,"Lack of sufficient volume, duration, & variation","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,S3,Git,Rarely,0,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Neural Nets,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Not Useful,,,,,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Other,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,South Africa,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Uplift Modeling,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Business Analyst,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,40,20,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Mix of fields,Fewer than 10 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,Neural Networks,"Microsoft Excel Data Mining,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Text Analytics,Time Series Analysis",Sometimes,,,,,,Often,Sometimes,,,,,,Sometimes,Sometimes,Often,,,,Often,,,,,,,,,Sometimes,Sometimes,,,,55,20,5,10,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,,Rarely,,,,,,,,,Sometimes,,,Most of the time,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,2000000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"Data Stories Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,,Necessary,,,,Udacity,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,I haven't started working yet,University courses,20,20,25,25,10,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Non-Kaggle online communities,YouTube Videos",,,,,,,Very useful,,Very useful,,,,,,,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Kaggle competitions,20,30,0,0,50,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important +Male,United States,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Regression,Python,University/Non-profit research group websites,"College/University,Friends network,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),Other","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Retail,"1,000 to 4,999 employees",Decreased significantly,3-5 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,IBM Cognos,Jupyter notebooks,NoSQL,Python,R,SAS Base,Spark / MLlib,SQL,Tableau",Sometimes,Often,,,,,,,Sometimes,Often,,,,,,,Often,,,,,,,,,,Rarely,,,,Often,,Rarely,,,,,Often,,,Rarely,Most of the time,,,Rarely,,,,,,,"A/B Testing,Collaborative Filtering,Lift Analysis,Logistic Regression,Markov Logic Networks,Recommender Systems",Most of the time,,,,Often,,,,,,,,,,Often,Often,Rarely,,,,,,,Sometimes,,,,,,,,,,50,0,40,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,Most of the time,,,,,,Sometimes,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,Experian; Oracle,Ingest,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"100,000",USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,South Korea,17,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),,,No,I prefer not to answer,Mathematics or statistics,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Computer Vision,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,Google Search,Arxiv,Very useful,,,,,,,,,,,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,"Computer Scientist,Engineer,Machine Learning Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,C/C++,Proprietary Algorithms,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,0,0,40,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Image data,Text data",Sometimes,100MB,Random Forests,"Amazon Web services,NoSQL,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,30,20,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,,,,,,,,,,,Most of the time,Sometimes,,,,,,76-99% of projects,More internal than external,IT Department,,The data is dirty and a lot is missing,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Sometimes,55000,GBP,Has decreased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,South Africa,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,Not Useful,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Statistician,Other",University courses,30,10,25,30,5,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,QlikView,R,Spark / MLlib,SQL,Stan,TensorFlow",,,,,,,,,Often,,,,,,,,Often,,,,,Rarely,,Often,Sometimes,,,,,,Sometimes,Rarely,Most of the time,,,,,,,,Often,Most of the time,Sometimes,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,Rarely,,Most of the time,Often,Often,Often,Sometimes,,Often,,Often,Often,Sometimes,,Rarely,Rarely,Often,Often,Often,Often,Often,Sometimes,Sometimes,Rarely,Sometimes,Rarely,Rarely,,,,60,10,15,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Often,,,Often,Often,,,,Often,,,Sometimes,,Often,,,Rarely,Often,Often,Sometimes,,Most of the time,26-50% of projects,Approximately half internal and half external,Standalone Team,,"Too much data, finding relevant signal in noise","Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other","Notebooks, RMarkdown, Shiny",Bitbucket,Sometimes,600000,ZAR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,34,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Deep learning,C/C++/C#,University/Non-profit research group websites,Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",University courses,0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,NA,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",30,50,15,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data,Other",Most of the time,10GB,"Gradient Boosted Machines,Random Forests","Jupyter notebooks,Python,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,Often,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,Most of the time,,,,30,30,10,10,10,10,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,,Sometimes,,,,,,Often,Often,,76-99% of projects,Entirely internal,Standalone Team,Regulatory reports; weather,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,145000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed part-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Julia,Bayesian Methods,Python,Google Search,"Official documentation,Podcasts,Textbook",,,,,,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Other,Self-taught,30,0,50,20,0,NA,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,High school,Internet-based,100 to 499 employees,Increased significantly,Less than one year,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,Tableau,Other",,Often,,,,,,,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Rarely,,,,Most of the time,,,"A/B Testing,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Rarely,,,,,,,Sometimes,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Sometimes,,,,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,Standalone Team,None,"To make sure that the data mined represents what I think it represents (considering all corner cases), and to make sure that it is complete.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"Metabase, Google Data Studio","Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,45000,BRL,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,66,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Text Mining,SQL,Government website,"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Survival Analysis,Time Series","Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Random Forests,Time Series Analysis",Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,30,10,10,20,30,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,Often,,,,Sometimes,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Rarely,666000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,Amazon Machine Learning,,,,"College/University,Textbook,Trade book",,,Very useful,,,,,,,,,,,,Very useful,Very useful,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,I don't write code to analyze data,"Statistician,Other","Online courses (coursera, udemy, edx, etc.)",22,10,10,50,0,8,"Computer Vision,Machine Translation,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A professional degree,Other,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Rarely,<1MB,"Bayesian Techniques,Decision Trees",Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,30,20,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",Sometimes,,,,,,,,Sometimes,Sometimes,,,,,,,,Often,,,,,51-75% of projects,Do not know,Business Department,,,,,,,,300000,INR,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Other,NA,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Bayesian Methods,R,GitHub,"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,,,,,,Necessary,,Necessary,Necessary,Necessary,,Necessary,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,26,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,Somewhat useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,I haven't started working yet,Self-taught,100,0,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Government,Fewer than 10 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,10MB,Regression/Logistic Regression,Java,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Recommender Systems,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,10,30,40,20,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Less than 10% of projects,,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Subversion,Sometimes,384000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects",,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,Researcher,Self-taught,25,25,25,25,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Pharmaceutical,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Other,Sometimes,10GB,"Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,,Most of the time,Often,Often,,,Often,,,,Often,,,,,Often,,Often,,,,,,,,,,,30,20,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,Sometimes,,Sometimes,,,,,,,26-50% of projects,Entirely internal,Other,dbsnp;cancer genome atlas;international cancer genome center,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Never,100000,CAD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Tableau,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Master's degree,No,Bachelor's degree,Other,Less than a year,Other,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Pakistan,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,KDnuggets Blog,3-5 years,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,20,30,0,30,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,,,,,,,,,,,,,,,, +Male,Switzerland,33,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,Google Search,"Blogs,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",86,0,0,0,14,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,,,,,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Neural Networks,Random Forests","Hadoop/Hive/Pig,IBM SPSS Modeler,Python,QlikView,R,SQL,TensorFlow",,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,,Often,Often,Often,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Association Rules,CNNs,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Neural Networks,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,,Sometimes,,,Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,Most of the time,Most of the time,Often,,,,60,5,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,,Rarely,,,,Sometimes,,,,,,Sometimes,,Sometimes,Often,Sometimes,Sometimes,,51-75% of projects,Entirely external,Central Insights Team,,unavailability,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Always,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Python,Neural Nets,Python,"I collect my own data (e.g. web-scraping),Other",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Other,University courses,15,0,45,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,,"Jupyter notebooks,Microsoft Excel Data Mining,Python,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,Sometimes,,,,,,,,Often,,,Often,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Often,Sometimes,,,,,,,,,,Sometimes,Often,,,,,,26-50% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,,Rarely,86000,AUD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,Very useful,Somewhat useful,"FastML Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Laptop or Workstation and local IT supported servers,Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Google Search,"Arxiv,Blogs,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,,,,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,25,50,25,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Other",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,,Often,,,"Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems",,,,,,,Often,,,,,,,,,Rarely,,,Rarely,,Rarely,,,Rarely,,,,,,,,,,25,2,3,10,30,30,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,Sometimes,Often,Sometimes,Often,,,,,Sometimes,,,,,,Often,,Sometimes,,Sometimes,,,51-75% of projects,Entirely internal,Business Department,,Messy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,100000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,United States,15,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,Python,GitHub,"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,1-2 years,,,,,Necessary,Nice to have,,Nice to have,Nice to have,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Bayesian Techniques,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,51,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Management information systems,More than 10 years,"Business Analyst,DBA/Database Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,25,50,0,25,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Statistics,R,RapidMiner (commercial version),RapidMiner (free version),SQL,Tableau",,Often,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Most of the time,,,,,,,Often,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,Often,Often,,,Often,,,,Most of the time,,Often,,,Most of the time,Sometimes,Often,,,,,,Sometimes,Sometimes,,,,45,15,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,Most of the time,,,,,,,,,Often,,,Often,,Sometimes,,,,100% of projects,Approximately half internal and half external,Business Department,,Dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,133000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,GitHub,"Blogs,College/University,Kaggle,Online courses,Tutoring/mentoring",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,Very useful,,"Data Machina Newsletter,DataTau News Aggregator,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,GitHub,"College/University,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,"Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,A professional degree,Technology,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,1GB,Regression/Logistic Regression,"Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,Tableau",,Rarely,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,Often,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,Often,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,70,10,5,10,5,0,Enough to run the code / standard library,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Never,50000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Conferences,Newsletters,Online courses,Textbook,Other",,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,Very useful,,,,Very useful,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,20,50,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Never,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,RNNs,Other","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,R,SQL",,Rarely,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests,RNNs,Text Analytics,Other",Often,,,,,Sometimes,Often,Rarely,,,,Sometimes,,,,,,,,,,,Sometimes,,Rarely,,,,Often,,,,Most of the time,40,5,10,20,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",Most of the time,,,Often,Often,,,,Sometimes,,,,,,Often,,Sometimes,,,,,,100% of projects,More internal than external,Business Department,Facebook;Twitter,"Data is incorrect, contradicts other data sources,legacy systems lack data dictionaries,disparate systems,data lake not stable and not open to whole organisation","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Bitbucket,Sometimes,130000,AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,,,,Other,"1,000 to 4,999 employees",Stayed the same,,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,50,0,0,20,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues",Often,Often,,,,Sometimes,,,Most of the time,,Sometimes,,Often,,,,Sometimes,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Anomaly Detection,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Researcher",Self-taught,40,20,40,0,0,0,"Adversarial Learning,Computer Vision,Unsupervised Learning","Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data",Sometimes,10GB,"CNNs,Ensemble Methods,GANs,Neural Networks,SVMs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,GANs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Most of the time,,Most of the time,Most of the time,,Sometimes,,Often,,,Sometimes,,,,,,Often,Often,,Often,,,,,Sometimes,,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Often,,,,Most of the time,,,,,,Sometimes,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,35,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,,Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Very useful,,"Data Machina Newsletter,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,0,0,90,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,"10,000 or more employees",Increased significantly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods","Cloudera,Hadoop/Hive/Pig,Java,KNIME (free version),MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,RapidMiner (free version),SAP BusinessObjects Predictive Analytics,SQL,Tableau,TIBCO Spotfire",,,,,Rarely,,,,Rarely,,,,,,Sometimes,,,,Sometimes,,Rarely,,Sometimes,,Most of the time,,Sometimes,,,,Sometimes,Rarely,Sometimes,,Sometimes,,Sometimes,,,,,Most of the time,,,Most of the time,,Most of the time,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Segmentation,Text Analytics",,,,,,,Often,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Often,,,,,75,5,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",Most of the time,,,,,,,Often,Often,,Often,,Often,,Often,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,client data,"missing data, dirty data, human errors, IT dept not understanding their data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",,65000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,,,Somewhat useful,Not Useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,0,0,0,70,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Turkey,38,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook",Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Data Analyst,Data Scientist,Operations Research Practitioner,Researcher,Software Developer/Software Engineer",University courses,35,30,0,35,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Other",Most of the time,10GB,"Bayesian Techniques,CNNs,Neural Networks","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Neural Networks,SVMs,Time Series Analysis",,,Sometimes,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,10,50,20,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,Often,,,,,,Most of the time,Most of the time,,Sometimes,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Central Insights Team,Imagenet,Characterizing the data without domain expertise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,36000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Malaysia,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM SPSS Statistics,Association Rules,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Personal Projects,Textbook,Other",,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,,Very useful,,,,"DataTau News Aggregator,Talking Machines Podcast,The Analytics Dispatch Newsletter",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Julia,Text Mining,R,Google Search,"Arxiv,College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,Self-taught,20,40,0,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Azure Machine Learning,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,Rarely,Most of the time,,,Most of the time,Often,,Sometimes,,,,Often,,,,,,Sometimes,Sometimes,,Sometimes,,,Often,,,,,,,,10,60,20,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Mexico,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,GitHub,"College/University,Kaggle,Personal Projects,Textbook",,,Very useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Neural Networks,Regression/Logistic Regression","Java,MATLAB/Octave,R,RapidMiner (free version),SAS Base,SQL,Tableau",,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Often,,Sometimes,,,Sometimes,,,,Often,,,Sometimes,,,,,,,"Evolutionary Approaches,Logistic Regression,Neural Networks,Text Analytics",,,,,,,,,,Sometimes,,,,,,Often,,,,Often,,,,,,,,,Sometimes,,,,,20,30,10,20,20,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools",,,,,Often,,,,Most of the time,Often,Most of the time,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git,Subversion",Sometimes,91000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Romania,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,,FastML Blog,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity,Other","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,0,10,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important +Male,Other,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,40,40,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,,1GB,,"Java,Jupyter notebooks,NoSQL,Python,R,SAS Base,SQL,Other",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,Often,,,,Often,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,"Association Rules,kNN and Other Clustering,Logistic Regression,Segmentation,Time Series Analysis",,Most of the time,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,50,25,0,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of significant domain expert input",,Often,,,,,,,,,Often,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,Transactional (pos-terminal) dataf from fmcg retailers,Cleaning the data and ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,,26000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,51,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Text Mining,Python,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Textbook",,Very useful,,,,,,,,Very useful,,,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,Other,Work,10,0,70,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"10,000 or more employees",Increased significantly,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10MB,"Decision Trees,Gradient Boosted Machines","C/C++,IBM SPSS Modeler,Perl,Python,R,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,Often,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,Rarely,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Sometimes,Rarely,Often,Most of the time,Sometimes,Most of the time,,,,Often,,Often,,Sometimes,,,Sometimes,Often,Often,,Most of the time,Sometimes,Rarely,Often,,Most of the time,Most of the time,Often,,,,50,20,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,Sometimes,,Less than 10% of projects,More internal than external,Other,Twitter;Weather;Traffic;Economics,Feature Engineering yo extract features,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,15000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Blogs,Online courses,YouTube Videos",Very useful,Very useful,,,,,,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Other",Work,30,20,30,15,5,0,"Computer Vision,Natural Language Processing","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,,"Neural Networks,SVMs","Java,Spark / MLlib,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,5,10,80,5,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Weka,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,30,15,5,15,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Gradient Boosting,Neural Networks - GANs",High school,Mix of fields,10 to 19 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Sometimes,100GB,"GANs,Markov Logic Networks,RNNs,SVMs","IBM SPSS Statistics,R,SQL,Tableau",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",Sometimes,,,,,,Most of the time,Often,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,,,25,20,20,20,10,5,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Standalone Team,no,Data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,0,INR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,South Korea,33,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Government website,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,Data Elixir Newsletter,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,,No,Bachelor's degree,,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,Logistic Regression,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,South Korea,24,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Very useful,,,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Data Miner,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,0,0,40,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Python,Random Forests,Python,Google Search,Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,Researcher,University courses,80,0,0,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - CNNs",,Technology,Fewer than 10 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Image data,Relational data",Rarely,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Naive Bayes,Natural Language Processing,Neural Networks",,,Sometimes,,,,,Often,Often,Often,,,,,,,,Often,Often,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,27,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Other",University courses,0,40,40,20,0,0,,,A bachelor's degree,Retail,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,CNNs,"Amazon Machine Learning,Amazon Web services,Perl,Python,R,RapidMiner (commercial version),RapidMiner (free version),SQL,Tableau",Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Often,,Most of the time,Sometimes,Sometimes,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression",Sometimes,,,Sometimes,,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,,,,Sometimes,,,,,Sometimes,Sometimes,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,,Rarely,80000,TRY,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Australia,43,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,More than 10 years,"Data Analyst,Researcher",University courses,85,0,0,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Government,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Often,,Sometimes,,Often,Sometimes,,Often,,,Sometimes,,Sometimes,Often,Often,,,,65,15,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,Sometimes,,Sometimes,,,Often,,,Sometimes,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,110,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,Google Search,"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,No Free Hunch Blog",< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,20,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,20,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,Very useful,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,50,0,20,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Time Series Analysis,,Google Search,"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,Russia,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Online courses,Textbook",,Somewhat useful,,,,,,,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Software Developer/Software Engineer,Self-taught,70,20,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"5,000 to 9,999 employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Rarely,10GB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics",,,,Sometimes,,Most of the time,Often,,,,,,,,,Sometimes,,Rarely,Most of the time,Often,,,,,Sometimes,,,Often,Often,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,Wikipedia,System logs about alerts,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,9300000,JPY,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,I never declared a major,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,Yes,,Researcher,Poorly,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Podcasts,Textbook",Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,20,15,10,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","C/C++,Cloudera,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL",,,,Sometimes,Rarely,,,,,,,,,,,,Often,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"CNNs,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Often,,,,,Often,,,,,Sometimes,,,,,,Often,Often,,,,,Most of the time,,Often,,,,,,70,20,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Sometimes,,,,,,,,Sometimes,Sometimes,,Most of the time,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Rarely,36000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,50,"Independent contractor, freelancer, or self-employed",,,No,Yes,Researcher,Poorly,Self-employed,R,Deep learning,R,I collect my own data (e.g. web-scraping),"Non-Kaggle online communities,Personal Projects",,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,15+ years,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,,,,,Traditional Workstation,0 - 1 hour,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,"Operations Research Practitioner,Researcher,Other",Self-taught,90,0,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Military/Security,20 to 99 employees,Decreased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,Traditional Workstation,Text data,,,Other,"C/C++,Microsoft Excel Data Mining",,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,40,20,10,10,20,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Need to coordinate with IT,Privacy issues",,,,,Often,,,,,,,,Often,,Most of the time,,Most of the time,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,800000,INR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,,,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,Japan,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Tutoring/mentoring,YouTube Videos",,,,,Very useful,,Very useful,,,,,,,,,,Somewhat useful,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,Data Analyst,Self-taught,50,0,50,0,0,0,Time Series,"Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,Fewer than 10 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,1GB,"Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","Perl,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Neural Networks,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Often,,,Often,,Most of the time,,Most of the time,Sometimes,,Most of the time,,,,40,40,10,10,0,0,Enough to tune the parameters properly,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,150000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,PhD,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,50,0,0,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,Republic of China,28,Employed full-time,,,No,Yes,Data Analyst,Fine,Self-employed,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Online courses",,,,,,Very useful,,,,,Very useful,,,,,,,,"Data Machina Newsletter,KDnuggets Blog,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Data Analyst,Self-taught,30,30,10,0,30,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Netherlands,52,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Other","College/University,Company internal community,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,75,10,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,500 to 999 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10MB,"Decision Trees,Regression/Logistic Regression","IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Spark / MLlib",,,,,,,,,,,,,Often,,Sometimes,,Often,,,,,Sometimes,,,,,,,,,Often,,Often,,,,,,,,Sometimes,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests",,,,,,Sometimes,Often,Often,,,,,,Often,,Often,,,Rarely,,,,Sometimes,,,,,,,,,,,40,30,5,20,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,Often,,,Sometimes,Often,,,,,Often,,,Often,Often,Sometimes,Sometimes,,51-75% of projects,More external than internal,Other,open data of dutch government,many independent source.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,"Bitbucket,Git,Subversion",,"65,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Singapore,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",5,40,40,10,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests",Sometimes,,,,,Often,Often,,Rarely,,,,,,,Rarely,,,,Sometimes,,,Often,,,,,,,,,,,50,10,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,51-75% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Rarely,72000,SGD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Official documentation,Online courses",Very useful,Very useful,,,,,,,,Very useful,Somewhat useful,,,,,,,,KDnuggets Blog,3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",40+,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,R,Deep learning,Python,Google Search,"Non-Kaggle online communities,Online courses",,,,,,,,,Somewhat useful,,Very useful,,,,,,,,,< 1 year,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,University courses,10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,India,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Cluster Analysis,R,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Engineer,Self-taught,40,20,0,40,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Internet-based,100 to 499 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Very important,Other,Basic laptop (Macbook),Text data,Rarely,,Regression/Logistic Regression,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,0,40,20,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,2000000,INR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +,Indonesia,22,"Not employed, but looking for work",,,,,,,,Python,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",55,45,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,56,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Uplift Modeling,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Official documentation,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,,Not Useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,More than 10 years,"Data Analyst,Operations Research Practitioner",University courses,50,10,30,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Retail,"1,000 to 4,999 employees",Increased slightly,6-10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Often,Often,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",Rarely,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,Often,,,,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,65,20,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,,,Most of the time,,,Sometimes,,,,,,Often,,,,,,,,,100% of projects,More internal than external,Business Department,TV ratings,Understanding definitions; getting consistent updates from 3rd parties,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,53000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,Java,Neural Nets,Python,GitHub,"College/University,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,Very useful,,,Very useful,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Statistician",Self-taught,40,20,20,10,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,100MB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Ensemble Methods,Natural Language Processing,Neural Networks",,,,,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,Most of the time,,Sometimes,,,Often,,,,,,,10-25% of projects,More external than internal,IT Department,news data like RCV,data clean,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"45,000",,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by a company that performs advanced analytics,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Conferences,Kaggle,Newsletters,Personal Projects,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Statistician",University courses,30,10,40,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,KNIME (free version),Microsoft SQL Server Data Mining,Minitab,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,Sometimes,Rarely,,,,,Sometimes,Rarely,Often,,,,,Often,Often,,,Often,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,,Often,Often,,,,,,,Often,Often,,Sometimes,,Sometimes,Often,,Sometimes,,,Often,,,Often,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,Central Insights Team,"Bureaus, lifestyle segments, marketing databses",Integration n understanding external data,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,1800000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Company internal community,Non-Kaggle online communities,Stack Overflow Q&A,Other",,Very useful,,Very useful,,,,,Somewhat useful,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Data Analyst,Programmer,Researcher,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Python,R,SAS Base,SQL,Tableau,TensorFlow",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Sometimes,Often,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,RNNs,SVMs",Sometimes,,Often,,,,Most of the time,,Often,,,Most of the time,,,,Most of the time,,Most of the time,Sometimes,Most of the time,,,,,Often,,,Rarely,,,,,,60,25,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,,Most of the time,,,,,,,Most of the time,,,Most of the time,,,Less than 10% of projects,More internal than external,Other,,Lack of cohesive standards,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,125000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,10,0,10,10,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,1GB,"Gradient Boosted Machines,Random Forests,RNNs","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,"CNNs,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,RNNs,SVMs,Time Series Analysis",,,,Often,,,Most of the time,Sometimes,,,,Sometimes,,Often,,,,,,,Often,,,,Often,,,Sometimes,,Often,,,,40,20,0,20,20,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Never,"8,000,000",JPY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,10,10,40,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,NoSQL,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,RNNs,Simulation,Time Series Analysis",,,Sometimes,,Sometimes,Sometimes,Most of the time,Sometimes,Sometimes,,,,Rarely,Often,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,,70,1,15,11,3,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,Often,,100% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Never,,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Iran,20,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Other,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,60,0,0,40,0,0,,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,26,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,Logistic Regression,A master's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",,Don't know,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,70,0,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,,,Sometimes,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,French maps,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,18000,,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,India,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,,DataRobot,Neural Nets,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",25,40,35,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,A doctoral degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Sometimes,1MB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Java,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,SQL",Often,Often,,Rarely,,,,,,,,,,,Rarely,,,,,,Often,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,"Decision Trees,Logistic Regression,Natural Language Processing,Recommender Systems,Text Analytics",,,,,,,,Sometimes,,,,,,,,Most of the time,,,Often,,,,,Often,,,,,Sometimes,,,,,35,30,30,5,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations of tools,Privacy issues",,Often,Sometimes,Rarely,,,,,Most of the time,,,,Sometimes,,,,Rarely,,,,,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,South Korea,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"FastML Blog,Partially Derivative Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,,Less than a year,I haven't started working yet,Self-taught,30,10,10,0,30,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,36,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,,,Very useful,,Very useful,Very useful,,,,Somewhat useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Other",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,40,0,0,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,Anomaly Detection,R,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Other,Self-taught,50,0,0,0,0,50,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon),High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Germany,49,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,More than 10 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,10,0,90,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,Don't know,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,,,,"Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Perl,Python,QlikView,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Germany,53,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Friends network,Kaggle,Online courses",,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,40,10,5,5,0,Outlier detection (e.g. Fraud detection),Bayesian Techniques,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,50,"Independent contractor, freelancer, or self-employed",,,No,Yes,DBA/Database Engineer,Poorly,Self-employed,Tableau,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Newsletters",,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",NA,100,0,0,0,0,,,,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,United Kingdom,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Other,,,,,Not at all important,,,,,,,"Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,NoSQL,Python,R,SQL,TensorFlow",,Most of the time,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Most of the time,,,,,,"A/B Testing,CNNs,GANs,HMMs,Markov Logic Networks,Neural Networks,RNNs,Segmentation,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,40,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Limitations of tools,Privacy issues",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,22000,GBP,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Russia,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,SQL,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",Online courses,,,,,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,I prefer not to answer,Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Relational data,,100MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,,,,Sometimes,,,,10,20,0,10,10,50,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,,,,,I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Poland,33,Employed full-time,,,Yes,,Statistician,Poorly,Employed by government,TensorFlow,Neural Nets,R,,"Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,More than 10 years,"Business Analyst,Data Analyst",Self-taught,70,10,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,Sometimes,Most of the time,,,,55,15,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,100% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Never,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,15,20,5,30,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",I prefer not to answer,Technology,"5,000 to 9,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Rarely,10MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Cross-Validation,Gradient Boosted Machines,Logistic Regression,Random Forests,Recommender Systems,Text Analytics",Sometimes,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,,,,Sometimes,Rarely,,,,,Often,,,,,50,15,10,20,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources",,Often,,,,,,,,Most of the time,,,,,,,,,,,,,26-50% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Never,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,< 1 year,,,,,Necessary,,,Necessary,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,A health science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Julia,Deep learning,Python,Google Search,"Blogs,Non-Kaggle online communities,Stack Overflow Q&A",,Very useful,,,,,,,Very useful,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,6 to 10 years,"Computer Scientist,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,40,20,40,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches",Primary/elementary school,Telecommunications,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Evolutionary Approaches,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Rarely,,,,Rarely,Rarely,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Evolutionary Approaches,Time Series Analysis",,,Sometimes,,,,Most of the time,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,,,45,20,5,30,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,Most of the time,,,,,Sometimes,,,,,,,100% of projects,Entirely internal,IT Department,,Understand propietary data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,35000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,Very useful,,,Very useful,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,20,20,50,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,100 to 499 employees,Increased significantly,Less than one year,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SAS Base,SQL,Tableau,TensorFlow",,Sometimes,,,,,,Most of the time,Rarely,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,Rarely,,,,Most of the time,,,Often,Often,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,Sometimes,Often,Often,Most of the time,Most of the time,Often,,,Often,Sometimes,Sometimes,,Sometimes,,,Sometimes,,Often,,Often,Often,,Often,,Often,Most of the time,Most of the time,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,Often,,,51-75% of projects,More internal than external,IT Department,"User data on social media, financial, income availability, ","reliability on data available, cleaning","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Sometimes,1600000,INR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",5,80,15,0,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,IBM Cognos,Jupyter notebooks,Microsoft Excel Data Mining,Minitab,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SQL,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,Sometimes,,,,Sometimes,Often,,,,,,,Most of the time,,,,,,Most of the time,,,Sometimes,,Sometimes,,,Most of the time,Rarely,Often,,,,,,,,,Sometimes,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,"Association Rules,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,Sometimes,Often,Sometimes,,,Most of the time,Often,,,,,,Often,,Most of the time,,Most of the time,Most of the time,Often,Most of the time,,Sometimes,,,,,Most of the time,Most of the time,,,,,50,30,10,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,Most of the time,Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,76-99% of projects,More internal than external,Business Department,"Data.gov, UCLA, ",Hardware,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,2000000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Iran,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,"Data Stories Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Data Scientist,Engineer,Researcher",Self-taught,50,10,0,20,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,1GB,"CNNs,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,,,,,Sometimes,Often,,Most of the time,,Often,,,Most of the time,,,,,Often,Often,,,Sometimes,,,,40,20,20,20,0,0,Enough to refine and innovate on the algorithm,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,,8,,,,,,,,,,,,,,,,,, +Male,Iran,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,Very useful,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,,5-10 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,60,20,0,20,0,0,Computer Vision,"Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,28,Employed part-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Official documentation,Textbook",Very useful,,Very useful,,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Computer Scientist,Programmer,Researcher",Self-taught,90,10,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,Rarely,,,,Often,,,,,,Often,,Rarely,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Markov Logic Networks,Naive Bayes,Natural Language Processing,Recommender Systems,SVMs",,,Sometimes,,,Often,,Often,,,,,,,,,Rarely,Sometimes,Often,,,,,Sometimes,,,,Often,,,,,,30,30,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,Often,,10-25% of projects,Entirely internal,IT Department,"we work mostly with publicly available data. For example, Common Crawl dataset or Dresden Table Corpus.",The biggest challenge is to develop efficient processing algorithms for the tasks we have at our university.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial,Subversion",Sometimes,,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Belgium,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring",Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Business Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Other,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,QlikView,R,SQL,Unix shell / awk,Other,Other,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Rarely,Most of the time,,,,,,,,,Often,,,,,,Sometimes,Sometimes,Most of the time,Sometimes,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,Sometimes,,Sometimes,,,,,Often,,,,,,,,Often,Most of the time,,,,50,25,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data",Sometimes,Often,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,More external than internal,IT Department,"Open data (weather, geographical, ...); Regulatory",Cleaning and merging heterogeneous sources,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,GitHub,"Arxiv,Blogs,Online courses,YouTube Videos",Very useful,Very useful,,,,,,,,,Very useful,,,,,,,Very useful,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Self-employed",Other,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,70,10,20,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Java,NoSQL,Python,R,SQL,TensorFlow,Other",,,,,,,,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,Rarely,,,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Sometimes,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,Often,Sometimes,,,Most of the time,,Sometimes,Often,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,,Often,Sometimes,Sometimes,,Sometimes,Often,Most of the time,,,,Most of the time,,Often,,76-99% of projects,Approximately half internal and half external,Business Department,Kaggle open data; Governmental open data (e.g. ECB statistics); NGO open data (e.g. wheather info); ,Data Quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,110000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Sweden,44,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,Julia,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Stack Overflow Q&A",Very useful,Very useful,,,Very useful,,,,,,,,,Very useful,,,,,"O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,A humanities discipline,More than 10 years,Researcher,University courses,0,0,70,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Rarely,Sometimes,,Sometimes,Sometimes,,,Rarely,,,,Often,,Sometimes,,Sometimes,Most of the time,Often,Most of the time,,Rarely,,Sometimes,,,Often,Most of the time,Rarely,,,,25,50,2,3,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,,,Often,,,Rarely,Rarely,Sometimes,Sometimes,Rarely,,Rarely,Rarely,Rarely,,,,Often,Often,,Less than 10% of projects,More internal than external,Other,web data; social media,access and availability,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,700000,SEK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,6 to 10 years,Other,Self-taught,50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Often,Often,Sometimes,,,Often,,Sometimes,,,,,,,Often,,Often,,,,,,,Often,,,,50,30,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,Often,,,Often,Most of the time,,Often,,,,,Sometimes,,,Sometimes,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Other,Sometimes,110000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,France,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Mathematics or statistics,,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Female,France,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,,,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,1TB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,,,,"A/B Testing,CNNs,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,RNNs",Most of the time,,,Most of the time,,,Most of the time,Often,,,,,,,,Most of the time,,,,Most of the time,,,Often,,Most of the time,,,,,,,,,50,10,15,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Often,,,,,,,,,Sometimes,,,Sometimes,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,70000,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,"Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Statistician",University courses,15,65,5,10,0,5,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,100MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),SQL",,,,,,,,,,,Sometimes,Often,,,,,Often,,,,,,Most of the time,Most of the time,,,,,,,Often,,Most of the time,,Rarely,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,,,,Most of the time,Most of the time,Most of the time,,,,,,Often,Sometimes,Most of the time,,Often,,,Most of the time,,Often,,,Often,,Sometimes,Sometimes,Often,,,,30,20,5,25,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,,,Often,Most of the time,,,,,,,Most of the time,,,,,,,51-75% of projects,Entirely internal,IT Department,none,"I am new to the domain, so i don't know what matters in the industry. The management does not help. It is more concerned with operational issues.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",skype,Other,Rarely,"27,000,000",XOF,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Ukraine,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Google Cloud Compute,Monte Carlo Methods,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher",Self-taught,70,0,20,10,0,0,Computer Vision,"Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Video data",Sometimes,10GB,"CNNs,Ensemble Methods,Neural Networks,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,,,Most of the time,,,,,,,,,,,Rarely,,Often,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Most of the time,,Often,Often,,Sometimes,,,,,Rarely,,,,,,Most of the time,Sometimes,,,,,Most of the time,,Sometimes,,,,,,15,50,5,5,25,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Often,,,,,,,,Often,,,Most of the time,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,"Kitty, NYU v2, TUM, cityscape, Make 3D",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,12000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Other,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",QlikView,Deep learning,Python,GitHub,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,Linear Digressions Podcast,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important +Female,Belarus,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Cloudera,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Stack Overflow Q&A",,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",40,20,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Other,Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,R",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs",,,,,,Most of the time,,Often,,,,Sometimes,,,,Often,,,,,,,Most of the time,,,,,Rarely,,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,Sometimes,,Often,Rarely,,,,,,,Often,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,Wikipedia,Merging,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",,15000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Spain,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,"GitHub,Google Search","Blogs,Online courses,Textbook",,Very useful,,,,,,,,,Very useful,,,,Somewhat useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,Natural Language Processing,Decision Trees - Gradient Boosted Machines,A bachelor's degree,Government,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10MB,"Decision Trees,Neural Networks","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SQL,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",,Sometimes,,,,,,Most of the time,,,,,,,,,Often,,,,,,Rarely,,Rarely,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,Sometimes,Rarely,Sometimes,,,,"Data Visualization,Decision Trees,Lift Analysis,Neural Networks,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,Sometimes,,,,,Often,,,Sometimes,,,,,,Most of the time,,,,,60,5,5,20,10,0,Enough to run the code / standard library,"Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,Often,,Often,,,,Often,,100% of projects,More external than internal,Business Department,surveys,Dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,140000,NZD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Other,Other,Never,,Other,"Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,,,,,,Git,,,,,3,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,Adversarial Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Financial,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Image data,Text data",Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,,,,,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Text Analytics",Often,,,,,Most of the time,Most of the time,Often,Often,,,,,,Often,Most of the time,,,,,,,Often,,,Often,,,Most of the time,,,,,30,20,40,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT",Sometimes,Sometimes,,Often,Often,,,,,,,,,,Often,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,"50,50",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,50,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,YouTube Videos",Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Data Scientist,DBA/Database Engineer,Engineer,Researcher",Self-taught,60,20,15,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Other,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",,100GB,Neural Networks,"Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,Often,,,,,,,,,,,Most of the time,,,,Often,,,,,Most of the time,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,,,,Often,Often,Sometimes,Sometimes,,,,,Sometimes,,,,,,,Often,,,,Often,,,,,Often,,,,80,10,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,Sometimes,Often,Sometimes,,,,Sometimes,,,Often,,,Sometimes,Often,,76-99% of projects,Approximately half internal and half external,IT Department,,Time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Other,,"117,500",GBP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Greece,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Stan,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Personal Projects,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,,,,,,Somewhat useful,,,Very useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Miner,Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,10,15,0,75,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Telecommunications,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Sometimes,Most of the time,,,,Often,,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,Most of the time,,,Most of the time,Often,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,,,,,Often,,,,,,,Most of the time,,,Often,,,Often,,,,,30,25,15,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,Sometimes,,,,,,,,,,,,,,,Often,,,26-50% of projects,Entirely internal,Business Department,fraud database,different internal databases and systems producing transactional data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,Very useful,"Data Elixir Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A health science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,10,80,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Rarely,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",Sometimes,,,Often,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Most of the time,,Sometimes,,,,,,Often,,Often,,,,Often,Sometimes,,,,,,,Often,,,,,,30,30,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,Sometimes,,Sometimes,Often,,Often,,,,,,,,Most of the time,Sometimes,,,76-99% of projects,Entirely internal,Standalone Team,,Lack of domain expert when trying to clean the data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,50000,GBP,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Israel,79,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,I don't plan on learning a new tool/technology,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Machine Learning Engineer,Researcher",Other,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Neural Networks - CNNs",High school,Academic,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Other,Sometimes,1GB,Neural Networks,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,60,10,10,0,20,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,None,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Other",,Very useful,,,,,Very useful,,,,Somewhat useful,,,,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",25,30,5,0,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Manufacturing,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,TensorFlow",,Often,,,Rarely,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,,,"CNNs,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Most of the time,,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,Sometimes,Sometimes,,,Most of the time,Often,,Most of the time,,Most of the time,,,,,Often,,,,20,50,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,Most of the time,,10-25% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,45000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Poland,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Cloudera,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,Very useful,,,,,Somewhat useful,Not Useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Technology,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"CNNs,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,Sometimes,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Text Analytics",,,,Often,,Often,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Often,,,Sometimes,,,,,60,10,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others",,,,,Often,Sometimes,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts",,Very useful,,,,,,,Somewhat useful,,Very useful,Very useful,Very useful,,,,,,"Data Elixir Newsletter,KDnuggets Blog,Partially Derivative Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Management information systems,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +,,NA,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,15,10,20,50,5,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,10 to 19 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data",Most of the time,1TB,"CNNs,Ensemble Methods,Neural Networks,Other","Amazon Web services,C/C++,Google Cloud Compute,Java,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,Rarely,,Sometimes,,,,Rarely,,,,,,,Sometimes,,,,,,Often,,Sometimes,,,,,,,,Most of the time,,Rarely,,,,,,,,,Rarely,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",Most of the time,,,Often,,Often,Often,,,,Sometimes,,,Most of the time,,,,,,Often,Most of the time,,,,,Often,,,Often,Often,,,,60,20,2,10,8,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Unavailability of/difficult access to data",Often,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,TensorFlow,,R,Google Search,"College/University,Textbook,YouTube Videos",,,Not Useful,,,,,,,,,,,,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,Computer Vision,Neural Networks - CNNs,,Academic,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Rarely,,"CNNs,Neural Networks","C/C++,Java,Python,R",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,Segmentation",,,,Most of the time,,Most of the time,Often,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,30,50,10,10,NA,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,40000,PKR,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Iran,26,"Not employed, but looking for work",,,,,,,,NoSQL,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,College/University,Online courses,Textbook,YouTube Videos",Very useful,,Very useful,,,,,,,,Very useful,,,,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Republic of China,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,Very useful,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,50,20,20,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",I prefer not to answer,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Relational data",Always,10GB,"CNNs,Decision Trees,Gradient Boosted Machines","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Recommender Systems,Segmentation",,,,Rarely,,Often,Often,Most of the time,,,,Most of the time,,Sometimes,,,,,,Rarely,,,,Sometimes,,Often,,,,,,,,50,20,10,10,10,0,Enough to run the code / standard library,"Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,,300000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Egypt,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Data Scientist","Online courses (coursera, udemy, edx, etc.)",5,60,10,20,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Cloudera,Flume,Hadoop/Hive/Pig,IBM Cognos,Impala,Java,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,Rarely,Sometimes,,Sometimes,,Sometimes,Rarely,,,,Sometimes,Sometimes,,Often,,Sometimes,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,Sometimes,,Rarely,Rarely,Sometimes,,Sometimes,Most of the time,,,Most of the time,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Often,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,,,,Often,,Most of the time,,Sometimes,Often,Rarely,Rarely,,Often,Most of the time,,Often,Rarely,Sometimes,Often,Often,,,,30,10,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,Sometimes,,,,,,,,,,,Sometimes,,,51-75% of projects,Entirely internal,IT Department,None,"Making sense of data, data quality, not finding the needed data for analysis",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Git",Sometimes,8500,EGP,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Social Network Analysis,Python,GitHub,"Blogs,Conferences,Kaggle,Podcasts",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Data Scientist,Self-taught,100,0,0,0,0,0,"Recommendation Engines,Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,NoSQL,Python,R",Rarely,,,,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,,,,,30,25,20,15,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues",,,,,Often,,,,Often,,Often,,Often,,,,Often,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Other,Never,120000,MAD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,5-10 years,Nice to have,Nice to have,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,,,,,,,,,,,,,,,, +Female,Taiwan,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",60,20,0,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,500 to 999 employees,Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Google Cloud Compute,Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SQL,Tableau",,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,Often,,Most of the time,,Often,,,,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,SVMs",Sometimes,,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,,,Sometimes,,Often,,,,,Often,Sometimes,,,,Sometimes,,,,,,40,40,0,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,30000,TWD,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Non-Kaggle online communities,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",Self-taught,50,0,30,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",High school,Insurance,"10,000 or more employees",Stayed the same,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,Often,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,30,10,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,Often,Often,,,,,Often,,,,,Often,Often,Often,,76-99% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,90000,SGD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Official documentation,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,Very useful,,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Partially Derivative Podcast",1-2 years,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Nigeria,41,Employed full-time,,,Yes,,Researcher,,Employed by government,,,,,"Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",30,40,25,5,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,HMMs,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,HMMs,Naive Bayes,Natural Language Processing,SVMs",,,Often,,,,,Most of the time,,,,,Often,,,,,Often,Often,,,,,,,,,Often,,,,,,25,30,10,5,25,5,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",Often,Most of the time,,,,,,Most of the time,Sometimes,,Often,,Often,,,,,,,,,,,,,,,,,,,Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,,Python,Support Vector Machines (SVM),Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,80,10,0,10,0,0,"Computer Vision,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,,,,,,,,Somewhat important,,,,,,Very Important +Male,United Kingdom,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Stan,Bayesian Methods,R,Government website,"Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,3 to 5 years,Other,Self-taught,85,10,NA,5,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,A tech-specific job board,Important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Rarely,100GB,Regression/Logistic Regression,"R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,5,0,10,10,35,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Sometimes,,Often,Often,,,Often,,,,,,Most of the time,,,,Often,Sometimes,Sometimes,,100% of projects,Approximately half internal and half external,Other,census; animal movements; national mapping agency; climate,volume,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Share Drive/SharePoint,Other",Open source platform,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,32000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed part-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,MATLAB/Octave,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",25,35,0,25,15,0,Time Series,Gradient Boosting,High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,Iran,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,IBM SPSS Statistics,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Kaggle,Textbook,Tutoring/mentoring",,,,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,,6 to 10 years,"Business Analyst,Data Miner,Data Scientist",Self-taught,60,10,10,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,QlikView,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,Rarely,Sometimes,,,,,,,,,Most of the time,,Rarely,Sometimes,,,,,,,Often,Rarely,Often,,Sometimes,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",,,,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,,Sometimes,Often,,Most of the time,Sometimes,Sometimes,Often,,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools",,Often,,,Often,,,,Most of the time,,,,Sometimes,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,,,80000000,IRR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Researcher,Other",University courses,10,40,30,10,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Manufacturing,"1,000 to 4,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Important,Other,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,Rarely,,Most of the time,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,SVMs",Rarely,Often,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,Rarely,Most of the time,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,16,8,30,8,8,30,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Sometimes,Sometimes,Sometimes,Sometimes,Often,,,,Sometimes,,Often,,,,,,,,,Sometimes,,Often,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,90000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,R,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,Somewhat useful,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,Partially Derivative Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",Work,8,30,10,50,2,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Female,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Official documentation,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,Very useful,Somewhat useful,,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,Researcher,Work,0,30,70,0,0,0,,Logistic Regression,A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Collaborative Filtering,Logistic Regression,Simulation",,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,10,65,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Privacy issues,,,,,,,,,,,,,,,,,Rarely,,,,,,100% of projects,Entirely internal,Standalone Team,ONS,,,Company Developed Platform,,Bitbucket,,47500,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Switzerland,30,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Other",Very useful,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,KDnuggets Blog,5-10 years,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,PhD,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Work,50,40,10,0,0,0,"Computer Vision,Recommendation Engines","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Male,People 's Republic of China,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Text Mining,Python,GitHub,"Blogs,Online courses",,Very useful,,,,,,,,,Very useful,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Talking Machines Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Coursera,"Basic laptop (Macbook),Other",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,32,"Not employed, but looking for work",,,,,,,,Amazon Web services,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,,,,,Very useful,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,,Nice to have,Unnecessary,,,,,,,0 - 1 hour,,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Programmer,Researcher",University courses,10,20,0,50,20,0,"Recommendation Engines,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,35,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,,,Very useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Other,2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,30,0,15,5,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Germany,32,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,Not Useful,Very useful,,,Very useful,,Somewhat useful,,"KDnuggets Blog,Partially Derivative Podcast,Talking Machines Podcast",5-10 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Necessary,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation,Other",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,40,10,35,5,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Japan,48,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,R,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,5,90,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Male,,NA,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Bayesian Methods,R,Google Search,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,10,30,0,0,Natural Language Processing,Decision Trees - Random Forests,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",,,Regression/Logistic Regression,"Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,R,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,Sometimes,,Rarely,Rarely,,,,,,,Often,,,,,,,,,Most of the time,,Sometimes,Often,,,,,,,"A/B Testing,Decision Trees,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,Often,,,,,,,,,,,Often,,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,,1100000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Python,Regression,Python,,"Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Data Analyst,Data Scientist,Engineer",,20,20,20,0,20,20,,,,Non-profit,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,,Laptop or Workstation and private datacenters,Relational data,Rarely,1TB,Other,"Microsoft Excel Data Mining,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,,Most of the time,,,,Sometimes,,,,,,,,,,"Segmentation,Other",,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,40,20,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues",Most of the time,,,,Often,,,,Often,,,,,,,,Sometimes,,,,,,10-25% of projects,,IT Department,,,,"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,25000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Personal Projects,Podcasts,YouTube Videos",,,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,,Very useful,"DataTau News Aggregator,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",University courses,20,10,20,30,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,,Most of the time,,,Most of the time,Most of the time,Often,Often,,,Often,,,Often,Often,,Often,,,Sometimes,,Often,,,Often,Sometimes,,,Sometimes,,,,70,10,5,15,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,Sometimes,,Often,,10-25% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,"202,500",,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Belgium,49,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,,,,,,,,Very useful,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Researcher,Other",Self-taught,80,0,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,"10,000 or more employees",Increased slightly,3-5 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,Often,,,Often,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Sometimes,Often,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,Often,,Often,,,,,Often,,Often,,,,,,,Often,,,,40,15,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization",,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,65000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Belgium,36,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,,,,Somewhat useful,,,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,"Business Analyst,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Supervised Machine Learning (Tabular Data),,A master's degree,Retail,"5,000 to 9,999 employees",Stayed the same,,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1GB,Regression/Logistic Regression,"IBM Cognos,R",,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,10,10,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,,Often,Most of the time,,,Most of the time,,Often,,Often,,,,,Often,,Sometimes,Most of the time,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Never,30000,EUR,Has decreased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Republic of China,21,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Partially Derivative Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Bayesian Methods,Python,University/Non-profit research group websites,"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Computer Scientist,Engineer,Software Developer/Software Engineer",Kaggle competitions,50,30,0,5,15,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Female,United Kingdom,29,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Bayesian Methods,SQL,Google Search,"Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,Not Useful,Not Useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Not Useful,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"DataCamp,edX,Udacity,Other",Other,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Australia,31,Employed part-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,5,5,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Jupyter notebooks,Mathematica,Python,R,SQL,Stan,Unix shell / awk",Rarely,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,Sometimes,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Most of the time,,Often,,,Most of the time,Most of the time,,Sometimes,,,Sometimes,,,Often,Often,,,,,,,Sometimes,,,Sometimes,Most of the time,Sometimes,,Most of the time,,,,75,5,8,5,7,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,Often,,,,,Often,,,,,,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,Many small important data sets stored ad hoc in excel,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,125,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Support Vector Machines (SVM),Python,,"Arxiv,Company internal community,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,,,Very useful,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,NA,75,0,0,0,Time Series,,A master's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Rarely,,,"Jupyter notebooks,Microsoft Excel Data Mining,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,75,0,0,15,10,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,103000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Researcher,Work,50,25,10,5,0,10,Unsupervised Learning,Bayesian Techniques,High school,Mix of fields,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data",Most of the time,10MB,Neural Networks,"C/C++,DataRobot",,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Ensemble Methods",,,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,50,30,10,5,5,0,Enough to run the code / standard library,Data Science results not used by business decision makers,,Often,,,,,,,,,,,,,,,,,,,,,None,,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,"5,000",GBP,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Taiwan,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Kaggle",Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,University courses,0,0,20,60,20,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,,,Sometimes,Sometimes,Often,Sometimes,Sometimes,Often,,Sometimes,Often,,Sometimes,,Often,,Sometimes,,Sometimes,Sometimes,,Sometimes,Often,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,20,60,0,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Bitbucket,Rarely,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,Tableau,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Other",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Operations Research Practitioner,Predictive Modeler,Other",University courses,60,0,0,40,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Other,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Most of the time,10MB,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","C/C++,Microsoft Excel Data Mining,Python,R,SQL,Tableau,Other",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,Often,,,,,,Most of the time,"Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Most of the time,,,,Most of the time,,,,Often,,,,,,Most of the time,Sometimes,,,,,,Most of the time,,,Most of the time,,,,80,1,1,1,17,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,Often,,,,Most of the time,,,,,Often,Most of the time,Most of the time,,100% of projects,More internal than external,IT Department,"Weather services, open geospatial data","Getting easy access to data, and cleaning data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,Git,Sometimes,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Canada,53,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Udacity,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),,"Software Developer/Software Engineer,Other",Other,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,Other,38,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,Jack's Import AI Newsletter,KDnuggets Blog",5-10 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Data Analyst,Engineer,Software Developer/Software Engineer",Self-taught,90,1,5,4,0,0,,,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,Finland,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Online courses,YouTube Videos",,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,More than 10 years,"Data Scientist,Researcher",University courses,45,5,40,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100GB,"CNNs,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Java,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,Rarely,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,GANs,HMMs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Sometimes,,Often,Sometimes,,Often,Most of the time,,,,Often,,Most of the time,Most of the time,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,50,10,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,Often,Often,,,,,,,,,,,,,Often,Often,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Sometimes,40000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,24,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Researcher",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,Basic laptop (Macbook),Text data,,1GB,,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,Often,,,,,,Most of the time,,,Most of the time,,,,30,60,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Most of the time,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,Most of the time,,100% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,98000,AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,Other,University courses,5,10,5,80,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Not important,,,,,,,,,,Very Important,Very Important,Very Important,Very Important +Female,United Kingdom,22,Employed full-time,,,No,Yes,Researcher,Fine,"Employed by professional services/consulting firm,Employed by non-profit or NGO",Python,Factor Analysis,Haskell,GitHub,"College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Not Useful,Not Useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,Not Useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,"Data Elixir Newsletter,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,,Necessary,Unnecessary,Necessary,Necessary,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",11 - 39 hours,Kaggle Competitions,Yes,I did not complete any formal education past high school,,1 to 2 years,"Data Analyst,Researcher",Self-taught,30,40,10,0,20,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Neural Networks - CNNs,Neural Networks - RNNs",High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Romania,28,Employed full-time,,,No,Yes,Data Miner,Fine,Employed by non-profit or NGO,RapidMiner (free version),Deep learning,Python,Google Search,Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Miner,University courses,30,20,30,10,10,0,Computer Vision,Evolutionary Approaches,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,I don't plan on learning a new ML/DS method,Python,Google Search,"Blogs,Newsletters,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,30,0,10,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"Java,MATLAB/Octave,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,"Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,,,30,20,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Most of the time,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,Most of the time,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Israel,32,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Online courses,Textbook,Tutoring/mentoring",,,,,,,,,,,Very useful,,,,Somewhat useful,,Very useful,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Reinforcement learning,Bayesian Techniques,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important +Female,United States,27,Employed full-time,,,Yes,,Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Text Mining,Python,University/Non-profit research group websites,"Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,Very useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Data Analyst,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,55,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Tableau",,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Often,Often,,,,,Often,,Most of the time,,,,,Most of the time,,Often,,,,,,,,,,,60,20,2,8,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,Most of the time,Most of the time,,,Most of the time,,,,,Most of the time,Most of the time,Sometimes,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,Git,Sometimes,85000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Russia,23,"Not employed, but looking for work",,,,,,,,Mathematica,Bayesian Methods,R,Other,"Blogs,Online courses",,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Other,0 - 1 hour,,No,Some college/university study without earning a bachelor's degree,A health science,Less than a year,Researcher,Self-taught,50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Ukraine,34,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,Non-Kaggle online communities,Tutoring/mentoring",,,,,,,Very useful,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Financial,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Evolutionary Approaches,Logistic Regression,Segmentation,Simulation",Sometimes,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,Sometimes,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,Sometimes,,76-99% of projects,More external than internal,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,I don't typically share data",,Bitbucket,Sometimes,25000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,42,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Personal Projects,Textbook",Somewhat useful,,,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,"DataTau News Aggregator,FastML Blog,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,,,GPU accelerated Workstation,40+,Kaggle Competitions,No,Bachelor's degree,Physics,,"Business Analyst,Data Miner,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,,,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Other,Work,0,20,80,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Rarely,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,Sometimes,Rarely,,Often,,,,,,,Sometimes,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,,,,76-99% of projects,Entirely internal,Standalone Team,Clients Data,Incomplete Data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,120000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,Python,,"Stack Overflow Q&A,Tutoring/mentoring",,,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Software Developer/Software Engineer",Work,10,0,90,0,0,0,,,A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Never,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,DataRobot,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,,,Often,,Rarely,,,,,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,Often,Sometimes,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Often,,Often,,Often,,,Often,,,,,Often,Often,Most of the time,,,,80,15,0,5,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,Git,Rarely,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,DataRobot,Deep learning,Python,Google Search,"Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,,,,Very useful,Very useful,Somewhat useful,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Work,10,45,45,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,DataRobot,Jupyter notebooks,Python,SQL,TensorFlow",,Most of the time,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Often,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Sometimes,Often,Most of the time,,Often,Often,,Sometimes,,Most of the time,,,,,Most of the time,,,,,,0,30,0,0,0,70,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,diabetes;iris;boston_housing,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,Git,Rarely,48000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Other,0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,I haven't started working yet",Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Other,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,Very useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,100 to 499 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100MB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,RNNs","Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,Rarely,,,Often,,,,,,Often,,,,Sometimes,,,,Most of the time,,Most of the time,,Sometimes,,,,,,Sometimes,Often,,,Rarely,Rarely,,Often,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Time Series Analysis",Sometimes,,,Rarely,,Often,Most of the time,Most of the time,Most of the time,,,,,Sometimes,,Rarely,,,,Sometimes,Sometimes,,Sometimes,Rarely,Rarely,,,,,Most of the time,,,,50,5,1,29,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Often,Most of the time,Often,,,,,,Sometimes,Most of the time,Often,,,,Most of the time,Most of the time,Most of the time,,,100% of projects,More internal than external,Other,,processing the data without proper big data infrastructure,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ukraine,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects",,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity","GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,10,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Brazil,22,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,University courses,20,0,30,50,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Technology,20 to 99 employees,Increased significantly,Less than one year,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,Sometimes,10GB,"Decision Trees,Neural Networks","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Sometimes,,Often,,,,Sometimes,,Often,,,,,,Often,Most of the time,,,,,,,,,Most of the time,,,,30,30,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,Often,,,Sometimes,,,,,Most of the time,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,,12000,BRL,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Norway,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook",,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,DataCamp,Laptop or Workstation and local IT supported servers,11 - 39 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,,Programmer,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,Greece,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Data Analyst,Researcher",Self-taught,10,30,60,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,Rarely,,,,Often,,Often,,,,,,,,Often,Most of the time,,,Most of the time,,,Rarely,,,,"A/B Testing,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis,Other",Often,,,,,,Most of the time,Rarely,,Rarely,,Rarely,,,,Rarely,,,,,,Often,Sometimes,,,Often,Often,,,Sometimes,Most of the time,,,10,20,10,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Often,,,Sometimes,Most of the time,Most of the time,,Sometimes,,,,,,Most of the time,,,Sometimes,,,,Often,,76-99% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Other",Rarely,25000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Italy,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Other,Time Series Analysis,Python,Google Search,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Data Scientist,Self-taught,40,5,50,3,2,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,Sometimes,,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Text Analytics,Time Series Analysis",,,,Often,,Sometimes,,Sometimes,Sometimes,,,,,Often,,Sometimes,,,Sometimes,Often,Sometimes,,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,,,,30,30,5,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,,,,,,Often,,,,,,Often,,,76-99% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Other",USB,"Git,Other",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Russia,43,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,Very useful,Very useful,,,,,,Very useful,Very useful,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",5-10 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity,Other",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),,"Programmer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,Google Search,"College/University,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,,Other,40,0,10,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,Other,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,Random Forests,"Jupyter notebooks,Minitab,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,Most of the time,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,0,0,40,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,Often,,,,,,,,,,,,,,Sometimes,Sometimes,,51-75% of projects,Entirely internal,Other,,non-consistent format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,,Sometimes,85000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Norway,33,Employed part-time,,,Yes,,Data Scientist,Perfectly,"Employed by a company that performs advanced analytics,Employed by government",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos,Other",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,40,5,40,5,0,"Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,Other","C/C++,Jupyter notebooks,NoSQL,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,,,,Often,,,,,,,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,,Often,,,,,40,35,15,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,I prefer not to say,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,Often,Rarely,,Sometimes,,,Sometimes,,Sometimes,,,Rarely,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,"size, noise and privacy","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,480000,NOK,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Somewhat useful,,,Very useful,,Very useful,,Somewhat useful,,,Not Useful,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Tableau,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,,,,,Very useful,,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,"FastML Blog,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer,Other",University courses,15,30,10,40,5,0,,Logistic Regression,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Text Analytics",,,Rarely,,,,Sometimes,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,Sometimes,,,,,30,20,10,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Often,,,Sometimes,,,,Most of the time,,Most of the time,,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,120000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Monte Carlo Methods,C/C++/C#,I collect my own data (e.g. web-scraping),"Blogs,Newsletters,Non-Kaggle online communities,Online courses",,Very useful,,,,,,Very useful,Very useful,,Very useful,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Researcher",Other,0,0,20,0,0,80,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Often,,,,,Often,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,,Often,Often,,,,,,,,Often,,,,,,Often,Often,,,Often,,,Often,Often,,,,60,20,10,5,5,0,Enough to tune the parameters properly,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,Often,,,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Rarely,400000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +A different identity,United Kingdom,100,Employed full-time,,,Yes,,Statistician,Fine,"Employed by professional services/consulting firm,Employed by government",Stan,Deep learning,R,Other,"College/University,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,80,5,15,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - GANs",High school,Other,500 to 999 employees,Increased slightly,Don't know,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Relational data,Other",Never,<1MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","IBM SPSS Statistics,Python,R,TensorFlow,Other",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,Rarely,,,Often,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Other,Other",,,,,,,Often,,,,,,,,,Sometimes,,,,,Often,,,,,Sometimes,,,,,Most of the time,Sometimes,,25,20,0,20,15,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues",Most of the time,Often,,,Often,,,,Most of the time,,Sometimes,Rarely,,,Often,,Often,,,,,,10-25% of projects,More internal than external,Other,None,Missing or incomplete data and sample or source origin not known or understood..,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Other,Rarely,1.00E+11,ILS,Other,8,,,,,,,,,,,,,,,,,, +Male,Brazil,29,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Official documentation,Online courses,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,,,,,Very useful,Very useful,,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,20,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Female,India,25,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by company that makes advanced analytic software,Orange,Deep learning,Scala,GitHub,"Blogs,Kaggle,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,,,,,,,,,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",35,60,5,0,0,0,Time Series,Logistic Regression,A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,100MB,Decision Trees,"Microsoft SQL Server Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,60,20,5,10,5,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),"Company Developed Platform,I don't typically share data",,,,,,,6,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",R,Deep learning,R,Google Search,"Blogs,Company internal community,Personal Projects,YouTube Videos",,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,I haven't started working yet",Self-taught,60,30,10,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,Mix of fields,I prefer not to answer,Increased slightly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100TB,Other,"SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Other",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,10,20,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Sometimes,,,,Often,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Sometimes,1700000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,Researcher,Self-taught,20,0,80,0,0,0,,,High school,Academic,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Traditional Workstation",Image data,,,,"C/C++,Jupyter notebooks,Python",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,PCA and Dimensionality Reduction,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,None,Do not know,Other,,,Other,Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,Very useful,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Scientist,Researcher",University courses,60,0,10,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Relational data,Other",Sometimes,<1MB,"Bayesian Techniques,Decision Trees,HMMs,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Python,R",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,,,Often,Most of the time,,,,,,Sometimes,,,Sometimes,,,,,Often,,,,,Sometimes,,Often,,,,,,10,40,10,25,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,Often,,,,,,,,,,,Often,,,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,N/A,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Rarely,120000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Russia,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Conferences,Official documentation,Stack Overflow Q&A",Very useful,,,,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Work,20,0,60,20,0,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Always,10GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Sometimes,,Often,,,Often,,,Often,,Rarely,,Sometimes,,,Most of the time,Sometimes,Sometimes,,Often,,Often,,,Rarely,Most of the time,,,,,40,20,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Often,,,,,,,,Rarely,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Other",Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,57,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Julia,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Official documentation,Textbook,YouTube Videos",Very useful,,,,,,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Other",Other,Sometimes,1TB,"Gradient Boosted Machines,SVMs","C/C++,Jupyter notebooks,Python,R,Stan,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,Rarely,,,Sometimes,,Often,,,,"Bayesian Techniques,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs",,,Sometimes,,,,,,,,,Often,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,Often,Often,,,,,,45,20,15,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,Sometimes,,,,,,,,Often,,,,100% of projects,Approximately half internal and half external,Standalone Team,,"size, aceess",Other,Company Developed Platform,,Git,Sometimes,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Anomaly Detection,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Very useful,,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,Data Scientist,Kaggle competitions,50,20,20,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Military/Security,100 to 499 employees,Decreased slightly,1-2 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,Sometimes,,Often,Most of the time,Often,Often,,,Often,Often,Sometimes,,Most of the time,,Sometimes,Often,Rarely,Sometimes,,Sometimes,,,,,,Often,Most of the time,,,,40,5,10,15,20,10,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,Often,,Often,,Often,,,,Sometimes,,Most of the time,Most of the time,,,,Sometimes,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,Koodous;VirusTotal,Slow DB,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Bitbucket,Sometimes,60000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,51,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Researcher,Software Developer/Software Engineer,Other",Self-taught,70,0,0,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,Sometimes,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Recommender Systems",Sometimes,,,,Sometimes,Often,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,Often,,,,,,,,,,70,15,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Often,,,,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Newsletters,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,Very useful,Not Useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,60,30,0,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Female,Australia,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Decision Trees,Python,,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,"Data Scientist,Researcher",Other,0,0,0,0,0,100,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Other,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",,,,,,Rarely,Rarely,Rarely,,,,,,,,Rarely,,,,,Rarely,,,,,Most of the time,,,Sometimes,Most of the time,,,,0,0,20,10,30,40,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,Often,,Often,,Sometimes,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,Most of the time,,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Blogs,Company internal community,Friends network",Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Researcher,Other",Self-taught,70,10,20,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A bachelor's degree,Government,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Other,Workstation + Cloud service,,Never,,Bayesian Techniques,"Amazon Web services,Java,Python",,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,0,0,50,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,75000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Spain,57,"Not employed, but looking for work",,,,,,,,SQL,Anomaly Detection,Python,University/Non-profit research group websites,"Arxiv,Personal Projects,Textbook,Other",Very useful,,,,,,,,,,,Very useful,,,Somewhat useful,,,,KDnuggets Blog,15+ years,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Researcher",Self-taught,100,0,0,0,0,0,"Computer Vision,Survival Analysis,Time Series","Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Other,23,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Programmer",Self-taught,90,6,0,4,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Engineer,Statistician",Work,80,0,10,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Often,,Often,,Most of the time,,,,,Often,Most of the time,Most of the time,,,Often,Sometimes,,,Often,,,,35,20,5,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,insee; webscraping,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,45000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Conferences,Kaggle",,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Other,University courses,5,0,40,0,55,0,Natural Language Processing,Bayesian Techniques,,Academic,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Traditional Workstation,"Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Random Forests","C/C++,Java,Orange,Python,R,SQL",,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Natural Language Processing",,,Sometimes,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,20,50,10,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Never,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,South Africa,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Google Cloud Compute,Bayesian Methods,Python,Google Search,"Arxiv,Conferences,Friends network,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",Very useful,,,,Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Data Scientist,Researcher,Other",Work,45,0,50,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Academic,"10,000 or more employees",Increased significantly,More than 10 years,A tech-specific job board,Very important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Julia,Jupyter notebooks,Python,R,TensorFlow,Other",,Often,,Rarely,,,,Sometimes,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,Often,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Sometimes,Often,,Most of the time,Most of the time,,,,,,,,,Sometimes,,,,Often,Often,,Sometimes,,,,,Sometimes,,Sometimes,,,,10,20,10,20,40,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Rarely,Often,Often,,,Often,,,,Often,,Sometimes,,,,Sometimes,,,,100% of projects,More internal than external,Standalone Team,Publicly available biomedical datasets,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,S3,"Git,Other",Sometimes,108000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM Watson / Waton Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,Somewhat useful,Not Useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Physics,,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Male,Italy,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Kaggle,Newsletters,Official documentation,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler",University courses,20,0,45,30,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,QlikView,R,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,TIBCO Spotfire,Other",,Often,,,Most of the time,,Rarely,,Sometimes,,,,,Most of the time,,,Most of the time,,Rarely,,Rarely,Rarely,,Sometimes,,,Sometimes,,,,Most of the time,Sometimes,Most of the time,,,,,,,,Often,Most of the time,Rarely,,Rarely,Sometimes,Rarely,,Rarely,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Rarely,Sometimes,Sometimes,Rarely,Sometimes,Most of the time,Most of the time,Sometimes,Often,,,,,Often,Rarely,Often,,Rarely,Rarely,Rarely,Often,Often,Sometimes,Sometimes,,,,,Rarely,Most of the time,,,,10,25,5,10,50,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,Sometimes,Most of the time,Most of the time,,Sometimes,Often,,,,,,,Often,,,,Most of the time,Most of the time,,100% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,36000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,SQL,Social Network Analysis,SQL,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A health science,3 to 5 years,Researcher,Work,55,10,15,20,0,0,,Logistic Regression,A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service",Relational data,Don't know,,,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression",,,,,,Sometimes,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,65,20,10,5,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,,,,"Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,47500,USD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Julia,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Academic,I prefer not to answer,,,,Not very important,,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction",,,,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,Often,,Rarely,,,,,,,,,,,,,40,20,0,20,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,Do not know,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,DataRobot,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Personal Projects",Very useful,,,,,,Very useful,,,,,Very useful,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist",Self-taught,45,5,20,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Sometimes,1MB,"Gradient Boosted Machines,Random Forests,SVMs","Amazon Web services,C/C++,Cloudera,DataRobot,NoSQL,Python,Spark / MLlib,Unix shell / awk",,Often,,Rarely,Rarely,Often,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Sometimes,Sometimes,Often,,,Often,,,Sometimes,Often,,,,,Sometimes,,Often,,,,,Often,,Sometimes,,,,5,75,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,Git,Sometimes,135000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,52,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,DataTau News Aggregator,The Analytics Dispatch Newsletter",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,Mathematics or statistics,Less than a year,Other,University courses,25,60,5,0,0,10,Recommendation Engines,Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,24,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by a company that performs advanced analytics,Self-employed",Microsoft Azure Machine Learning,Time Series Analysis,R,Google Search,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,,University courses,70,0,0,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",High school,Insurance,"5,000 to 9,999 employees",Increased slightly,6-10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Never,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Orange,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,Sometimes,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction",Sometimes,,,,,Often,,,,,,Often,,Sometimes,,Often,,,,,Sometimes,,,,,,,,,,,,,25,15,0,30,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Rarely,Often,Often,,,Most of the time,,,,,Most of the time,Sometimes,,Often,,,Most of the time,Often,,51-75% of projects,More internal than external,Standalone Team,,Ease of access and denormalization of data from multiple platforms.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,130000,USD,Has decreased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,,Python,,"Kaggle,Online courses,Tutoring/mentoring",,,,,,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,"Laptop or Workstation and local IT supported servers,Traditional Workstation",0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),Less than a year,Other,Self-taught,50,50,0,0,0,0,,,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Argentina,44,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,Nice to have,,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,A social science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,50,20,10,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Sometimes,1GB,"Bayesian Techniques,SVMs","Hadoop/Hive/Pig,Java,Python,R,Spark / MLlib",,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,,,,,,,,,,,"Decision Trees,Naive Bayes,SVMs",,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,20,40,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,51-75% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,2000000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by college or university,Python,Decision Trees,SQL,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Textbook",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,Coursera,Traditional Workstation,0 - 1 hour,Master's degree,No,Master's degree,A social science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important +Male,United States,29,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,IBM Watson / Waton Analytics,Deep learning,R,Google Search,"Personal Projects,Trade book",,,,,,,,,,,,Very useful,,,,Very useful,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Miner,Data Scientist,Researcher",University courses,0,0,10,90,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Decreased significantly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,Markov Logic Networks,"Amazon Web services,C/C++,IBM SPSS Statistics,MATLAB/Octave,Python,R,SQL",,Sometimes,,Most of the time,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,Often,,,,Most of the time,Often,,,,,,,,Most of the time,,Most of the time,,,Often,,Often,,,,,,,,,,,45,20,15,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database",,,,,Sometimes,,,,,,Sometimes,,,,,,Rarely,Often,,,,,100% of projects,Do not know,Other,,,Key-value store (e.g. Redis/Riak),Email,,Subversion,,,,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,Very useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,,6 to 10 years,"Data Analyst,Other",Kaggle competitions,30,10,10,0,50,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SAS Base,SQL,Tableau,Other,Other",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Most of the time,,,,,Rarely,,,,Most of the time,,,Often,,,,Most of the time,,Most of the time,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Often,Most of the time,Often,Often,,,Often,,Sometimes,Most of the time,Most of the time,,,,,Often,,Sometimes,,,Sometimes,,,Sometimes,,,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,,,,,,,,,,,Most of the time,Sometimes,,,100% of projects,Entirely internal,Business Department,Cannot say,Converting and cleaning the data into formats needed for modeling. Consolidating data from multiple data sources.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Microsoft SQL Server,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"175,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,66,Retired,,,Yes,,Statistician,Fine,Employed by college or university,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Trade book",Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,,Very useful,,,"FlowingData Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Other",Self-taught,60,30,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,50,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,R,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,"Data Miner,DBA/Database Engineer,Engineer,Researcher",Self-taught,50,10,15,25,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests",High school,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,Some other way,Important,Other,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Never,10GB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Neural Networks,Random Forests,RNNs","Amazon Web services,C/C++,Hadoop/Hive/Pig,NoSQL,R,SAP BusinessObjects Predictive Analytics,SQL,Unix shell / awk",,Sometimes,,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Rarely,,,,,Most of the time,,,,,,Sometimes,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Simulation,Text Analytics",,,Often,Rarely,,,Most of the time,Often,,,Rarely,,,Often,,,,Most of the time,,Rarely,,,Often,Sometimes,,,Sometimes,,Most of the time,,,,,60,10,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools",Sometimes,Often,,,Most of the time,,,,,Often,,,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Other,Bristol city council open datasets; kaggle; uk government; aws open data sets,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Very useful,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Often,,Often,,,,Sometimes,,,,Often,,,,,,,Often,,,,,,Rarely,Often,,,,15,20,35,5,25,0,Enough to tune the parameters properly,"Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,Sometimes,,Rarely,,,,Often,,100% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,,100000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Finland,37,"Not employed, but looking for work",,,,,,,,KNIME (commercial version),Uplift Modeling,R,Google Search,"Blogs,College/University,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,Somewhat useful,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",University courses,10,10,0,80,0,0,Survival Analysis,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Researcher,Poorly,Employed by government,TensorFlow,Neural Nets,R,Government website,"Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,55,5,5,35,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,,100MB,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,70,0,10,0,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,60000,SGD,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Italy,32,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Tableau,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,Very useful,,Very useful,,,,Very useful,,,,,,,Very useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,,Nice to have,Nice to have,Nice to have,,,,"Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",60,20,0,0,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,Taiwan,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Neural Nets,Python,"GitHub,Google Search","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,Computer Vision,Neural Networks - CNNs,A master's degree,Manufacturing,"10,000 or more employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Image data,Video data",Rarely,10GB,"CNNs,Neural Networks","Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Logistic Regression,Neural Networks",,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,20,20,60,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",Most of the time,Often,,Often,,,,,,Often,,,,,,,,,,,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,LFW,Not enough labeled data,,Share Drive/SharePoint,,Git,Rarely,3000000,TWD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Norway,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,,,Very useful,Somewhat useful,,,,"Data Elixir Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,Necessary,,,,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Other,R,,"Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Statistician",Work,25,10,20,40,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,"5,000 to 9,999 employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,Often,Often,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics",,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,Most of the time,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,51-75% of projects,More internal than external,Central Insights Team,[confidential..],Triangulation and mismatches,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Rarely,"1,800,000",INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Bayesian Methods,R,,"Arxiv,Blogs,Company internal community,Conferences,Online courses,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Somewhat useful,Not Useful,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,Data Analyst,University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",I don't know/not sure,Internet-based,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,R,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,"A/B Testing,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,Most of the time,,,,,,,,Rarely,Often,,,Sometimes,,Rarely,,Often,,,,Often,Sometimes,Sometimes,Rarely,,,,30,20,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Rarely,Often,,,Most of the time,Often,,Sometimes,,,,,,Sometimes,,,,,Often,,Most of the time,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,S3,Bitbucket,Always,180000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +,,NA,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,57,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,95,1,2,2,0,0,Natural Language Processing,Logistic Regression,A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,10MB,Regression/Logistic Regression,"Jupyter notebooks,KNIME (free version),Perl,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,Often,Often,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,,,,,Often,,,,,,,Sometimes,,Often,,,Often,,Sometimes,,,,,,,,Sometimes,,,,,30,10,3,10,47,0,Enough to run the code / standard library,"Dirty data,Limitations of tools",,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Other,"text corpora, data from the national statistics office, twitter, facebook, web data, wikipedia",cleaning and understanding the semantics,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,100000,CHF,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Google Cloud Compute,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Engineer,Software Developer/Software Engineer",Work,40,30,20,5,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,TensorFlow,Other",,,,Most of the time,,,,,,,,,,,,Rarely,Sometimes,,,,Sometimes,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,Often,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation",,,,Sometimes,,Often,Often,Sometimes,,,,,,Sometimes,,Rarely,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,15,45,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data,Other",,,Sometimes,Sometimes,,,,,,,,Sometimes,,Often,,,,,Most of the time,,Most of the time,Often,10-25% of projects,Entirely internal,Standalone Team,MSTAR; ImageNet,Lack of relation between data available for modelling/prototype and data in production,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,1700000,RUB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,,,,,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Business Analyst,Programmer,Other",University courses,25,50,15,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A professional degree,Internet-based,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,,,Often,,,,,,Sometimes,,,Sometimes,,,Often,Sometimes,,,Most of the time,,,,50,10,30,10,0,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,75000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Very useful,,,,Very useful,,Very useful,,Somewhat useful,,,Very useful,"FlowingData Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,20,50,0,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Retail,500 to 999 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","DataRobot,Impala,Jupyter notebooks,Python,SAS Base,SQL,TensorFlow",,,,,,Rarely,,,,,,,,Rarely,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,,Rarely,,,,,,"Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Text Analytics",,,,,,,,Often,,,,Often,,,,Most of the time,,,Often,Sometimes,,,,Most of the time,Sometimes,,,,Often,,,,,30,10,40,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,,,,,,Sometimes,,,Sometimes,,,,Most of the time,Often,,,,10-25% of projects,Entirely internal,Standalone Team,,Applying models at scale. Moving data to the models.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,"92,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Kenya,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Google Search,University/Non-profit research group websites","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Programmer",Self-taught,40,10,40,10,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,Java,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,TIBCO Spotfire",,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,,,,,,,Often,,,Most of the time,,,Often,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Most of the time,Sometimes,Most of the time,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,,,,,Often,Often,,,,,,Sometimes,,Often,,,,,,,Often,Sometimes,,Often,,,Often,,,,,60,15,5,10,10,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,CDRs,Cleaning ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,110000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,IBM Watson / Waton Analytics,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Miner,Data Scientist,Engineer,Other",Self-taught,25,25,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,Tableau",,,,,Often,,,,Often,,,,Sometimes,,,,Often,,,,,,,,,,,Often,,,Often,,Often,,,,,,,,Sometimes,,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Naive Bayes,Natural Language Processing,SVMs,Time Series Analysis",,Sometimes,Sometimes,Sometimes,,Often,,Sometimes,,,,,,,,,,Often,Often,,,,,,,,,Sometimes,,Most of the time,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues",Most of the time,Often,,,Most of the time,Often,,,Often,,Often,,Sometimes,,,,Most of the time,,,,,,10-25% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,,130000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Ukraine,21,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Tableau,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,10,5,15,0,"Computer Vision,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Telecommunications,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,Often,,,,,,,Often,,Often,,,,,,,,,Often,,,,Rarely,,,,,,"Association Rules,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,SVMs,Time Series Analysis",,Most of the time,,,,,,Often,,,,Sometimes,,Sometimes,,Often,,,,Sometimes,,,,,,,,Often,,Most of the time,,,,30,7,3,30,30,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,,,Sometimes,,,Often,,,,Most of the time,,,Often,,Most of the time,,Most of the time,Rarely,,100% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,360000,UAH,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,"Data Elixir Newsletter,DataTau News Aggregator,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,Researcher,Self-taught,50,15,35,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A master's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,Regression/Logistic Regression,"Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Most of the time,,,,Often,,,,,Often,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,Most of the time,Often,,,Sometimes,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",Sometimes,,,,,Often,Most of the time,,,,,,,Often,,Often,,,,,Most of the time,,,,,Often,,,,Often,,,,70,10,15,5,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Most of the time,Often,Most of the time,,,Often,Sometimes,,Often,,Sometimes,Most of the time,Often,,Most of the time,Sometimes,,Most of the time,Most of the time,,100% of projects,More internal than external,Business Department,Geographical data,Security and privacy issues ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,70000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Researcher",University courses,10,10,20,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Image data,Rarely,100GB,"CNNs,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,,,,,Rarely,Rarely,,,,,,Sometimes,Often,,Often,,,Often,,Rarely,,Sometimes,,,,40,35,0,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Often,,100% of projects,More external than internal,Other,USGS; academic datasets; figshare,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,85000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,"Friends network,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,A social science,3 to 5 years,"Machine Learning Engineer,Programmer",Self-taught,30,30,40,0,0,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United Kingdom,41,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,R,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Podcasts",,,,,,,Very useful,,,,Very useful,,Somewhat useful,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",33,66,0,0,1,0,,,A master's degree,Government,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,,1GB,,"Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation,Simulation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,50,0,0,20,30,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,100% of projects,Entirely internal,IT Department,,The quality of the data. It is frequently out of date or inconsistent.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,55000,GBP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +A different identity,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Data Stories Podcast,No Free Hunch Blog,Talking Machines Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,Hidden Markov Models HMMs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Monte Carlo Methods,R,"Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer,Other",University courses,40,0,10,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Rarely,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Often,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Sometimes,,Often,Often,Often,,,Often,Often,Often,Often,Most of the time,Often,,Most of the time,Rarely,Sometimes,Often,Often,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Sometimes,,Sometimes,,,,Most of the time,Often,Often,,Sometimes,Often,,Often,,,,Sometimes,Sometimes,,51-75% of projects,More internal than external,IT Department,Esri; web scrape ,Cleaning and merging,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,"150,000",,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Chile,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Deep learning,Python,University/Non-profit research group websites,"Blogs,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Scientist,University courses,20,40,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A professional degree,Military/Security,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,Often,,,Sometimes,,,,,,,,,,,25,25,0,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,Rarely,,Often,Often,,Often,,,,,Sometimes,,,,,,,76-99% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Rarely,6000000,CLP,Other,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,0,10,10,15,15,Natural Language Processing,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Nigeria,27,"Not employed, but looking for work",,,,,,,,Java,Social Network Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Data Analyst,Other",Self-taught,50,20,30,NA,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,36,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Python,Bayesian Methods,Python,Google Search,"Blogs,Conferences,Kaggle",,Somewhat useful,,,Somewhat useful,,Very useful,,,,,,,,,,,,"FastML Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",50,10,30,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Insurance,500 to 999 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,QlikView,SAS Base,SQL",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,Sometimes,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation",Rarely,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,Often,Sometimes,,,,Sometimes,Rarely,,,,,,Often,,,,,,,40,10,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Most of the time,Often,Often,,Often,Sometimes,Sometimes,Sometimes,,Sometimes,Rarely,Often,Sometimes,,,,Often,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,210000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Software Developer/Software Engineer,University courses,40,0,30,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,SVMs,"Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Collaborative Filtering,kNN and Other Clustering,Recommender Systems,SVMs",Often,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,20,50,15,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,Most of the time,,,,,Rarely,,,,,,,,,10-25% of projects,More internal than external,IT Department,product datasheets,dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,,EUR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Kaggle,Official documentation,Personal Projects,Textbook",,,,,,Very useful,Very useful,,,Very useful,,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,,Self-taught,50,0,30,5,15,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,1GB,"CNNs,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Rarely,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs",Rarely,,Rarely,,,Sometimes,Sometimes,,Often,,,,,Sometimes,,Sometimes,,,,,,,Often,,,,,Sometimes,,,,,,10,30,20,30,10,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,24000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Germany,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,Google Search,"Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,Computer Vision,"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",High school,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10GB,"CNNs,Neural Networks,RNNs","Hadoop/Hive/Pig,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,Sometimes,Often,,,Sometimes,Most of the time,,,,,,"CNNs,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,Most of the time,,,Often,,,,,,,,,,,,,Most of the time,Often,,,,Often,,,,,,,,,45,25,10,10,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,,,,,,,,,Most of the time,,Often,Often,,,26-50% of projects,More internal than external,Standalone Team,,Getting necessary computing power (GPU or Cloudservice-Costs),Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,Email",,"Git,Subversion",Sometimes,93000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Other",,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Analyst,Data Scientist",University courses,20,0,30,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Retail,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Never,,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,,,,,,,,,,,,,,,,,Often,,,Often,76-99% of projects,More internal than external,Other,Nielsen data,Overloaded DB servers,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,83200,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,43,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,,Nice to have,Necessary,Unnecessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Not important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important +Female,United States,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler",University courses,15,15,40,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,R,SQL,Stan,Tableau,Unix shell / awk,Other,Other",,,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,Sometimes,,Rarely,,,Sometimes,Most of the time,Often,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Often,,Most of the time,,,Most of the time,Most of the time,Rarely,Sometimes,,,Often,,,,Most of the time,,,Sometimes,Sometimes,Often,,Often,,,Often,Most of the time,Sometimes,,Most of the time,,,,35,20,5,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,100% of projects,More internal than external,IT Department,"Google Analytics, lever, LinkedIn, slack",API not well documented nor functioning,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,80000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,64,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Mexico,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Canada,43,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Work,35,35,30,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Always,100MB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",Sometimes,,Sometimes,Often,,Most of the time,Often,,,,,,,Sometimes,,Most of the time,,Sometimes,Often,Most of the time,Sometimes,,,Sometimes,,,,,Often,,,,,20,40,10,25,5,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,Often,,,,,Sometimes,,,,,,,,,76-99% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,code in git that pulls and recreates,Git,Always,140000,CAD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,NoSQL,Genetic & Evolutionary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,Textbook",,,Very useful,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,29,0,1,50,20,0,"Computer Vision,Machine Translation,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs",A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,<1MB,Decision Trees,"C/C++,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,Often,,,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Decision Trees,Evolutionary Approaches",,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,80,5,7,2,6,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,Often,,Most of the time,,Often,,,,,Often,,Sometimes,Most of the time,Most of the time,,,,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,,,10000,EUR,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +A different identity,Other,100,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Decision Trees,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,,Very useful,,Very useful,Very useful,Very useful,Very useful,,,Very useful,,,,Very useful,Very useful,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Other",Self-taught,30,20,30,20,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,,Other,"IBM SPSS Modeler,Java,Microsoft SQL Server Data Mining,Python,SQL,Tableau",,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Often,,,,,,Rarely,,,,,,,,,,,Often,,,Sometimes,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,35,10,10,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,Often,,,,,Often,Often,,51-75% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"67,500",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Spain,26,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,,"No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Regression/Logistic Regression,Other","C/C++,Mathematica,MATLAB/Octave,Perl,R,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,,Rarely,,,Most of the time,,,,,,,,,Rarely,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Simulation",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,30,20,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,Sometimes,,,,,Sometimes,,,,,,,Often,,Often,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,None;,No data culture that leads to no implication into producing raw data that contains the necessary variables for proper data analysis or sufficiently realiable data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",GitLab,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Never,27000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,R,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,,,Somewhat useful,,Very useful,,,,,,,Very useful,Linear Digressions Podcast,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Other,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Engineer,Researcher,Other",University courses,20,30,35,15,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","R,Tableau,Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,Often,Sometimes,Rarely,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression",,,,,,Sometimes,Most of the time,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,50,10,5,20,15,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Unavailability of/difficult access to data",,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Sometimes,95000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Other,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Kaggle,Personal Projects,Textbook",,,,,,,Very useful,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Business Analyst,Self-taught,80,5,5,5,5,0,"Natural Language Processing,Time Series","Evolutionary Approaches,Neural Networks - CNNs",A professional degree,Mix of fields,500 to 999 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Rarely,1GB,"Evolutionary Approaches,Neural Networks","Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,"A/B Testing,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,Text Analytics,Time Series Analysis",Rarely,,,,,,Most of the time,,,Sometimes,,,,Sometimes,,Rarely,,,Rarely,Sometimes,,,,Rarely,,,,,Sometimes,Often,,,,20,20,0,60,0,0,Enough to refine and innovate on the algorithm,Inability to integrate findings into organization's decision-making process,,,,,,,,Most of the time,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,,Never,600000,RUB,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Scientist,Engineer,Other",Self-taught,10,20,10,0,10,50,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Often,,,,Often,,,,Often,Sometimes,,Often,,,,,Often,Sometimes,,,,,60,20,0,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Sometimes,Most of the time,,Most of the time,Often,,Sometimes,,,Often,,,Most of the time,,,Sometimes,,Often,Most of the time,Sometimes,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,40000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Czech Republic,25,Employed part-time,,,Yes,,Data Miner,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Partially Derivative Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,1 to 2 years,"Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",25,65,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,Sometimes,,,,Rarely,,,,Often,,Rarely,,,,,,,,,Sometimes,,,,,,,Often,,,"Decision Trees,Logistic Regression,Random Forests",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,75,7,3,15,0,0,Enough to run the code / standard library,"Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,Often,,76-99% of projects,More internal than external,IT Department,scraped data,Keep calm,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,100000,CZK,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,Self-taught,30,30,10,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,Traditional Workstation,Other,Never,100MB,Gradient Boosted Machines,"IBM SPSS Modeler,IBM SPSS Statistics,Mathematica,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,Rarely,Rarely,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Often,,Often,,,,,,,,,Rarely,,,,,,,,,,"Data Visualization,Simulation",,,,,,,Often,,,,,,,,,,,,,,,,,,,,Often,,,,,,,0,0,0,10,0,90,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,,,,Often,Often,,26-50% of projects,Do not know,Other,,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Email,I don't typically share data",,Git,Sometimes,,BRL,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,I don't plan on learning a new tool/technology,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,I prefer not to answer,Mathematics or statistics,1 to 2 years,Programmer,Self-taught,90,1,0,0,9,0,"Reinforcement learning,Unsupervised Learning",Other (please specify; separate by semi-colon),A master's degree,Academic,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,10MB,"Neural Networks,RNNs","MATLAB/Octave,Other",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Evolutionary Approaches,Natural Language Processing,Neural Networks,RNNs,Other",,,,,,,,,,Often,,,,,,,,,Often,Most of the time,,,,,Sometimes,,,,,,Most of the time,,,5,75,0,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,,,,Sometimes,,,,,,,,,Often,,26-50% of projects,Entirely external,Standalone Team,Kaggle datasets; Reddit comments,None,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,100000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,"Employed by a company that performs advanced analytics,Self-employed",Amazon Web services,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,Very useful,Very useful,,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A health science,3 to 5 years,"Business Analyst,Data Analyst,Other",Self-taught,60,10,20,0,10,0,Time Series,Logistic Regression,A bachelor's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,,"Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,Tableau,Unix shell / awk",Rarely,Most of the time,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,Most of the time,,,,"A/B Testing,Association Rules,Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Often,,,,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,Often,,,Often,Often,,,,55,30,10,15,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,"Commercially available healthcare data, IMS, Symphony, etc",Timeliness of delivery,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,South Korea,59,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,,"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,KDnuggets Blog,3-5 years,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,,"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",I prefer not to answer,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,India,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Statistician",University courses,25,20,20,35,0,NA,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Google Cloud Compute,IBM Cognos,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Spark / MLlib,Stan,TensorFlow",Rarely,,,,,,,Sometimes,,Rarely,,,Often,,Often,,Rarely,,,,Often,Often,Most of the time,Rarely,Rarely,,Sometimes,,,,Often,,Often,,,Sometimes,,,,,Often,,Often,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Time Series Analysis",,Rarely,Most of the time,Most of the time,Sometimes,Often,Most of the time,Most of the time,Often,Often,Often,Often,Often,Often,,Often,Often,Sometimes,Often,Often,Often,Often,Sometimes,Sometimes,Often,Often,Often,,,Most of the time,,,,10,25,20,25,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,Sometimes,,,,Sometimes,Often,Sometimes,Often,Sometimes,,Often,,Often,,,Often,Often,Often,,76-99% of projects,Approximately half internal and half external,Central Insights Team,"Bloomberg data, census data, campaign data, data collected at clients place, IoT",Unavailability of data due to various reasons,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,4500000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,R,Factor Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Newsletters,Online courses,Textbook",,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,PhD,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,30,5,0,60,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important +Male,United States,31,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,Other (Separate different answers with semicolon),5-10 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,13,7,0,0,NA,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,Mexico,41,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),Other,,,,,,,,,,,,,,,,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,Survival Analysis,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Technology,100 to 499 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Perl,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Random Forests,Text Analytics",,,,,,,Most of the time,Often,Sometimes,,,,,Often,,,,,,,,,Often,,,,,,Most of the time,,,,,40,10,0,20,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Git",Rarely,1300000,MXN,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Sweden,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Blogs,College/University,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Very useful,,,,,,,Very useful,,Very useful,,Not Useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),40+,Master's degree,Yes,Bachelor's degree,Computer Science,,Programmer,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Not important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Software Developer/Software Engineer,Statistician",Self-taught,50,10,0,0,20,20,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL",,,,,,,,Sometimes,,,,,,,Often,,Often,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests",,,Sometimes,Sometimes,,Most of the time,,Often,Often,,,,,,,Often,,Often,Sometimes,Often,,,Often,,,,,,,,,,,20,30,30,10,10,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,85000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Other,Fine,Self-employed,Other,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,,,Very useful,Very useful,,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Pharmaceutical,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Image data,Rarely,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",Sometimes,Often,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,Rarely,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Often,Often,,,Often,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",Often,,Often,,Often,Often,Often,Often,Most of the time,,,Often,,Often,,Most of the time,Most of the time,Most of the time,Often,Most of the time,Most of the time,,Most of the time,,Most of the time,Often,,Most of the time,,,,,,80,5,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,240000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Brazil,22,Employed part-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,50,5,5,40,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",,,,"Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Other",,Sometimes,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,90,0,0,10,0,0,Enough to tune the parameters properly,"Dirty data,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,26-50% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,60000,BRL,,9,,,,,,,,,,,,,,,,,, +Male,Taiwan,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,,,,,Very useful,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Physics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Turkey,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Newsletters,Personal Projects,Textbook",Very useful,,,,,,Very useful,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,"Data Machina Newsletter,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",65,10,0,0,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Increased significantly,Less than one year,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Always,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Impala,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Often,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Often,Often,Most of the time,Most of the time,,,Most of the time,,Rarely,Rarely,Rarely,,,,Sometimes,Sometimes,Sometimes,Most of the time,,,Often,,Rarely,,Sometimes,,,,80,5,0,5,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Nigeria,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Bayesian Methods,R,Google Search,"Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Researcher,Statistician,Other",University courses,20,20,20,30,10,0,,Logistic Regression,A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,,Regression/Logistic Regression,"Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Prescriptive Modeling",,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,10,20,10,30,30,0,Enough to refine and innovate on the algorithm,Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,100% of projects,Do not know,Central Insights Team,Third party,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,135000,NGN,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by college or university,Employed by government",TensorFlow,Genetic & Evolutionary Algorithms,Python,"Government website,University/Non-profit research group websites","Arxiv,Company internal community,Conferences,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A",Very useful,,,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Analyst,Other",University courses,0,5,40,45,0,10,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Government,500 to 999 employees,Stayed the same,More than 10 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Sometimes,100GB,"Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Often,,,,,,Often,,,Often,Sometimes,Most of the time,,Often,,,,,,,Often,Most of the time,Rarely,,,,25,25,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,Often,,Sometimes,Sometimes,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,format inconsistencies ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,"Bitbucket,Git,Mercurial,Subversion",Rarely,110000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,49,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Other,Monte Carlo Methods,SQL,,"Company internal community,Personal Projects",,,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst",Work,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,500 to 999 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Always,10MB,"Decision Trees,Regression/Logistic Regression","R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Often,,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",,,,,,Often,Often,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,,,15,15,10,20,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Privacy issues,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,,,,Sometimes,,,,Sometimes,,76-99% of projects,Entirely internal,Business Department,Credit bureaux data; microgeographic data,data inconsistencies,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,100000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Other,Neural Nets,SQL,Google Search,"Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,23,23,23,23,8,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Never,1TB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Other,Other",,Sometimes,,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Most of the time,Sometimes,,,Sometimes,Sometimes,,"Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,55,5,0,40,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,Most of the time,,,,,Most of the time,,,Often,,,,,,,Most of the time,,,100% of projects,Entirely external,Other,Prefer not to say,Data is dirty or not defined.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,0,USD,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,NoSQL,Text Mining,R,,"Blogs,College/University,Online courses",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,Operations Research Practitioner,University courses,0,0,100,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,Other,"Jupyter notebooks,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,SAS Enterprise Miner,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Ensemble Methods,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,,,,Rarely,,,,,,,Sometimes,,,Sometimes,,,Most of the time,,,,Rarely,,,,Most of the time,,,,10,70,0,0,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Other,,data quality,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Bitbucket,Sometimes,175000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Mexico,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Management information systems,,Business Analyst,University courses,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Russia,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Online courses",Very useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Female,United States,41,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by non-profit or NGO,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,A health science,More than 10 years,"Business Analyst,Data Analyst",Other,30,30,0,0,0,40,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Non-profit,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,DataRobot,Python,R,SQL,Tableau",,Rarely,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,Sometimes,,,Often,Often,,Sometimes,Often,,,,Sometimes,Sometimes,Often,Often,,,,,Sometimes,,76-99% of projects,More internal than external,Business Department,American Hospital Association; Centers For Medicare and Medicaid; United States Census,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,120000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,"Udacity,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,65,0,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",University courses,35,50,0,15,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important +Male,France,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,I don't plan on learning a new ML/DS method,R,I collect my own data (e.g. web-scraping),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Time Series,Decision Trees - Random Forests,A doctoral degree,Other,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,"Decision Trees,Random Forests","Hadoop/Hive/Pig,Python,R,SQL,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Decision Trees,Random Forests,Segmentation",,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Scaling data science solution up to full database",,Often,,,Often,Sometimes,,Often,,,Most of the time,,,,,,,Sometimes,,,,,100% of projects,Entirely internal,IT Department,,Old data not good enough,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Never,50000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Other,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Data Miner,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Computer Vision,Reinforcement learning","Neural Networks - CNNs,Neural Networks - GANs",High school,Financial,20 to 99 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Don't know,1TB,CNNs,"Google Cloud Compute,Java,Jupyter notebooks,Python,TensorFlow",,,,,,,,Rarely,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,70,10,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,Often,,Sometimes,,,,,,,,,,,,,,Most of the time,,,Less than 10% of projects,Entirely external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Rarely,15000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed part-time,,,Yes,,Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by college or university",MATLAB/Octave,Time Series Analysis,SQL,"Google Search,I collect my own data (e.g. web-scraping)",Podcasts,,,,,,,,,,,,,Very useful,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,10,15,45,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Decreased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data",Most of the time,10GB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Machine Learning,C/C++,DataRobot,Java,Julia,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Orange,SAS JMP,SQL",Often,,,Most of the time,,Often,,,,,,,,,,Most of the time,Often,,Rarely,,Most of the time,Rarely,Often,,,,,,Often,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests",,,Sometimes,,,,,,,,,,,,,,,,Often,Sometimes,,Most of the time,Most of the time,,,,,,,,,,,45,15,5,14,21,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,Often,Often,Most of the time,Often,,,,,,Most of the time,Often,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Mercurial,Subversion",,100000,BIF,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,,Very useful,,Somewhat useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Operations Research Practitioner",University courses,30,15,25,25,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A master's degree,Military/Security,Fewer than 10 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1TB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,Tableau,Other,Other",,Rarely,,,Most of the time,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,Rarely,Often,,,,,,,,Most of the time,,,,Sometimes,,,,Often,Rarely,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Rarely,,,,,,Sometimes,,Sometimes,,,,,Often,,Rarely,,,,Sometimes,,Sometimes,Often,,,,40,10,10,10,10,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,Unavailability of/difficult access to data,Other",Often,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,Often,Often,76-99% of projects,More internal than external,IT Department,,lots of compliance and access barriers due to classified data ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Git,Never,180000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,20,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,Somewhat useful,,,,Very useful,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Czech Republic,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,48,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Other,Non-Kaggle online communities,,,,,,,,,Somewhat useful,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer",University courses,20,10,40,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,10 to 19 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,SVMs",Spark / MLlib,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Segmentation,Time Series Analysis",Sometimes,Sometimes,Sometimes,,Sometimes,Often,,Often,Often,,,,,Often,,,,,,,,,,,,Often,,,,Sometimes,,,,10,55,20,5,10,0,Enough to refine and innovate on the algorithm,"Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,,,Often,Often,,,,Sometimes,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Bitbucket,Sometimes,150000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,49,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Government website,"Company internal community,Friends network,Online courses,Personal Projects,Tutoring/mentoring",,,,Very useful,,Very useful,,,,,Very useful,Very useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (commercial version),Python,SAP BusinessObjects Predictive Analytics,Spark / MLlib",,,,,,,,,Often,,,,,,,,Often,Sometimes,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs",,,,,,Often,Often,Often,,,,Sometimes,,,,Sometimes,,,,,,,Often,,,,,Most of the time,,,,,,65,5,10,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Sometimes,Often,,,,,,,Often,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,The size of some data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",University courses,30,10,20,30,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Other,Fewer than 10 employees,Stayed the same,,Some other way,Important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Random Forests,RNNs","Java,Jupyter notebooks,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,RNNs,Time Series Analysis",,,,,,,,Often,,,,,,,,Often,,Sometimes,Most of the time,,,,Often,,Often,,,,,Often,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,26-50% of projects,Entirely external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Impala,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,,,Very useful,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Researcher",University courses,20,0,25,55,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Technology,Fewer than 10 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,10TB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,Impala,Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Enterprise Miner,Spark / MLlib,SQL",,Sometimes,,,Sometimes,,,,Sometimes,,Rarely,,,Sometimes,Often,,Often,,,,,,,Sometimes,,,,,,,Often,,Sometimes,,,,,,Sometimes,,Often,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,Sometimes,Sometimes,Often,Most of the time,,Most of the time,Sometimes,,Often,Often,Most of the time,,,Often,,Sometimes,Often,Sometimes,,,,40,10,30,10,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Most of the time,,Often,Sometimes,Often,,Sometimes,,,Often,,,Often,,Often,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,150000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +A different identity,Other,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,Very useful,,Very useful,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,"Data Stories Podcast,Linear Digressions Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Other,Basic laptop (Macbook),2 - 10 hours,PhD,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,NA,0,50,50,0,0,"Survival Analysis,Time Series",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,16-20,Very Important,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,Weka,"Ensemble Methods (e.g. boosting, bagging)",Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Other,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,20,20,10,40,10,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by non-profit or NGO,Other,,Python,I collect my own data (e.g. web-scraping),"Blogs,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",15,25,30,0,30,0,"Computer Vision,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,100 to 499 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Most of the time,<1MB,,"Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,Sometimes,,,,,,,"Association Rules,Data Visualization,Text Analytics,Time Series Analysis",,Often,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,30,25,15,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Rarely,,Often,,,,Often,,Sometimes,,,,,Often,,Often,,,Often,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,,Never,50000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Poland,56,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Engineer,Predictive Modeler,Programmer",Self-taught,50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Image data,Text data",Rarely,10MB,"Neural Networks,SVMs","C/C++,Hadoop/Hive/Pig,Java,Julia,Jupyter notebooks,NoSQL,Perl,Python,R,SQL,Unix shell / awk",,,,Often,,,,,Rarely,,,,,,Rarely,Sometimes,Sometimes,,,,,,,,,,Rarely,,,Sometimes,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"CNNs,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Simulation,SVMs,Time Series Analysis",,,,Sometimes,,,Often,,,,,,,Sometimes,,Sometimes,,,,Often,,,,,,,Often,Often,,Sometimes,,,,50,20,10,20,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,Git,Sometimes,60000,PLN,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,,Very useful,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Business Analyst,Data Scientist",Self-taught,50,5,10,10,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,Rarely,,,,,Most of the time,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Often,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Recommender Systems,RNNs,Time Series Analysis",,,,Sometimes,Often,Most of the time,,,,,,Often,,Sometimes,,Sometimes,,,,Often,,Sometimes,,Often,Often,,,,,Often,,,,40,30,0,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,,,,Often,,,,Sometimes,,,,,,Most of the time,,,,,,,,Less than 10% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Bitbucket,Sometimes,2000000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Israel,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Kaggle,Textbook",Very useful,Very useful,,,Somewhat useful,,Very useful,,,,,,,,Very useful,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher",Work,50,10,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Mix of fields,I prefer not to answer,Increased significantly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Impala,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,Tableau,TensorFlow",,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,Often,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,Time Series Analysis",Most of the time,,,Often,,Most of the time,,Most of the time,Most of the time,,,Most of the time,Sometimes,Most of the time,Sometimes,Often,,,,Most of the time,Often,,Often,,Often,Most of the time,Often,,,Often,,,,60,20,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Online courses,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Very useful,,Very useful,,Very useful,,Very useful,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Engineer,Researcher,Software Developer/Software Engineer",Self-taught,50,20,10,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","C/C++,Microsoft Azure Machine Learning,Python,R,TensorFlow,Unix shell / awk,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,Most of the time,Often,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs",Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,20,20,20,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of significant domain expert input,Privacy issues",,,,,Often,,,,,,Often,,,,,,Sometimes,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,cleaning the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,195000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,,,,"Arxiv,Blogs,College/University,Conferences,Kaggle,Podcasts,Textbook",Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,Very useful,,Very useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,24,20,50,5,0.1,0.9,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Video data,Text data",Most of the time,100GB,"CNNs,Gradient Boosted Machines,Regression/Logistic Regression,RNNs","Amazon Web services,Python,R,Spark / MLlib,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",Most of the time,,,Often,,Most of the time,,Often,,,,,,,,Often,,,,Often,Sometimes,,Sometimes,,Sometimes,,,,,,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Often,Often,Often,Sometimes,,,Sometimes,Often,,,,,,,,,,,,Sometimes,,26-50% of projects,Entirely internal,Other,uci datasets,clean and formatted data,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Most of the time,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,41,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,"FlowingData Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer,Other",Self-taught,33,33,0,0,0,34,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Other,"1,000 to 4,999 employees",Decreased significantly,More than 10 years,Some other way,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,Rarely,,,,,Most of the time,,,,,,Rarely,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,Most of the time,,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,Often,Often,,,,,,Often,Often,,,Often,,,Often,Most of the time,,,,40,15,15,10,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,100% of projects,Approximately half internal and half external,Other,top-level domain names are used by tldextract,schema changes,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,"140,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Company internal community,Conferences,Kaggle,Personal Projects",,Very useful,,Somewhat useful,Somewhat useful,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,Other,Work,30,0,60,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Insurance,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Rarely,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,Spark / MLlib",,,,,,,,,Often,,,,,,,,Often,,,,,Often,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,Sometimes,,,,Often,,,Often,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,Rarely,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,Sometimes,Sometimes,,,,51-75% of projects,More internal than external,Business Department,,Memory,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,160000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",5-10 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,A health science,,"Business Analyst,Data Analyst,Data Scientist,Researcher",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,44,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Cloudera,Deep learning,R,Government website,"Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Other,Sort of (Explain more),Master's degree,Computer Science,,Engineer,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Logistic Regression,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,29,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),Other","Arxiv,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,"Jack's Import AI Newsletter,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",5,80,5,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Other",,Very useful,,,,,Somewhat useful,,,,Very useful,,Very useful,Very useful,Somewhat useful,,,,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Insurance,100 to 499 employees,Increased significantly,Less than one year,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests","Jupyter notebooks,MATLAB/Octave,Minitab,Perl,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,Sometimes,,,,Rarely,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,Rarely,,,,"Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests",,,,,,,,Often,Often,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,30,25,10,25,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,Often,,,,,Often,,Most of the time,,,,Sometimes,,,26-50% of projects,More internal than external,Other,Non,Dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Bitbucket,Rarely,95000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Orange,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,edX,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Other,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,45,20,10,5,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Spain,47,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Genetic & Evolutionary Algorithms,R,GitHub,"Personal Projects,Textbook",,,,,,,,,,,,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Other,Self-taught,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,A tech-specific job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Always,10TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Often,Often,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"A/B Testing,CNNs,Decision Trees,Natural Language Processing,Neural Networks,Simulation,SVMs",Often,,,Sometimes,,,,Sometimes,,,,,,,,,,,Most of the time,Most of the time,,,,,,,Often,Often,,,,,,30,20,20,10,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,,Sometimes,,,,,Most of the time,,,,Sometimes,,,,Often,,,,Less than 10% of projects,Entirely internal,IT Department,We are the propietary,No challenge,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Sometimes,100000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"FlowingData Blog,Partially Derivative Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Work,30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,Spain,41,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Python,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University",Somewhat useful,,Very useful,,,,,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,University courses,10,0,40,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Academic,500 to 999 employees,Decreased significantly,Don't know,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data",Rarely,10MB,"CNNs,Decision Trees,Random Forests,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,Often,,Most of the time,Rarely,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,Often,,Rarely,,,,,,40,30,30,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,Sometimes,Most of the time,Most of the time,,Often,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Mercurial",,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,,Very useful,,,,Not Useful,"Data Elixir Newsletter,FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Predictive Modeler,Researcher,Statistician",University courses,20,0,20,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,Rarely,,,,Often,Most of the time,Sometimes,,,,Sometimes,,Sometimes,Often,Often,,,,Sometimes,Often,Often,Sometimes,Sometimes,,Sometimes,Sometimes,,Often,Often,,,,60,5,20,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,Often,Sometimes,Sometimes,,76-99% of projects,More internal than external,Business Department,BLS,Data Integrity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,70000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,52,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Anomaly Detection,R,GitHub,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,85,0,0,10,0,"Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,Manufacturing,500 to 999 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,<1MB,Regression/Logistic Regression,"IBM Cognos,KNIME (free version),Microsoft Excel Data Mining,R,SQL",,,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Logistic Regression,Time Series Analysis",,Rarely,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,50,40,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT",Often,Often,,,Often,Sometimes,,Sometimes,Often,,Often,,,,Often,,,,,,,,None,Entirely internal,Business Department,,The company hasn't invested in analytical tools,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,,Never,36000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",5,35,25,20,15,0,"Recommendation Engines,Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,500 to 999 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,,Rarely,,,Often,Often,Sometimes,Most of the time,,,Most of the time,,,,Often,,Rarely,,,Sometimes,,Often,Sometimes,,,,Rarely,Rarely,,,,,34,34,20,10,2,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,Often,Often,Most of the time,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Telegram,Other,Rarely,30000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Google Search,"Blogs,Online courses",,Very useful,,,,,,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,,6 to 10 years,"Business Analyst,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,NoSQL,Python,R,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Often,Rarely,,Rarely,Sometimes,,Sometimes,,,,"Association Rules,Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,Rarely,Rarely,,,,,Often,Sometimes,,,Often,,Sometimes,Rarely,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,Most of the time,,,,25,25,25,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Sometimes,,Sometimes,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,,,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",Most of the time,150000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Researcher",University courses,40,30,10,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Workstation + Cloud service","Image data,Other",Always,1GB,"Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,RNNs","Google Cloud Compute,Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Rarely,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",,,,Rarely,,Often,Most of the time,Sometimes,Most of the time,,,,Rarely,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,Most of the time,,Often,Often,Sometimes,Often,,Most of the time,,,,20,15,15,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,Often,,,,,,,,,,Sometimes,Often,,100% of projects,Entirely external,Other,Government data; satellite data,Funding for the research,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Git,Most of the time,1700,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Excel Data Mining,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,50,5,0,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Other,100 to 499 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Python,R,SAP BusinessObjects Predictive Analytics",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,Most of the time,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,50,0,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization",Often,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Mexico,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Other,Survival Analysis,,I collect my own data (e.g. web-scraping),Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,10,15,20,5,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,,,,,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Always,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,DataRobot,Jupyter notebooks,Mathematica,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",Often,Often,,,,Rarely,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,Sometimes,Often,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Most of the time,,,,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,20,35,20,15,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,Often,,,,,,,Often,,,26-50% of projects,Approximately half internal and half external,Standalone Team,"FAA, EDGAR, GOOGLE FINANCE, CRUNCHBASE, ETC",Understand automatically the data.,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,Bitbucket,Sometimes,360000,MXN,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,42,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,Udacity",Other,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,40,10,0,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Female,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,IBM Watson / Waton Analytics,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Company internal community,Online courses,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,40,10,40,0,0,Computer Vision,"Bayesian Techniques,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Image data,Sometimes,<1MB,"Markov Logic Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,Sometimes,,,,5,20,50,15,10,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,Sometimes,,Sometimes,,,,Rarely,Rarely,Rarely,,76-99% of projects,Do not know,Other,Human Connectome Project,unique data format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,55900,USD,Other,9,,,,,,,,,,,,,,,,,, +Female,Other,28,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Online courses,Personal Projects",Very useful,,,,,,,,,,Very useful,Very useful,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,India,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Data Analyst,Predictive Modeler",Work,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,,Often,Sometimes,Sometimes,,,Sometimes,,,Often,Often,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,35,25,0,25,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",Often,,,,Most of the time,,,,Often,,Most of the time,,,,,,,,,,,,76-99% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1300000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,33,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",1-2 years,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,50,0,0,50,0,0,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Markov Logic Networks,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Very Important +Male,United States,40,Employed full-time,,,Yes,,Engineer,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",R,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer",Work,15,15,45,25,0,0,Time Series,"Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Non-profit,"5,000 to 9,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,Rarely,,Rarely,,,,,,,Most of the time,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,Often,,Sometimes,,,,,,,,,Sometimes,,,,,60,5,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Rarely,Sometimes,,Most of the time,,,Rarely,Often,,,,Sometimes,,Most of the time,,,,,Sometimes,Most of the time,,76-99% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,103000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Data Miner,Predictive Modeler,Statistician",Self-taught,35,25,25,10,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Often,Often,,,Often,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,,Often,Sometimes,,,,,,,Most of the time,,Sometimes,,,Sometimes,,Often,,,,,,,Often,,,,45,35,10,3,7,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Often,,,Often,Often,,,Often,Sometimes,,,,,,,,Often,,,,Most of the time,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,110000,,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Web services,Deep learning,Python,,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Data Stories Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,30,10,0,40,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,Some other way,Important,Other,Traditional Workstation,Other,,1TB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Ensemble Methods,HMMs,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,,,,Sometimes,,,,Sometimes,,,Often,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,50,25,0,25,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Often,,,76-99% of projects,Do not know,Other,,Collecting it in the first place,,Other,,Git,Rarely,42000,GBP,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Spain,38,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,,,,,"GPU accelerated Workstation,Traditional Workstation",,Kaggle Competitions,Yes,Master's degree,Computer Science,,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Other,22,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Very useful,,,,Not Useful,,,,,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,,,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,Unsupervised Learning,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,Very Important,,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,Russia,21,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by company that makes advanced analytic software,C/C++,Cluster Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Company internal community,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,Very useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Physics,1 to 2 years,Predictive Modeler,Self-taught,20,5,60,10,5,0,Computer Vision,"Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Don't know,10GB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,"CNNs,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,Sometimes,,,,,,,,20,40,10,10,20,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,Sometimes,,,,,,,,Rarely,Most of the time,,,100% of projects,Entirely internal,Standalone Team,imagenet;cifar;pascal_voc;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,40000,RUB,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,43,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,Other,Python,Google Search,"Arxiv,Conferences,Kaggle,Podcasts,Stack Overflow Q&A",Very useful,,,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist",Work,30,20,20,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Python,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Often,,Sometimes,Sometimes,Often,,,,,Sometimes,Sometimes,Most of the time,,Sometimes,,,Often,Often,Often,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,,,,,,,,,Often,,,,,,,,,76-99% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,150000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,"Coursera,edX","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,15,10,0,55,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,38,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Official documentation,Personal Projects",,,,,,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,Other,Self-taught,95,0,0,0,5,0,Time Series,,A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,Don't know,Some other way,Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,10GB,,"Amazon Web services,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,30,10,30,30,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,,,,,Sometimes,,,Most of the time,,,,,,Most of the time,,,,,,,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Portugal,26,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,30,0,0,60,10,0,Outlier detection (e.g. Fraud detection),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,20,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Deep learning,Python,Google Search,"College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Very useful,Somewhat useful,Not Useful,,Very useful,,Somewhat useful,,Somewhat useful,,Not Useful,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,20,10,0,55,15,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,40,5,40,10,5,0,Computer Vision,Neural Networks - CNNs,A master's degree,Other,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Image data,Sometimes,10GB,Neural Networks,"C/C++,Google Cloud Compute,Python,TensorFlow",,,,Most of the time,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,0,0,85,15,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Mexico,30,Retired,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Proprietary Algorithms,Python,Other,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,15,15,10,40,0,Survival Analysis,Ensemble Methods,A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Fine,Self-employed,Amazon Machine Learning,Rule Induction,R,Google Search,"Friends network,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",,,,,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler",University courses,40,10,20,25,5,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,Decision Trees,"Google Cloud Compute,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SQL,Tableau,TIBCO Spotfire",,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,Often,,,Rarely,,Sometimes,,,,,Often,,,,Sometimes,,,Sometimes,,Often,,,,,"Data Visualization,Decision Trees,Logistic Regression,Simulation,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,,,Often,,,,,,,,,,,Sometimes,,,Often,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Often,Often,,,,Sometimes,,Often,,51-75% of projects,Entirely internal,Other,Many...,"Data lineage, repeat-ability of reported figures (GL, 10Ks, etc.)","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,150000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Data Analyst,Self-taught,70,10,10,10,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1TB,Decision Trees,"Amazon Web services,NoSQL,Python",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Prescriptive Modeling,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,30,50,0,5,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,Often,Sometimes,Often,,,Most of the time,Often,Often,Sometimes,Sometimes,,,Often,,,Most of the time,Most of the time,,,26-50% of projects,Entirely internal,IT Department,Location data ,No feedback look for training data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,,82500,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Conferences,Non-Kaggle online communities,Online courses,Textbook",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Other",Workstation + Cloud service,11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,28,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Matlab,Google Search,"Blogs,College/University,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,Somewhat useful,,,Very useful,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,"Data Stories Podcast,Partially Derivative Podcast,Talking Machines Podcast",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,35,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",Self-taught,70,10,10,10,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,Sometimes,,Sometimes,,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,,35,30,10,5,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,100000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,65,Retired,,,Yes,,Statistician,Fine,Employed by government,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Statistician,Other",Other,20,0,60,20,0,0,"Survival Analysis,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression",A doctoral degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,,,,Very useful,Jack's Import AI Newsletter,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,,,,Coursera,Traditional Workstation,11 - 39 hours,Master's degree,Yes,Professional degree,,,"Computer Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Other,34,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Uplift Modeling,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Statistician",University courses,30,0,50,10,10,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Always,1GB,Regression/Logistic Regression,"SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,Most of the time,,,,,,,,,,"kNN and Other Clustering,Lift Analysis,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,,,,,,,,Sometimes,Sometimes,Most of the time,,,,,,,,,,Often,,,,Rarely,,,,60,20,15,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues",Sometimes,,,Sometimes,Often,Sometimes,,,Most of the time,,,Most of the time,,,Most of the time,,Sometimes,,,,,,Less than 10% of projects,Do not know,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Never,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Other,22,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Decision Trees,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,,Not Useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,10,0,80,0,0,"Recommendation Engines,Reinforcement learning,Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,16-20,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,Python,Google Search,"College/University,Kaggle,Personal Projects",,,Very useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",University courses,30,0,0,70,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,NoSQL,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,Sometimes,,,,Often,Often,,,,,,Often,,,,,,,,,,,,Sometimes,,,Sometimes,Often,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Naive Bayes,Natural Language Processing,Text Analytics",,,Sometimes,Often,,Most of the time,Most of the time,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,Sometimes,,,,,,,,,Often,,,,26-50% of projects,Approximately half internal and half external,Central Insights Team,wordnet; CoNLL; website rss feeds,various formats,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Always,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Nigeria,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,Government website,"Blogs,Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,O'Reilly Data Newsletter,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,15,0,5,0,10,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"Conferences,Kaggle,Online courses,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Other",Self-taught,90,5,5,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,,Often,,Often,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis",Sometimes,,Sometimes,,Often,Most of the time,Most of the time,Sometimes,Often,,,Often,,Often,Sometimes,Often,,Sometimes,,,Often,,Often,Often,,Often,Sometimes,,,Often,,,,40,15,5,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,51-75% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,130000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Russia,60,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Non-Kaggle online communities,Personal Projects",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,60,0,20,0,20,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Telecommunications,100 to 499 employees,Decreased slightly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data,Text data,Relational data",Most of the time,1GB,"CNNs,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests","C/C++,Java,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs",,Sometimes,,Often,Often,Often,Often,Often,Often,Often,Sometimes,Often,,Often,,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,Often,Often,Sometimes,,,Sometimes,,,,,,30,40,20,10,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Limitations of tools",,Often,,,Often,,,,,,,,Often,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Sometimes,"200,000",RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,,University courses,0,0,0,90,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,,,"Neural Networks,Regression/Logistic Regression","C/C++,Flume,Python,TensorFlow",,,,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Neural Networks,Simulation",Sometimes,,,,,,Often,,,,,,,,,Often,,,,Often,,,,,,,Sometimes,,,,,,,30,30,20,0,20,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,140000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Necessary,,Nice to have,,Necessary,,Nice to have,,Necessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Switzerland,53,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Computer Scientist,Fine,Self-employed,Microsoft Azure Machine Learning,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,Somewhat useful,Not Useful,Somewhat useful,,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,,"Bayesian Techniques,Decision Trees,Neural Networks","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Time Series Analysis",,Sometimes,Rarely,,,,Most of the time,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,,,,,,,Most of the time,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,,,,Most of the time,,,,Sometimes,,,,,,Often,,,,,,,,76-99% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,150000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Podcasts",,,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Physics,,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important +Male,Mexico,36,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts",,Very useful,,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Researcher",Work,40,20,40,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,10MB,SVMs,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,SVMs,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,70,10,0,10,10,0,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,76-99% of projects,Entirely external,Standalone Team,data generated from the client,cleaning it,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,200000,MXN,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"GitHub,Google Search",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,Rarely,,,Rarely,,Sometimes,Rarely,,,,Rarely,,Rarely,,,,20,70,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Often,,,,,,,,Often,,,Most of the time,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,‰Û_Ì¢‰Û_Ì_,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,,,,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,Very useful,,,,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,Business Analyst,Other,0,10,10,0,0,80,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Rarely,,Often,Rarely,,Most of the time,Most of the time,Sometimes,Most of the time,,,Often,,Sometimes,,Often,,,,Rarely,Sometimes,,Often,,,Sometimes,,,,Sometimes,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,43000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,"Data Analyst,Researcher,Statistician",University courses,50,15,10,25,0,0,,,A doctoral degree,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1MB,Regression/Logistic Regression,"R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation",,,,,,Often,Often,,,,,,,,,Often,,,,,,Sometimes,,,,Often,,,,,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,Sometimes,,,Sometimes,,,,,,Often,Often,Often,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,25000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,Less than a year,,Self-taught,40,40,20,0,0,0,,"Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,Rarely,,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Segmentation,SVMs",,,Sometimes,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,60,10,0,15,15,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,,Sometimes,,Sometimes,,,100% of projects,More internal than external,Other,,"Data size, converting proprietary file formats to usable file formats. Irregularity between data sets. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,Internal server,Subversion,Sometimes,"20,000",USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Text Mining,R,University/Non-profit research group websites,"Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,Not Useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Scientist,Programmer,Researcher",University courses,40,0,10,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Financial,"1,000 to 4,999 employees",Stayed the same,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,Regression/Logistic Regression,"IBM SPSS Modeler,Python,R,SAS Base,SQL",,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Often,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,Sometimes,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,,,,10,10,0,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Often,,,,Sometimes,,Often,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Rarely,70000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Mexico,70,Retired,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,R,Other,"Arxiv,Blogs,Newsletters,Online courses,Personal Projects,Textbook",Somewhat useful,Very useful,,,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Self-taught,50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,C/C++,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",University courses,0,20,0,80,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Non-profit,"1,000 to 4,999 employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,Rarely,,,,Often,,,Most of the time,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Rarely,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Often,,,Often,,Often,Most of the time,Rarely,,,,,Rarely,Often,Sometimes,,,,70,5,0,20,5,0,Enough to run the code / standard library,Need to coordinate with IT,,,,,,,,,,,,,,,Most of the time,,,,,,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,"68,000",USD,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects",,Very useful,,,,,,,,,,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Data Scientist,Programmer",University courses,40,10,0,50,0,0,"Computer Vision,Machine Translation,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Financial,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression",,,,,,,Most of the time,Often,,,,Most of the time,,,,Often,,,,,,,,,,,,,,,,,,50,10,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,Department of Labor; Census,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,130000,USD,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Turkey,22,"Not employed, but looking for work",,,,,,,,Java,Social Network Analysis,Java,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,10,0,0,90,0,0,Computer Vision,Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Russia,23,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Scientist,Predictive Modeler",University courses,50,20,20,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Government,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,SVMs",Sometimes,,,,Sometimes,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Most of the time,,Most of the time,,,,,,,Most of the time,,,Sometimes,,Often,,,,,,55,10,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,Most of the time,Most of the time,Often,,Often,Most of the time,,,Sometimes,,,,,,,,Most of the time,Often,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,780000,RUB,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Partially Derivative Podcast",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,20,10,30,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Jupyter notebooks,Text Mining,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Sometimes,,,"Java,NoSQL,Python,R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,10,20,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Bitbucket,Sometimes,110000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Business Analyst,Data Analyst,Engineer",Self-taught,50,40,0,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Government,100 to 499 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data",Always,100MB,Decision Trees,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems",,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Often,,,Sometimes,,,,Rarely,Often,,,,,,,,,,70,10,15,5,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,100% of projects,More internal than external,IT Department,Kaggle ,Cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,,10000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,20,Employed part-time,,,No,Yes,Engineer,Fine,"Employed by professional services/consulting firm,Employed by college or university",Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,I prefer not to answer,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,70,5,0,0,25,0,Computer Vision,"Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,47,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,Very useful,,,Very useful,,Somewhat useful,,,,,"FlowingData Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Other,11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer,Other",Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important +Male,Brazil,48,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Company internal community,Kaggle,Online courses,Personal Projects",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Other,University courses,0,30,30,30,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib",,,,,,,,,Rarely,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Sometimes,Often,Often,Sometimes,Sometimes,,,Often,,,,Sometimes,,,Often,,Sometimes,,Sometimes,Sometimes,,,,,Often,Sometimes,,,,70,20,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Sometimes,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,R,Genetic & Evolutionary Algorithms,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Data Analyst,DBA/Database Engineer,Researcher",Self-taught,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Most of the time,100MB,"Decision Trees,Neural Networks,Random Forests","Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,Often,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,30,40,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Most of the time,Often,,,Often,,,,,,Most of the time,,Sometimes,,Often,,,,51-75% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Rarely,82000,CAD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Canada,63,"Not employed, but looking for work",,,,,,,,RapidMiner (free version),Deep learning,R,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Textbook",,Very useful,,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,Very useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",45,50,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Very Important,,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,United States,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,0,50,0,20,30,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,,100MB,Other,"C/C++,Jupyter notebooks,MATLAB/Octave,Python,R",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Simulation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,25,25,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Sometimes,54000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Random Forests,Java,University/Non-profit research group websites,Blogs,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,DBA/Database Engineer,University courses,70,0,0,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Other,,Text data,Never,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,Less than 10% of projects,More internal than external,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,College/University,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,10,0,25,65,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Video data,Sometimes,1TB,"Bayesian Techniques,CNNs,Neural Networks","C/C++,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Rarely,,,,"Bayesian Techniques,CNNs,Cross-Validation,Naive Bayes,Neural Networks,Time Series Analysis",,,Most of the time,Often,,Often,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Often,,,,40,50,0,5,5,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,"36,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,19,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,Somewhat useful,"Data Stories Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,60,20,0,20,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,52,"Not employed, but looking for work",,,,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,Very useful,,,,,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,,Necessary,Necessary,Necessary,Necessary,Necessary,,Necessary,,Necessary,Necessary,,,DataCamp,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Other",University courses,0,0,0,100,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Online courses,Textbook",,Somewhat useful,,,,,,,,,Very useful,,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Relational data",Rarely,100MB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,MATLAB/Octave,Spark / MLlib,TIBCO Spotfire",,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs",,,,Sometimes,,Often,Most of the time,Often,Often,,,,,Often,,Often,,Often,,Often,Often,Often,Often,,,,Often,Often,,,,,,30,10,5,5,50,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,Rarely,Sometimes,,Most of the time,Often,,Rarely,,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Other,Energy production; weather,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,106000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Poland,36,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Other,Self-taught,70,15,15,0,0,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests",A professional degree,Technology,"5,000 to 9,999 employees",Increased significantly,,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees","Microsoft Excel Data Mining,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,,,,"Naive Bayes,Random Forests,Text Analytics",,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,Sometimes,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",Sometimes,,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,39,"Not employed, but looking for work",,,,,,,,SAS Base,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,DataTau News Aggregator,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,,Experience from work in a company related to ML,No,Master's degree,A health science,I don't write code to analyze data,I haven't started working yet,University courses,20,0,0,80,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Gradient Boosted Machines,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,25,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",15,25,20,30,10,0,Speech Recognition,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,SVMs","C/C++,Microsoft Excel Data Mining,Python,SQL,Tableau",,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Segmentation,Text Analytics,Time Series Analysis",,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,,,50,20,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Most of the time,Often,,,,,,,Most of the time,Often,,,,,,Most of the time,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,none,not organised well,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,,90000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,Other,,Employed by professional services/consulting firm,Python,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,1-2 years,,,,,Necessary,Necessary,Necessary,,,Necessary,,,,,,0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book",,,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Hidden Markov Models HMMs",A master's degree,Government,I don't know,Increased slightly,More than 10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Random Forests,Other","Jupyter notebooks,Perl,Python,R,Spark / MLlib,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Rarely,Most of the time,,Rarely,,,,,,,,Often,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Random Forests,Time Series Analysis",,,Often,,,Often,Most of the time,Often,,,,,,,,,,Often,,,,,Often,,,,,,,Most of the time,,,,40,5,10,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Often,Often,,,,Often,,,,,Often,,,Often,,,,,,,Sometimes,,76-99% of projects,More internal than external,Other,,domain expertise,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,160000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,67,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,College/University,Personal Projects,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,More than 10 years,"Computer Scientist,Data Scientist,Predictive Modeler,Researcher",Self-taught,80,0,10,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,500 to 999 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Text data,Sometimes,1GB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,RNNs,SVMs","C/C++,Java,MATLAB/Octave,Python,R,Unix shell / awk",,,,Often,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,Often,,,,"Evolutionary Approaches,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,Sometimes,,,,Often,,,,,Often,Sometimes,Often,,,,,,Often,,Often,Often,,,,10,50,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,Often,,Often,Sometimes,,,,,,Sometimes,,Sometimes,Sometimes,Often,Often,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,120000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,Somewhat useful,,Very useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,53,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,,Other,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Other,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,R,,Python,I collect my own data (e.g. web-scraping),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,"Data Miner,Operations Research Practitioner,Programmer",University courses,20,0,20,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Other,Basic laptop (Macbook),"Text data,Other",Don't know,<1MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Sometimes,,Often,,,,,,,,,Sometimes,,,Often,,,,,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,Segmentation,SVMs",,,,,,,,Often,,,,,,Often,,Sometimes,,Sometimes,,Often,,Often,Often,,Often,Sometimes,,Sometimes,,,,,,60,20,10,10,0,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,Less than 10% of projects,,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,,12000,MAD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Portugal,49,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Python,"Ensemble Methods (e.g. boosting, bagging)",R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Online courses,Textbook,Tutoring/mentoring",,Somewhat useful,Very useful,,,,,,,,Very useful,,,,Very useful,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Predictive Modeler,Programmer,Researcher,Statistician",Work,5,5,20,70,0,0,Time Series,Logistic Regression,High school,Academic,,,,,Important,,,,,,,"IBM SPSS Statistics,R",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,20,40,0,10,30,0,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,10-25% of projects,Do not know,,,,,Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Regression,R,I collect my own data (e.g. web-scraping),"College/University,Personal Projects,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,I don't plan on learning a new ML/DS method,Python,Government website,"Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,,,Very useful,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Engineer,University courses,35,10,30,25,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Manufacturing,20 to 99 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Lift Analysis",Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,80,10,0,0,10,0,Enough to run the code / standard library,"Limitations of tools,Other",,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,Less than 10% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Sometimes,70700,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Ukraine,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,,Very useful,,,Very useful,,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,Udacity",GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,35,45,10,0,10,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,24,Employed part-time,,,Yes,,Other,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,GitHub,"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,3 to 5 years,I haven't started working yet,University courses,10,15,10,50,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A master's degree,Academic,I don't know,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Workstation + Cloud service",Image data,Never,10GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,Rarely,,,,,,,,40,20,10,10,10,10,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Most of the time,,,,,,,,,Often,,,,,,,Most of the time,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts",,Very useful,,Very useful,,,Very useful,,,Very useful,Very useful,Very useful,Somewhat useful,,,,,,"DataTau News Aggregator,Linear Digressions Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Predictive Modeler,Researcher",Self-taught,20,5,40,5,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Stayed the same,3-5 years,Some other way,Somewhat important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Rarely,Sometimes,Sometimes,,,,,Sometimes,,Often,,,,,Rarely,Rarely,Often,,,,10,20,10,30,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Standalone Team,Prefer Not to Say,merging and etl,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,145000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Stan,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Very useful,,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,,,,,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",Rarely,Sometimes,,,Rarely,,,Sometimes,Sometimes,,,,,,Rarely,,Often,,,,Rarely,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,Often,Often,,,,Rarely,,Sometimes,Rarely,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Sometimes,,,Sometimes,Often,Often,Sometimes,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Often,Sometimes,,Often,,Sometimes,Often,,,,,50,20,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,I prefer not to say,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,Often,,Most of the time,,Sometimes,,Often,,,,,Sometimes,,,,,,,51-75% of projects,More internal than external,IT Department,wikidata;weather,data cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,60000,EUR,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Very useful,Very useful,,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Machine Learning Engineer,University courses,50,10,20,20,0,0,"Computer Vision,Machine Translation","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,100MB,"Neural Networks,Regression/Logistic Regression,RNNs","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",Sometimes,,,Rarely,,Often,Often,,Often,,,,,,,Sometimes,,Sometimes,Most of the time,Most of the time,Often,,,,Most of the time,,,,Often,,,,,30,50,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,Often,,,,,,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,Europarl;TAUS;WMT;IWSLT,Getting good in-domain data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,11000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,RapidMiner (free version),Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,Very useful,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Other","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Often,,,,Often,,,,,,Most of the time,,,,,,,,Sometimes,,,,Often,,,,Often,,,,,,,,,,Often,Most of the time,,,,Sometimes,,Most of the time,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Natural Language Processing,Neural Networks,Random Forests,Simulation",,,,,,,Sometimes,Sometimes,,,,Most of the time,,,,,,,Sometimes,Often,,,Often,,,,Often,,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,Often,,,,,,,,Often,,,Sometimes,,,26-50% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,145000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,26,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,A social science,3 to 5 years,Researcher,University courses,40,0,0,60,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,France,25,"Not employed, but looking for work",,,,,,,,Tableau,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,10,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Brazil,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Somewhat useful,,,,,KDnuggets Blog,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,Recommendation Engines,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,41,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Personal Projects,Stack Overflow Q&A",Somewhat useful,,Very useful,,Very useful,Very useful,,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,5,0,75,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Video data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,C/C++,Cloudera,KNIME (free version),NoSQL,Python,QlikView,Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,,,Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,Most of the time,Rarely,,,,,,,,,Sometimes,Often,,,Sometimes,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics",Sometimes,Often,Often,,,Most of the time,Most of the time,Often,Often,Rarely,,Rarely,Rarely,Often,Often,Rarely,Sometimes,Sometimes,Often,Often,Often,Often,Sometimes,,,Sometimes,Often,Often,Often,,,,,50,15,5,10,10,10,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,Often,,,Often,Sometimes,Often,,,Most of the time,,,Often,,,Often,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,"open street map, statista, public government data",the data is not as complete as the business thinks,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Sometimes,,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Israel,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Java,,Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Machine Learning Engineer,Researcher",University courses,0,0,50,50,0,0,Recommendation Engines,"Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Technology,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Always,100GB,Regression/Logistic Regression,"C/C++,Hadoop/Hive/Pig,Java,Spark / MLlib",,,,Rarely,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Logistic Regression,Recommender Systems",Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,20,30,30,0,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Scaling data science solution up to full database",,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,200000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Government website,University/Non-profit research group websites","Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,,Very useful,,,,Very useful,"Data Machina Newsletter,Data Stories Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,20,20,0,60,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Not important +Male,United States,32,Employed full-time,,,Yes,,Engineer,,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Work,10,30,60,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Relational data,Sometimes,100GB,"Gradient Boosted Machines,Neural Networks","Hadoop/Hive/Pig,Python,SQL,TensorFlow",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,Text Analytics",,,,Sometimes,,Often,Often,Often,Often,,,Often,,,,,,,,Often,,,Sometimes,,Often,,,,Often,,,,,20,20,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,112000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Mexico,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important,Very Important,Not important,Somewhat important,Somewhat important +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Not Useful,Somewhat useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,50,15,30,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Insurance,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1TB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,Rarely,,,,,,Often,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,Rarely,Often,Most of the time,Most of the time,,,,Most of the time,,Often,,,,,Often,,,,,,,,,,Often,Sometimes,,,,35,18,2,30,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Often,,,,Most of the time,,Often,,,Often,,,Sometimes,,Often,Often,Often,,76-99% of projects,More internal than external,Standalone Team,fred; census,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,108000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Text Mining,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Conferences,Kaggle,Online courses,Textbook",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,30,10,5,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Increased slightly,1-2 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,"Text data,Relational data",Always,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,SVMs","DataRobot,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL",,,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Text Analytics",,,,,,Most of the time,Most of the time,,,,,,,Sometimes,,,,Often,Often,Sometimes,,Often,,,,,,,Sometimes,,,,,60,5,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,76-99% of projects,More internal than external,Central Insights Team,Data.gov,Access to complete data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,130000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Researcher",Work,35,35,20,10,0,0,,Logistic Regression,A bachelor's degree,Mix of fields,500 to 999 employees,Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Other","Text data,Relational data",,10GB,Regression/Logistic Regression,"Mathematica,Python,R,SQL",,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",Rarely,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Often,,,,30,10,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,,,,Often,,,,Often,,,,,,Most of the time,,,,,,,,100% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,350000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed part-time,,,Yes,,Other,Perfectly,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos,Other",Very useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,"DataTau News Aggregator,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",5,5,5,5,0,80,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,Don't know,Some other way,Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Other",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,Often,Often,Rarely,Often,,,,,Often,,Often,,,Sometimes,Sometimes,Often,,Often,,,Often,,Sometimes,Sometimes,,,,,30,45,0,15,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Other",,,,,,Often,,,Often,,,,,,,,,,,,,Often,100% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Github,Git,Sometimes,,,,4,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Tableau,Time Series Analysis,Python,University/Non-profit research group websites,"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,KDnuggets Blog,3-5 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,32,Employed part-time,,,Yes,,Statistician,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Researcher,Statistician",University courses,15,10,5,70,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation",,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,Often,,,,,,,50,15,10,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,Sometimes,,,,,,,,,,,,,Often,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Egypt,23,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Online courses,Personal Projects",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,0,20,0,80,0,0,Computer Vision,"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,38,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses",,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,"FlowingData Blog,KDnuggets Blog",10-15 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist,Engineer,Predictive Modeler,Statistician",Work,5,5,50,40,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects",,Somewhat useful,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs",A professional degree,Academic,,,,,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data",,,Neural Networks,"Amazon Web services,Jupyter notebooks,Python",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Neural Networks,PCA and Dimensionality Reduction",,,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Limitations in the state of the art in machine learning",,,,Sometimes,,,,,,,,Often,,,,,,,,,,,10-25% of projects,Entirely internal,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,"Becoming a Data Scientist Podcast,FastML Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,500 to 999 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,"Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,,Sometimes,,Sometimes,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,Sometimes,,,Most of the time,Sometimes,,,,Rarely,Often,,,,,25,25,25,25,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data",,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Bitbucket,Git,Mercurial",,90000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Web services,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Arxiv,Blogs,Conferences,Friends network,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,Software Developer/Software Engineer,Other",Work,60,5,20,15,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,10 to 19 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Relational data,Other",Sometimes,100MB,"Decision Trees,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Most of the time,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,SVMs",,,,,,,Most of the time,Rarely,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,20,5,60,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Sometimes,Most of the time,,,,,,Often,,,,,Often,Often,,,Sometimes,,,100% of projects,Entirely internal,IT Department,Open Street Map,Quantity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Git,Other",Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,"Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Programmer,University courses,0,10,0,90,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Brazil,28,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Telecommunications,100 to 499 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,,,"Microsoft Excel Data Mining,Python,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,20,0,0,20,0,60,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT",Most of the time,,,,,,,,,,,,,,Often,,,,,,,,10-25% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Never,56200,BRL,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Portugal,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,,"Arxiv,Blogs,Official documentation,Podcasts,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,Somewhat useful,,,Not Useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Data Scientist,University courses,10,0,30,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,Rarely,1GB,HMMs,"Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,HMMs,Time Series Analysis",,,,,,Rarely,Most of the time,Rarely,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,40,50,10,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,Often,,10-25% of projects,Approximately half internal and half external,Other,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),,,Git,Rarely,160000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,TensorFlow,,Python,,"Blogs,Online courses,Textbook",,Somewhat useful,,,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Data Scientist,Self-taught,30,10,30,30,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Image data,Other",Sometimes,,"Bayesian Techniques,CNNs,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Stan",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",Rarely,,Sometimes,Sometimes,,Sometimes,Most of the time,,,,,,,Sometimes,,Often,,,,,Often,,Often,,,,,,,,,,,40,20,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,100% of projects,More external than internal,Standalone Team,,,,Company Developed Platform,,Other,,230000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Official documentation,Personal Projects,Podcasts,Textbook,Tutoring/mentoring",,Very useful,Very useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,Very useful,,Very useful,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Physics,3 to 5 years,Researcher,Work,40,NA,40,20,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Brazil,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Kaggle,Personal Projects,Trade book,YouTube Videos",,Not Useful,,Not Useful,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,Very useful,"FastML Blog,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",University courses,90,5,5,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,A tech-specific job board,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Random Forests","Hadoop/Hive/Pig,Java,NoSQL,R,SQL",,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Random Forests",,,,,,Most of the time,Often,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,70,10,10,5,5,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,Often,,,Often,,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,144000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Not Useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"FlowingData Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",25,55,15,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Internet-based,100 to 499 employees,Increased significantly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,Sometimes,Often,,,,,Sometimes,,,,,Sometimes,Sometimes,,Often,Often,,,Sometimes,Often,,Sometimes,Often,,,,40,5,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,Often,,100% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,65000,GBP,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Spain,59,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,MATLAB/Octave,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,0,10,10,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,I prefer not to answer,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,Traditional Workstation,Video data,Rarely,1GB,Neural Networks,"C/C++,MATLAB/Octave,Perl",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,50,20,5,15,10,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,,,,,Rarely,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Other,Proprietary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Personal Projects",,,Somewhat useful,,,Somewhat useful,,,,,,Not Useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",University courses,5,5,10,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",High school,Other,"10,000 or more employees",Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,,"Decision Trees,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Java,Microsoft Excel Data Mining,R,RapidMiner (commercial version),RapidMiner (free version),SAS Enterprise Miner,Spark / MLlib,SQL,Tableau",,,,,Rarely,,Rarely,,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Rarely,Rarely,,,,Rarely,,Rarely,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Often,,,,,,Sometimes,,Sometimes,,,,,,,,,,Often,,,Rarely,Most of the time,,,,10,10,10,20,50,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Rarely,"65,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Engineer,Poorly,Employed by government,TensorFlow,,,,"Blogs,College/University,Company internal community,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,1GB,"CNNs,Regression/Logistic Regression","Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Tableau,TensorFlow,Other",Sometimes,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Often,Most of the time,,,Most of the time,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Other",Sometimes,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Often,,,Often,,Most of the time,,Most of the time,,,,Most of the time,Often,,Often,,,,Most of the time,Often,,,Most of the time,,,30,15,15,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Sometimes,,,Sometimes,,,,Often,Often,,,,,,,Often,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",,55000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,PhD,No,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,5,20,50,5,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Brazil,54,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Physics,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Rule Induction,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Not Useful,,Somewhat useful,Somewhat useful,,,,Very useful,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Fine arts or performing arts,1 to 2 years,Other,Other,40,5,30,0,0,25,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Increased significantly,1-2 years,Some other way,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Hadoop/Hive/Pig,Python,Spark / MLlib,Unix shell / awk,Other,Other",,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,Most of the time,Often,Sometimes,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",,,Rarely,,,Sometimes,Sometimes,,Sometimes,,,,,Most of the time,,Sometimes,,Sometimes,Often,,Sometimes,,Sometimes,,,,Often,,Sometimes,,,,,25,20,50,2,3,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,Other,,"My boss insists on personally deploying all jobs to query data from our graph database (Apache Cassandra). This means that all queries of our internal data must either go through our (slow) API or we can access backups of the data on S3 but these are indexed by time only, making it impossible to optimize ETL processes for most projects.","Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Git,Sometimes,90000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,Engineer,Self-taught,55,5,40,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,10GB,Regression/Logistic Regression,"Jupyter notebooks,Perl,Python,SAS JMP,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,Often,,,,,Rarely,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Time Series Analysis",Often,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,15,10,0,40,35,0,Enough to run the code / standard library,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Other,,figuring out how to efficiently work with large datasets (exceeding local harddrive capabilities),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,120000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Tableau,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Conferences,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,10,40,20,0,0,"Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Somewhat important,Other,Workstation + Cloud service,"Relational data,Other",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,NoSQL,Python,R,Spark / MLlib,SQL",Rarely,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Other",,Rarely,Sometimes,,,Most of the time,Most of the time,Often,Often,,,,,,,Most of the time,,Often,,Sometimes,Rarely,,Most of the time,,,,,,,,Most of the time,,,10,30,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,100% of projects,More internal than external,Other,GEO Datasets,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,135000,CAD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,A humanities discipline,,"Business Analyst,Data Analyst",Work,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,United States,22,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Other,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Newsletters",Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,,University courses,20,0,70,10,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Not very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data",Sometimes,100GB,"CNNs,GANs,Neural Networks,RNNs","C/C++,Jupyter notebooks,Perl,Python,TensorFlow,Unix shell / awk,Other,Other",,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,Sometimes,,Often,Most of the time,Sometimes,,"CNNs,Cross-Validation,Data Visualization,GANs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation",,,,Most of the time,,Sometimes,Often,,,,Sometimes,,,,,,,,Often,Most of the time,Often,,,,Often,Often,,,,,,,,10,80,0,10,0,0,Enough to refine and innovate on the algorithm,Limitations of tools,,,,,,,,,,,,,Rarely,,,,,,,,,,10-25% of projects,Entirely internal,Other,Many public academic datasets,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Git,Rarely,"78,000",USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Singapore,48,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Textbook",,,,,,,Very useful,,,,,Somewhat useful,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Engineer",University courses,10,0,50,10,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,QlikView,R,SAS Enterprise Miner,SQL",,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,Often,Sometimes,,,,,,,Sometimes,Often,,,,,,Rarely,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression",,,,,,Rarely,Often,Often,,,,,,Sometimes,Sometimes,Often,,,,,,,,,,,,,,,,,,20,50,20,10,0,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Often,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Company internal community,Conferences,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,Very useful,Somewhat useful,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,Engineer,Self-taught,40,10,25,20,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Logistic Regression,Markov Logic Networks",A master's degree,Government,10 to 19 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,1TB,"Markov Logic Networks,Regression/Logistic Regression","DataRobot,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,Tableau",,,,,,Rarely,,,Often,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Rarely,,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Markov Logic Networks,Naive Bayes,Simulation,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,Often,Most of the time,Sometimes,,,,,,,,,Most of the time,,Often,Often,,,,20,30,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,42500,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Microsoft R Server (Formerly Revolution Analytics),,,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Other",Work,20,0,80,0,0,0,Reinforcement learning,,High school,Other,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Don't know,1GB,,"IBM SPSS Statistics,SQL,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,80,0,0,20,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,,Sometimes,700000,INR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,25,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Government website,"Non-Kaggle online communities,Personal Projects,Textbook",,,,,,,,,Very useful,,,Very useful,,,Somewhat useful,,,,"DataTau News Aggregator,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer,Researcher",Kaggle competitions,20,25,20,0,35,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python",,Sometimes,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Decision Trees,Ensemble Methods,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Sometimes,,Often,,,Often,Often,,,,,,,,,Often,Most of the time,Often,Often,,,Often,,,Sometimes,Sometimes,Often,Often,,,,10,30,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process",Often,,,,,,,Often,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,160000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Australia,24,"Not employed, but looking for work",,,,,,,,Other,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,,,Very useful,,,Very useful,O'Reilly Data Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),40+,Master's degree,No,Master's degree,Other,1 to 2 years,I haven't started working yet,University courses,10,20,0,70,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,NA,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Newsletters,Personal Projects",,,,,Somewhat useful,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,,,"Perl,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,0,0,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",DataRobot,Neural Nets,Stata,University/Non-profit research group websites,Arxiv,Very useful,,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,10,20,20,45,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Financial,Fewer than 10 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Never,100MB,Other,Amazon Machine Learning,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,None,Approximately half internal and half external,Business Department,,,Key-value store (e.g. Redis/Riak),Email,,Subversion,Rarely,0,BGN,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,No Free Hunch Blog,3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,,Researcher,Kaggle competitions,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Very Important +Male,United States,57,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that performs advanced analytics,R,Deep learning,Matlab,I collect my own data (e.g. web-scraping),"Company internal community,Friends network,Kaggle,Personal Projects,Textbook",,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Other",Self-taught,90,0,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",High school,Manufacturing,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Other,Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,MATLAB/Octave",,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,Often,,,Most of the time,,,,50,20,5,20,5,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,,,,Often,,,,,,,,Most of the time,,Often,,76-99% of projects,More internal than external,Other,None,Integrating differently shaped data along production flow paths,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Anomaly Detection,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Official documentation,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,Very useful,Very useful,,,Somewhat useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Other",Self-taught,80,10,0,0,0,10,"Outlier detection (e.g. Fraud detection),Time Series",Other (please specify; separate by semi-colon),A doctoral degree,Internet-based,"5,000 to 9,999 employees",Increased slightly,3-5 years,Some other way,Very important,Other,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,<1MB,Other,"Perl,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Most of the time,,,,,,Often,,,,Sometimes,,,,,Sometimes,,Often,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,192000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Australia,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important +Male,Other,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,47,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,C/C++/C#,Government website,"Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,Not Useful,,,,,3-5 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,No,Bachelor's degree,Psychology,6 to 10 years,"Data Analyst,Researcher,Other",Self-taught,35,60,5,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series",,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important +Male,Australia,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,6 to 10 years,"Data Scientist,Researcher",Work,40,30,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Financial,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Neural Networks,Regression/Logistic Regression,SVMs","Cloudera,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,Often,Often,,,,Sometimes,Rarely,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,,,Most of the time,Often,,,,,,,Sometimes,,Often,,,Often,,Sometimes,,,,Rarely,,,Often,Often,,,,,20,45,30,5,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,Entirely internal,Standalone Team,wikipedia toxic comments dataset; blackbase HDD stats datasets; ISCX 2012 Intrusion Detection Evaluation Dataset,Getting enough labeled data for supervised ML approaches.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,Bitbucket,Sometimes,"43,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Time Series Analysis,Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests","Java,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,Most of the time,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Time Series Analysis",,Often,,,,,Most of the time,Often,,,,,,Often,,,,,Often,Most of the time,,,,,,,,,,Sometimes,,,,70,20,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Often,Often,,,,,,,,,,Often,Sometimes,,51-75% of projects,More internal than external,IT Department,Telecommunication,Price of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Other,Rarely,600000,TWD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Anomaly Detection,Python,University/Non-profit research group websites,"Arxiv,Kaggle,Official documentation,Stack Overflow Q&A",Very useful,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Machine Learning Engineer,University courses,20,0,30,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,Most of the time,10MB,"Neural Networks,SVMs","Amazon Web services,Python,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,RNNs,SVMs,Time Series Analysis",,,,,,Most of the time,Often,,,,,,,Sometimes,,,,,,,Sometimes,,,,Sometimes,,,Most of the time,,Most of the time,,,,20,15,45,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,,Often,,,,,,,,,Sometimes,,10-25% of projects,Entirely internal,Standalone Team,none,lack of machine learning models that apply well on time series data,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,130000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by government,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Not Useful,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Software Developer/Software Engineer,Other",University courses,20,20,40,20,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,500 to 999 employees,Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,,,"Jupyter notebooks,Python,SAS Base,SAS JMP,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Often,,Sometimes,,Often,,,,,,Rarely,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Random Forests,SVMs",,,,,,Rarely,Most of the time,Rarely,Rarely,Rarely,,,,,,Rarely,,,,,,,Rarely,,,,,Rarely,,,,,,20,10,0,30,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,Often,,Often,Often,,,,Often,,Most of the time,Most of the time,,,Most of the time,,Often,,100% of projects,More internal than external,Other,Market data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,177000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,19,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,Microsoft SQL Server Data Mining,Anomaly Detection,Matlab,GitHub,"Company internal community,Conferences,Personal Projects,Stack Overflow Q&A",,,,Very useful,Somewhat useful,,,,,,,Very useful,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,"Engineer,Programmer",University courses,50,0,0,40,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Other,Most of the time,10GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks","C/C++,Oracle Data Mining/ Oracle R Enterprise,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Markov Logic Networks,Neural Networks,Simulation,Time Series Analysis",,,Most of the time,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,,,,Most of the time,,,Sometimes,,,,10,70,5,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Rarely,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Git,Git,Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,,Very useful,,Somewhat useful,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Predictive Modeler,Self-taught,10,0,80,10,0,0,Computer Vision,"Bayesian Techniques,Evolutionary Approaches",A bachelor's degree,Manufacturing,,,,,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Image data,Relational data",Most of the time,10GB,"Bayesian Techniques,Evolutionary Approaches,Other","MATLAB/Octave,R,SAS JMP,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,Sometimes,,,,,Often,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Evolutionary Approaches,PCA and Dimensionality Reduction,Simulation",,Often,Sometimes,,,Most of the time,Most of the time,,,Often,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,30,30,5,15,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,,,Often,Sometimes,,,,,Often,,,,,,,Most of the time,,76-99% of projects,Entirely internal,Other,none,Data quality,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,240000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Indonesia,22,"Not employed, but looking for work",,,,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,Kaggle,,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Researcher,University courses,30,0,0,30,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - GANs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,Time Series Analysis,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,Not Useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Biology,Less than a year,"Data Scientist,Researcher,Other",University courses,40,0,30,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,Don't know,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Perl,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Text Analytics",,,,,,,,Sometimes,Most of the time,,,,,Often,,Sometimes,,Sometimes,,,,,Often,Most of the time,,,,,Sometimes,,,,,10,50,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,,,,,Sometimes,,,,,,Sometimes,,,Less than 10% of projects,Entirely internal,Standalone Team,none,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,"50,000",USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by government,Python,Social Network Analysis,Python,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,"Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,6 to 10 years,"Business Analyst,Data Analyst,Researcher,Other",Other,25,15,30,30,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Government,500 to 999 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",,1GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,SAS JMP,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,,Often,,Most of the time,,,Rarely,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,60,10,0,10,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Often,,,,,,,,Most of the time,,,,,,,Most of the time,Most of the time,,,,Sometimes,,100% of projects,More internal than external,Other,"Some public local law enforcement datasets, land information office shape files.",Some of the data has information that is not subject to open records. ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,I don't typically share data",,Other,Rarely,"75,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Iran,24,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,"Programmer,Software Developer/Software Engineer",,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"GitHub,Google Search","Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Somewhat useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Professional degree,,Less than a year,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,10,10,20,30,10,20,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Always,10GB,"CNNs,Neural Networks,Random Forests,RNNs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,RapidMiner (commercial version),Other",,,,,,,,,Sometimes,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,"Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,Sometimes,,,,,,Sometimes,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,10,30,20,20,10,10,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT",,,Sometimes,,,,,,,Often,,,,,Most of the time,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,230000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,Not Useful,Very useful,Very useful,,,Very useful,"FastML Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Software Developer/Software Engineer,Other",University courses,40,20,10,30,0,0,,Logistic Regression,A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data,Other",Sometimes,10MB,Regression/Logistic Regression,"IBM Watson / Waton Analytics,Jupyter notebooks,Python",,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Lift Analysis,Natural Language Processing,Simulation,Text Analytics",Sometimes,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,Sometimes,,Often,,,,,40,10,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,Often,,,,,Most of the time,,,Sometimes,,Sometimes,Often,,51-75% of projects,More internal than external,Standalone Team,Data.gov.in;WB;WHO,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,Work,30,0,45,20,5,0,Time Series,Logistic Regression,I prefer not to answer,Financial,"1,000 to 4,999 employees",Decreased slightly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,100MB,Regression/Logistic Regression,"MATLAB/Octave,R,SAS Enterprise Miner,Other",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,Most of the time,,,"Logistic Regression,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,Sometimes,,,,50,30,15,2,3,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Often,,,,,,Sometimes,,,,,,Sometimes,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,27500,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Online courses,Trade book,YouTube Videos",,Very useful,,,,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,O'Reilly Data Newsletter,< 1 year,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Time Series,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,Hong Kong,37,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,Work,55,10,20,5,10,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Other,Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Python,R,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Often,,,Often,Most of the time,Often,Sometimes,,,Sometimes,Sometimes,Often,,Often,,,,,Most of the time,,Often,,,,,,,Most of the time,,,,20,50,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,Often,,,,Often,,Sometimes,,,,Most of the time,,,,76-99% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Researcher",Kaggle competitions,10,0,30,20,40,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,"5,000 to 9,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Relational data",Rarely,100MB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,QlikView,R,Spark / MLlib",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,Often,,,,,,,,Sometimes,,,,,,,,,,,"CNNs,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests",,,,Often,,,,Sometimes,,,,Often,,,,Often,,,,Often,,,Sometimes,,,,,,,,,,,20,20,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Sometimes,,Sometimes,Often,Often,,,Most of the time,,Often,,,,,,,,Sometimes,,,,51-75% of projects,Approximately half internal and half external,Other,weather,data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Never,"34,000",JPY,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Java,Neural Nets,C/C++/C#,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Engineer,Programmer",Self-taught,50,20,10,10,10,0,Time Series,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data",Never,1GB,"CNNs,Neural Networks","C/C++,Java,Python",,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks",,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,5,20,15,50,10,0,Enough to tune the parameters properly,"Dirty data,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,10-25% of projects,More internal than external,IT Department,none,how to select feature or create feature?,Key-value store (e.g. Redis/Riak),"Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"20,000",AED,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,Textbook,Other",,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,,Self-taught,40,20,0,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A doctoral degree,Technology,500 to 999 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation,Workstation + Cloud service",Relational data,,10MB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,15,10,0,25,50,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database",,,,Rarely,Sometimes,,,,,,,,,,,,,Sometimes,,,,,51-75% of projects,More internal than external,Standalone Team,Sometimes use economic market data for population,Joining tables and accessing the correct data,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"115,000",USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,Self-taught,50,20,0,10,20,0,"Computer Vision,Natural Language Processing",Neural Networks - CNNs,A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,,1GB,Regression/Logistic Regression,NoSQL,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,"Logistic Regression,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools",,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Bitbucket,,1400000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,47,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,,3-5 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Iran,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Official documentation,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,,,,Very useful,,Very useful,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Miner,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",89,10,0,1,0,0,"Natural Language Processing,Unsupervised Learning",Logistic Regression,High school,Other,20 to 99 employees,Increased slightly,Less than one year,A tech-specific job board,Somewhat important,Other,"Basic laptop (Macbook),Traditional Workstation","Video data,Text data,Relational data",Sometimes,1GB,Other,"Jupyter notebooks,MATLAB/Octave,NoSQL,Orange,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Sometimes,,Often,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,,,,,Most of the time,,,,,,,Often,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,30,15,0,25,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,Most of the time,,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,1194000000,IRR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Iran,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,MATLAB/Octave,Deep learning,Matlab,"Google Search,University/Non-profit research group websites","Company internal community,Kaggle,Personal Projects",,,,Somewhat useful,,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Researcher,University courses,15,20,25,35,5,0,Speech Recognition,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Decreased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,100MB,CNNs,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,25,25,10,15,15,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Machina Newsletter,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important +Male,People 's Republic of China,32,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects",,Very useful,,Somewhat useful,,,Very useful,,,Very useful,Somewhat useful,Very useful,,,,,,,,3-5 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Work,30,20,20,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,,"FastML Blog,Jack's Import AI Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",35,15,25,0,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Minitab,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Sometimes,,,Often,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,Often,,,,60,5,20,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,Most of the time,,51-75% of projects,Entirely internal,Business Department,,Not clean and not stored in a way that can be scaled.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,,Sometimes,65000,USD,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,No,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,60,0,10,15,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Indonesia,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,,,,,,Very useful,Very useful,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,University courses,40,10,0,50,0,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Decreased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,SVMs","Java,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,SVMs",,Sometimes,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,Often,,Often,,,,,,30,30,30,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,UCI Machine Learning Repository,Cleaning the data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,6500000,IDR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Turkey,32,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Hadoop/Hive/Pig,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",70,0,20,10,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,<1MB,Other,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,"Segmentation,Other",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Rarely,0,0,30,70,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT",Rarely,,,,,,,,,,,,,,Rarely,,,,,,,,None,Entirely internal,IT Department,.,.,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Other,Rarely,60000,TRY,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by government,Amazon Machine Learning,Deep learning,SQL,"GitHub,Google Search,Government website","Blogs,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,Somewhat useful,,Very useful,,,,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,40,20,0,5,0,Outlier detection (e.g. Fraud detection),"Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs",A doctoral degree,Retail,"5,000 to 9,999 employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Decision Trees,Evolutionary Approaches,Neural Networks,RNNs","Amazon Machine Learning,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Microsoft SQL Server Data Mining,NoSQL,TensorFlow",Rarely,,,Often,,,,Sometimes,Often,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,CNNs,Evolutionary Approaches,Logistic Regression,Neural Networks,Simulation",Sometimes,,,Sometimes,,,,,,Often,,,,,,Often,,,,Often,,,,,,,Most of the time,,,,,,,40,25,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Often,Often,,,Sometimes,,,,Often,Often,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,National Informatics Center,filtering useful data from large dataset,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,100000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Japan,24,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Random Forests,Python,I collect my own data (e.g. web-scraping),"Blogs,Stack Overflow Q&A,Other",,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,Time Series,"Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Not at all important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,<1MB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Python",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,,,,Rarely,,,,,,,,,,,,,,Often,,,Often,,,,,Often,,Often,,,,0,50,0,20,30,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,Often,,,Most of the time,,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,IT Department,None,"highly accurate prediction. ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"40,000",,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Other,Relational data,Always,100GB,Other,"Java,NoSQL,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules",Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,5,25,0,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,Most of the time,Most of the time,,,,,Sometimes,,,,,,Most of the time,Most of the time,,100% of projects,Entirely internal,Standalone Team,Na,Access and consolidation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,"85,000",,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +A different identity,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Cluster Analysis,Python,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,,No,Professional degree,,,"Data Analyst,Engineer,Programmer,Researcher,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Female,Iran,27,Employed part-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher",University courses,10,10,30,50,0,0,"Machine Translation,Natural Language Processing",Bayesian Techniques,A bachelor's degree,Technology,Fewer than 10 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Sometimes,100MB,"Bayesian Techniques,Neural Networks,RNNs","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Often,,,,,Often,,,,,,,,,60,30,5,0,5,0,Enough to run the code / standard library,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,,240000000,IRR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Switzerland,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Official documentation,Online courses",,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,,,,,,,3-5 years,,,,,,,,,,,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,PhD,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,I haven't started working yet,University courses,25,10,0,60,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Google Search,"Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,,< 1 year,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Other,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Unix shell / awk,Regression,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Official documentation,Online courses,Textbook",Somewhat useful,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,20,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Other,Most of the time,1GB,"Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Unix shell / awk",,,,Sometimes,,,,Most of the time,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,Often,,,,"Logistic Regression,Naive Bayes,Neural Networks,Simulation,SVMs",,,,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,Most of the time,Sometimes,,,,,,10,10,10,40,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,100% of projects,Entirely internal,Other,,,Other,"Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,250000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,South Africa,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,,Somewhat useful,,,Not Useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important +Female,India,36,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,40,0,0,30,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",,100TB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,SAS Base,SQL",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,,,,,,Often,Often,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,10,30,30,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,"Open source tools cannot be relied upon for buisness decisions although they help in data science. Validated tools like sas are good but expensive,so investment is required. For companies using sas from quite longtime migrating to open source means translating code which again means cost to company. And most important is if someone understands company buisness he / she should know how we can use data science to excel in our buisness. So we need modules built as per buisness by experts which can be used for our buisness to start with.","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,0,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Social Network Analysis,Matlab,University/Non-profit research group websites,"Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer",University courses,10,40,20,30,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,RapidMiner (free version),TensorFlow,Other,Other",,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,,,Sometimes,,,,,Often,,,,Most of the time,,Often,,Rarely,,,,,,,,,,,Often,,,Most of the time,Most of the time,,"Association Rules,CNNs,Collaborative Filtering,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs",,Sometimes,,Often,Most of the time,,,Often,,,,,,Sometimes,,Often,,Often,Most of the time,Most of the time,Rarely,,Sometimes,Most of the time,Often,,,Sometimes,,,,,,10,20,0,10,0,60,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues",,,,,Most of the time,,,,,Most of the time,,,Often,,,,Most of the time,,,,,,10-25% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,12000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Russia,65,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by college or university,R,Deep learning,R,I collect my own data (e.g. web-scraping),"Personal Projects,Tutoring/mentoring",,,,,,,,,,,,Very useful,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",5-10 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,,Doctoral degree,Psychology,I don't write code to analyze data,Data Scientist,Work,0,0,100,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Support Vector Machines (SVM),,GitHub,"College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,,Very useful,,,,,Very useful,Very useful,,Very useful,,,Very useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,10,0,90,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Java,NoSQL,SAS Base",Sometimes,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,"Bayesian Techniques,Natural Language Processing,Neural Networks",,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,15,10,70,0,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues",Rarely,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,10-25% of projects,More external than internal,IT Department,Confidential,Bugs,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,"Bitbucket,Git",Rarely,500000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Australia,40,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",20,5,0,0,0,75,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,Australia,47,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",5-10 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,DataCamp,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,I haven't started working yet,University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,India,32,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,"College/University,Kaggle",,,Somewhat useful,,,,Very useful,,,,,,,,,,,,"Jack's Import AI Newsletter,Linear Digressions Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,50,20,10,20,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,500 to 999 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1MB,"Bayesian Techniques,Decision Trees,HMMs,Neural Networks,Regression/Logistic Regression","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,Often,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Often,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,HMMs,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,Most of the time,Most of the time,,,,Most of the time,Most of the time,,,,,Often,,,,,Most of the time,,Often,Most of the time,,,,,,,,,Often,,,,30,30,10,10,10,10,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Egypt,33,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Tableau,Regression,R,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,40,20,40,0,0,0,"Computer Vision,Time Series","Bayesian Techniques,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Not at all important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,<1MB,"Bayesian Techniques,Decision Trees","NoSQL,Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,,,,Most of the time,,,,,,,"Association Rules,Decision Trees,Logistic Regression,Naive Bayes",,Often,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,60,5,5,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,,,,Often,Sometimes,,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Personal Projects",,,,,Somewhat useful,,,,,,,Very useful,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Engineer,Researcher,Software Developer/Software Engineer",Self-taught,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Neural Networks - RNNs",A master's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Most of the time,1GB,"Evolutionary Approaches,Neural Networks,RNNs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Evolutionary Approaches,Neural Networks,Segmentation,Simulation,Time Series Analysis",,,,,,,,,,Often,,,,,,,,,,Often,,,,,,Sometimes,Often,,,Often,,,,30,30,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making",Most of the time,,Often,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,43000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Non-Kaggle online communities,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,,,Very useful,,,Very useful,,,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,"Business Analyst,Other",University courses,30,10,50,10,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Perl,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Text Analytics",Sometimes,,,,,Rarely,,Sometimes,,,,,,,,Most of the time,,,,Sometimes,,,Often,,,,,,Often,,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,Often,,,,,,Often,,,,Sometimes,,,26-50% of projects,Approximately half internal and half external,Standalone Team,"population statistics, data from organizations",Gathering correct data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Cloud services,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,60000,EUR,I am not currently employed,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,,University courses,1,0,0,99,0,0,Outlier detection (e.g. Fraud detection),,A master's degree,Financial,500 to 999 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,,,"Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,100,0,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,Do not know,IT Department,Stock Exchange data,I don't yet understand stocks,Other,I don't typically share data,,"Bitbucket,Git,Subversion",Never,50000,EUR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,39,Employed full-time,,,No,Yes,Other,Perfectly,Employed by non-profit or NGO,TensorFlow,Neural Nets,Python,Government website,"Arxiv,College/University,Kaggle,Online courses,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Linear Digressions Podcast",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Professional degree,,1 to 2 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,40,0,50,10,0,,"Logistic Regression,Neural Networks - CNNs",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,60,20,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",No education,Financial,I don't know,Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Flume,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,Sometimes,,Often,,,,,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,Most of the time,Most of the time,,Often,Often,Often,,Often,,,Often,,Often,,Often,,Often,,,Often,Sometimes,Often,Often,,,,,Often,Often,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",,,,Often,Most of the time,Sometimes,,,Sometimes,,Often,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Stack Overflow Q&A",,Very useful,,,,,Very useful,Somewhat useful,,,,,,Very useful,,,,,"Data Elixir Newsletter,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,30,10,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,Technology,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Naive Bayes,Natural Language Processing,Segmentation,Text Analytics",Sometimes,,Most of the time,,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,Most of the time,,,,,50,30,20,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Often,,,,,,,Often,,,,,,,26-50% of projects,Approximately half internal and half external,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,3100000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Italy,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,GitHub,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,10,10,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Telecommunications,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100GB,Regression/Logistic Regression,"NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Segmentation",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,20,30,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",,Sometimes,,,Most of the time,Often,,Often,Most of the time,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Never,30000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Sweden,37,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,,Sort of (Explain more),Bachelor's degree,Other,,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,RapidMiner (commercial version),Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,45,50,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,R,RapidMiner (commercial version),RapidMiner (free version),SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Sometimes,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression",Most of the time,,,,,,Most of the time,Most of the time,,,,Often,,,,Sometimes,,,,,,,,,,,,,,,,,,35,20,10,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,Sometimes,,,Often,,,,,Sometimes,,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,S3,Git,Sometimes,195000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Amazon Machine Learning,,,,Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",Self-taught,80,1,10,5,0,4,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis",Support Vector Machines (SVMs),,Academic,100 to 499 employees,Stayed the same,1-2 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,"Text data,Relational data",,,SVMs,"Amazon Machine Learning,Amazon Web services,R",Rarely,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,RNNs,SVMs",Rarely,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Most of the time,,,,,,0,50,0,50,0,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,,,,Most of the time,Rarely,Rarely,,Most of the time,,,Most of the time,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,,5,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Microsoft Excel Data Mining,Python,QlikView,R,SQL,Tableau,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,Sometimes,Often,Often,,,,,,,,,Often,,,Most of the time,Sometimes,Sometimes,,,,,"Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation",,,,,,,Most of the time,Often,,Sometimes,,,,,,Often,,,,,,Sometimes,Sometimes,,,Sometimes,Often,,,,,,,35,10,10,35,10,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,Rarely,,,,,Often,,,Often,,,76-99% of projects,Approximately half internal and half external,Other,NA,NA,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data,Share Drive/SharePoint",,Other,Sometimes,2500000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Republic of China,41,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,,,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Recommendation Engines,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,55,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by government,Python,Bayesian Methods,R,GitHub,"Company internal community,Online courses,YouTube Videos",,,,Very useful,,,,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Government,100 to 499 employees,Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,Regression/Logistic Regression,"MATLAB/Octave,R,SAS JMP",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,76-99% of projects,More internal than external,Other,standards,Understanding circumstances/methods under which survey or test data was obtained,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"115,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Greece,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Newsletters,Personal Projects,Textbook",Not Useful,,Somewhat useful,,,,Very useful,Not Useful,,,,Very useful,,,Somewhat useful,,,,"Data Elixir Newsletter,FlowingData Blog,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Yes,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important +Male,Ukraine,23,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Scala,GitHub,"Arxiv,Company internal community,Kaggle,Stack Overflow Q&A",Very useful,,,,,,,,,,,,,Very useful,,,,,"FlowingData Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",High school,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data,Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,GANs,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Cloudera,Mathematica,NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,GANs,Naive Bayes,Neural Networks,Random Forests,Text Analytics",,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,80,0,10,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Often,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,76-99% of projects,More external than internal,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,15000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,SAS Base,Bayesian Methods,SAS,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,Somewhat useful,,Very useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Researcher,Statistician",University courses,30,20,20,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,10 to 19 employees,Stayed the same,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,,Regression/Logistic Regression,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,,,Sometimes,,Often,,,,Rarely,Sometimes,,,26-50% of projects,More internal than external,Other,NOAA rainfall,"Data structures and meaning are understood by a few key people, and there is no training program to share information. You basically just have to ask when/if it's relevant to a project you're on","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Rarely,85000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Spain,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,,Logistic Regression,A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,,Other,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,26-50% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,Email",,Git,Rarely,41500,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +A different identity,Other,28,Employed full-time,,,No,Yes,Engineer,,,,,,,Other,,,,,,,,,,,,,,,,,,,,3-5 years,,,,,,,,,,,,,,,,,,,Master's degree,,,,,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",15,80,5,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,Other,Basic laptop (Macbook),Relational data,Most of the time,1MB,Regression/Logistic Regression,"Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,Sometimes,,,100% of projects,Entirely internal,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,It is shared by default,Git,Rarely,"120,000",MYR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Poland,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,0,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Academic,500 to 999 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Relational data",,10GB,"CNNs,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,Sometimes,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,,,Most of the time,Often,,Sometimes,,,Sometimes,,Rarely,Rarely,Rarely,,,,30,50,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,,Often,,Most of the time,Rarely,,Sometimes,,,,,,Often,Often,,76-99% of projects,More external than internal,IT Department,"RCSB Protein Data Bank,ChEMBL,PDBBind",Data are very dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git,Other",Sometimes,23000,PLN,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Israel,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Podcasts,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Very useful,,,,,,Somewhat useful,Somewhat useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,25,10,40,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Other,,,Random Forests,"Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib",,Often,,,,,,Often,,,,,,,,,Often,,,,Rarely,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,"Cross-Validation,Data Visualization,Random Forests",,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,10,0,30,20,40,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,Often,,,,,Often,,,,,,,,,Often,,,,,26-50% of projects,Do not know,Standalone Team,,Noise,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,"Bitbucket,Git",Rarely,216000,ILS,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by college or university,I don't plan on learning a new tool/technology,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,,Self-taught,20,20,30,20,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,"Ensemble Methods,Random Forests","Jupyter notebooks,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Rarely,,,,,,Often,,Often,,,,,,,Rarely,,,,Sometimes,,Sometimes,Often,,,,30,30,5,10,25,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,Sometimes,Often,,,,,,,Most of the time,Most of the time,,Sometimes,Most of the time,,,76-99% of projects,More internal than external,Central Insights Team,HESA,"Lack of consitency as a result of constant changes in business process, staffing and lack of understanding/general disregard that data quality is important and its collection is sometimes seen as a burden on the business rather than an asset.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,,Sometimes,47000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Colombia,43,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,SAS Base,Social Network Analysis,SAS,I collect my own data (e.g. web-scraping),"College/University,Conferences,Friends network,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Very useful,,Very useful,Very useful,Very useful,,,,,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",15+ years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Researcher",Self-taught,75,0,25,0,0,0,"Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United Kingdom,24,"Not employed, but looking for work",,,,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"College/University,Kaggle,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,0,0,0,100,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Hungary,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Friends network,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,,,,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,Researcher,University courses,10,10,0,80,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,Hospitality/Entertainment/Sports,I don't know,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Don't know,100MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Mathematica,MATLAB/Octave",,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,kNN and Other Clustering,PCA and Dimensionality Reduction,Time Series Analysis",,,Rarely,,Often,,,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,Most of the time,,,,30,10,0,50,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Most of the time,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,India,49,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,IBM Watson / Waton Analytics,Deep learning,Matlab,University/Non-profit research group websites,"College/University,Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Business Analyst,Programmer,Other",Self-taught,70,25,5,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,10MB,"Decision Trees,Neural Networks,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Decision Trees,Neural Networks,SVMs",,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Limitations of tools,Organization is small and cannot afford a data science team,Other",,,,,,,,Sometimes,,,,,Often,,,Most of the time,,,,,,Most of the time,None,Do not know,Other,I teach using datasets from UC Irvine,,Other,"Email,I don't typically share data",,Other,Never,,INR,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,India,32,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Neural Nets,R,"Google Search,University/Non-profit research group websites","Blogs,College/University,Online courses,Textbook",,Somewhat useful,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,0,20,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Other,,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,QlikView,R,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,Often,,,,Often,Most of the time,,Often,,,,,,,,,Sometimes,,,,60,20,NA,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Often,,,,,,,,Often,,,Most of the time,,,26-50% of projects,Entirely internal,Standalone Team,,cleaning up the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Most of the time,75000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,22,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,31,"Not employed, but looking for work",,,,,,,,Python,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Necessary,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Software Developer/Software Engineer",Self-taught,40,0,40,20,0,0,Adversarial Learning,"Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,,,Somewhat important,Somewhat important,,, +Male,South Africa,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Programmer,Statistician",University courses,20,5,15,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs","C/C++,MATLAB/Octave,Python,R",,,,Often,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,Sometimes,Rarely,,Most of the time,Most of the time,,,,,,,,,Often,,,,Sometimes,Sometimes,,Often,,,,Often,Sometimes,,Most of the time,,,,60,15,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,,,,Sometimes,,,,,Most of the time,Rarely,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,40000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,20,Employed part-time,,,No,Yes,Programmer,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important,Very Important +Male,Other,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Data Scientist,Software Developer/Software Engineer",University courses,70,30,0,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Insurance,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Always,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Java,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Most of the time,,Sometimes,,,,,,,Most of the time,,,Often,Sometimes,,Most of the time,Most of the time,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Neural Networks,Random Forests,Recommender Systems,RNNs,Simulation",Sometimes,Most of the time,,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,,Often,,,,,Often,,,Often,Most of the time,Sometimes,,Most of the time,,,,,,,10,80,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Italy,44,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,R,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,"Data Machina Newsletter,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,,,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Internet-based,20 to 99 employees,Increased significantly,Less than one year,A tech-specific job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Naive Bayes",Sometimes,,,,,Sometimes,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,20,0,10,10,60,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,76-99% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,136000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Kenya,42,Employed full-time,,,No,Yes,Other,Poorly,Employed by government,R,Survival Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,< 1 year,,,,,,Necessary,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Other,I don't write code to analyze data,Other,Kaggle competitions,0,0,0,0,100,0,Other (please specify; separate by semi-colon),Logistic Regression,No education,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,,,,,,Somewhat important,,,,,,Very Important,,,, +Female,Russia,22,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Deep learning,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,10,10,60,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,I prefer not to answer,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",Most of the time,,,,Sometimes,,Often,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,30,40,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,,,,,,,,,,Rarely,Often,Often,Often,,,Often,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Never,624000,RUB,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,United States,65,Employed part-time,,,No,Yes,Data Analyst,Poorly,Employed by college or university,Other,Association Rules,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Newsletters,Online courses,Personal Projects",,Somewhat useful,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,3 to 5 years,Other,University courses,10,30,0,50,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,45,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",3-5 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,,Other,40,0,20,10,0,30,,,A bachelor's degree,Other,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,1GB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,RapidMiner (free version),SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,,,,Most of the time,,,,,,Most of the time,Sometimes,,,"Cross-Validation,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,,,,,Often,,,,,,,,,,Often,,,Most of the time,,,,Often,,,,,,Most of the time,,,,,15,15,50,5,15,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,Often,,,,,,Often,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Subversion",Never,70000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by college or university,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Tutoring/mentoring",,,,,,,Very useful,,,,Somewhat useful,,,,,,Very useful,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Analytics Dispatch Newsletter",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Statistician",Kaggle competitions,15,0,10,30,45,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Other,51,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Talking Machines Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Doctoral degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,30,0,0,20,0,Other (please specify; separate by semi-colon),Decision Trees - Gradient Boosted Machines,Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,India,24,"Not employed, but looking for work",,,,,,,,SQL,Text Mining,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Female,United States,NA,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,C/C++,Deep learning,Stata,Other,"College/University,Company internal community,Official documentation,Online courses,Personal Projects,Textbook,Trade book",,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",42,32,20,5,1,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Non-profit,100 to 499 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,Regression/Logistic Regression,"C/C++,IBM SPSS Statistics,Python,R,SQL,Other",,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,,,,,Most of the time,,,Sometimes,,,,Sometimes,,,,Often,,Often,Often,,,,5,10,10,5,70,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,Often,,,,,,Most of the time,,,,,,,,100% of projects,More external than internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,100000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,57,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Podcasts,Textbook",Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,"Data Stories Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Necessary,Necessary,Necessary,,Other,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Electrical Engineering,,"Data Analyst,Other",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Germany,29,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,A social science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,United Kingdom,26,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Google Search,Government website,University/Non-profit research group websites","Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,Very useful,,,Very useful,,,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,Other,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Biology,1 to 2 years,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important +Male,Israel,24,Employed full-time,,,No,Yes,Programmer,Fine,Employed by government,Python,Regression,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Programmer,University courses,0,0,0,85,15,0,Unsupervised Learning,Hidden Markov Models HMMs,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Colombia,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,NoSQL,Deep learning,R,,"Blogs,Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Data Miner,"Online courses (coursera, udemy, edx, etc.)",10,20,50,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","KNIME (commercial version),R,SAS Enterprise Miner,SQL,Other",,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,Most of the time,,,,,,,Often,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,Sometimes,,Most of the time,Often,Often,Often,,,,,Often,,Most of the time,,Often,Often,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Often,Often,Often,,,,50,20,10,10,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Sometimes,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,65000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,Somewhat useful,FastML Blog,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,30,10,40,19,1,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Ukraine,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Friends network,Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos",Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Researcher",Self-taught,60,0,30,0,10,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Image data,,,"CNNs,RNNs","Amazon Web services,C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk,Other,Other",,Sometimes,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,Most of the time,Rarely,Often,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Segmentation,Time Series Analysis",Sometimes,,,Often,,Often,Sometimes,,,,Sometimes,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,30,30,30,7,3,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,Rarely,,,Sometimes,,,Often,,,,,,,,Often,Often,,10-25% of projects,Do not know,IT Department,imagenet,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Git,Rarely,51000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,46,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Amazon Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,,More than 10 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,5,0,0,95,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,,"C/C++,Python,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,5,50,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Rarely,145000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Spain,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,30,0,0,20,0,"Computer Vision,Time Series","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,10GB,"Markov Logic Networks,Neural Networks,RNNs","C/C++,Hadoop/Hive/Pig,Java,NoSQL,R,TensorFlow",,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Logistic Regression,Markov Logic Networks,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,,,Sometimes,,,Sometimes,,,,30,20,20,30,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,,,,Often,Often,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,70000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Spain,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,Mathematica,Regression,Matlab,Google Search,"Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,Very useful,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,1 to 2 years,"Engineer,Researcher",University courses,20,20,10,50,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,Not very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10MB,Bayesian Techniques,"Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,0,10,90,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Limitations of tools",,,Sometimes,,,Often,,,,,,,Rarely,,,,,,,,,,None,Approximately half internal and half external,IT Department,,,Other,"Commercial Data Platform,Email",,Other,Sometimes,40000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,62,"Not employed, but looking for work",,,,,,,,Other,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Company internal community,Friends network,Online courses,Personal Projects,Tutoring/mentoring",,,,Not Useful,,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Other (Separate different answers with semicolon),3-5 years,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,A health science,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Researcher",Work,40,0,10,0,0,50,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,11-15,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,France,34,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Statistician",University courses,30,30,40,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Other",,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Minitab,Perl,Python,R,SQL,Stan,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Rarely,Rarely,Often,,,Sometimes,Most of the time,,,,,Rarely,,,,Rarely,Most of the time,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,Rarely,,Rarely,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Sometimes,,Sometimes,Most of the time,Often,,,,,,Sometimes,Sometimes,,Often,,Sometimes,,,Often,Most of the time,Often,,,Often,Most of the time,Often,,Often,,,,10,50,20,10,10,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Other,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,"Data Machina Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,20,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Other","Image data,Text data",Sometimes,<1MB,"CNNs,Ensemble Methods,SVMs,Other","C/C++,Julia,Jupyter notebooks,Mathematica,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,Often,Often,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Natural Language Processing,SVMs",,,,Sometimes,,Often,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,20,50,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process",Often,Often,,,,,,Sometimes,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,tweeter api; kaggle database; dataset include at programming libraries,Size an dimension,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,"300,000",MXN,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Doctoral degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",60,30,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Colombia,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Self-employed,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,Very useful,Very useful,,Not Useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,13,4,62,11,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Decreased significantly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Relational data",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,,,Often,,,,Sometimes,Sometimes,,Sometimes,,,,,,Most of the time,,Rarely,Rarely,Rarely,Rarely,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,Rarely,,Rarely,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis",Sometimes,Sometimes,Often,Often,,Most of the time,Most of the time,Often,Most of the time,Often,,,,Most of the time,,Often,,,Often,Most of the time,Most of the time,,,,,,Most of the time,Often,,Most of the time,,,,40,20,10,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,Often,,,Most of the time,Often,Often,Often,,,Often,,,,Often,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git,Other",Most of the time,12000000,COP,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,Primary/elementary school,Other,500 to 999 employees,Decreased slightly,Don't know,A general-purpose job board,Important,,,Text data,Never,,,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,Rarely,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Sometimes,,,,,,,,,,,Often,,,,Often,,,Less than 10% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git,Mercurial",Sometimes,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Self-employed",Google Cloud Compute,,,,"College/University,Conferences,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,RNNs,SVMs","Amazon Machine Learning,Cloudera,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,QlikView,R,SQL,Tableau,TensorFlow",Rarely,,,,Sometimes,,,,,,,,,,,,Most of the time,,Sometimes,,,Most of the time,Often,,,,Often,,,,Most of the time,Sometimes,Most of the time,,,,,,,,,Often,,,Often,Often,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,GANs,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",Often,,Often,Often,,Often,Often,Often,,Often,Often,,,,Often,Often,,Often,,Often,,Often,Often,Often,,Often,,Often,,Often,,,,20,30,10,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Sometimes,Sometimes,,,,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,,26-50% of projects,More internal than external,Business Department,,,,,,,,68000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,54,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Friends network,Kaggle,Online courses",,,,,,Very useful,Somewhat useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,20,40,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,Often,,,,"Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,Often,Sometimes,,,,,Most of the time,Rarely,Often,,,,,,Most of the time,Sometimes,,,Often,,,Sometimes,Often,,,,40,15,15,10,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,,,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Brazil,40,Employed part-time,,,No,Yes,Engineer,Fine,Employed by college or university,Python,Deep learning,Python,Google Search,"College/University,Kaggle,Online courses,Personal Projects",,,Not Useful,,,,Somewhat useful,,,,Somewhat useful,Not Useful,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Professional degree,,I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,United States,55,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,55,Employed full-time,,,No,Yes,Statistician,Poorly,Employed by college or university,Minitab,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,40,10,8,2,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,United Kingdom,54,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Regression,R,Google Search,"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,Very useful,Very useful,Very useful,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,A social science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Company internal community,Online courses,Stack Overflow Q&A",,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,DBA/Database Engineer,University courses,30,20,0,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Decreased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SQL,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Segmentation,SVMs",Often,,Sometimes,,Sometimes,Often,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,Often,Often,,Most of the time,,Sometimes,,,,,,50,15,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Sometimes,Sometimes,,,Sometimes,Often,,,,,Most of the time,,,,Most of the time,Often,,,,26-50% of projects,More external than internal,Standalone Team,"Big Query, feefo",Transforming the data so that it can be used for machine learning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Never,47000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Poland,29,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,30,0,0,0,70,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Survival Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Not Useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer",University courses,10,0,20,50,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Python,QlikView,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,Sometimes,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,Most of the time,,,,,Sometimes,Sometimes,,Sometimes,Most of the time,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis,Other",,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,,Most of the time,,,,Sometimes,Sometimes,,Often,,,,,,,Often,Most of the time,,,25,10,20,25,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,Often,,Sometimes,,,Often,,Sometimes,,,Often,Often,,Often,Sometimes,Often,Often,Often,,76-99% of projects,More internal than external,Business Department,"USDA, NOAA",Availability and necessary context of data needed for predictions,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,130000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Russia,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Tutoring/mentoring",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",15,80,5,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Other,Relational data,Never,,Ensemble Methods,"Julia,MATLAB/Octave,Python,R,Tableau",,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,25,15,0,45,15,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Other,Never,-,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Indonesia,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Stack Overflow Q&A,Textbook,Trade book",,,Very useful,,,,,,,Very useful,,,,Very useful,Very useful,Very useful,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Other,No,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,40,0,0,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United Kingdom,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Amazon Machine Learning,Proprietary Algorithms,R,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Business Analyst,Data Analyst,Researcher,Other",Work,30,30,40,0,0,0,"Recommendation Engines,Reinforcement learning","Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Sometimes,,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Recommender Systems,Simulation",Often,,Sometimes,,Often,,Often,Sometimes,Rarely,Often,,,,Sometimes,,Sometimes,,Rarely,,,,,,Often,,,Sometimes,,,,,,,20,40,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,,,,,Sometimes,,,,,,Often,,Sometimes,,Often,,Sometimes,,26-50% of projects,More external than internal,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Not Useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,20,0,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important +Male,United States,39,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,Government website,"Arxiv,Blogs,Kaggle,Personal Projects",Very useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,10,20,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data",Never,10GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Random Forests,Segmentation,Simulation,SVMs",,,,Often,,Most of the time,Most of the time,,,,,Often,,Often,,,,,,Often,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,,70,10,10,7,3,0,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,,,,,Often,,,,,,,Sometimes,,10-25% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Google Search,"Conferences,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Programmer,Statistician",Self-taught,25,15,40,15,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Most of the time,1TB,Neural Networks,"Google Cloud Compute,Python,R,SQL,TensorFlow",,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Segmentation,Time Series Analysis",Most of the time,,,,,Often,Often,,,,,,,Often,,Sometimes,,,,Most of the time,,,,,,Often,,,,Sometimes,,,,40,40,10,5,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Entirely external,Business Department,TensorFlow,data accuracy (i.e. variables are there but data is not always correct). Also data lags by 1-2 days (not real-time),Other,Company Developed Platform,,Other,Rarely,153000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,30,15,15,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Never,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Simulation",,,Sometimes,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,Most of the time,,,,,,,10,50,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,Census; BLS; IPUMS; DoD data,"Difficult to gain access to some datasets, long bureaucratic process when there is potentially personally identifiable information.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Rarely,80000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,"Google Search,Government website","Arxiv,Blogs,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,Somewhat useful,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,University courses,10,20,50,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Always,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,,,Often,,,,,Often,,,,,,,,Most of the time,,,Rarely,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",Often,,,,Often,Sometimes,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,Often,Often,,,,,Most of the time,,,,,,Sometimes,,,Most of the time,,,,30,15,20,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Rarely,,Sometimes,Most of the time,Often,,,Sometimes,,Sometimes,Often,Sometimes,,Sometimes,,,Most of the time,Most of the time,,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,165000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,44,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Personal Projects,Textbook,Other",Very useful,,,,,,,,,,,Somewhat useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Machine Learning Engineer,Other",Self-taught,50,0,0,50,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,6-10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Most of the time,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Time Series Analysis",Sometimes,,,Often,,Most of the time,,Often,Most of the time,,Rarely,Sometimes,Sometimes,Often,,,Rarely,,,Most of the time,Most of the time,Often,Sometimes,Sometimes,Often,,Sometimes,Often,,Most of the time,,,,40,40,20,0,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,,Sometimes,Often,,,,,,Often,,None,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"300,000",BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Text Mining,Java,"GitHub,Google Search","Blogs,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,,,Somewhat useful,,,Very useful,The Analytics Dispatch Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Management information systems,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Survival Analysis",Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Kaggle,Online courses",Not Useful,,,,,,Very useful,,,,Very useful,,,,,,,,"Data Elixir Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,60,20,0,20,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,Regression/Logistic Regression,"MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Cross-Validation,Logistic Regression,Segmentation",,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,70,0,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,Often,,,,,,,Sometimes,Often,,,Most of the time,,,10-25% of projects,More external than internal,Other,FRED Economic Data,Validation,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Rarely,80000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Spain,40,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Stack Overflow Q&A",,,,,Very useful,,Very useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer",Other,0,90,0,0,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,43,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,edX,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),,"Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Italy,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Other,40+,Master's degree,Yes,Master's degree,Computer Science,,,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,,,,,,,,,,,,,,,, +Male,South Africa,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Anomaly Detection,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,Very useful,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,I prefer not to answer,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler",Work,15,0,50,35,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",I prefer not to answer,Retail,"5,000 to 9,999 employees",Increased significantly,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100TB,"Bayesian Techniques,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,RapidMiner (commercial version),SAS Enterprise Miner,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,Often,Often,,,Often,Most of the time,Most of the time,Sometimes,,,,,Most of the time,,,Most of the time,,Rarely,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Most of the time,Often,Most of the time,Most of the time,Most of the time,,Sometimes,,Often,Most of the time,,,Most of the time,,Often,Often,Often,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",Often,,,,Most of the time,,,,,,Most of the time,,,Often,,,Most of the time,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,,dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Kaggle,Online courses,Podcasts,YouTube Videos",,,,,Very useful,,Very useful,,,,Very useful,,Very useful,,,,,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,0,0,10,50,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Financial,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,Other,Traditional Workstation,Relational data,,,,"Java,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,40,0,0,,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Most of the time,,Most of the time,Often,,,,,,,,,,,,,,Often,Most of the time,,100% of projects,More internal than external,IT Department,Twitter,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Other,Rarely,100000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Other,Perfectly,Employed by government,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",20,40,10,10,20,0,"Computer Vision,Time Series","Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Other,Workstation + Cloud service,Relational data,,10MB,Regression/Logistic Regression,"IBM SPSS Statistics,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,20,40,0,40,0,0,"Enough to code it again from scratch, albeit it may run slowly",Explaining data science to others,,,,,,Sometimes,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Don't know,15600,SVC,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Psychology,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",40,60,NA,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,A tech-specific job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Python,SQL,Tableau",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,,,,Often,Most of the time,Sometimes,Often,Sometimes,,,,,Often,,Often,,Often,Often,,,,Often,Often,,Sometimes,,Often,Often,,,,,40,50,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,Less than 10% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,160000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Japan,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer",Self-taught,60,10,20,0,10,0,Reinforcement learning,"Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Internet-based,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk,Other",,Rarely,,Rarely,,,,,,,,,Rarely,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,Most of the time,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Simulation",,,,Sometimes,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,Most of the time,,,Sometimes,,,,Sometimes,,,,,,,40,40,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,,,Often,,,,,Often,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by college or university,,Deep learning,Python,University/Non-profit research group websites,"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",3-5 years,,Necessary,Necessary,,Necessary,Necessary,Nice to have,,,Necessary,,,,,Traditional Workstation,,PhD,,Doctoral degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Engineer,Predictive Modeler,Researcher",Self-taught,75,0,0,25,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,Very Important,,,Somewhat important,,Very Important,,Very Important,Somewhat important,,,,Somewhat important +Female,Ukraine,24,Employed full-time,,,No,Yes,Business Analyst,Poorly,,Mathematica,Neural Nets,R,Google Search,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Programmer,Software Developer/Software Engineer,Other",Other,100,0,0,0,0,0,,,"Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United Kingdom,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Other,Bayesian Methods,R,GitHub,"Blogs,Online courses,Podcasts,Textbook",,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,Time Series,,High school,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,Other,"Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,Sometimes,,,,,"Data Visualization,Time Series Analysis,Other",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,50,0,15,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,Sometimes,Often,Most of the time,Sometimes,,,Often,,,,,Rarely,Often,,,,Often,,,,76-99% of projects,More internal than external,Business Department,National customs; IEA; power network operators,Knowing the idiosyncrasies of each provider and how to reconcile/calibrate between sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,39500,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Deep learning,C/C++/C#,GitHub,"Blogs,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Programmer,Researcher",University courses,10,0,30,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",I don't know/not sure,Academic,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,,"Bayesian Techniques,Decision Trees,Neural Networks,SVMs","IBM Watson / Waton Analytics,Mathematica,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,SAS Enterprise Miner,SQL",,,,,,,,,,,,,Sometimes,,,,,,,Rarely,,Often,,Sometimes,Often,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Naive Bayes,Neural Networks,Segmentation,SVMs",,Often,,,,,Most of the time,Often,,,,,,,,,,,,Often,,,,,,Sometimes,,Sometimes,,,,,,10,20,5,10,55,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Sometimes,Sometimes,,,,,,,Sometimes,Often,,,,Sometimes,,Often,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,Most of the time,55000,USD,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,Very useful,,No Free Hunch Blog,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,Other,30,0,5,0,0,65,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,Julia,Genetic & Evolutionary Algorithms,C/C++/C#,University/Non-profit research group websites,"Arxiv,Company internal community,Personal Projects,Textbook,Tutoring/mentoring",Somewhat useful,,,Very useful,,,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,,Self-taught,70,10,10,8,2,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,100 to 499 employees,Decreased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Rarely,100MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Random Forests,Simulation,Time Series Analysis",,,Most of the time,,,Often,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,Often,,,,20,20,10,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,,,,,,,Rarely,,Sometimes,,Rarely,,,,76-99% of projects,More internal than external,Other,Regulatory filings.,Subtle errors in the source data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,,GBP,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Denmark,35,Employed full-time,,,Yes,,Business Analyst,,Employed by a company that doesn't perform advanced analytics,R,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Not Useful,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,6 to 10 years,Other,Self-taught,30,0,30,40,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,CRM/Marketing,100 to 499 employees,Stayed the same,Don't know,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Rarely,,"Decision Trees,Regression/Logistic Regression","QlikView,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Segmentation,Text Analytics",,,,,,,Sometimes,Rarely,,,,,,,,Rarely,,Rarely,,,,,,,,Rarely,,,Rarely,,,,,80,5,10,5,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,Less than 10% of projects,Entirely internal,Other,,Unstructured data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,,Rarely,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Other,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites,Other","College/University,Conferences,Kaggle,Personal Projects,Trade book,Other",,,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,,,Very useful,,,"Data Machina Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer",Work,20,0,15,60,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Government,"10,000 or more employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,Often,,,,,,,,,,,,,Most of the time,Often,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,Often,Sometimes,,,Often,Most of the time,Often,,,,,,,Often,Often,,,,,Often,,Sometimes,Sometimes,,Sometimes,,,Sometimes,Most of the time,,,,30,15,5,20,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Limitations of tools,Privacy issues,Unavailability of/difficult access to data,Other",,Often,,,,,,,,,,,Sometimes,,,,Most of the time,,,,Often,Often,26-50% of projects,Approximately half internal and half external,Central Insights Team,Census,Merging and access to distributed data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion,Other",Sometimes,126000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Russia,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Newsletters,Online courses,Textbook,Other",Not Useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,"Data Elixir Newsletter,FastML Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,30,0,10,10,0,"Time Series,Unsupervised Learning",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Traditional Workstation,Other",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,20,20,0,30,30,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,No,Yes,Statistician,Perfectly,Employed by company that makes advanced analytic software,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Laptop or Workstation and local IT supported servers,11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Statistician",Self-taught,50,5,20,10,10,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",No education,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"Arxiv,Blogs,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,"FlowingData Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Researcher",University courses,80,10,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A professional degree,Internet-based,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Rarely,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Sometimes,,,Often,Most of the time,,,,,,,,,Sometimes,,Rarely,Often,,Often,,,,,,Often,,,Most of the time,,,,30,10,15,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,Often,,,,,,,,Sometimes,,,,,,Most of the time,,,76-99% of projects,More external than internal,Standalone Team,weather,"trust data accuracy, completeness","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,175000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,32,Employed full-time,,,No,Yes,Other,Perfectly,Employed by government,Jupyter notebooks,Neural Nets,R,"Google Search,Government website","Company internal community,Kaggle,Personal Projects",,,,Very useful,,,Very useful,,,,,Very useful,,,,,,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Biology,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",45,50,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Brazil,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Julia,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,Fewer than 10 employees,Increased slightly,1-2 years,A tech-specific job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Flume,Java,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,Rarely,,,,,,,,Sometimes,,Most of the time,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,Often,,Most of the time,Most of the time,,,,,Most of the time,,Often,,,,,,,Often,,Most of the time,,,,,,Often,,,,,10,20,10,10,20,30,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,Often,,,Often,,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Git,Rarely,240000,BRL,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +A different identity,Nigeria,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,Amazon Machine Learning,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,30,5,0,20,5,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Microsoft Azure Machine Learning,Python,R,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Ensemble Methods,Logistic Regression,Naive Bayes",,,Often,Sometimes,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,40,30,10,10,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,"80,000",NGN,I am not currently employed,6,,,,,,,,,,,,,,,,,, +Male,Israel,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,33,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other,,,,,,,,,,,,,,,,,,,"KDnuggets Blog,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Github Portfolio,No,Master's degree,,6 to 10 years,"Business Analyst,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,Logistic Regression,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Amazon Web services,Deep learning,R,"Google Search,Government website,University/Non-profit research group websites","Conferences,Online courses,Personal Projects",,,,,Somewhat useful,,,,,,Very useful,Somewhat useful,,,,,,,"FlowingData Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,35,50,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Other,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Hadoop/Hive/Pig,Julia,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),R,Tableau",,,,,,,,,Rarely,,,,,,,Rarely,Sometimes,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Association Rules,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Time Series Analysis",,Rarely,,,,,Often,,Rarely,,,Sometimes,,Rarely,,Sometimes,,,Often,Often,,,Rarely,,,,,,,Sometimes,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,Often,Most of the time,,,Often,Often,,Sometimes,,,Often,Often,,Often,Sometimes,,Sometimes,Sometimes,,76-99% of projects,Entirely internal,IT Department,census; ArcGIS,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,135000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,,,,,"Data Elixir Newsletter,FastML Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Other,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",20,10,0,0,70,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,Australia,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,TensorFlow,Deep learning,Python,Google Search,"Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,65,20,5,10,0,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression",High school,Retail,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Sometimes,,,,,Often,Often,Sometimes,,,,,,,,Often,,Sometimes,Sometimes,Sometimes,Often,,Sometimes,,,,,,,Often,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",Often,,,,Most of the time,,,,,Often,Sometimes,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Most of the time,60000,BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",5,10,0,80,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,Rarely,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,Rarely,,Sometimes,Most of the time,,Most of the time,,,,,Most of the time,,,,,,75,15,0,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",Difficulties in deployment/scoring,,,,Often,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,Understanding the data,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,95000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Kaggle,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,,University courses,15,15,0,60,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,I prefer not to answer,Increased significantly,6-10 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",,,,"Amazon Web services,Python,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,,,,,,,,,4,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,C/C++,,Java,,"Conferences,Kaggle,Tutoring/mentoring",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,Less than a year,Machine Learning Engineer,University courses,0,20,10,60,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,"HMMs,Neural Networks,SVMs","C/C++,Java,NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"HMMs,Natural Language Processing,Neural Networks,Recommender Systems,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Data Science results not used by business decision makers,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Other,SQL,Google Search,"Blogs,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Data Miner,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,90,0,10,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,Other,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,,1TB,Decision Trees,"Microsoft SQL Server Data Mining,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,0,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,,,,,Most of the time,,Often,,,,Most of the time,,Most of the time,,,Often,,,10-25% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Rarely,120000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Australia,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Fine,Self-employed,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Company internal community,Conferences,Official documentation,Personal Projects,Textbook",,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Data Miner,Predictive Modeler,Statistician",Work,50,0,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,,,,,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Regression/Logistic Regression,Other","SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,Sometimes,,,,Sometimes,,Sometimes,,,,Sometimes,,Often,,Most of the time,,,,Sometimes,Often,,Sometimes,,,Often,,Sometimes,,Sometimes,,,,60,20,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Often,Sometimes,,,,,,,,,,,Rarely,,,Less than 10% of projects,Entirely internal,Standalone Team,None,Understanding derived variables and which table to find them,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,,Other,Rarely,310000,AUD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,20,10,30,30,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Internet-based,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,Often,,,Most of the time,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,30,10,10,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,Often,,,,,,,,,,,,Often,Often,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Statistician,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",R,Monte Carlo Methods,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,"Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,,University courses,45,0,14,40,1,0,"Natural Language Processing,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A professional degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Other","Text data,Relational data",Don't know,10GB,"Bayesian Techniques,CNNs,Decision Trees,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Unix shell / awk,Other",,,,Most of the time,,,,,Rarely,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Rarely,Rarely,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Text Analytics",,,Often,Sometimes,,Most of the time,Most of the time,,,,,,,Most of the time,,,,Often,Most of the time,,,,,,,,,,Most of the time,,,,,0,10,50,30,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,Sometimes,,Often,,,,Most of the time,,,,26-50% of projects,More internal than external,Other,Bloomberg;ravenpack,Set the structure ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,Git,,"80,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Web services,Bayesian Methods,Python,Government website,"College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,,,,,,1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,Italy,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by non-profit or NGO,Python,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Arxiv,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Non-profit,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,R",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Often,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,Sometimes,,Often,Often,,Often,Sometimes,,,,,Sometimes,Most of the time,,,,20,40,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Sometimes,Often,,,,,,,Most of the time,,,Most of the time,,,,Often,,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,40000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Poorly,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Self-employed",,,Python,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Researcher,Other",University courses,0,0,20,80,0,0,"Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon)",,A bachelor's degree,Military/Security,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,Other,"IBM SPSS Modeler,IBM SPSS Statistics,Mathematica,Python,R",,,,,,,,,,,Often,Often,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Prescriptive Modeling,Segmentation,Simulation,Other",,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,Sometimes,,,,,,Sometimes,60,30,5,0,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Other",Sometimes,,,,Most of the time,,,,Sometimes,Sometimes,,,,,,Often,,Often,,,,Sometimes,Less than 10% of projects,Entirely internal,Standalone Team,,"All 4 Vs: Veracity, Velocity, Volume and variety","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Most of the time,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Official documentation,Online courses",Very useful,,,,,,,,,Somewhat useful,Very useful,,,,,,,,"FastML Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,0,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important +A different identity,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Anomaly Detection,Python,Google Search,Arxiv,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Researcher",Self-taught,80,20,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Python,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Most of the time,Most of the time,,Most of the time,Most of the time,,Most of the time,,Often,Most of the time,Rarely,Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Rarely,Most of the time,,Most of the time,Rarely,Often,Sometimes,Most of the time,Often,,,,20,40,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,Most of the time,,Most of the time,Sometimes,,,Sometimes,Most of the time,,,,Often,,,Rarely,,,Rarely,Most of the time,,76-99% of projects,Entirely internal,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Indonesia,35,Employed part-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,MATLAB/Octave,Support Vector Machines (SVM),R,GitHub,Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,50,0,30,0,10,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",A doctoral degree,Government,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Important,,Basic laptop (Macbook),,Sometimes,1MB,Neural Networks,"MATLAB/Octave,Microsoft Excel Data Mining,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,"Data Visualization,Simulation",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,20,40,0,40,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Google Search,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Very useful,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,Self-taught,30,60,10,0,0,0,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Female,Spain,34,Employed full-time,,,Yes,,Researcher,,Employed by government,Spark / MLlib,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,Very useful,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,6 to 10 years,Researcher,Self-taught,80,5,5,5,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Government,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,Rarely,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,Often,,Sometimes,,,,Most of the time,,,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Often,,,Often,,,,Sometimes,,,Most of the time,Often,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Git,Never,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,52,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,,"Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Predictive Modeler",University courses,15,35,50,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Other,"5,000 to 9,999 employees",Increased significantly,6-10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression",,,,,,Sometimes,Often,Sometimes,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,Often,,Often,,,,,,,,Sometimes,Sometimes,,,Often,,Sometimes,,Often,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,265000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by non-profit or NGO",Amazon Machine Learning,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,Somewhat useful,Somewhat useful,,Not Useful,"KDnuggets Blog,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Operations Research Practitioner,Other",University courses,20,30,20,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,Tableau,Other",,Rarely,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,Rarely,,,,Sometimes,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",Rarely,Sometimes,,,Sometimes,Sometimes,Often,Sometimes,Sometimes,,,,,Often,Often,Often,,Sometimes,,Sometimes,Often,Often,,Sometimes,,,Often,,Sometimes,Sometimes,,,,50,20,10,15,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,,,,,,,,,,,Sometimes,,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,120000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",University courses,10,5,55,30,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",,1GB,"Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,Often,Often,Sometimes,Often,Often,,,,Sometimes,Often,,Often,,Often,Often,,,,Often,Often,,,,Often,Often,Sometimes,,,,65,10,5,5,15,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,26-50% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Git,Rarely,"100,000",CAD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,63,Retired,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Neural Nets,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Other",,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Engineer,Programmer,Researcher,Statistician",Work,0,0,50,50,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,No,Yes,Other,Fine,Employed by government,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,,No,Bachelor's degree,A social science,1 to 2 years,Other,University courses,10,0,40,40,10,0,"Survival Analysis,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important +Female,Australia,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Statistician",University courses,0,10,20,70,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Java,Jupyter notebooks,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,Often,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,Often,Often,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,Sometimes,,,,30,15,5,20,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Explaining data science to others,Lack of significant domain expert input,Unavailability of/difficult access to data",,Sometimes,,,,Often,,,,,Rarely,,,,,,,,,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Rarely,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Egypt,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,"Employed by college or university,Self-employed",,,,,"Blogs,College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Computer Scientist,University courses,45,10,10,30,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,RNNs,SVMs","C/C++,MATLAB/Octave,Other",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs",,,Most of the time,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,,,15,20,25,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,,,Most of the time,,51-75% of projects,Entirely internal,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,Very useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner",University courses,0,30,0,60,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A master's degree,Insurance,"1,000 to 4,999 employees",Increased significantly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Recommender Systems,Segmentation,Text Analytics",Often,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,,80,10,0,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,ABS; ,database was created without forethought and has many issues,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Never,60000,AUD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Personal Projects",,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Statistician,Other",Self-taught,70,10,10,0,10,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Insurance,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,DataRobot,Jupyter notebooks,Python,SQL",,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Random Forests",Sometimes,,Sometimes,,,,Often,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,10,20,20,20,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Rarely,,,Most of the time,,,,,Often,,,,,Often,Most of the time,,,,,,,,76-99% of projects,More external than internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,250000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,37,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,,,1-2 years,Necessary,,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Predictive Modeler,Statistician",Self-taught,20,10,70,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Company internal community,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,Very useful,,,,,,,,Very useful,Very useful,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,1 to 2 years,"Programmer,Other",Self-taught,50,0,30,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Often,,,,Often,,,,Often,,,Often,,,,Often,,,,Often,Often,Often,Often,,,,50,10,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,Most of the time,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,48000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,MARS,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,20,10,30,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Always,1TB,"Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,Often,,,Often,,,,,,Often,Often,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,Often,,,,Most of the time,Most of the time,,Most of the time,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Often,Most of the time,,Most of the time,Often,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Most of the time,,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,40,20,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Other",Most of the time,Often,,,Most of the time,Most of the time,,,Often,,,,,,,,,,,,,Most of the time,10-25% of projects,More internal than external,Business Department,Card Issuing Bank Data; Credit Bureau ,Dirty Data (such as name of businesses).,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git,Subversion",Never,110000,USD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Online courses,Trade book",,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,,Very useful,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Other",Self-taught,60,40,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Military/Security,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Text data,Rarely,,Decision Trees,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,Often,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Text Analytics",,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,30,0,0,10,0,60,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,Sometimes,Often,,,,,,,,Often,,,,,,,Most of the time,Often,,51-75% of projects,More external than internal,IT Department,Weather Reports; New Sites RSS Feeds; Reddit; FBI records,Unifying the collection processes and applying standardization across the business. ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,71000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Neural Nets,Python,Google Search,"Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,,,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Java,NoSQL,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Natural Language Processing,Neural Networks,SVMs",,,,,,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,Rarely,,,,,,35,20,5,15,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,,,,,Rarely,,,Rarely,,,,,,Rarely,,,,,Less than 10% of projects,Do not know,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Not Useful,,,,,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,More than 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,30,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Other,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Relational data,Never,,Other,"Amazon Web services,Jupyter notebooks,Python,SQL",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,100,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Business Department,None,No time allocated to explore ML algorithms. All study being done on my own after hours. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Bitbucket,Sometimes,"175,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,C/C++/C#,,Other,,,,,,,,,,,,,,,,,,,,5-10 years,Necessary,Necessary,Necessary,,,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,,Master's degree,Computer Science,I don't write code to analyze data,"Data Analyst,Data Scientist,Predictive Modeler",Self-taught,0,0,0,0,0,100,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Not Useful,,Very useful,Not Useful,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,University courses,40,0,30,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Sometimes,Often,,Most of the time,,,Most of the time,"CNNs,Data Visualization,Ensemble Methods,Gradient Boosted Machines,HMMs,Lift Analysis,Logistic Regression,Neural Networks,Random Forests",,,,Sometimes,,,Often,,Sometimes,,,Often,Often,,Sometimes,Often,,,,Often,,,Often,,,,,,,,,,,60,5,5,5,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,Sometimes,,,,,,,,,,,,,,,,Often,,,,51-75% of projects,Entirely external,Other,Osm database. Nhtsa data,Volume,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,240000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Australia,33,"Not employed, but looking for work",,,,,,,,Java,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos,Other",,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,,3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,0,5,10,85,0,0,,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Very useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Statistician,University courses,5,5,20,65,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Often,,,,,,Often,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Text Analytics",Sometimes,,,,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,Sometimes,Often,,,,,,,Often,Sometimes,,Often,,,Sometimes,,,,,15,5,10,5,65,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,Often,Most of the time,Often,,,,,,,Sometimes,,Most of the time,Often,,,,Often,Often,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,150000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,Python,Random Forests,R,,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,Bayesian Techniques,A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by non-profit or NGO,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook",,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,"Data Analyst,Data Scientist,Other",Kaggle competitions,10,10,50,10,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Non-profit,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Cloudera,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,R",,Often,,,Rarely,,,,,,,,,,,,Sometimes,,,,,,Often,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Text Analytics",Sometimes,,,,,,Most of the time,Rarely,,,,,,,,Sometimes,,,,,Rarely,Sometimes,,,,,,,Most of the time,,,,,30,5,5,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,76-99% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,120000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Conferences,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",10-15 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,,Mathematics or statistics,,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Israel,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Monte Carlo Methods,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Very useful,Not Useful,Not Useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher",University courses,75,5,5,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A master's degree,Retail,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Lift Analysis,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,Rarely,,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,Often,Rarely,,Often,Sometimes,Sometimes,,,Rarely,Rarely,Most of the time,Most of the time,Rarely,,,Most of the time,Rarely,Rarely,Most of the time,,,,40,20,0,20,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,Sometimes,,,,,Rarely,,Often,,,,,,,,,Often,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,480000,ILS,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Poland,21,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,"Data Analyst,Data Scientist",University courses,10,45,30,15,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Sometimes,,,Often,,,,Sometimes,,,,,,Often,Often,,,,45,15,15,10,15,0,Enough to explain the algorithm to someone non-technical,"Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,Sometimes,,Often,,,,,Sometimes,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,43200,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",SAS Base,Social Network Analysis,Python,Government website,"Company internal community,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Engineer",University courses,30,10,40,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Other",Other,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft SQL Server Data Mining,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation",Sometimes,,Most of the time,,,,Most of the time,Most of the time,,,,,,Sometimes,Most of the time,Sometimes,Most of the time,Sometimes,,,Sometimes,,Sometimes,,,,Most of the time,,,,,,,60,20,10,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Dirty data,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Most of the time,Most of the time,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,10-25% of projects,Entirely internal,Standalone Team,,"unbalanced population samples, missing data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,"80,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Hungary,25,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Deep learning,Python,University/Non-profit research group websites,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,5,10,85,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,10 to 19 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,100MB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,R",,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,,Sometimes,,,,,,,,,,,,Often,,Often,,,,,,,Rarely,,,,50,30,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,Sometimes,Sometimes,,,Often,,Often,,,,,Most of the time,,Rarely,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,150000,DKK,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Ukraine,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,Other,"Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Analyst,Data Scientist",Work,5,20,70,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Decreased slightly,6-10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests,Segmentation",Often,,Often,,,Often,Often,Often,Sometimes,,,Often,,,,Sometimes,,Sometimes,,,,,Often,,,Often,,,,,,,,30,20,40,5,5,0,Enough to explain the algorithm to someone non-technical,Difficulties in deployment/scoring,,,,Sometimes,,,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,-,-,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Sometimes,-,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Other,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Trade book",,Not Useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",60,10,0,0,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Telecommunications,"10,000 or more employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,1TB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Python,Spark / MLlib,SQL,TensorFlow,Other",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,Sometimes,,,,Rarely,,,Most of the time,,,"CNNs,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Random Forests,Time Series Analysis",,,,Rarely,,,Often,,Rarely,,,Rarely,,Sometimes,,,,,,,,,Sometimes,,,,,,,Rarely,,,,10,10,60,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,,Often,Often,,,,,,,Rarely,,,Often,Sometimes,,100% of projects,Entirely internal,Other,none,improving it,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,aws and s3,Other,Never,188000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Newsletters,Online courses,Personal Projects,Podcasts,YouTube Videos",,,,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,,,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,30,60,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,NoSQL,Python,R,Spark / MLlib,SQL",Sometimes,Sometimes,,,,,Sometimes,,Most of the time,,,,,Often,,,Often,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,Often,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Often,,,,,Most of the time,Most of the time,Often,Most of the time,,,,,,Most of the time,Often,,,Most of the time,,Often,,Most of the time,Most of the time,,Often,,,,,,,,50,10,5,15,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,Often,Most of the time,Often,,,,,Often,,,,,,Sometimes,,,Sometimes,,,100% of projects,More internal than external,Central Insights Team,CITDB;D&B;,Cleanliness of data;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,3200000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,I don't plan on learning a new ML/DS method,R,I collect my own data (e.g. web-scraping),"Conferences,Textbook",,,,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,"Engineer,Predictive Modeler",University courses,60,0,0,40,0,0,,,,Academic,I don't know,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Don't know,,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","C/C++,Java,Mathematica,MATLAB/Octave,R,SAS JMP",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,Often,Often,,,,,,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Logistic Regression,Simulation",,,Sometimes,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,20,40,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,Sometimes,,,Often,Sometimes,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",,,,,7,,,,,,,,,,,,,,,,,, +Male,Israel,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,60,0,0,15,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Romania,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Stayed the same,More than 10 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,60,20,2,16,2,0,Enough to explain the algorithm to someone non-technical,"Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,,,,,,,,,,Often,,,Most of the time,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Greece,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Cloudera,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Other,Other,Other",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,28,0,2,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Often,,Rarely,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,Most of the time,,Often,,,,,,,,Often,Often,,,Often,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Often,,,Often,Rarely,Sometimes,Often,Often,,,,,Often,Often,Often,,,,50,30,5,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,Often,Most of the time,Often,,,Often,,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Always,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,DataRobot,,SQL,,"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,,5-10 years,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,"Coursera,edX,Udacity,Other","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher",Work,20,10,60,10,0,0,Supervised Machine Learning (Tabular Data),,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,India,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Miner,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,0,15,30,50,5,0,"Recommendation Engines,Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A professional degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Microsoft Azure Machine Learning,QlikView,R,SQL,Tableau",,,,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,,,,Often,Often,,,,,,,,,Sometimes,,,Often,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Text Analytics",,Sometimes,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,Rarely,,,,,30,10,0,40,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,"Advisen , SAS ","Dirty data , Data unavaiibility","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,3100000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,45,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Other,11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,30,0,0,20,0,"Computer Vision,Natural Language Processing","Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Linear Digressions Podcast",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Programmer",Self-taught,10,80,0,10,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,,Not important,Not important,Not important,Very Important +Male,Denmark,34,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),Other","Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,Somewhat useful,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Other,Self-taught,70,30,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Financial,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,,Often,,,,,,,,,,,,,Often,Often,,,,,,,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",Sometimes,Often,,,,,,Often,,,,,,,Most of the time,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,800000,DKK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Iran,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Friends network,Stack Overflow Q&A,Textbook",,,Very useful,,Somewhat useful,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Data Miner,Data Scientist,Researcher",University courses,30,10,20,40,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,CRM/Marketing,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,HMMs,Markov Logic Networks,Neural Networks,SVMs","Java,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,Often,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Sometimes,Most of the time,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics",,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,Often,Most of the time,,,Sometimes,Often,Often,Most of the time,Often,,,,,,Most of the time,Sometimes,Often,,,,,40,15,15,10,15,5,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team",Most of the time,,,,Most of the time,Often,,,,,,,,,,Often,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,2000,IRR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Russia,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Programmer,Researcher,Other",Self-taught,20,0,20,15,10,35,"Natural Language Processing,Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks",A master's degree,Other,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python",,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Often,Often,,Often,Often,Most of the time,Most of the time,Most of the time,Sometimes,,Sometimes,,Most of the time,,Most of the time,Often,Most of the time,Often,,Often,Sometimes,Most of the time,Often,,Often,,,Often,Most of the time,,,,20,20,10,30,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Self-employed,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"FastML Blog,FlowingData Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,25,0,40,30,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Image data,Video data,Text data,Relational data",Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,Often,Often,,Often,Sometimes,Sometimes,,,,,,Rarely,,Sometimes,,Sometimes,,Often,Often,,,,,,,Sometimes,,,,,,20,30,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Mercurial,Sometimes,,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Germany,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,"Data Elixir Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,45,5,10,30,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Retail,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Never,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Java,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Time Series Analysis",,,,,,Most of the time,Sometimes,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Rarely,,,,20,60,0,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,Most of the time,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,Less than 10% of projects,Entirely internal,IT Department,,"Information about data is not documented within the company, so it's hard to find out what certain data really represents.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Never,46000,EUR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Russia,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Bayesian Methods,Python,Google Search,"Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Analyst,Data Scientist,Programmer,Researcher,Statistician",Kaggle competitions,20,40,30,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Traditional Workstation",Text data,Most of the time,10MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Recommender Systems,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,50,10,10,20,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,UCI Machine Learning Repository,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,10000,RUB,Has decreased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,Very useful,Somewhat useful,Somewhat useful,,Not Useful,Somewhat useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft Excel Data Mining,QlikView,SAS Base",,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling",Sometimes,Sometimes,,,,,Often,Rarely,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,,,,,,,,10,15,5,20,50,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Often,,Sometimes,Most of the time,Often,,,,,,Often,,,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Never,825000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",60,20,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,,100MB,"Decision Trees,Ensemble Methods,Random Forests","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,Sometimes,,,,,30,20,20,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,,,,Often,,Often,,,Often,,,,,,Often,,,10-25% of projects,Entirely internal,Standalone Team,None,Getting and understanding the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Rarely,50000,INR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Decision Trees,R,Google Search,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Not Useful,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A professional degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Germany,38,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Not Useful,Very useful,,,,,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Other,Kaggle competitions,30,10,10,10,40,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Insurance,"5,000 to 9,999 employees",Decreased slightly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","DataRobot,R",,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Sometimes,Sometimes,,,,,,Sometimes,Often,Often,,,Sometimes,Often,Sometimes,Sometimes,Sometimes,,,,30,10,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Privacy issues",Most of the time,Most of the time,Often,Often,Sometimes,,,Often,,,,,Often,,,,Sometimes,,,,,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,0,AED,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Australia,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,16,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,No Free Hunch Blog,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Necessary,,,"GPU accelerated Workstation,Workstation + Cloud service",40+,Other,Sort of (Explain more),I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,France,36,"Not employed, but looking for work",,,,,,,,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle",Very useful,Very useful,,,,,Very useful,,,,,,,,,,,,,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,70,10,5,10,0,"Computer Vision,Natural Language Processing,Unsupervised Learning",Neural Networks - CNNs,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Japan,50,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,Stan,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",5-10 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Software Developer/Software Engineer,Statistician",Work,60,30,10,0,0,0,Unsupervised Learning,Bayesian Techniques,I prefer not to answer,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Female,Japan,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Deep learning,Matlab,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,Somewhat useful,,,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Researcher",Self-taught,50,30,10,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Other,Sometimes,10GB,Neural Networks,"C/C++,IBM SPSS Statistics,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL",,,,Often,,,,,,,,Rarely,,,,,,,,,Most of the time,Rarely,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,Often,,,,Often,Most of the time,,,,,,,Often,,Most of the time,,,,60,10,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,,,,,,,,,Often,,,Most of the time,,,,Often,,100% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,I collect my own data (e.g. web-scraping),"Blogs,Personal Projects,Textbook",,Very useful,,,,,,,,,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer",Work,35,10,55,0,0,0,,,A doctoral degree,Technology,100 to 499 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,,"Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,Often,,,,"Data Visualization,Natural Language Processing",,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,80,0,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Often,,,,,,,,,,,,Often,Often,,Often,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Other",USB Stick,"Bitbucket,Git",Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,32,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Non-Kaggle online communities,Tutoring/mentoring",,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Other,0,0,0,0,0,100,,,High school,Internet-based,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Never,,,"SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,kNN and Other Clustering",Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,90,0,0,0,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,,,Most of the time,,100% of projects,Entirely internal,Business Department,,Cleaning and munging.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,25000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Poland,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Stan,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,,,Very useful,,,Very useful,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,100 to 499 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests",Sometimes,,,,,Most of the time,Often,Often,,,,,,Sometimes,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,60,15,0,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,,,,,,,Often,,Less than 10% of projects,Entirely internal,Standalone Team,,"missing data, imputation",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,180000,PLN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A health science,6 to 10 years,"Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,NA,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs,Other","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Simulation,SVMs,Other",Sometimes,Sometimes,Sometimes,,,Often,Most of the time,Often,Sometimes,,,,,Often,,Often,,Often,,,,,,Often,,,Often,Rarely,,,,,Often,40,10,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,Often,,Often,,,,,,,,,,Often,,26-50% of projects,Do not know,Central Insights Team,,Storage,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by a company that performs advanced analytics,C/C++,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,Not Useful,,Very useful,,,,Very useful,,,Very useful,,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Statistician",University courses,0,10,0,80,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,10 to 19 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Always,10MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Python,R,SQL,Stan,Tableau,Unix shell / awk",,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,Sometimes,,Most of the time,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Random Forests,Simulation",,,Sometimes,,,Most of the time,Most of the time,,Sometimes,,,,,,,Often,,,,,,,Sometimes,,,,Most of the time,,,,,,,10,10,70,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,,,,,,,,Often,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Hong Kong,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,39,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Other,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100MB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,TensorFlow",,Often,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Logistic Regression,Neural Networks,RNNs",,,,Often,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,Most of the time,,,,,,,,,50,20,5,10,10,5,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,,,,,,,,,Most of the time,Often,Sometimes,Often,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Belgium,17,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Conferences,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,,,,Somewhat useful,,,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,Other",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,,Not important,Somewhat important,Somewhat important +Male,Portugal,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,10,10,30,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",Often,,,,,Often,Often,Often,Often,,,Often,,Often,,Often,,,,Often,Often,,Often,,,Often,,Often,,Often,,,,30,25,10,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Privacy issues",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,Git,Rarely,"25,000",EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Japan,39,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Work,25,0,75,0,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs",High school,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Image data,Video data",Rarely,1GB,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,Tableau,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,"CNNs,Cross-Validation,Neural Networks,RNNs",,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,50,25,0,10,15,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues",,,,,Most of the time,,,,,,,,,,,,Often,,,,,,51-75% of projects,More internal than external,IT Department,Imagenet,huge size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Rarely,"8,000,000",JPY,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,College/University,Personal Projects,Textbook,Other",Somewhat useful,,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Researcher",University courses,30,0,40,30,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,SQL",,,,Sometimes,,,,,,,,,,,Rarely,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,RNNs,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,Rarely,,,,Most of the time,,,,,,Most of the time,Most of the time,Sometimes,,,Often,Sometimes,,Often,,Most of the time,,,,30,20,10,10,30,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,,Often,,,,,,,Often,,100% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Subversion",,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Ireland,36,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler",Self-taught,80,0,0,10,10,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Other,"1,000 to 4,999 employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Orange,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Sometimes,Rarely,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Rarely,,Often,Often,Most of the time,,,,,Often,Sometimes,Sometimes,,,Often,,,,Rarely,,,,50,5,0,20,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Often,,,Most of the time,Often,,,,,,,,,,,Often,Sometimes,,100% of projects,More internal than external,IT Department,Census; weather,Lack of interest and input of the business owners to aid data interpretation ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,85000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,18,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,,,,Very useful,,,Somewhat useful,,Data Machina Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Netherlands,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Factor Analysis,Matlab,"Google Search,Other","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,Somewhat useful,Not Useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,"Researcher,Software Developer/Software Engineer,Statistician",University courses,20,10,20,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Other (please specify; separate by semi-colon),A bachelor's degree,Other,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Always,100GB,"Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Perl,Python,Spark / MLlib,SQL,Other",,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,,Sometimes,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression",Most of the time,,,,Sometimes,Rarely,Often,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,30,30,30,0,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,46000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,Very useful,,Very useful,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",55,30,10,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Text data,Relational data",Most of the time,1GB,"CNNs,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,RapidMiner (free version),SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Rarely,,,,,,,,Most of the time,,,,Sometimes,,,,,,,Often,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Neural Networks,Recommender Systems,RNNs,Simulation",,,,Sometimes,,Most of the time,Most of the time,,,,,Sometimes,,,,Often,,,,Most of the time,,,,Sometimes,Sometimes,,Often,,,,,,,20,30,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",Often,,,,Sometimes,Sometimes,,Often,Most of the time,Often,Most of the time,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,38000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Government website,"Blogs,Company internal community,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Work,70,0,30,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Retail,20 to 99 employees,Increased slightly,,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,1GB,Other,"SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,10,10,40,0,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Scaling data science solution up to full database",,Often,Sometimes,,,,,,,,,,,,,,,Often,,,,,100% of projects,Entirely external,IT Department,none,Understanding and cleaning data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Never,450000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Australia,37,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,"GitHub,Government website",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,DataCamp,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,25,10,0,25,0,"Machine Translation,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,39,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,,Java,GitHub,"Kaggle,Newsletters,Personal Projects",,,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,,,1-2 years,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,"Business Analyst,Data Miner,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Machine Translation,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,19,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Monte Carlo Methods,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Not Useful,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Necessary,Nice to have,,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,0,0,40,0,Survival Analysis,Decision Trees - Random Forests,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Not important +Female,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Random Forests,SQL,GitHub,"Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist",Work,0,0,100,0,0,0,Time Series,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Pharmaceutical,"10,000 or more employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,Regression/Logistic Regression,"Amazon Web services,KNIME (free version),Microsoft Excel Data Mining,Python,SAS Base,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Segmentation,Time Series Analysis",Rarely,,,,,,Often,Rarely,,,,,,,,Sometimes,,,,Rarely,,,Rarely,,,Often,,,,Sometimes,,,,10,30,30,20,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Sometimes,,,,,Rarely,Sometimes,,Often,Most of the time,,Often,,,,Often,Often,Sometimes,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Rarely,1200000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Switzerland,45,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Data Analyst,DBA/Database Engineer,Other",Self-taught,50,30,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Markov Logic Networks",,Internet-based,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees","Amazon Web services,Python,R,RapidMiner (free version),SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,Rarely,,,,,,,Most of the time,,,Most of the time,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,Often,,,Most of the time,Most of the time,Often,,,,,,,,Sometimes,Sometimes,,,,,,,,,Most of the time,Most of the time,,Sometimes,Often,,,,50,20,30,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Often,,,,,,,,,,Often,Often,Most of the time,,,,,Often,,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,125000,CHF,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Denmark,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Singapore,36,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,R,Genetic & Evolutionary Algorithms,R,Other,"Blogs,Kaggle,Personal Projects,Podcasts",,Very useful,,,,,Very useful,,,,,Very useful,Somewhat useful,,,,,,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Work,0,5,95,0,0,0,Time Series,Logistic Regression,A professional degree,Manufacturing,I don't know,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Always,1GB,Regression/Logistic Regression,"IBM SPSS Modeler,IBM SPSS Statistics,Minitab,R,Tableau",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,Often,,Sometimes,,,Often,,,Rarely,Most of the time,,,,10,20,10,10,50,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,Sometimes,Sometimes,,,,,Most of the time,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,Often,Sometimes,,Often,Most of the time,,100% of projects,Approximately half internal and half external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Republic of China,31,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Researcher",Self-taught,40,40,10,0,10,0,"Computer Vision,Machine Translation,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data",Most of the time,10GB,"CNNs,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Random Forests",,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,,Often,,,,,,,,,,,40,10,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Often,,,,,,,,,,Sometimes,,,10-25% of projects,More external than internal,Central Insights Team,NA,Lack enough source data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,500000,YER,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Business Analyst,Fine,,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Researcher",University courses,15,25,40,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,,,,,,Often,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,60,10,0,10,20,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Standalone Team,,,,,,Bitbucket,Sometimes,90000,,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Spain,33,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM SPSS Statistics,Regression,C/C++/C#,I collect my own data (e.g. web-scraping),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Professional degree,,,"Business Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook",,Very useful,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,,Very useful,,,,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,DataCamp,Other,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Malaysia,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Excel Data Mining,Time Series Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,Markov Logic Networks,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Taiwan,40,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,10,10,20,0,0,"Machine Translation,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,SVMs","C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL",,,,Often,,,,,Sometimes,,,,,,Sometimes,,,,,,Often,,Often,,Often,,Often,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Text Analytics",Often,Often,Often,Often,,,,Often,,,,,,,,Often,,Often,,Often,,,,,,,,,Often,,,,,10,20,10,10,50,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,Often,,Often,,,,76-99% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial,Subversion",,,,,9,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,SQL,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,,1 to 2 years,I haven't started working yet,University courses,0,5,0,95,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Self-taught,75,0,0,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Financial,500 to 999 employees,Increased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,Most of the time,,,,,,,,,,,,,,,,Rarely,Rarely,,Often,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,Rarely,Often,,Most of the time,Rarely,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Often,Often,,,Often,Often,Often,,Often,,,,Often,Most of the time,,Often,,Often,,Most of the time,Often,,Often,,,,65,20,5,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,,,,Sometimes,,,Most of the time,,Often,,,76-99% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Other,"Arxiv,Blogs,Conferences,Newsletters,Official documentation,Textbook,Other",Very useful,Very useful,,,Not Useful,,,Very useful,,Somewhat useful,,,,,Somewhat useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,63,2,25,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression","Python,R,SQL,Stan,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,Often,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis,Other,Other",,,Most of the time,,,Sometimes,Most of the time,Rarely,,Sometimes,,Most of the time,,Often,,Most of the time,,,,,Often,,Rarely,,,Often,Sometimes,,,Sometimes,Rarely,Rarely,,50,30,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,Sometimes,,Often,,,Sometimes,Sometimes,Often,Sometimes,,,Sometimes,Most of the time,,Sometimes,,Most of the time,Sometimes,Often,,51-75% of projects,More internal than external,Other,Vehicle Registration Data; Vehicle Sales Forecasts; Economic Indicators,"Domain knowledge for understanding meaning of data: what do the columns mean, what are the descriptions of columns that contain a code, how are values calculated (e.g., cumulative values or differential?), and which filters need to be applied for a particular business question?","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Never,118000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Egypt,23,"Not employed, but looking for work",,,,,,,,Tableau,Deep learning,Python,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,< 1 year,Necessary,Unnecessary,Unnecessary,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,NA,20,10,0,"Machine Translation,Recommendation Engines,Speech Recognition,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,,,,,,Very useful,Very useful,Very useful,,,,Very useful,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,6 to 10 years,"Business Analyst,Software Developer/Software Engineer",Work,20,0,30,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,CRM/Marketing,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Recommender Systems,Segmentation",Often,,,,,Often,,Most of the time,Sometimes,,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,30,40,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,10-25% of projects,Entirely internal,Other,,Varied and dirty,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,125000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,55,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,GitHub,"Arxiv,Conferences,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,,,Somewhat useful,,,,,,Very useful,Very useful,,Somewhat useful,,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,20,40,0,20,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Video data,Always,1GB,"CNNs,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,Sometimes,Often,,Most of the time,,,,,,,,Sometimes,,Often,,,,Most of the time,Often,,,,Often,,,Sometimes,,,,,,40,30,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,,,,,Sometimes,,Most of the time,,,,Often,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git,Subversion",Sometimes,100000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,Other",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Italy,63,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,College/University,Conferences,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Operations Research Practitioner,Self-taught,70,5,20,5,0,0,"Time Series,Unsupervised Learning","Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased significantly,3-5 years,Some other way,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Traditional Workstation",Other,Sometimes,1GB,"Neural Networks,Random Forests,RNNs,SVMs","Amazon Web services,C/C++,Python,R,Unix shell / awk",,Rarely,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,Prescriptive Modeling,Random Forests,SVMs",,,,Sometimes,,,,Often,,Often,,,,Often,,,,,,Often,,Often,Often,,,,,Most of the time,,,,,,20,80,0,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,100000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",40,20,10,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Google Cloud Compute,IBM SPSS Statistics,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Often,,,,Rarely,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,Often,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,Often,,,,,Often,,,,,Most of the time,Most of the time,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Often,Often,,100% of projects,More internal than external,IT Department,,Cleaning data and API Integration ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Sometimes,,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Portugal,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,50,15,10,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Logistic Regression,Random Forests,Text Analytics",Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,,,Often,,,,,,Sometimes,,,,,20,15,10,25,30,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Privacy issues",,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,improper labeling heuristics,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,24000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",0,40,0,50,0,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,20 to 99 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,10MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,Sometimes,Often,,,Often,,Often,,,,,,,,,Often,,,,Often,Often,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,Often,,Often,,Often,,,Often,Often,,,Often,,,,,,,Often,,,,40,5,5,10,40,0,,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process",Often,,,,,,,Often,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,N/A,How the company will use it for business decisions,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,106000,USD,Has stayed about the same (has not increased or decreased more than 5%),,,,,,,,,,,,,,,,,,, +Male,Nigeria,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,0,0,30,70,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Microsoft Excel Data Mining,Python,R,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",,,,Rarely,,Often,Often,Often,,,,,,Sometimes,,Sometimes,,,,,Often,,Sometimes,,Most of the time,,,Most of the time,,Most of the time,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,,Sometimes,,,,Often,,,,Often,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,Rarely,"400,000",NGN,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Egypt,31,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,Google Search,"Blogs,Kaggle",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,30,0,50,0,0,Natural Language Processing,Decision Trees - Random Forests,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,28,"Not employed, but looking for work",,,,,,,,SQL,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Biology,,Other,"Online courses (coursera, udemy, edx, etc.)",8,90,0,0,2,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Somewhat important,Somewhat important +Male,Taiwan,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Official documentation,Online courses",,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,3 to 5 years,"Data Miner,Engineer,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,30,20,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,10GB,Regression/Logistic Regression,"Jupyter notebooks,NoSQL,Python,R,Spark / MLlib",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Often,,,,,,,,Sometimes,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,15,20,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Most of the time,,,Most of the time,,Sometimes,,,Most of the time,,,,,Sometimes,,Most of the time,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,70000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Deep learning,R,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,30,20,10,10,0,"Natural Language Processing,Survival Analysis","Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,100 to 499 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Text data,Relational data",Sometimes,10GB,"Neural Networks,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,Most of the time,,,,Sometimes,Sometimes,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"CNNs,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Often,,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,Often,Sometimes,Most of the time,,Most of the time,,,Most of the time,Most of the time,Often,,,,50,10,10,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,Sometimes,,,,Sometimes,,Often,,Often,Sometimes,,,,51-75% of projects,More external than internal,Central Insights Team,"Angel,CNNIC,IDC...","clean data, data missing ","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,200000,CNY,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Japan,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Stan,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,India,21,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Regression,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Machine Translation,Decision Trees - Gradient Boosted Machines,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Female,People 's Republic of China,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,"GitHub,University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,,,,,Very useful,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,University courses,0,5,0,95,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Other,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by government,C/C++,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,60,0,0,30,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Python,R,SAS Base",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Often,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,,,Often,,,,,,,Often,,,,,,,Sometimes,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues",,Most of the time,,,,,,,Most of the time,,,Most of the time,Often,,,,Most of the time,,,,,,100% of projects,More internal than external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Never,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,KDnuggets Blog,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Researcher",University courses,30,10,5,50,5,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Other,R,Government website,"Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Other",University courses,5,5,40,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Decreased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Most of the time,1MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Simulation,Text Analytics,Other",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,Rarely,,,,Often,20,30,10,10,10,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Most of the time,Most of the time,,,,,,Most of the time,Often,,,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,Judy Diamond Large Group Self-Funded Database; AIS Database; SUSB Census,Matching it to our internal database.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,120000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Finland,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Other,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Friends network,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,Not Useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Not Useful,Not Useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",0,20,0,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Often,Sometimes,,Most of the time,Most of the time,Often,Often,,,Often,Sometimes,Sometimes,,Often,,Sometimes,Often,Often,Most of the time,,Often,,Sometimes,,Often,Sometimes,Often,Sometimes,,,,25,20,10,20,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git,Subversion",Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Online courses,Podcasts",,,,,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,"Linear Digressions Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,10,30,0,40,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Random Forests","Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Random Forests,Segmentation,Time Series Analysis",Sometimes,,,,,Rarely,Most of the time,Often,Rarely,,,,,,,,,,,,,,Often,,,Often,,,,Often,,,,10,20,0,30,40,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,18000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Poland,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,Not very important,Other,Laptop or Workstation and private datacenters,"Relational data,Other",Rarely,,"Decision Trees,Random Forests,Regression/Logistic Regression","Minitab,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Often,Often,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,20,60,0,20,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,51-75% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Stack Overflow Q&A",,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,10,5,10,65,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,,Often,,Sometimes,Most of the time,,Most of the time,Often,,,Often,,Often,,Sometimes,,Often,Most of the time,Sometimes,Most of the time,,Often,Sometimes,Sometimes,,,Often,Most of the time,,,,,50,15,10,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,Often,,Often,,,Most of the time,,,Often,,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,130000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,Very useful,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,5,20,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Text data,,1TB,Regression/Logistic Regression,"C/C++,Flume,Google Cloud Compute,MATLAB/Octave,Python,Unix shell / awk",,,,Sometimes,,,Often,Often,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,HMMs,PCA and Dimensionality Reduction,Time Series Analysis",Often,,,,,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,Most of the time,,,,20,10,20,30,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues",,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,26-50% of projects,More internal than external,Other,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,350000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Friends network,Official documentation,Stack Overflow Q&A",Very useful,Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Operations Research Practitioner,Predictive Modeler",University courses,15,0,25,50,0,10,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Perl,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Often,,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,Often,Most of the time,Most of the time,Often,Most of the time,Often,,Most of the time,,Often,,Often,,,Often,Often,Often,Often,Often,Often,,Often,Often,Often,Often,Often,,,,20,20,40,5,15,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process",,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,"Government, Disclosure, market research, financial, commodity pricing, market prices, real time financial ",Rearchitecting from legacy systems to spark/ modern infrastructure,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,245000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Government website,"College/University,Company internal community,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,Somewhat useful,,Somewhat useful,,,,,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,20,25,50,5,0,"Computer Vision,Machine Translation,Natural Language Processing","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,Less than one year,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Sometimes,1GB,"Neural Networks,SVMs","Java,Jupyter notebooks,KNIME (free version),NoSQL,Python,R,SQL,Tableau,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Rarely,Sometimes,Most of the time,,,,,"A/B Testing,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",Often,,,,,,Most of the time,Sometimes,,,,,,,,,,Rarely,Most of the time,Sometimes,Most of the time,,,,,,,Sometimes,Most of the time,,,,,20,30,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,Sometimes,,Most of the time,,,,Often,,,,,,Often,,,,,,Rarely,,51-75% of projects,Entirely internal,Central Insights Team,Demographic,Not on cloud,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Bitbucket,Sometimes,85000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Nigeria,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Microsoft Azure Machine Learning,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,Very useful,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,"Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,SVMs",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,40,20,0,30,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT",Often,,,,,,,,,,,,,,Often,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,world bank data set,Not much for now,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,"50,000",NGN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,44,"Not employed, but looking for work",,,,,,,,KNIME (free version),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,Tutoring/mentoring,YouTube Videos,Other",Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,,Not Useful,,Somewhat useful,Very useful,Very useful,KDnuggets Blog,10-15 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,0,0,50,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,16-20,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Newsletters,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,Very useful,,,,Very useful,,,,,,Somewhat useful,"Data Stories Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,Researcher,Software Developer/Software Engineer",University courses,10,0,50,40,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs",A master's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Relational data,Other",,,,"Hadoop/Hive/Pig,Java,Perl,Python,SQL,Other,Other",,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,Most of the time,,,,,,,Often,Most of the time,,Association Rules,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,40,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Most of the time,,Most of the time,Sometimes,,,,,,,,Often,,,,,,Most of the time,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,,Sometimes,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer",University courses,0,0,0,0,0,100,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,100GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,Python,SQL,TensorFlow",,,,,,,Sometimes,,Often,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",Sometimes,,,,Often,,,,,,,,,Often,,,,,Often,,Often,,,,,,,,Often,Often,,,,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,,Often,,,Often,,,,,,,Often,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Email",,Bitbucket,,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Chile,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,Less than a year,"Engineer,Other",University courses,0,5,5,70,20,0,,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Other,39,Employed full-time,,,Yes,,Statistician,Poorly,Employed by professional services/consulting firm,Julia,Bayesian Methods,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",50,20,10,20,0,0,,Logistic Regression,A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,100MB,"Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,Often,,,,,Rarely,,,,,Rarely,,,,,,,,,Often,Rarely,Most of the time,,,,,,,,,Rarely,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,Often,Most of the time,,,,,,,Sometimes,,,,,,,Often,,Sometimes,,,Often,Most of the time,,,Sometimes,,,,35,30,10,15,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,,,,Often,,,,Most of the time,Most of the time,,Rarely,,,Often,,,,76-99% of projects,More internal than external,Business Department,can not disclose,lack preparation proccess flexibility,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,18000,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +,,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,IBM SPSS Statistics,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Non-Kaggle online communities",,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Software Developer/Software Engineer,Statistician",Work,30,0,60,0,10,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Gradient Boosting,Logistic Regression,Markov Logic Networks",A doctoral degree,Other,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Gradient Boosted Machines,Markov Logic Networks,Regression/Logistic Regression","C/C++,Mathematica,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,Sometimes,Sometimes,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,,Often,,,Often,,,,10,50,20,5,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Other",Often,,,,,,,,,,,,,,,,,,,,,Most of the time,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Subversion",Sometimes,1500000,RUB,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Personal Projects,Podcasts",,Very useful,,Very useful,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,"Linear Digressions Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Scientist",Other,10,10,25,50,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Text Analytics",Most of the time,,,,,Often,Most of the time,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,,,,,30,15,5,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,,,Sometimes,,,,,,,,Often,,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,147000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,38,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,22,Employed part-time,,,No,Yes,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Not Useful,,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,Jack's Import AI Newsletter,KDnuggets Blog",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Spain,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,"Data Stories Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Statistician",University courses,3,2,5,90,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,CRM/Marketing,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests","Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Decision Trees,Naive Bayes,Neural Networks,Random Forests,Segmentation",,,Often,,,,,Often,,,,,,,,,,Often,,Often,,,Sometimes,,,Often,,,,,,,,20,15,20,20,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,Often,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,100000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,,,Not Useful,Very useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,Data Scientist,Self-taught,40,20,20,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Relational data,Sometimes,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Natural Language Processing,Time Series Analysis",,,,Sometimes,Sometimes,Most of the time,,Often,,,,,,Often,Often,Often,Sometimes,,Often,,,,,,,,,,,Most of the time,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Most of the time,Most of the time,,,Often,,,Most of the time,,,,,Often,,Most of the time,,,,51-75% of projects,More internal than external,Business Department,,Consolidating all the datasets in order to make them useful.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1200000,INR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Philippines,45,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Other,Python,Google Search,"Company internal community,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Researcher,Other",Work,15,0,80,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Decreased significantly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,"Bayesian Techniques,Regression/Logistic Regression","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Often,,,Sometimes,Often,,,,,,,,,Often,,,,,Sometimes,,,,,,Most of the time,,,Often,,,,10,20,0,10,60,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,1000000,PHP,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,43,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,"Programmer,Other",Other,5,0,25,0,0,70,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),High school,Government,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and private datacenters,"Video data,Text data",,,"CNNs,GANs,RNNs","DataRobot,Jupyter notebooks,Perl,Python,R,TensorFlow",,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,Often,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,GANs,Natural Language Processing,Neural Networks,RNNs,Simulation,Text Analytics",,,,Often,,Most of the time,,,,,Sometimes,,,,,,,,Often,Sometimes,,,,,Sometimes,,Often,,Sometimes,,,,,25,0,0,10,0,65,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Most of the time,,,,,Most of the time,,,,,,,,Often,,,,,,,Often,,51-75% of projects,Approximately half internal and half external,Other,None,Obtaining enough of it from the data owners.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,135000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Colombia,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Government website,"Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Very useful,,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Statistician",Work,15,5,60,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Decreased slightly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,KNIME (free version),Microsoft Azure Machine Learning,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,TensorFlow",,,,,,,,,Rarely,,Rarely,Rarely,,,,,,,Rarely,,,Rarely,,,,,,,,,Often,Sometimes,Most of the time,,,,,Rarely,Rarely,,,Most of the time,,,Most of the time,Often,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,50,10,0,25,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",Most of the time,Sometimes,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,,,,,,100% of projects,More internal than external,IT Department,"Government public data sources, Shape files",NDA's,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,60000000,COP,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,DataTau News Aggregator,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",Work,0,20,80,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,6-10 years,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,Rarely,,Most of the time,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",,,,,,,Most of the time,Sometimes,Rarely,,,Often,,Rarely,Sometimes,Often,,Often,Sometimes,,Sometimes,,Often,Rarely,,,,,Sometimes,,,,,35,25,25,10,5,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,87000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Iran,44,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Association Rules,R,,"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,Other,University courses,35,35,0,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,1GB,"Decision Trees,GANs,Neural Networks,Regression/Logistic Regression,RNNs,Other","C/C++,IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,R,RapidMiner (free version),SQL",,,,Sometimes,,,,,,,,Often,,,,,,,,,Often,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,Evolutionary Approaches,GANs,kNN and Other Clustering,Logistic Regression,RNNs,SVMs,Text Analytics",,,,,,Most of the time,,Sometimes,,Sometimes,Sometimes,,,Often,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,10,30,30,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,,,,Often,Sometimes,,,,,Often,,Most of the time,,,,Often,,26-50% of projects,Approximately half internal and half external,Standalone Team,,another language,,Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,25000000,IRR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,42,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Kaggle,Podcasts,Textbook",,,,,,,Not Useful,,,,,,Very useful,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,Programmer,Other",Self-taught,70,0,25,0,5,0,Time Series,,,Telecommunications,"1,000 to 4,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,,"QlikView,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Data Visualization,Prescriptive Modeling,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,Rarely,,,,,,,,Sometimes,,,,60,20,0,10,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,Often,,,,Sometimes,,,,,,,Often,,Often,,,,,,,,Less than 10% of projects,Do not know,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Share Drive/SharePoint,Other",,,Never,30000,AED,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Deep learning,Python,I collect my own data (e.g. web-scraping),"Friends network,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,,,,,,,,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,,Self-taught,90,0,10,0,0,0,,,A master's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Gradient Boosted Machines,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,20,20,0,40,20,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,25000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by non-profit or NGO,Other,Other,Python,I collect my own data (e.g. web-scraping),"Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A",,,,Very useful,,,,,,Not Useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,Engineer,Work,50,0,50,0,0,0,,,A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Important,Other,Laptop or Workstation and local IT supported servers,Other,Rarely,,Other,"Amazon Web services,C/C++,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL",,Rarely,,Often,,,,,,,,,,,Rarely,,,,,Often,Sometimes,,Sometimes,,,,,,,,Often,,,,,,,,,,,Rarely,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,15,5,0,20,60,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,,Often,Often,Sometimes,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Subversion,Sometimes,152000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,Predictive Modeler,Self-taught,30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression",A master's degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,Less than one year,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100MB,"Ensemble Methods,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Python,R,RapidMiner (free version)",,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,,Sometimes,,,,,Sometimes,,,,,,,Often,,,,,Sometimes,,,Rarely,Rarely,,,,60,10,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,,Often,,,,,Often,Often,,,,,,Often,,26-50% of projects,Entirely internal,Other,none,"IT has yet to allow easy access to tables used in SAP, most data we already track has little to do with the problems my team is tasked with solving",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Rarely,"74,800",USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Ireland,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Online courses,Personal Projects",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Manufacturing,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Rarely,1GB,"Neural Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,SAS JMP,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,,,,10,20,20,40,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Unavailability of/difficult access to data",Often,,Sometimes,,Often,,,,,,,,,,,,,,,,Sometimes,,76-99% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,Python,,"Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Professional degree,,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,20,70,0,0,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Germany,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",30,30,20,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,"1,000 to 4,999 employees",Increased slightly,Don't know,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Rarely,,,,,,,,Often,,,,,Often,,Sometimes,,,,,,,,,,,80,0,0,0,20,0,Enough to tune the parameters properly,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,51-75% of projects,,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,,Rarely,55000,EUR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,,University courses,20,10,35,35,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Support Vector Machines (SVMs),Primary/elementary school,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Rarely,10GB,"Regression/Logistic Regression,SVMs","Amazon Web services,NoSQL,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,40,20,0,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Limitations of tools",Often,,,,,,,,,,,,Often,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,38000,BRL,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Other,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Deep learning,Python,Google Search,"Kaggle,Official documentation,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Other",Self-taught,90,0,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,,,Often,,Often,,,,Often,,,Often,,Sometimes,,,Often,,Sometimes,,,,10,60,0,20,10,0,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Often,,100% of projects,More internal than external,IT Department,Public holidays datasets,Formulating it in a suitable format for ML algorithms to have a meaningful output.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Always,4000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Python,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,More than 10 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",15,25,0,50,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Government,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,10GB,Regression/Logistic Regression,"IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,Rarely,Rarely,,,,,Sometimes,,Sometimes,,,,,Most of the time,Often,,,,,,Sometimes,,Often,,,,,Often,Often,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,20,20,10,10,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,Often,Sometimes,,,,Sometimes,,Most of the time,,Most of the time,Often,Most of the time,,Sometimes,,51-75% of projects,Approximately half internal and half external,Other,"Census data, Medicare data sets, epidemiological data sets (NESARC, NHIS, etc), Labor and employment data","Poor documentation, data messiness","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"77,900",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Germany,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,TensorFlow,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Conferences,Kaggle,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,More than 10 years,Researcher,Self-taught,80,0,0,10,10,0,Time Series,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Other,Sometimes,100GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,Sometimes,Sometimes,,Sometimes,Most of the time,,,,,,,,,Often,,,,Sometimes,Sometimes,,,,,Often,,Often,,Most of the time,,,,40,10,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools",Sometimes,,,,Often,,,,Most of the time,Sometimes,,Most of the time,Most of the time,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,Computer memory limitations,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,1000000,CZK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,22,Employed part-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Tutoring/mentoring",,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,,Somewhat useful,,"Linear Digressions Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Other,2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,40,60,0,0,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Belgium,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Machine Learning Engineer,Programmer,Researcher,Statistician",University courses,20,0,25,50,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Academic,I don't know,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Random Forests,SVMs","C/C++,R",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Often,,Often,,Rarely,,,,Sometimes,,,,Often,,,Sometimes,,Most of the time,,,,,,Most of the time,,,,,30,40,15,15,0,0,Enough to refine and innovate on the algorithm,Explaining data science to others,,,,,,Sometimes,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Switzerland,33,Employed part-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Other,Self-taught,30,20,0,0,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Decision Trees,SVMs","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,Rarely,Often,Most of the time,Sometimes,Often,,,,,,,Most of the time,,,Often,Sometimes,Often,,Sometimes,,,,,Sometimes,Often,,,,,60,5,0,25,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,Most of the time,,,Often,,Sometimes,,26-50% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,95000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,,University courses,10,0,20,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A professional degree,Insurance,"1,000 to 4,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft SQL Server Data Mining,R,Tableau",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,Often,Often,,,,,,,Sometimes,,,,,,Sometimes,,,,,60,15,0,15,10,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,Most of the time,Often,Often,,,Often,Often,,Sometimes,,Most of the time,Sometimes,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"65,000",USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,France,42,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by government,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Physics,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,33,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,40,30,10,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Somewhat useful,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Data Analyst,Researcher",University courses,10,0,10,80,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Non-profit,500 to 999 employees,Stayed the same,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,10,10,0,10,70,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,Sometimes,,,Most of the time,,,,,,Often,Often,,,,Often,Often,,51-75% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Rarely,68000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,,"Blogs,College/University,Company internal community,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Operations Research Practitioner",University courses,33,0,33,34,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Microsoft SQL Server Data Mining,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Often,Most of the time,,,,Most of the time,,,,,,,Most of the time,,Sometimes,,,,Most of the time,Sometimes,,Sometimes,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Most of the time,,Most of the time,,,,Sometimes,,Often,,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Often,Most of the time,,100% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Other",Always,150000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,46,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Engineer,University courses,0,30,40,30,0,0,,,High school,Technology,"10,000 or more employees",Increased significantly,Don't know,A tech-specific job board,Important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,,,"Minitab,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,50,0,0,0,50,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Ukraine,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Self-employed",Jupyter notebooks,Support Vector Machines (SVM),,I collect my own data (e.g. web-scraping),"Friends network,Official documentation,Online courses,Stack Overflow Q&A",,,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,6 to 10 years,"Data Analyst,Researcher",University courses,20,50,15,15,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,IBM SPSS Statistics,R,SQL",Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Logistic Regression,Markov Logic Networks,Segmentation",Often,,,,,Often,,Often,,,,,,,,Often,Rarely,,,,,,,,,Sometimes,,,,,,,,30,10,0,10,50,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,Often,,,,,,,Sometimes,Sometimes,,,,,,,26-50% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Most of the time,28000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Argentina,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,Somewhat useful,,,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,10,0,30,60,0,0,"Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,Spark / MLlib",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",Most of the time,Most of the time,,,Often,Often,Often,Most of the time,Most of the time,,,,,Sometimes,,Sometimes,,,,,,,Often,,,Most of the time,,,,,,,,60,20,0,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,Sometimes,Most of the time,Often,,Most of the time,Sometimes,,,,,Most of the time,Most of the time,,Sometimes,Often,Most of the time,,Most of the time,,51-75% of projects,Entirely internal,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Rarely,33000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Philippines,27,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Online courses,Textbook",,Very useful,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Time Series,,A bachelor's degree,Other,"5,000 to 9,999 employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,SQL,Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,Most of the time,Often,"Data Visualization,Prescriptive Modeling",,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,30,0,20,30,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,Most of the time,,76-99% of projects,Approximately half internal and half external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,,533000,PHP,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Retail,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk,Other",,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Often,,,,,,Most of the time,Sometimes,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",Rarely,,,,,Most of the time,Most of the time,,Often,,,,,,,Sometimes,,,,,Rarely,,Often,,,Rarely,,Rarely,,Sometimes,,,,15,10,30,15,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,Rarely,,Sometimes,,Often,,Rarely,Rarely,,Rarely,,,Often,Sometimes,,,Sometimes,Sometimes,Sometimes,,,26-50% of projects,More internal than external,Other,Weather Holidays,"Multiple sources, constantly evolving","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,46000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,SAS Base,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,10,50,0,40,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,Deep learning,R,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Researcher",University courses,20,20,60,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,Sometimes,,,,Most of the time,Most of the time,,Often,,,Most of the time,,Often,,Often,,,Often,,Often,,Most of the time,,,Most of the time,,Often,Most of the time,,,,,55,15,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,Often,,,,,,Often,Often,,,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Most of the time,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,Very useful,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,Researcher,Kaggle competitions,60,0,10,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Manufacturing,"1,000 to 4,999 employees",Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Rarely,1MB,Regression/Logistic Regression,"Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs",,,,,,,Most of the time,Often,,Sometimes,,,,Often,,Often,,,,,Often,Often,Sometimes,,,,,Sometimes,,,,,,20,20,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,Sometimes,,,Often,,,Often,,,,,,,,Most of the time,,,,Often,,10-25% of projects,More external than internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,"Bitbucket,Git",,90000,SGD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Textbook",Very useful,Very useful,Very useful,,,,Very useful,,,,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Government,20 to 99 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Sometimes,1MB,"Bayesian Techniques,CNNs,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Segmentation,SVMs",,,Often,Sometimes,,Often,Often,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,25,0,0,25,50,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,,,,,Sometimes,,,Often,,,,,,,,Often,,,,,,100% of projects,Entirely internal,Standalone Team,None,Segmentation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Never,142000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed part-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,6 to 10 years,"Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Software Developer/Software Engineer,Statistician",Kaggle competitions,60,10,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,10GB,"Decision Trees,Neural Networks","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Minitab,Python,R,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,Rarely,Sometimes,,,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Neural Networks,Prescriptive Modeling,Random Forests",,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,Often,Sometimes,,,,,,,,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,,USD,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Other,36,Employed full-time,,,No,Yes,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Other,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,"Statistician,Other",Self-taught,70,30,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I don't know/not sure,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important +Male,Japan,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,,5-10 years,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,0,0,0,50,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",I don't know/not sure,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Female,United States,32,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects,Tutoring/mentoring",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,Somewhat useful,,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Physics,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Japan,26,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Very useful,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",GPU accelerated Workstation,11 - 39 hours,Other,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,60,15,0,20,0,5,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,India,27,"Not employed, but looking for work",,,,,,,,R,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Friends network,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Necessary,,Necessary,,,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,,Master's degree,Computer Science,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,25,0,40,5,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,45,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",Work,10,0,90,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning",Logistic Regression,Primary/elementary school,Insurance,"10,000 or more employees",Increased slightly,6-10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,Other,"Hadoop/Hive/Pig,SAS Base,Unix shell / awk",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,"A/B Testing,Segmentation,Other",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,70,20,0,0,10,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,Sometimes,,,Often,,,Often,Often,Often,,,Sometimes,,,,Sometimes,,Less than 10% of projects,More internal than external,Central Insights Team,,,Column-oriented relational (e.g. KDB/MariaDB),"Email,Share Drive/SharePoint",,Other,Sometimes,200000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,Other,"Arxiv,Blogs,Online courses",Very useful,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,University courses,10,20,70,NA,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"Ensemble Methods,Gradient Boosted Machines,RNNs","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,Often,Most of the time,,,,,Often,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,Often,,,,,45,5,0,30,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,7000000,JPY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Australia,32,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Stan,Deep learning,R,I collect my own data (e.g. web-scraping),"Arxiv,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,,Very useful,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,,"Amazon Web services,R,SQL,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,SVMs",,,,Rarely,,Most of the time,Most of the time,,,Sometimes,,Often,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,92,1,5,1,1,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,,,Sometimes,Sometimes,Most of the time,Often,Most of the time,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,137000,AUD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other,Other",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",20,30,20,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Russia,20,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses",,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,50,0,30,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,Somewhat useful,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,No,Master's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Personal Projects,Trade book",,,,,Very useful,,Very useful,,,,,Somewhat useful,,,,Very useful,,,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,Data Scientist,Work,35,10,40,15,0,0,Computer Vision,"Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs",,Financial,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Most of the time,100TB,"CNNs,GANs,Neural Networks","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"CNNs,Cross-Validation,GANs,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,Most of the time,,,,,Often,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,5,45,25,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,,Often,,Often,,,,Most of the time,Sometimes,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,IT company',Not easy to get.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,ALL,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,Australia,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,SAS Base,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website",Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Python,SQL",,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,,,,,,Most of the time,Often,,,,,,,,Often,,,Most of the time,,,,Often,,,,,,Often,,,,,0,20,0,60,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data",,,,Often,Often,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Central Insights Team,Melbourne Gov Data,Find the insight,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,50000,AUD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Indonesia,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,MATLAB/Octave,"Ensemble Methods (e.g. boosting, bagging)",Java,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Researcher,Work,15,20,50,10,5,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,CRM/Marketing,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Rarely,,CNNs,"IBM SPSS Statistics,Minitab,R",,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Decision Trees",,,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,50,25,10,10,5,0,Enough to tune the parameters properly,"Lack of significant domain expert input,Need to coordinate with IT,Privacy issues",,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,,,51-75% of projects,More external than internal,Business Department,,,Key-value store (e.g. Redis/Riak),Company Developed Platform,,Git,,5000000,IDR,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Singapore,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,,"Data Stories Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",University courses,30,15,0,50,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Decreased slightly,3-5 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,Sometimes,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Time Series Analysis",,,Often,,,Most of the time,Most of the time,,Most of the time,,,Often,,,,Often,,,,,,,,,,,,,,Sometimes,,,,66,25,4,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,Often,,,Sometimes,,,Most of the time,,,,,,,,,,Often,,Often,,26-50% of projects,More external than internal,IT Department,Google analytics,Cleaning data; some attributes are not defined clearly or misleading,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,"100,000",SGD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,South Korea,48,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Textbook",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,More than 10 years,Other,Self-taught,70,0,20,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Mathematica,MATLAB/Octave,R,SAS Base,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,Sometimes,,,,"Cross-Validation,HMMs,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,,,,,,,Sometimes,Sometimes,,,,,,,Often,,Often,,,,,,,Most of the time,,,,10,50,20,0,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,Often,,,,,,,,,,,,,,Often,Often,,Less than 10% of projects,More internal than external,Other,thompson reuters; local financial data providers,precision; ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"200,000,000",KRW,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,60,15,8,15,2,0,Survival Analysis,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Never,1GB,Regression/Logistic Regression,"C/C++,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation",,,,,,Often,,,,,,,,Often,,Most of the time,,,,,Most of the time,Often,,,,,Most of the time,,,,,,,20,25,25,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Hong Kong,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,TensorFlow,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,YouTube Videos,Other",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,University courses,70,10,0,10,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - CNNs",High school,Other,,,,,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Rarely,10MB,Neural Networks,"Java,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,Sometimes,,,,"A/B Testing,Neural Networks",Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,60,30,0,0,0,10,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Share Drive/SharePoint,Other",Database System,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,10000,USD,I am not currently employed,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Official documentation,Stack Overflow Q&A",,Very useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Engineer,University courses,20,0,20,60,0,0,"Natural Language Processing,Unsupervised Learning",Bayesian Techniques,,Technology,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing",,,Often,,,,Often,,,,,,,Sometimes,,,,Often,Most of the time,,,,,,,,,,,,,,,60,10,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data,Other",Often,,,,Most of the time,,,,Often,Often,,,,,,,,,,,Sometimes,Most of the time,10-25% of projects,Approximately half internal and half external,Standalone Team,ISP datas,preprocessing,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,600000,TWD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,47,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Anomaly Detection,,"Google Search,University/Non-profit research group websites","Conferences,Friends network",,,,,Somewhat useful,Very useful,,,,,,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Data Analyst",Self-taught,90,10,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Male,Netherlands,43,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring",,Very useful,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,10,0,0,10,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100GB,"Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Oracle Data Mining/ Oracle R Enterprise,Orange,Python,R,RapidMiner (free version),Spark / MLlib,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",Rarely,Most of the time,,,Rarely,,Rarely,,Often,,,,Rarely,,Most of the time,,,,Often,,Rarely,Sometimes,,Often,,,,Rarely,Rarely,,Most of the time,,Most of the time,,Rarely,,,,,,Often,,,,Often,Rarely,Sometimes,Most of the time,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,Rarely,,,Sometimes,Sometimes,Most of the time,,,,,,Rarely,Often,,,,Sometimes,Most of the time,Often,Sometimes,,Often,Sometimes,,Often,,,Most of the time,Sometimes,,,,55,20,10,10,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,,Most of the time,Often,Often,,10-25% of projects,More internal than external,IT Department,,preparing data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,55000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Online courses,Stack Overflow Q&A",,,,,,Very useful,,,,,Very useful,,,Very useful,,,,,,< 1 year,,,,,,,,,,,,,,edX,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Taiwan,40,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Matlab,University/Non-profit research group websites,"College/University,Textbook",,,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Traditional Workstation,0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important +Male,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,20,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,SQL",Often,Often,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,Often,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",,Often,Often,,,,Most of the time,Most of the time,,,,,,,,Often,,Often,Most of the time,Often,,,Sometimes,Often,Sometimes,,,,Most of the time,Often,,,,10,40,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,Often,,,Most of the time,Most of the time,Sometimes,,10-25% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git,Subversion",Rarely,800000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,43,Employed full-time,,,No,Yes,Programmer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics,Employed by non-profit or NGO,Self-employed",Microsoft Excel Data Mining,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Stories Podcast,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Business Analyst,Engineer,Operations Research Practitioner,Programmer",Work,30,20,50,0,0,0,Machine Translation,"Logistic Regression,Markov Logic Networks",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,55,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,R,GitHub,"Blogs,College/University,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,,,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,1 to 2 years,"Data Analyst,Data Miner,DBA/Database Engineer,Software Developer/Software Engineer,Statistician",University courses,20,0,50,30,0,0,Time Series,"Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Stayed the same,3-5 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Relational data,Other",Rarely,1TB,"Neural Networks,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Python,R,RapidMiner (commercial version),Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,Often,,,,,,,Often,Most of the time,,,Rarely,Sometimes,,Most of the time,,,,"kNN and Other Clustering,Logistic Regression,Neural Networks,Time Series Analysis",,,,,,,,,,,,,,Rarely,,Most of the time,,,,Rarely,,,,,,,,,,Often,,,,85,10,5,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,Most of the time,Often,,Most of the time,,,Most of the time,Rarely,,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,Less than 10% of projects,More external than internal,IT Department,"IT - Infrastruktur Data, Incident, Problem Change, Events, Performance values",To get the data into Systems There are Beta of Data we schould handle. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Most of the time,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Vietnam,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst",University courses,50,0,25,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,R,SQL,Tableau",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,,Rarely,Rarely,,,Sometimes,,,Most of the time,,Sometimes,Sometimes,Most of the time,,,,25,25,0,25,25,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Rarely,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,census; focus groups; surveys,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Other",Rarely,218000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,New Zealand,NA,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",15,60,0,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Retail,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests",,Sometimes,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,,100% of projects,Entirely internal,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,DataRobot,Link Analysis,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,80,0,0,20,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Survival Analysis",Decision Trees - Gradient Boosted Machines,A bachelor's degree,Technology,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Most of the time,100MB,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,Google Cloud Compute,IBM Watson / Waton Analytics,Julia,NoSQL,Python,R,TensorFlow",Often,Sometimes,,,,,,Often,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"Decision Trees,Text Analytics",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,10,40,30,20,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,Often,Often,,,,Most of the time,,,,,Most of the time,,,,,,,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,Accuracy,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Sometimes,140000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Text Mining,Python,,"Blogs,Company internal community,Personal Projects,Textbook,YouTube Videos",,Very useful,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,,1 to 2 years,Researcher,Self-taught,90,10,0,0,0,0,,,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,South Korea,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,40,10,0,10,20,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,,Often,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Segmentation",Most of the time,Rarely,,,,,,Most of the time,Sometimes,,,,,Sometimes,Sometimes,Most of the time,,,,,,,,,,Often,,,,,,,,0,20,20,20,40,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Sometimes,Sometimes,,Often,,,,,,Sometimes,,,,,Often,Sometimes,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,400000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Russia,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,Self-taught,50,0,50,0,0,0,Time Series,Decision Trees - Random Forests,High school,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Relational data,Rarely,100MB,Decision Trees,"C/C++,KNIME (free version),Microsoft SQL Server Data Mining,Python,R",,,,Often,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Time Series Analysis",,Often,,,,,Rarely,Often,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,10,70,0,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Mercurial",Sometimes,80000,RUB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Pakistan,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Monte Carlo Methods,Matlab,GitHub,"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),,Master's degree,Yes,Professional degree,,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,100,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,India,30,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Other,Time Series Analysis,R,"GitHub,Government website","College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,1 to 2 years,Business Analyst,University courses,10,10,40,25,5,10,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,48,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Text Mining,R,"Government website,University/Non-profit research group websites","Friends network,Newsletters,Online courses,Personal Projects,Textbook",,,,,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,KDnuggets Blog,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,Other,30,10,0,25,10,25,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Analyst,Data Scientist",Self-taught,60,10,30,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,Often,,,,,,,,,Most of the time,,,,,,,Rarely,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Rarely,,,Most of the time,,Rarely,Often,Most of the time,Most of the time,,Most of the time,,Sometimes,Rarely,Most of the time,Most of the time,Sometimes,Rarely,,,,40,40,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Rarely,140000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,South Korea,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Researcher",University courses,60,0,0,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,,,,,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,100MB,"CNNs,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","C/C++,Jupyter notebooks,Python,R,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Time Series Analysis",,,,Often,,Most of the time,Often,Sometimes,Often,Sometimes,,Often,,,,Sometimes,,,,Most of the time,Often,Sometimes,Sometimes,,Often,,,,,Sometimes,,,,30,20,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,Most of the time,,,,,Often,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,IT Department,web data; public domain data,data having inherent poor information; lack of data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,"60,000,000",KRW,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Germany,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,Proprietary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Not Useful,,Very useful,,,,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",Self-taught,60,20,20,0,0,0,"Recommendation Engines,Speech Recognition,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs",,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Text Analytics",Sometimes,Sometimes,Often,,,,Most of the time,Most of the time,,,,,,,,Often,,Most of the time,,Often,,,Most of the time,,,,,,Most of the time,,,,,50,20,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Privacy issues",Often,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Rarely,50000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Poland,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Unix shell / awk,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Newsletters,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,Computer Vision,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data",Rarely,1GB,"CNNs,Neural Networks,RNNs","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,Most of the time,,Most of the time,Often,,,,,,,,,,,,,Often,Sometimes,,,,,Often,,,,,,,,40,40,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,10-25% of projects,More internal than external,Standalone Team,LFW,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,"65,000",,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,48,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,R,Factor Analysis,,,Arxiv,Not Useful,,,,,,,,,,,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,100,0,0,0,0,0,Unsupervised Learning,Logistic Regression,No education,Academic,500 to 999 employees,Decreased significantly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,10MB,Regression/Logistic Regression,"IBM SPSS Modeler,IBM SPSS Statistics,SAS Base",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,"Bayesian Techniques,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Time Series Analysis",,,Rarely,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,Often,,,,,,,,,Sometimes,,,,100,0,0,0,0,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Rarely,,,,,,,Less than 10% of projects,,Standalone Team,,smal,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Never,8000,BAM,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,SQL,Genetic & Evolutionary Algorithms,Python,Government website,"Arxiv,Blogs,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,,Somewhat useful,,,Very useful,Very useful,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,,,Nice to have,Nice to have,Nice to have,Nice to have,,,Nice to have,,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,17,"Not employed, but looking for work",,,,,,,,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,Very useful,,Somewhat useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",55,40,0,0,0,5,"Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Female,Iran,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by government,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Researcher,Other",Self-taught,40,0,0,0,0,60,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","C/C++,Java,Microsoft Excel Data Mining,R,RapidMiner (free version),SQL",,,,Often,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,Sometimes,Sometimes,,Sometimes,Often,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,Often,,Often,Often,,Often,,,,,Often,,,,,,25,25,15,15,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Often,,,,,,Often,,,Often,,Most of the time,,26-50% of projects,Approximately half internal and half external,Other,"teacher, PhD student",dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Other",Most of the time,1800000,IRR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Singapore,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,Jupyter notebooks,Deep learning,C/C++/C#,University/Non-profit research group websites,"College/University,Textbook",,,Very useful,,,,,,,,,,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Master's degree,No,Bachelor's degree,Computer Science,1 to 2 years,,University courses,0,0,30,60,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,Malaysia,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Enterprise Miner,Text Mining,SAS,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Other,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Germany,26,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by government,TensorFlow,Deep learning,Python,GitHub,"Conferences,Stack Overflow Q&A",,,,,Not Useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",Self-taught,60,0,40,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",,Military/Security,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Image data,Video data,Text data,Relational data",Most of the time,,Neural Networks,"Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (commercial version),NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,Often,,,,,,Sometimes,,Rarely,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,"Data Visualization,Decision Trees,Random Forests",,,,,,,Most of the time,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,75,10,5,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Sometimes,,,Often,Most of the time,,,Sometimes,,,,,,,,Often,,Sometimes,,Often,,51-75% of projects,Entirely external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,44000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",Java,Google Search,College/University,,,Very useful,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Researcher,University courses,50,10,0,40,NA,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Other (please specify; separate by semi-colon)",A master's degree,Academic,20 to 99 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Always,10MB,"Ensemble Methods,Other",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Cross-Validation,Ensemble Methods,Other",,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,20,50,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Rarely,,Rarely,Sometimes,,,Most of the time,Often,Often,Often,Rarely,Often,,,Rarely,Often,,Often,Rarely,Often,,Less than 10% of projects,Do not know,IT Department,,,Other,Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,1700,TND,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,35,Employed part-time,,,Yes,,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Time Series Analysis,R,GitHub,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher",Self-taught,60,10,20,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,KNIME (commercial version),KNIME (free version),MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,R,RapidMiner (commercial version),RapidMiner (free version),SQL",,,,,,,,,,,Sometimes,,,,,,,Often,Sometimes,,Rarely,,,Often,,,Rarely,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Sometimes,Most of the time,Sometimes,Often,,,Often,,,,Often,,,,Most of the time,Sometimes,,Often,,,,,Often,,,,,,10,20,10,10,20,30,"Enough to code it again from scratch, albeit it may run slowly",Inability to integrate findings into organization's decision-making process,,,,,,,,Sometimes,,,,,,,,,,,,,,,10-25% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,420000000,IRR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Spain,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Programmer,Researcher",Self-taught,60,40,0,0,0,0,,,Primary/elementary school,Internet-based,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,,,"Amazon Web services,NoSQL,R",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Decision Trees",Sometimes,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,40,15,10,15,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,Often,,,,Sometimes,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Git,Mercurial",Most of the time,50000,EUR,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,65,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Scala,Google Search,"Arxiv,Kaggle,Online courses,Personal Projects",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,10,10,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A master's degree,Financial,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Most of the time,1GB,Decision Trees,"Jupyter notebooks,R,Spark / MLlib,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,Not Useful,Not Useful,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,30,10,0,60,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Image data,Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees,Neural Networks","C/C++,Google Cloud Compute,Java,Python,TensorFlow",,,,Sometimes,,,,Sometimes,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",Rarely,,,Sometimes,,Often,Most of the time,,,,,,Sometimes,,,Sometimes,,Sometimes,Often,Often,Most of the time,,,,,,,Sometimes,,,,,,10,30,5,15,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",Most of the time,Often,,,Often,Sometimes,,,Often,,,Sometimes,Sometimes,,Most of the time,Most of the time,,,,,,,51-75% of projects,Entirely external,IT Department,Ski-learn; tensorflow; matplotlib;,Get enough data from others department.,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,,Python,University/Non-profit research group websites,"Arxiv,Blogs,Friends network,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Somewhat useful,,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher",Self-taught,85,0,15,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,,Most of the time,Often,,,Often,,Sometimes,,Most of the time,,,,Sometimes,Often,,Often,,,,,Often,,Most of the time,,,,40,30,0,0,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,Often,,,,,,,,,,Sometimes,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"450,000",CNY,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Africa,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Self-employed,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,,,Very useful,"Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Recommendation Engines,Reinforcement learning,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Mix of fields,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,,Most of the time,Most of the time,Often,,Most of the time,Most of the time,,,Most of the time,,,,Often,Often,,Often,,,Most of the time,,,,Often,,,,45,45,10,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,Less than 10% of projects,More external than internal,Other,prefer not to say,network latency,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Most of the time,1800000,ZAR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Microsoft Excel Data Mining,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",Sometimes,,Sometimes,Rarely,,Most of the time,Most of the time,Often,,,,Sometimes,,Often,,Often,,Sometimes,Sometimes,,Most of the time,,,,,Often,,,,Often,,,,10,15,30,20,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,Most of the time,,,,,,,,Most of the time,,Often,,,Sometimes,Most of the time,,,26-50% of projects,Entirely internal,Business Department,none,Data cleaning and mining,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,65000,USD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Germany,30,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,3-5 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),,Master's degree,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,70,0,0,30,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Germany,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Not Useful,Somewhat useful,,,Not Useful,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,50,0,15,35,0,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",CRM/Marketing,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Java,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,GANs,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Text Analytics",Sometimes,,Sometimes,Often,,Most of the time,Often,,Sometimes,,Often,,,,Often,Sometimes,,,Most of the time,Most of the time,,,Often,Sometimes,Sometimes,,,,Most of the time,,,,,28,20,50,1,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Often,,,,Most of the time,Most of the time,,,Often,,Often,Sometimes,Sometimes,,,Most of the time,,Most of the time,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Git,Rarely,60000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed part-time,,,Yes,,Researcher,Poorly,Employed by college or university,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,50,5,0,40,5,0,Outlier detection (e.g. Fraud detection),,A master's degree,Financial,20 to 99 employees,Stayed the same,6-10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Never,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,Sometimes,,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,Often,,Often,,Often,Sometimes,,Often,,,,Sometimes,Often,,,,,,20,65,0,5,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,,,,,,Most of the time,Often,,,,,Most of the time,,,Often,Most of the time,,Less than 10% of projects,More external than internal,Other,"kdd99, nsl-kdd, icsx ids 2012, berka",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,shared server,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression,Random Forests",,,,,,,,Often,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,I prefer not to say,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Often,,,,,,Rarely,Sometimes,,,Often,,,,,Sometimes,,,,,,,Less than 10% of projects,More internal than external,Business Department,Online product interests,Using the unstructured data correctly in the analysis step.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Subversion,Sometimes,2600000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Computer Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Rarely,10GB,"Ensemble Methods,Neural Networks","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests",,,,,,Most of the time,Most of the time,Sometimes,Often,,,,,,,Often,,,,Often,,,Sometimes,,,,,,,,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues",Sometimes,Often,,,Most of the time,,,Most of the time,,,,,,Often,Often,,Most of the time,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,106000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Stan,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,,,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,15,15,0,35,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Financial,,,,,Not very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Often,,,Sometimes,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Most of the time,,,Often,Often,Often,,Most of the time,,,,,,Most of the time,,,,,40,20,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Often,,Sometimes,,,Often,Sometimes,,,,,Often,,,,Sometimes,,,,26-50% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,40000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Czech Republic,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Textbook,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Very useful,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,6 to 10 years,"Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Never,,CNNs,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Simulation,SVMs",,,,Sometimes,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,20,20,0,30,30,0,Enough to run the code / standard library,"Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,76-99% of projects,Entirely internal,Standalone Team,imagenet,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,3,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Data Scientist,Researcher",University courses,30,0,40,30,0,0,"Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Neural Networks - RNNs",A bachelor's degree,Telecommunications,20 to 99 employees,Increased slightly,3-5 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,"Bayesian Techniques,Regression/Logistic Regression,RNNs,SVMs,Other","Hadoop/Hive/Pig,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk,Other",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Often,,,,,Rarely,,Most of the time,Most of the time,,,"kNN and Other Clustering,Logistic Regression,Naive Bayes,Prescriptive Modeling,Recommender Systems,RNNs,Simulation,SVMs",,,,,,,,,,,,,,Often,,Sometimes,,Often,,,,Often,,Sometimes,Sometimes,,Often,Rarely,,,,,,20,20,20,0,10,30,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,Sometimes,,,,,,,,,Sometimes,Sometimes,,Most of the time,,Often,,,,10-25% of projects,Entirely internal,Standalone Team,"open street maps, AppTweak",Parallelization,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Rarely,1500000,TWD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Mexico,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",Primary/elementary school,Other,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,RNNs","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,Most of the time,,,,Rarely,,Often,,,,,,,,,Sometimes,,,,,,,Most of the time,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Simulation,Time Series Analysis",,,,,,,Often,Sometimes,,,,Often,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,40,10,40,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,,,,,,,,,,,Often,,,,76-99% of projects,More internal than external,Standalone Team,Weather Underground,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,640000,DKK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Russia,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,70,25,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Don't know,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SAS Enterprise Miner,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,Rarely,,,,,,"Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,,,Sometimes,,,,Often,,,,Often,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,5,60,20,5,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Rarely,10000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Republic of China,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,,"Data Machina Newsletter,FlowingData Blog,Talking Machines Podcast",1-2 years,Necessary,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs",,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,Spain,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Anomaly Detection,Python,Google Search,"Blogs,Personal Projects,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Other",University courses,15,10,15,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,Less than one year,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,10GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R",,Often,,,,,,,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,Often,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,Recommender Systems,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Often,,,,,,Sometimes,,,,Often,,Sometimes,,,,Sometimes,,,,,,Often,,,,15,15,10,15,15,30,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,Sometimes,,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,IT Department,Census or official demographics; Geographic areas; ,To combine it appropriately when data is pulled from different data sources and having different granularities.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,45000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Statistician",University courses,10,0,0,90,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Sometimes,,,Sometimes,Rarely,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Rarely,Sometimes,,Sometimes,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Often,Often,,,,,,,Often,,,Often,Often,,Often,Often,,Often,,,,,26-50% of projects,More internal than external,IT Department,Not Applicable. I am pursuing a Data Science course from University.,"The data collected is not in the desired format. Sometimes, the number of parameters is too large which are not necessary.",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,-,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Ukraine,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,31,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,Very useful,,,,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,"Data Stories Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Miner,Other","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Other",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Often,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,Sometimes,Most of the time,,,,,Most of the time,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",,,,,,Most of the time,Most of the time,Often,,,,,,Often,,Often,,,,,Sometimes,,,,,Sometimes,,,,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,Often,,,Most of the time,,Often,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,Legislative,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,67500,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Africa,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,60,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",High school,Manufacturing,20 to 99 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Image data,Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Segmentation",,,,Sometimes,,Most of the time,Most of the time,Sometimes,,,,,,Often,,,,Sometimes,,,,,,,,Often,,,,,,,,40,25,5,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,,,,,Often,,Sometimes,,,,Often,,76-99% of projects,Entirely internal,Other,None,Ground truthing data is challenging,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,Never,780723,ZAR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Egypt,21,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Proprietary Algorithms,Python,Google Search,"Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,Very useful,,,Very useful,Very useful,Other (Separate different answers with semicolon),1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,I prefer not to answer,,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important +Female,Sweden,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses",,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,40,0,0,0,Natural Language Processing,Logistic Regression,A master's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,,,,"Google Cloud Compute,Jupyter notebooks,Python",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,75,0,0,10,15,0,Enough to tune the parameters properly,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,Often,,,,Often,,100% of projects,Entirely internal,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Rarely,,,Other,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Google Search,Government website","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",,Somewhat useful,,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,Very useful,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician,Other",University courses,70,10,10,5,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","IBM Cognos,IBM SPSS Statistics,IBM Watson / Waton Analytics,R,SQL",,,,,,,,,,Sometimes,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",,,,,Sometimes,Sometimes,Most of the time,Often,,,,,,Rarely,,Often,,,,,Often,,Sometimes,Sometimes,,Most of the time,,,,Sometimes,,,,65,5,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Often,Most of the time,Often,,Often,Often,Sometimes,Often,,,Often,Often,,,,,Often,Sometimes,,76-99% of projects,More internal than external,Business Department,"mint, dnb, forrester, forbes, statistical bureau","merge, clean","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,75000,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Australia,64,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",15+ years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Researcher",University courses,40,10,0,0,50,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +A different identity,Other,30,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Java,Other,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Mathematics or statistics,,"Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Female,India,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,0 - 1 hour,Other,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,University courses,10,0,0,70,20,0,Supervised Machine Learning (Tabular Data),Hidden Markov Models HMMs,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,India,23,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Data Elixir Newsletter,< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,50,20,0,5,0,Survival Analysis,Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ireland,34,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Official documentation,Online courses,YouTube Videos",,,,,,,,,,Very useful,Very useful,,,,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Other,Self-taught,70,30,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Pharmaceutical,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Often,,,,Often,,,,Often,,,Often,,,,,,,Often,,,,40,30,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,Often,,,,Often,,,,Most of the time,,Most of the time,,,,,,Most of the time,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,83000,EUR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Friends network,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Other,Self-taught,20,0,80,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Other,10 to 19 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Julia,R,SQL,Stan,Tableau",,,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,Sometimes,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation",,,Often,,,Often,Often,Sometimes,,,,Sometimes,,,,Often,,Sometimes,,,Rarely,,,,,,Often,,,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,,,,,,,,,,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,processing speed,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,60000,GBP,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Hungary,25,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,Google Search,"College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,University courses,0,0,0,80,20,0,,,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important +Male,Turkey,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Julia,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",35,50,5,10,0,0,"Survival Analysis,Time Series",,"Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10GB,,"Cloudera,Hadoop/Hive/Pig,Impala,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Often,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,Rarely,Most of the time,,,Most of the time,,,Often,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,20,10,10,50,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Most of the time,Most of the time,,Most of the time,Most of the time,Often,,Most of the time,Sometimes,,,,Most of the time,Often,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,100% of projects,Entirely internal,Other,,understanding what the data represents since it's coming from various production sites and different names are used to represent the same variable for example,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Rarely,65000,RON,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,,"Blogs,College/University,Online courses",,Very useful,Very useful,,,,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,Talking Machines Podcast",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer",Self-taught,60,0,20,20,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Australia,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,DataTau News Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Yes,,Computer Science,1 to 2 years,Business Analyst,University courses,30,10,0,60,0,0,Survival Analysis,"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Singapore,40,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,R,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Conferences,Online courses,Textbook",,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,20,20,0,60,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Rarely,1GB,"Bayesian Techniques,Neural Networks","IBM SPSS Modeler,IBM SPSS Statistics,Java,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,Sometimes,Sometimes,,,Often,,,,,,,,Often,,,,,,,,Rarely,,Sometimes,,Rarely,,,,,,,,,,Rarely,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Text Analytics,Time Series Analysis",,,Sometimes,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Often,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Never,150000,SGD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Australia,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,Very useful,,Very useful,Very useful,"DataTau News Aggregator,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,0,20,0,50,20,10,"Natural Language Processing,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Financial,100 to 499 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,1MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft Excel Data Mining,QlikView,R,SAP BusinessObjects Predictive Analytics,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,Often,,,,Often,,,,,,,,Often,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,50,20,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,70000,AUD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Factor Analysis,C/C++/C#,I collect my own data (e.g. web-scraping),"Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,,Very useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher,Statistician",Self-taught,70,29,1,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Military/Security,I prefer not to answer,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Never,100MB,"Decision Trees,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Random Forests,Segmentation",,,,,,Often,Most of the time,,,,,,,,,,,,,,,,Rarely,,,Often,,,,,,,,10,30,0,60,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Most of the time,,100% of projects,Do not know,Central Insights Team,no,"creating the data by my model, that shown an adequate to real data result.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Rarely,1100000,RUB,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,India,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,10 to 19 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Traditional Workstation",Relational data,Sometimes,<1MB,Gradient Boosted Machines,"Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,Often,Sometimes,,,,,,"A/B Testing,Association Rules,Cross-Validation,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,,,,Often,,,,,,Often,,,,Often,,,Often,Often,Often,,,,,Often,,Often,Often,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,,Often,,,,Often,Often,Often,Often,Often,Often,,Often,,Often,,Often,,,Less than 10% of projects,Entirely internal,IT Department,,data cleaning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,975000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Online courses,Textbook,YouTube Videos",,,,,Very useful,,,,,,Very useful,,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,20,20,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Other,Most of the time,1GB,"Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,Often,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,SVMs,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,Most of the time,,Often,,,Often,Often,,,,,,,,Often,,Most of the time,,,,40,20,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data",,Often,Often,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,spark,remove data noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,130000,CNY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Friends network,Kaggle,Personal Projects",,,Somewhat useful,,,Very useful,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,Other,University courses,40,5,40,15,0,0,,Logistic Regression,"Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,100MB,"Decision Trees,Regression/Logistic Regression","IBM Watson / Waton Analytics,R,SAS JMP,Tableau",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression",,Often,,,,,Most of the time,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,20,20,0,35,25,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,51-75% of projects,Do not know,,,,,Email,,,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,Udacity,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Fine arts or performing arts,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,Other,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,No Free Hunch Blog,< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Professional degree,,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,10,0,50,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Canada,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Textbook",,Very useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,1 to 2 years,,University courses,25,0,0,75,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Internet-based,10 to 19 employees,Increased significantly,Less than one year,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1MB,"Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Natural Language Processing,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,50,13,25,2,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Most of the time,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,Normalizing,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,Rarely,94000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,GitHub,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,,Master's degree,Other,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,India,25,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,1-2 years,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,10,0,40,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Sweden,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",NoSQL,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,30,0,0,"Machine Translation,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation",Text data,Sometimes,,,"C/C++,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,10,10,10,70,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,Most of the time,Most of the time,,,Sometimes,Sometimes,,,,,,,,,Most of the time,Often,,,,76-99% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Conferences,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,50,15,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",High school,Government,,,,,Important,,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,,Sometimes,Most of the time,Most of the time,Often,Often,,,Sometimes,,Sometimes,,Often,,Often,Often,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,Often,Sometimes,,,,35,25,20,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Privacy issues",,,,,Often,Sometimes,,,,,,,,,,,Most of the time,,,,,,51-75% of projects,More external than internal,IT Department,Public datasets because of larger volume,quality,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Colombia,25,Employed part-time,,,Yes,,Data Analyst,,Employed by college or university,Microsoft Excel Data Mining,Factor Analysis,R,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Operations Research Practitioner,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",0,0,100,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Other,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Rarely,10MB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,0,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,,,,Sometimes,,Sometimes,,Often,,,,,,,,Most of the time,,100% of projects,Do not know,Business Department,goverment,prepare data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,24000000,COP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,25,10,50,10,5,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Military/Security,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Python,R,RapidMiner (commercial version),SQL,Tableau",,Sometimes,,,Sometimes,,,,Rarely,,,,Rarely,,,,Sometimes,,Rarely,,Most of the time,,,,,,,,,,Sometimes,,Most of the time,Sometimes,,,,,,,,Often,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Often,Most of the time,Sometimes,,,,,,,Most of the time,,,,,,,Often,,,,Most of the time,,,Most of the time,,,,30,30,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Sometimes,Often,Often,,,Sometimes,,,Sometimes,,,Often,,Most of the time,Most of the time,Often,Often,,,76-99% of projects,More internal than external,Standalone Team,Bloomberg,Bad formats,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"77,500",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Online courses,Podcasts",,,,Somewhat useful,Somewhat useful,,,,,,Very useful,,Not Useful,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,25,75,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Impala,Python,R,Spark / MLlib,SQL",,Most of the time,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,,,,,,,,"A/B Testing,Data Visualization,Random Forests,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,,,,,,,,,,,,,Often,Most of the time,Often,Often,Most of the time,,51-75% of projects,More internal than external,Business Department,Census,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,"153,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Italy,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Amazon Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",1,99,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Retail,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,Most of the time,,,,,Often,,,,,,40,28,2,10,20,0,Enough to explain the algorithm to someone non-technical,Inability to integrate findings into organization's decision-making process,,,,,,,,Sometimes,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,Machine learning database ,Prepari and Cleaning data ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,30000,,Other,9,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Predictive Modeler,Work,95,5,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Insurance,500 to 999 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests",,,,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Often,,Sometimes,,Sometimes,,,,,,Sometimes,Often,,,,,,,,,,,25,25,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others",Often,,,,Most of the time,Often,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Other,census bureau; bureau of labor statistics,not in enough detail,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Always,175000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,GitHub,"Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,10,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,Military/Security,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,,,,Rarely,Often,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,Sometimes,,Most of the time,,,,,Rarely,,,Sometimes,Most of the time,,,Often,Rarely,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Prescriptive Modeling,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,,,,,Often,Often,,,,Often,,Most of the time,,,,,,,,Sometimes,,Often,,,Rarely,,Sometimes,Most of the time,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Sometimes,Often,Most of the time,Most of the time,,,Most of the time,,,,,,Most of the time,Often,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,110000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed part-time,,,Yes,,Programmer,Fine,Employed by college or university,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Very useful,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Other,University courses,25,10,5,60,0,0,,,A doctoral degree,Academic,I don't know,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Sometimes,<1MB,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,50,0,40,5,5,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,Sometimes,Most of the time,,,,,,,,,,,Most of the time,Often,,,Sometimes,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,sanitization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Brazil,45,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,30,30,30,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Always,10MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Flume,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,Often,,Most of the time,Sometimes,Most of the time,,,,Sometimes,Most of the time,Most of the time,Sometimes,,,,30,30,10,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Sometimes,,Often,,Sometimes,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,,180000,BRL,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,SQL,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Trade book",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,,Self-taught,30,20,20,30,0,0,,Logistic Regression,A master's degree,Academic,I don't know,Increased slightly,,Some other way,Somewhat important,Other,Laptop or Workstation and local IT supported servers,"Image data,Other",,10MB,Regression/Logistic Regression,"MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Logistic Regression,Other",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,50,20,0,15,15,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,50000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,South Africa,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Other,University courses,15,25,55,5,0,0,"Time Series,Unsupervised Learning",Logistic Regression,"Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Stayed the same,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,R,Spark / MLlib,SQL",,Sometimes,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Simulation",,,,,,,Often,,,,,,,,,Sometimes,,,,,,Often,,,,,Often,,,,,,,65,10,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,Rarely,,76-99% of projects,More external than internal,Standalone Team,Various clients' data,Communication between myself and the client with regard to understanding what the data actually represents.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,228000,ZAR,Other,8,,,,,,,,,,,,,,,,,, +Female,Other,24,Employed part-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,R,Support Vector Machines (SVM),Python,,"Blogs,College/University,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,"FastML Blog,Linear Digressions Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,10MB,,"Amazon Web services,Python",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,70,5,5,20,0,0,,Lack of significant domain expert input,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,Less than a year,"Data Analyst,Other",University courses,10,50,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,,Often,,,Sometimes,Sometimes,,,,30,0,30,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,,,,Sometimes,,,,Often,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,360000,CZK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Africa,33,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Engineer,Other",Self-taught,85,10,5,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Java,Mathematica,Microsoft Excel Data Mining,Minitab,Python,QlikView,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,Most of the time,,,Rarely,,,,,Sometimes,Rarely,Most of the time,,Often,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,Sometimes,Sometimes,,,Most of the time,Most of the time,Often,Often,Sometimes,,Sometimes,,Sometimes,,Sometimes,,Sometimes,,Often,Sometimes,Sometimes,Often,Sometimes,,Sometimes,Often,Often,,Sometimes,,,,50,15,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Sometimes,,,,Sometimes,,Most of the time,Most of the time,,,,,,Most of the time,,Often,Sometimes,,,Sometimes,,76-99% of projects,Entirely internal,Other,None,Measurement noise - measured directly from instruments in petrochemical environment,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,850000,ZAR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Monte Carlo Methods,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Not Useful,Not Useful,Very useful,,Somewhat useful,,,Not Useful,,"Data Machina Newsletter,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",65,10,10,15,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Survival Analysis,Time Series","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,Always,1PB,"Decision Trees,HMMs,Markov Logic Networks,RNNs","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP",,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,Often,,Most of the time,,,,,Often,Sometimes,Often,,,,,,,,,,,,"Data Visualization,HMMs,Lift Analysis,Logistic Regression,Simulation,SVMs,Time Series Analysis",,,,,,,Sometimes,,,,,,Most of the time,,Rarely,Often,,,,,,,,,,,Rarely,Often,,Sometimes,,,,25,40,25,5,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,,,Rarely,,,,,,Most of the time,Rarely,,,,76-99% of projects,More internal than external,Standalone Team,"NHTSA databases (NASS CDS, GES, FARS, NMVCSS)",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,Germany,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,"DataTau News Aggregator,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,75,12.5,0,2.5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,Sometimes,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Often,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Often,,,,Often,,,,,,Often,Often,,,Often,,,Often,Often,,,,60,20,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,I prefer not to say,Lack of significant domain expert input,Limitations of tools",,,,,Often,,Often,,,,Sometimes,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Never,50000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book",,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,"Data Machina Newsletter,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,20,50,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",Often,,,,,Often,Most of the time,,,,,,,Sometimes,,Most of the time,,,Often,,Sometimes,,Sometimes,,,,Rarely,,Often,,,,,30,10,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,Sometimes,,,,,,,Often,,100% of projects,Entirely internal,Standalone Team,None,Our data comes from our users (real people) and thus can be hard to interpret. Real life messy data!,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,135000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"FlowingData Blog,The Analytics Dispatch Newsletter",1-2 years,,,Nice to have,,Necessary,,,Nice to have,,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,,,Work,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,No,Yes,Data Analyst,,Employed by government,IBM SPSS Statistics,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,Not Useful,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",3-5 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,A humanities discipline,I don't write code to analyze data,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,20,20,50,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Canada,39,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,Other,Other,Python,University/Non-profit research group websites,"Newsletters,Personal Projects,Podcasts,YouTube Videos,Other",,,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,Somewhat useful,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Researcher",University courses,0,0,0,100,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,20 to 99 employees,Decreased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Other,Most of the time,100GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Amazon Web services,C/C++,Python,SQL",,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Prescriptive Modeling,Segmentation,Other",,,,,,Most of the time,Sometimes,Most of the time,Most of the time,,,,,,Sometimes,,,,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,0,0,0,0,10,90,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Other",Often,,,,,,,,,Sometimes,,Often,Often,,,,,,,,,Most of the time,Less than 10% of projects,Entirely external,Standalone Team,DMP (data management platform),Managing huge quantities of data; reaching the third party stakeholder to fix bugs and improve the overall user experience.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,Amazon Web Services,Git,Rarely,"100,000",CAD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Switzerland,51,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Google Search,"Blogs,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler",Work,30,10,60,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Rarely,,,Sometimes,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,Often,Often,,,,Sometimes,,,Most of the time,Sometimes,,Sometimes,,Sometimes,Sometimes,,,,,35,15,10,15,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Rarely,Sometimes,,,,,,,Sometimes,,Sometimes,,Often,,,,Sometimes,,100% of projects,Do not know,Business Department,",",",","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,170000,CHF,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Russia,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,40,10,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,SVMs","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Most of the time,,Sometimes,,,Often,Most of the time,Often,,,,,,Sometimes,,,,Rarely,,,,,Most of the time,Sometimes,,,,,Most of the time,Sometimes,,,,30,5,30,20,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Unavailability of/difficult access to data",,Sometimes,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,,Most of the time,,26-50% of projects,Entirely internal,IT Department,ruscorpora,dirty datasets,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,messenger,Bitbucket,,1300000,RUB,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Belgium,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,Talking Machines Podcast",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,7,10,0,80,3,0,Supervised Machine Learning (Tabular Data),,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,Russia,33,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,,Very useful,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,15,10,25,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Internet-based,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,,"Regression/Logistic Regression,RNNs","Google Cloud Compute,Python,SQL",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,RNNs,Text Analytics",,,,,,Often,,Often,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,,,,5,5,5,0,85,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Sometimes,"2,000,000",RUB,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,,,,,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,50,20,10,20,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,10 to 19 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,1GB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Text Analytics",Sometimes,,Often,,,Often,Most of the time,,Sometimes,Sometimes,,,,Sometimes,Sometimes,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,,,,,10,40,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,Sometimes,,,,Sometimes,,,Often,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Bitbucket,Most of the time,"45,000",BRL,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Belarus,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,Neural Nets,Python,Google Search,"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Adversarial Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Python,Support Vector Machines (SVM),R,Google Search,"Blogs,Online courses",,Somewhat useful,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",10,10,70,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Angoss,MATLAB/Octave,Python,R,SQL,Tableau,Unix shell / awk",,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,Rarely,,,,Sometimes,Most of the time,,Sometimes,,,,,,Most of the time,Most of the time,,,,Sometimes,,Often,Sometimes,,,Most of the time,,,,Rarely,,,,70,15,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,Sometimes,Often,,,Rarely,,,Sometimes,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Rarely,80000,CAD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,RapidMiner (free version),Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Data Analyst,Work,30,30,30,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,GANs,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),SQL,Tableau",Sometimes,Often,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,Most of the time,,Sometimes,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,Rarely,Most of the time,Sometimes,Sometimes,,,,,,Often,Often,,,Most of the time,,,Rarely,Sometimes,,,,,,Most of the time,,,,,60,20,5,10,5,0,Enough to tune the parameters properly,"Dirty data,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,Often,,,,,,,,,,,Often,Often,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,85000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,17,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,No,I prefer not to answer,"Information technology, networking, or system administration",,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Neural Nets,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,Data Analyst,University courses,0,30,20,50,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A professional degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,10GB,Regression/Logistic Regression,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,"A/B Testing,Lift Analysis,Logistic Regression,Prescriptive Modeling",Sometimes,,,,,,,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,,,,,,,,,30,30,10,15,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Often,Most of the time,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,None,reliability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,"90,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Spain,27,Employed part-time,,,Yes,,Other,Poorly,Self-employed,Other,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,Not Useful,Very useful,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,3 to 5 years,"Data Miner,Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",90,5,0,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",I prefer not to answer,Pharmaceutical,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Java,Microsoft Excel Data Mining,Python,R",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Natural Language Processing,Random Forests",,,Sometimes,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,30,0,5,5,30,30,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Often,Most of the time,,,,,,,,Rarely,,,Sometimes,Sometimes,,,Often,,,76-99% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Most of the time,,LKR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Brazil,25,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,3 to 5 years,"Researcher,I haven't started working yet",University courses,30,10,10,40,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,Google Search,"College/University,Online courses,Other,Other",,,Not Useful,,,,,,,,Very useful,,,,,,,,"FastML Blog,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,40,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Rarely,,,,,,,,,,,Sometimes,Sometimes,,,,,Often,,,10-25% of projects,Approximately half internal and half external,IT Department,weather,data dropouts and accuracy,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,"138,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Other,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Monte Carlo Methods,R,Google Search,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Somewhat useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Other",Other,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A bachelor's degree,Retail,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,10MB,Regression/Logistic Regression,"IBM Cognos,R,TIBCO Spotfire",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,50,10,10,25,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Most of the time,,,,,,,,Often,,,,,,,,,,76-99% of projects,More internal than external,Business Department,Banks,"Knowledge, computers","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Sometimes,"32,000.00",,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,Canada,33,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Somewhat important +Male,United States,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Python,Text Mining,R,Government website,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,"FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Predictive Modeler",University courses,35,40,0,25,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,"10,000 or more employees",,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes",,,Sometimes,,,Often,Most of the time,Often,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,55,17,0,23,5,0,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,76-99% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Most of the time,35000000,COP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst",University courses,10,0,20,50,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Video data,Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks",,,,Often,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Female,United States,42,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","College/University,Kaggle,Online courses",,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,University courses,0,20,0,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important +Female,Other,42,Employed full-time,,,No,Yes,Computer Scientist,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Textbook",,,,,,,Very useful,,,,,,,,Somewhat useful,,,,KDnuggets Blog,1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Computer Science,1 to 2 years,Computer Scientist,University courses,20,10,20,50,0,0,Natural Language Processing,Bayesian Techniques,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important +Female,United States,53,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Trade book",,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Statistician,Other",University courses,25,0,25,50,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Decision Trees,HMMs,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,Rarely,Rarely,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis,Other",Sometimes,,Sometimes,,,Most of the time,Most of the time,Often,Sometimes,,,,Sometimes,Sometimes,Often,Often,,,Sometimes,,Sometimes,,Often,,,Often,,Rarely,Often,Sometimes,Most of the time,,,40,20,10,10,20,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,Sometimes,Often,,,,,,,,,Often,,,Often,,,Often,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Italy,50,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Textbook",,Somewhat useful,,,Somewhat useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Operations Research Practitioner",Work,50,20,30,0,0,0,,,High school,Financial,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,Decision Trees,"IBM Cognos,MATLAB/Octave,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,SAS Base,SQL",,,,,,,,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,"Decision Trees,Simulation",,,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,30,30,15,10,15,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Spain,47,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,Data Analyst,Work,10,0,90,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Gradient Boosted Machines,Primary/elementary school,Financial,"10,000 or more employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,Neural Networks,"Oracle Data Mining/ Oracle R Enterprise,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,Association Rules,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,70,10,10,0,10,0,Enough to explain the algorithm to someone non-technical,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Often,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed part-time,,,No,Yes,Other,Fine,Employed by college or university,Python,Time Series Analysis,R,"University/Non-profit research group websites,Other","College/University,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Other",University courses,20,0,0,80,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Neural Networks - RNNs",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Jupyter notebooks,,Python,I collect my own data (e.g. web-scraping),"Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Psychology,1 to 2 years,"Researcher,Other,I haven't started working yet",Self-taught,40,30,15,15,0,0,,,A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,Some other way,Somewhat important,Other,Basic laptop (Macbook),Other,Never,<1MB,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression",Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,70,25,0,5,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Sometimes,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,10-25% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by college or university,Employed by non-profit or NGO",Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Researcher,Software Developer/Software Engineer",University courses,40,20,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Hidden Markov Models HMMs,A bachelor's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,"HMMs,Regression/Logistic Regression","Jupyter notebooks,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,Often,,,,"Association Rules,Cross-Validation,Data Visualization,HMMs,Neural Networks,PCA and Dimensionality Reduction",,Often,,,,Often,Most of the time,,,,,,Often,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,40,10,4,18,28,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues",,,,,Often,,,,Sometimes,,,,Often,,Often,,Sometimes,,,,,,76-99% of projects,More external than internal,Standalone Team,Gene Expression Omnibus,Lack of well-defined patterns,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,94000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,More than 10 years,"Data Scientist,Researcher,Statistician",Work,20,0,80,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"5,000 to 9,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Ensemble Methods,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,Most of the time,,,,,,Often,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics,Time Series Analysis",Often,,,,Sometimes,Most of the time,Most of the time,,Sometimes,,,,,Most of the time,,Most of the time,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,,Most of the time,Sometimes,,,,20,50,15,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Sometimes,Most of the time,,,,Often,,,,Often,Sometimes,,,,Often,Often,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Sometimes,201000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Programmer,Poorly,Employed by government,KNIME (free version),Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Blogs,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Work,60,10,20,10,0,0,Time Series,Logistic Regression,A master's degree,Government,100 to 499 employees,Stayed the same,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,Evolutionary Approaches,"R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,10,5,30,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Often,Often,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,Often,,Most of the time,,,,51-75% of projects,More external than internal,IT Department,recorder of Deeds,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,96,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Taiwan,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Canada,30,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by non-profit or NGO,Python,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,"Arxiv,College/University,Stack Overflow Q&A",Very useful,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Researcher",University courses,0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Text data,Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Java,NoSQL,Python,R,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Rarely,,Often,,,,,,,,,,,,,,,,Rarely,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,55,15,10,10,10,0,Enough to run the code / standard library,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,,100% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data,Other",internal cloud,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,60000,CAD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",4,95,0,0,1,0,"Supervised Machine Learning (Tabular Data),Time Series",,A professional degree,Technology,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,,"Jupyter notebooks,Microsoft Azure Machine Learning,R,Spark / MLlib",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,30,0,25,5,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,100% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Sometimes,135000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,Amazon Machine Learning,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Other,University courses,0,100,0,0,0,0,Time Series,,"Some college/university study, no bachelor's degree",Internet-based,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,,,"Amazon Web services,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Data Visualization",Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Standalone Team,,,,,,Bitbucket,Always,180000,USD,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,50,0,20,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important +Male,United States,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,75,5,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Africa,27,Employed full-time,,,No,Yes,Other,Fine,Employed by non-profit or NGO,R,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,,"Online courses (coursera, udemy, edx, etc.)",NA,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Friends network,Online courses,Personal Projects",,,,,,Somewhat useful,,,,,Very useful,Somewhat useful,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,10,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A master's degree,CRM/Marketing,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Cloudera,NoSQL,Python,QlikView,R,SAS Base,SAS Enterprise Miner",,Sometimes,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,Sometimes,Most of the time,,,,,Often,Often,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",Often,Sometimes,,,,Often,Most of the time,Rarely,,,,,,,Sometimes,Often,,,,,Sometimes,Sometimes,,,,Sometimes,Rarely,,,Sometimes,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,Most of the time,Sometimes,,,,,Sometimes,Often,,,Sometimes,Sometimes,Most of the time,,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"73,000",GBP,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,Python,Cluster Analysis,R,"Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Not Useful,Somewhat useful,Somewhat useful,Very useful,Not Useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A humanities discipline,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Other",Self-taught,45,5,25,25,0,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Government,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Other",Relational data,,1MB,,"Python,R,SAS Base,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,Often,,,,Often,,,,,,Often,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,10,0,10,5,10,65,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,Less than 10% of projects,Entirely internal,Standalone Team,All open government data,"Cleaning it, line by line","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Sometimes,48000,GBP,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Textbook",,Somewhat useful,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Programmer,University courses,30,30,30,10,0,0,Time Series,Other (please specify; separate by semi-colon),High school,Mix of fields,500 to 999 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10GB,,"Minitab,Python",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Time Series Analysis",Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,10,10,30,10,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Never,"120,000",,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Belarus,20,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,,,,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,,Self-taught,65,20,5,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,Often,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,Sometimes,,Often,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Often,Often,Often,,Sometimes,,Rarely,,Sometimes,,,Most of the time,,Sometimes,,Often,,,Often,,Sometimes,Most of the time,Sometimes,,,,10,35,15,10,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,Sometimes,,Sometimes,Most of the time,Sometimes,,,Most of the time,,Often,,,,,Sometimes,Most of the time,,100% of projects,Entirely internal,Other,,Poor data infrastructure ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Other,Rarely,,,,4,,,,,,,,,,,,,,,,,, +Male,Brazil,40,"Not employed, but looking for work",,,,,,,,Oracle Data Mining/ Oracle R Enterprise,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,Somewhat useful,Very useful,Data Stories Podcast,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,DataCamp,Gaming Laptop (Laptop + CUDA capable GPU),,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important +Male,Russia,34,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,,Somewhat useful,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Other",Kaggle competitions,20,30,10,0,30,10,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Financial,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",,1GB,,"Oracle Data Mining/ Oracle R Enterprise,QlikView,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,Often,Sometimes,,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Segmentation,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,,10,0,0,20,70,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,Sometimes,,Sometimes,Often,,Often,,,,,,,,,,,Often,,,,100% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,480000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Argentina,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,20,15,40,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",High school,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Always,100GB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Often,,,Rarely,Most of the time,,Most of the time,,,,"A/B Testing,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,,,Often,Often,,,,,,,Often,,,,Most of the time,Often,,,,,Sometimes,Often,,,Often,,,,20,30,40,0,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,Often,,Sometimes,,,Sometimes,,,,,Most of the time,,,,,,,,,None,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,504000,ARS,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,44,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,IBM SPSS Modeler,Regression,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,A humanities discipline,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Mexico,48,Employed full-time,,,No,Yes,Other,Fine,Employed by company that makes advanced analytic software,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,edX,Laptop or Workstation and local IT supported servers,0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst",University courses,20,0,20,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Other,"10,000 or more employees",Stayed the same,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100TB,"Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Gradient Boosted Machines,Logistic Regression,SVMs",Often,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,30,10,10,10,40,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Mexico,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Retail,Fewer than 10 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Python,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Segmentation,SVMs,Time Series Analysis",Often,Often,,,,Often,,Often,Sometimes,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,Often,,,,60,20,0,0,20,0,Enough to refine and innovate on the algorithm,Difficulties in deployment/scoring,,,,Most of the time,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,risk management,innovation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,35000,MXN,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,IBM SPSS Modeler,Text Mining,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Company internal community,Kaggle,Online courses,YouTube Videos",,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Researcher",Work,10,20,60,10,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,,1MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Python,R,Tableau",,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,Rarely,,,,,Sometimes,,Often,,,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Text Analytics",Often,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,20,5,25,25,25,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",,,,,,,,Often,,,,,,,Most of the time,,,,,,,,10-25% of projects,More internal than external,IT Department,,Access and permission from stakeholders ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,73500,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Textbook",Very useful,Very useful,,,Very useful,,Very useful,,,,,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Programmer,Researcher,Statistician",University courses,30,40,0,30,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Python",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis",Sometimes,,Often,,,Most of the time,Most of the time,,,,,,,Sometimes,,Most of the time,,Often,,,Sometimes,,Sometimes,Often,,Often,Often,,,Sometimes,,,,50,30,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Lack of significant domain expert input",,,,,Often,Sometimes,,,,,Sometimes,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,80000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Necessary,,,,Udacity,Traditional Workstation,11 - 39 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,47,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Doctoral degree,,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Researcher,Poorly,Employed by government,R,Genetic & Evolutionary Algorithms,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Government,500 to 999 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,100MB,Regression/Logistic Regression,"IBM SPSS Statistics,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Data Visualization,Logistic Regression",,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,Often,,,Most of the time,Most of the time,Sometimes,Most of the time,Most of the time,,Most of the time,Most of the time,Often,Most of the time,,Most of the time,Most of the time,,51-75% of projects,More internal than external,Business Department,,"The data is not collected properly or reported well. Many variables are missing most of the time. Consequently every study is fraught with lots of limitations. The bureaucrats and policymakers do not care less about any data driven decision making. There is no incentive to do data science. I am stuck in my present predicament since during my time of looking for employment, this employer was the first to offer me work and also filed for my work visa. ",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,57000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,54,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,,15+ years,,,,,,,,,,,,,,"Coursera,Udacity",GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Professional degree,,,"Data Scientist,Engineer,Predictive Modeler,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,20,15,0,60,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Female,France,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Mathematica,Bayesian Methods,Python,"Government website,Other","Tutoring/mentoring,Other",,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,"Other,I haven't started working yet",Work,40,0,45,15,0,0,,"Bayesian Techniques,Logistic Regression",A doctoral degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,,,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"Bayesian Techniques,PCA and Dimensionality Reduction",,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,5,10,50,25,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Often,,,,,,,,,,Most of the time,,,,,,,,,,,,100% of projects,Entirely internal,Other,Public astronomical database,,,Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,16000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,SQL,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,0,50,0,0,50,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Sometimes,,"Bayesian Techniques,HMMs","C/C++,MATLAB/Octave,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"HMMs,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,Sometimes,,,,,Often,,Often,Sometimes,,,,,,,,,Often,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Need to coordinate with IT",,,,,,,,,,,Sometimes,,,,Often,,,,,,,,26-50% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,"College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,20,0,70,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,Regression/Logistic Regression,"Amazon Web services,SAS Base,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Lift Analysis,Logistic Regression,Segmentation",,,,,,,Often,,,,,,,,Rarely,Often,,,,,,,,,,Sometimes,,,,,,,,0,40,0,20,40,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,70000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,YouTube Videos",,Somewhat useful,Not Useful,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,10,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Traditional Workstation","Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,SVMs","C/C++,Python,Spark / MLlib",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Segmentation,Simulation,SVMs",Most of the time,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Most of the time,,,,Most of the time,,,,,,,Rarely,Sometimes,,,,,,30,20,40,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,45000,BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Data Elixir Newsletter,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,5,20,50,5,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Flume,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,Unix shell / awk",,Most of the time,,,,,Sometimes,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,,,Most of the time,Often,Often,,,,Often,Sometimes,Often,,Often,,Often,Most of the time,,Most of the time,,Often,Often,,,,Most of the time,,,,,,30,30,20,0,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,Sometimes,Most of the time,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,120000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Julia,Deep learning,R,,"Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,80,10,0,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Python,R",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Other",,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,,Most of the time,90,0,5,0,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Sometimes,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,76-99% of projects,Entirely internal,,,over dispersion; under dispersion,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +A different identity,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Link Analysis,R,"Google Search,University/Non-profit research group websites","Blogs,Company internal community,Kaggle,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Government,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests","C/C++,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Evolutionary Approaches,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems",,Sometimes,,,,,Most of the time,Often,,Sometimes,,,,,,,,,Most of the time,Often,,Sometimes,Most of the time,Often,,,,,,,,,,75,10,10,3,2,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Most of the time,,,,,Rarely,Most of the time,,,,,Sometimes,Sometimes,,26-50% of projects,More internal than external,Central Insights Team,,Getting access in bulk,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,80000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,Python,Google Search,"College/University,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,,Very useful,,,,,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Programmer,Statistician",University courses,15,0,60,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Flume,Hadoop/Hive/Pig,NoSQL,Python,QlikView,R,Spark / MLlib,SQL,Tableau",,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,Often,Often,Sometimes,,,Often,Most of the time,,,,Sometimes,Most of the time,Most of the time,,,,10,20,40,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,Often,Most of the time,,,,,,Sometimes,,,,,Sometimes,,,Less than 10% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,180000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Poland,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Self-employed",Julia,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Other",University courses,10,30,50,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Other",Most of the time,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Other,Other",,Most of the time,,,,,,Most of the time,Often,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,Most of the time,Most of the time,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests",Sometimes,,,,,Often,Often,Often,Often,Sometimes,,Most of the time,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,,,,,,20,20,10,10,20,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Most of the time,,,,,,,,,Often,,Sometimes,,,,Often,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",I don't typically share data,,"Bitbucket,Git",Rarely,25000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Online courses,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,2 - 10 hours,Github Portfolio,No,Master's degree,Biology,I don't write code to analyze data,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",50,40,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important +Male,United States,22,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Text Mining,R,I collect my own data (e.g. web-scraping),"College/University,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,1 to 2 years,I haven't started working yet,University courses,10,0,80,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,"10,000 or more employees",Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,10MB,Regression/Logistic Regression,"R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"kNN and Other Clustering,Natural Language Processing,Text Analytics",,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,50,15,0,15,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database",,,,,Often,,,,,,,,,,,,,Most of the time,,,,,100% of projects,Entirely internal,Standalone Team,,Aggregating all the different data sets together to get what the decision makers want. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,65000,,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Tutoring/mentoring",,Very useful,,,,,Very useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,"Business Analyst,Data Analyst,Researcher",Self-taught,40,10,0,20,30,0,Unsupervised Learning,,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,,,"Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,PCA and Dimensionality Reduction,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,50,15,5,20,10,0,Enough to run the code / standard library,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,100% of projects,More internal than external,Other,,,,,,,,12000,CNY,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Technology,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Other,Relational data,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Japan,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,R,Google Search,"Arxiv,Blogs,Conferences,Friends network,Kaggle,Podcasts",Very useful,Very useful,,,Very useful,Very useful,Very useful,,,,,,Very useful,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Researcher,Self-taught,50,30,0,0,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression",,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1TB,"Bayesian Techniques,Gradient Boosted Machines","Google Cloud Compute,Jupyter notebooks,Python,R,SQL",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Recommender Systems",Most of the time,,Sometimes,,,Most of the time,Most of the time,,Sometimes,,,,,,,Often,,,,,,,,Often,,,,,,,,,,50,30,0,20,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations of tools",Most of the time,Often,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,,,26-50% of projects,More external than internal,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Company Developed Platform",,Git,Always,8000000,JPY,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Republic of China,20,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,"Not employed, but looking for work",,,,,,,,Python,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects",,Somewhat useful,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Other,Basic laptop (Macbook),2 - 10 hours,Other,Sort of (Explain more),Master's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,29,Employed part-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Physics,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,No education,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,31,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,23,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,"Data Stories Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,,,,,Necessary,Necessary,,,,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,Researcher,Self-taught,100,0,0,0,0,0,"Survival Analysis,Time Series","Logistic Regression,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,,,,,,,Very Important,,,,,Very Important,,,Very Important +Male,Iran,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by government,Microsoft Excel Data Mining,Regression,Python,University/Non-profit research group websites,"College/University,Textbook,Other",,,Very useful,,,,,,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Yes,Bachelor's degree,Other,I don't write code to analyze data,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,20,15,0,5,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression",No education,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,I don't write code to analyze data,Other,Other,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,I prefer not to answer,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,R,Spark / MLlib,SQL,Tableau,TIBCO Spotfire",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,Most of the time,,,Most of the time,,Most of the time,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression",,,,,,Sometimes,Most of the time,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Often,,,Sometimes,,,Often,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,Jupyter notebooks,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,5,0,5,0,"Computer Vision,Reinforcement learning,Speech Recognition,Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Government,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Rarely,10GB,"Neural Networks,Regression/Logistic Regression,RNNs","IBM SPSS Statistics,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Often,,,,,,"Cross-Validation,Evolutionary Approaches,Logistic Regression,Neural Networks,RNNs,Time Series Analysis",,,,,,Often,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,30,30,15,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,Often,,,,,Often,,,,Sometimes,,,Less than 10% of projects,Do not know,IT Department,,missing data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,16000,TWD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Factor Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,Not Useful,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Work,20,10,60,5,5,0,Natural Language Processing,Logistic Regression,A master's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1PB,"Decision Trees,Random Forests,Other","Amazon Web services,Microsoft Excel Data Mining,Perl,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Rarely,,,Rarely,,,,,Most of the time,Sometimes,,,Most of the time,,,Most of the time,,,,,,,"Collaborative Filtering,Data Visualization,Text Analytics",,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,50,5,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT",Often,Sometimes,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",,Very useful,,Somewhat useful,Very useful,,,,,Very useful,Very useful,,Very useful,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,Manufacturing,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,Rarely,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",Sometimes,Most of the time,,,Most of the time,Most of the time,Most of the time,Often,,,,Often,,,,Most of the time,Most of the time,,,,Sometimes,,Often,,,Most of the time,,,Sometimes,,,,,80,5,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,Most of the time,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,55000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Mexico,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Podcasts",,,,,,,Somewhat useful,,,,,,Very useful,,,,,,"KDnuggets Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Computer Scientist,University courses,30,0,20,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,100MB,"Decision Trees,Random Forests,Other","MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Text Analytics",,,,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,50,30,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Need to coordinate with IT",Most of the time,,,,Most of the time,Often,,,,,,,,,Sometimes,,,,,,,,76-99% of projects,Entirely internal,Business Department,none,missing and corrupt data,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,1000000,MXN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Social Network Analysis,Python,GitHub,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"DBA/Database Engineer,Software Developer/Software Engineer",Work,20,0,80,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Other,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100GB,,"Cloudera,Hadoop/Hive/Pig,NoSQL,Tableau",,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,20,0,0,80,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Most of the time,,,Other,5,,,,,,,,,,,,,,,,,, +Male,Australia,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,,Self-taught,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,,,,,Somewhat important,Other,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,100MB,"Decision Trees,Random Forests","Amazon Web services,C/C++,Google Cloud Compute,Mathematica,MATLAB/Octave,Perl,Python,SQL,Unix shell / awk",,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests",Sometimes,,,,,Most of the time,Sometimes,Often,,,,,,,,Often,,Sometimes,Most of the time,,,,Often,,,,,,,,,,,20,20,20,0,20,20,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,Often,,,,Rarely,,Sometimes,,,,Often,,,,Most of the time,Most of the time,,10-25% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,160000,AUD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,New Zealand,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,I don't plan on learning a new ML/DS method,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Non-Kaggle online communities,Official documentation,Online courses",,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Software Developer/Software Engineer,Statistician",University courses,0,0,70,20,0,10,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Financial,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,SAS Base,SAS Enterprise Miner,SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,Most of the time,,,Sometimes,,,Rarely,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Lift Analysis,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,,Often,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,,,Sometimes,Often,,,,60,20,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,,,,,,,Often,,,,,,,,,,,,,Most of the time,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,120000,NZD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,39,"Not employed, but looking for work",,,,,,,,SAS Base,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Conferences,Friends network,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,Very useful,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer",Self-taught,40,20,30,10,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,India,22,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,Kaggle Competitions,Yes,Bachelor's degree,Other,Less than a year,I haven't started working yet,Other,50,10,10,0,10,20,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Denmark,57,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,Other,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Not Useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"DBA/Database Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,70,5,20,5,0,0,"Computer Vision,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Pharmaceutical,"10,000 or more employees",Stayed the same,Don't know,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,,"Decision Trees,HMMs,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Mathematica,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,SAS Base,SQL,Unix shell / awk",,,,Sometimes,,,,,Rarely,,,,,,,,,,,Rarely,Sometimes,,,,Often,,,,,,Sometimes,,Most of the time,,,,,Often,,,,Sometimes,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,Rarely,,,Sometimes,Often,Sometimes,Sometimes,,,Sometimes,Rarely,Often,,Often,,,,Sometimes,Often,,Often,,,Sometimes,,Rarely,,Often,,,,5,40,10,30,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Sometimes,,Often,,,Rarely,Most of the time,,,,,,Sometimes,Sometimes,,Sometimes,Most of the time,Often,,,51-75% of projects,More internal than external,Standalone Team,weather data; public height contour maps; medical data; patient data,Restrictions on data usage,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Most of the time,"700,000",DKK,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Pakistan,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,GitHub,"Arxiv,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,,,,Very useful,Very useful,,Very useful,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,50,0,20,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,100GB,"Decision Trees,Ensemble Methods,Random Forests","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,Often,,,,Often,,Often,,Sometimes,,Rarely,,,,,Most of the time,,,Most of the time,,,Sometimes,Most of the time,,,,60,20,0,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,Often,Often,,,,Sometimes,,,,,,,,Often,,,Often,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Never,,,,5,,,,,,,,,,,,,,,,,, +Male,Hungary,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Company internal community,Kaggle",,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,3 to 5 years,"Data Miner,Researcher,Software Developer/Software Engineer",University courses,15,0,50,30,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Japan,47,Employed part-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Very useful,,,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,,Other (Separate different answers with semicolon),3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,"Coursera,edX,Udacity",GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Physics,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,80,0,0,0,20,"Computer Vision,Recommendation Engines,Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Germany,34,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Not Useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,,Somewhat useful,Not Useful,Somewhat useful,Very useful,Somewhat useful,,Not Useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Researcher",Self-taught,30,5,20,35,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Other","Amazon Web services,Hadoop/Hive/Pig,NoSQL,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Sometimes,,Often,,,Most of the time,Often,Often,,,,,,Sometimes,,Often,,Often,,,Most of the time,Sometimes,Rarely,,,Sometimes,,,,Most of the time,,,,60,10,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,Most of the time,,,Sometimes,Most of the time,,Most of the time,Most of the time,,Often,Most of the time,Most of the time,,Rarely,Sometimes,Most of the time,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,55000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Newsletters,Online courses,Textbook",,,,,,Very useful,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Data Scientist,Researcher",University courses,60,5,15,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","KNIME (free version),Microsoft Azure Machine Learning,Perl,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,Often,,,,,,Often,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",,Sometimes,,,Sometimes,Often,Often,Often,Sometimes,,,Often,,Often,,Often,,,Often,,Often,,Often,Sometimes,,Often,,,Often,,,,,60,10,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,Data linking; missing values,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,60000,,Other,7,,,,,,,,,,,,,,,,,, +Male,Italy,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,R,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,,5-10 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,,11 - 39 hours,PhD,Yes,Master's degree,Mathematics or statistics,,Business Analyst,University courses,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,"Blogs,College/University,Kaggle,Official documentation,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Work,10,0,70,10,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Don't know,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Python,R,Stan,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Rarely,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,Time Series Analysis",,,Sometimes,,,Often,,,,,,Sometimes,,Often,,Sometimes,,,,Often,,,Often,,Often,,,,,Often,,,,10,30,0,10,50,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,Sometimes,,,,,,,,,,,Sometimes,,,10-25% of projects,More external than internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,Email",,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,R,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Researcher",University courses,30,0,10,60,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",I don't know/not sure,Academic,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,Other,Text data,Never,100MB,"Bayesian Techniques,Decision Trees,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,40,30,5,15,10,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,Less than 10% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,Never,"200,000",UGX,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Switzerland,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Regression,Python,,"Non-Kaggle online communities,Online courses,Textbook",,,,,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Electrical Engineering,I don't write code to analyze data,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Netherlands,49,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Random Forests,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Data Analyst,Predictive Modeler,Statistician",University courses,30,5,60,5,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A bachelor's degree,Telecommunications,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,QlikView,R,SAS Base,SQL,Other",,,,,,,,,,,Most of the time,Sometimes,Rarely,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Often,Sometimes,,,,,Rarely,,,,Most of the time,,,,,,,Rarely,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Prescriptive Modeling,Segmentation",,,,,,,Often,Often,Sometimes,,,,,,,Often,,,,Sometimes,,,,,,Often,,,,,,,,50,15,5,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,,Sometimes,,Sometimes,,Sometimes,,Often,Sometimes,Often,Often,Often,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,140000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,23,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Google Search,I collect my own data (e.g. web-scraping)","Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Management information systems,,,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,Google Search,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,25,25,0,0,50,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,,"Markov Logic Networks,Regression/Logistic Regression","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Markov Logic Networks",,,,,,Often,Most of the time,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,40,10,0,30,20,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Other",,,,,Most of the time,Rarely,,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,76-99% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,Most of the time,624000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Software Developer/Software Engineer",University courses,40,10,0,40,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Relational data,Other",Most of the time,10GB,Regression/Logistic Regression,"C/C++,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Often,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"A/B Testing,Cross-Validation,Logistic Regression",Most of the time,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,20,10,40,0,20,10,"Enough to code it again from scratch, albeit it may run slowly",Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Sometimes,,,,,10-25% of projects,More internal than external,Other,,noise,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Company Developed Platform,,Other,Sometimes,400000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,Very useful,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,15,0,15,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,Sometimes,,,,,,Sometimes,Sometimes,,,,Often,,Rarely,,Most of the time,,,,Rarely,Sometimes,,Most of the time,Most of the time,,Often,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Most of the time,Often,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,Most of the time,Often,,,Most of the time,Often,Most of the time,Often,Often,Most of the time,,Most of the time,,Often,Most of the time,Often,,,,30,35,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,Often,,,Often,,,,,Often,,,Sometimes,,Often,Often,,,Less than 10% of projects,More internal than external,Central Insights Team,Cannot disclose,dirty data; data from different sources,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Never,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,,,,,,Very useful,Very useful,,,Somewhat useful,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,,"Programmer,Other",University courses,NA,NA,NA,NA,NA,NA,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Italy,48,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,Less than a year,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,93,5,NA,2,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",High school,Mix of fields,"10,000 or more employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics",,Sometimes,,,Sometimes,Most of the time,Most of the time,Often,,,,,,,,,,,Often,,,,Often,Sometimes,,,,,Often,,,,,45,20,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,"60,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,France,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,Machine Learning Engineer,Self-taught,30,0,40,0,30,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,Not Useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,40,20,0,10,10,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,Less than one year,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis",Sometimes,,,,,Often,Often,Sometimes,,,,Often,,,,Often,,,,,,,Often,,,Sometimes,,Sometimes,,Sometimes,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,Sometimes,,,Sometimes,,,,,,Most of the time,,,51-75% of projects,Entirely internal,Standalone Team,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"32,000",,Other,8,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,"Linear Digressions Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Engineer,Researcher",Self-taught,10,5,75,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Technology,100 to 499 employees,Decreased significantly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Perl,Python,R,SQL,TensorFlow,Unix shell / awk",Rarely,Sometimes,,,,,,,Sometimes,,,,,,Often,,,,,,,Rarely,,Rarely,,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Often,,,,Rarely,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Sometimes,,,,Sometimes,Often,Most of the time,Often,Often,,,Often,,,,,,,,,Often,,Often,,,,,,,Often,,,,50,30,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,,,,,Most of the time,,,,,Often,,,Sometimes,,Sometimes,,Sometimes,,100% of projects,Approximately half internal and half external,Business Department,Map data,Spatio-temporal variability and data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,70000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,49,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by company that makes advanced analytic software,R,Text Mining,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Conferences,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,Very useful,,,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Biology,I don't write code to analyze data,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Work,20,10,70,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Vietnam,22,"Not employed, but looking for work",,,,,,,,R,Support Vector Machines (SVM),Python,I collect my own data (e.g. web-scraping),"Blogs,Newsletters,Official documentation,Personal Projects",,Very useful,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,"FlowingData Blog,KDnuggets Blog",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,40,30,0,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Julia,Bayesian Methods,Python,Google Search,"Arxiv,Blogs,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",20,30,30,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,Sometimes,Often,,Most of the time,Most of the time,,,,,,,Sometimes,,,,,,Often,Often,,,,,,,,,Often,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Often,Most of the time,Most of the time,,,,Often,,,,Often,Most of the time,,,,,Most of the time,Most of the time,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,60000,GBP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Greece,34,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),5-10 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,30,10,10,40,10,0,Time Series,"Bayesian Techniques,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,India,21,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Trade book,Tutoring/mentoring",,,Very useful,,,,Very useful,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,,Github Portfolio,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,0,20,20,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,Very useful,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",Other,40,0,0,0,0,60,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A master's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),"N/A, I did not receive any formal education",Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Rarely,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,Other",,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Often,,,,,,,Often,,,"A/B Testing,Natural Language Processing,Neural Networks,Random Forests,Time Series Analysis",Sometimes,,,,,,,,,,,,,,,,,,Rarely,Rarely,,,Rarely,,,,,,,Rarely,,,,10,20,10,40,20,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,76-99% of projects,More external than internal,Other,localization info; dog breed info;,overlapping mappings,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,"160,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Philippines,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,R,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Very useful,,,Very useful,,,,Somewhat useful,Very useful,,,,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Engineer,Self-taught,30,10,20,20,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,500 to 999 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Image data,,10GB,"Bayesian Techniques,SVMs","IBM SPSS Statistics,MATLAB/Octave,Python",,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,kNN and Other Clustering,Segmentation",,,Sometimes,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,Limitations in the state of the art in machine learning,,,,,,,,,,,,Sometimes,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Other,Sometimes,1000000,PHP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Other,22,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Neural Nets,Matlab,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,50,30,0,20,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed part-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Support Vector Machines (SVM),Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Egypt,29,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by college or university,MATLAB/Octave,Neural Nets,Java,University/Non-profit research group websites,"College/University,Friends network,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,Very useful,,5-10 years,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Management information systems,1 to 2 years,Researcher,University courses,50,5,15,30,0,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Sweden,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Python,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,60,20,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,R,Spark / MLlib,Other",,,,,,,,,,,Often,,Sometimes,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Sometimes,,,,Sometimes,,,,Sometimes,Often,,Often,,,,,Sometimes,,Often,,,,40,10,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,Sometimes,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,103000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Other,25,Employed part-time,,,No,Yes,Other,Fine,,Python,Cluster Analysis,Python,Other,"Blogs,Kaggle,Newsletters,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Psychology,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important +Male,Egypt,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Company internal community,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Very useful,Very useful,Very useful,,Very useful,,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,20,5,20,55,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,Fewer than 10 employees,Decreased slightly,6-10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,,"Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Random Forests",,,,,,Often,Most of the time,Often,,,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,20,10,30,20,20,0,Enough to tune the parameters properly,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",git,"Bitbucket,Git",Sometimes,156000,EGP,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Official documentation,Personal Projects,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,Very useful,,,Very useful,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Engineer,Self-taught,80,0,0,0,0,20,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,India,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,Very useful,,,,,Very useful,,Very useful,,,,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Other,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Programmer,Software Developer/Software Engineer",University courses,20,15,10,50,5,0,,,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,Germany,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Textbook,Tutoring/mentoring",,,Very useful,,,,Very useful,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,"Data Stories Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Bachelor's degree,Management information systems,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,27,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,,,,,,Very useful,,< 1 year,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important +Male,Hungary,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,KDnuggets Blog,3-5 years,Unnecessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,3 to 5 years,"Data Miner,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,20,15,15,25,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Neural Nets,Python,GitHub,"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,"Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,35,30,10,15,0,Computer Vision,Neural Networks - CNNs,A master's degree,Other,20 to 99 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Don't know,10GB,Neural Networks,"Amazon Web services,C/C++,Jupyter notebooks,Python,TensorFlow",,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Segmentation,SVMs",,,,Most of the time,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,35,35,10,10,10,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,none,labelling,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Sometimes,38000,GBP,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Finland,45,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,Very useful,Not Useful,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,University courses,20,30,30,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Video data",Never,100MB,"CNNs,Neural Networks,SVMs","C/C++,Mathematica,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Naive Bayes,Neural Networks,Segmentation,SVMs",,,,Most of the time,,Often,Most of the time,,,,,,,,,,,Rarely,,Often,,,,,,Most of the time,,Sometimes,,,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,getting access to annotated data / annotate data myself,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,400000,DKK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Norway,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,30,30,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,SVMs",,,,,,Most of the time,Sometimes,Often,,,,,,,,Sometimes,,,,,,,Often,,,,,Sometimes,,,,,,10,10,50,20,10,0,Enough to explain the algorithm to someone non-technical,"Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,Rarely,Often,Sometimes,Sometimes,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,75000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,Government website,"Official documentation,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,,,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,"FastML Blog,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Researcher",Self-taught,50,0,0,0,0,50,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Often,,Often,,,,"CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,Time Series Analysis",,,,Sometimes,,Most of the time,,Sometimes,,,,Sometimes,,,,Often,,,Rarely,Often,,,Sometimes,,Sometimes,Rarely,,,,Often,,,,70,10,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Sometimes,,Sometimes,Most of the time,Sometimes,,,Most of the time,,,,,Sometimes,Often,,,,Sometimes,Sometimes,Sometimes,,100% of projects,More internal than external,Business Department,NOAA; us census,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",S3,Git,Rarely,108000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Kenya,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Researcher",Work,0,20,50,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,Sometimes,,Often,,Most of the time,,Sometimes,Often,,Often,,Sometimes,,,,,,Most of the time,Sometimes,,,,75,10,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,Most of the time,,,,Most of the time,Sometimes,Sometimes,,,,Most of the time,Most of the time,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,3000000,KES,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Conferences,Official documentation,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Engineer,Researcher,Statistician",University courses,40,10,5,40,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Other (please specify; separate by semi-colon)","Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Most of the time,10GB,Other,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,TIBCO Spotfire",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,Often,,,,,,,,Rarely,Rarely,,,,Rarely,Sometimes,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,Most of the time,Often,,,,,,,,,Sometimes,,,,,Rarely,,,,,Often,,,,Sometimes,,,,20,40,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,Whether,Domain knowledge about the data,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"2,000,000",TWD,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Personal Projects",,Very useful,Very useful,,Very useful,,Very useful,,,,,Very useful,,,,,,,"Data Stories Podcast,Linear Digressions Podcast,R Bloggers Blog Aggregator",3-5 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,"Information technology, networking, or system administration",,"Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Excel Data Mining,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,30,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,"5,000 to 9,999 employees",Increased slightly,6-10 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,Often,,,,,Most of the time,,Sometimes,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Often,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,Often,Often,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Git,Always,1000000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,QlikView,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Predictive Modeler",University courses,30,0,20,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Tableau",,Often,,,Often,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Most of the time,,,Often,Often,,,Often,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Segmentation",Most of the time,Sometimes,,,,Often,Often,Sometimes,Often,,,,,,Most of the time,Often,,Sometimes,,,,,Most of the time,,,Often,,,,,,,,20,30,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,Most of the time,,Sometimes,Most of the time,Often,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Always,"54,000",SGD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,United States,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,28,18,20,32,2,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,10GB,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,SQL,Tableau",Often,Often,,,,,,,Most of the time,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,,,,,Often,Often,,,,93,1,4,1,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,,,,,Often,,Often,,,,,,,,Often,,Often,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Git,Subversion",Most of the time,74000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,48,"Not employed, but looking for work",,,,,,,,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Yes,Professional degree,,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",35,40,0,0,0,25,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Germany,52,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Official documentation,Online courses",Very useful,Very useful,,,,,,,,Somewhat useful,Very useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Computer Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",15,30,30,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Non-profit,Fewer than 10 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,Often,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Often,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,Often,Often,Often,,,,Often,,Sometimes,,Sometimes,,Often,Sometimes,,Sometimes,Often,Often,,,,,Often,Sometimes,Often,,,,30,40,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues",,,,,Often,,,,,,,,,,,,Often,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,,55000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Friends network,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,Very useful,,,,,,Very useful,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,South Africa,36,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects",Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,,,Somewhat useful,,Very useful,,,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher,Other",University courses,40,0,30,25,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,,Often,Most of the time,Often,Sometimes,Sometimes,,,Often,,,,Often,,,Often,Sometimes,,,Often,Often,,,,Sometimes,Often,,,,,60,20,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,,,,,,,,,,,,,,Often,,,,26-50% of projects,Do not know,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,250000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Not Useful,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,20,10,0,0,60,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,"5,000 to 9,999 employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Don't know,,Other,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,Most of the time,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,70,0,30,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,,,,,,,,Often,,,Less than 10% of projects,Approximately half internal and half external,Other,google API; web scraping; twitter; Salesforce; Omniture,Cleaning and combining into a workable structure,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Never,62500,USD,Other,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Anomaly Detection,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,40,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Non-profit,20 to 99 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Often,,Sometimes,,Most of the time,,,,,Often,Most of the time,Often,,,,Sometimes,,Sometimes,Sometimes,,,,30,20,0,0,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Often,Sometimes,,Sometimes,Most of the time,Often,,,Sometimes,,,,,Sometimes,,,,,,,Often,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,120000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Greece,26,"Not employed, but looking for work",,,,,,,,Other,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,FastML Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,40,0,35,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +A different identity,United States,35,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Unnecessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,Unnecessary,Unnecessary,,,,,GPU accelerated Workstation,0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,80,5,5,10,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important +Male,Brazil,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,France,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,33,0,50,17,0,Recommendation Engines,"Ensemble Methods,Logistic Regression",A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Bayesian Techniques,Regression/Logistic Regression",Amazon Web services,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,PCA and Dimensionality Reduction",Often,,Often,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,10,40,40,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",Need to coordinate with IT,,,,,,,,,,,,,,,Rarely,,,,,,,,26-50% of projects,Entirely internal,IT Department,none,none,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,45000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,Google Cloud Compute,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Very useful,,,,,,,,,,,Very useful,"KDnuggets Blog,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer",University courses,70,10,0,20,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Sometimes,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Java,NoSQL,Python",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Decision Trees,Logistic Regression,Neural Networks",Often,,,,Often,,,Often,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,20,30,15,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,,,Most of the time,,,,Often,,,,,Sometimes,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,NONE,Clean the data.,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Git,Rarely,"80,000",BRL,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by college or university,R,Deep learning,R,GitHub,Blogs,,Very useful,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,1 to 2 years,"Data Analyst,Data Scientist,Researcher",Self-taught,90,5,0,5,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,10MB,Regression/Logistic Regression,"IBM SPSS Statistics,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,10,60,10,20,0,0,Enough to run the code / standard library,"Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,Often,,,,,,,Often,,,Often,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,Graph (e.g. GraphBase/Neo4j),"Commercial Data Platform,Email",,Git,Never,15,,Has decreased 20% or more,2,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,,,,,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,,Other,50,50,0,0,0,0,,,A bachelor's degree,Academic,"10,000 or more employees",,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,,,,,,"C/C++,Jupyter notebooks,Mathematica,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,Most of the time,,,,,,,,,,,,,Most of the time,,,Rarely,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,70,0,20,10,0,Enough to refine and innovate on the algorithm,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,R,Other,"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Not Useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",50,10,30,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,Other","IBM Watson / Waton Analytics,NoSQL,Python,R,Stan,Unix shell / awk",,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",Most of the time,,Often,,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,Often,Sometimes,,Most of the time,,,,Often,Sometimes,,Often,,,,45,10,5,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Rarely,Sometimes,Most of the time,,,Most of the time,,,,,,Most of the time,Sometimes,,Rarely,,Sometimes,Sometimes,,,100% of projects,Approximately half internal and half external,Standalone Team,,data quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Rarely,53000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Germany,38,Employed part-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Stan,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Friends network,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Very useful,,,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,Other,University courses,65,15,0,20,0,0,Natural Language Processing,Logistic Regression,Primary/elementary school,Technology,10 to 19 employees,,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,,Other,"C/C++,Java,MATLAB/Octave,Perl,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,Rarely,Rarely,,Most of the time,,,,,,,,,Sometimes,,,,,,Rarely,,,,"Data Visualization,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,30,30,10,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,Most of the time,,,Rarely,Sometimes,,,51-75% of projects,More internal than external,IT Department,,,Other,Share Drive/SharePoint,,"Git,Subversion",Rarely,24000,EUR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,5,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Julia,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Rarely,Often,,,,,,,Often,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,"Neural Networks,RNNs,Text Analytics",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,60,20,20,0,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Rarely,1500000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,No,Yes,Engineer,Fine,Employed by company that makes advanced analytic software,SQL,Anomaly Detection,R,Google Search,"Blogs,Company internal community,Conferences,Online courses",,Somewhat useful,,Very useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,"FlowingData Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",5-10 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Traditional Workstation,2 - 10 hours,Other,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,I don't write code to analyze data,Other,Self-taught,30,25,40,5,0,0,,,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,Data Analyst,Work,80,0,20,0,0,0,Reinforcement learning,"Logistic Regression,Neural Networks - RNNs",High school,Financial,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Rarely,10GB,Neural Networks,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,,,,,Often,Often,,,,,Often,Most of the time,,,Most of the time,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,Often,,,Most of the time,,Sometimes,,,Often,,Often,,,,Sometimes,,,100% of projects,More external than internal,Standalone Team,Financial market data,Timestamps and missing data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,,400000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,6 to 10 years,"Data Miner,Data Scientist,Predictive Modeler,Other",University courses,20,20,10,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important +Male,France,26,"Not employed, but looking for work",,,,,,,,Python,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),,Master's degree,Yes,Master's degree,Computer Science,Less than a year,Other,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,43,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Textbook,Trade book",,,,,,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches",A doctoral degree,Academic,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,Traditional Workstation,Text data,Rarely,1GB,Other,Spark / MLlib,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,30,50,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Often,Often,,76-99% of projects,Approximately half internal and half external,Other,Census Data; Numerical Simulation Data; Historical Weather Data,N/A,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Shared file system; Google Drive,Git,Sometimes,"150,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,India,33,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",Other,10,0,0,0,0,90,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Other,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Very useful,,Very useful,,,,Very useful,,,Very useful,,,,,"O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Business Analyst,Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Rarely,,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",Rarely,,,,Most of the time,Most of the time,Most of the time,Often,,,,Often,,,,Often,,,,,Often,,Often,Most of the time,,,,,,Sometimes,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Privacy issues",,Sometimes,,,,,,,Most of the time,,,,,,,,Often,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Git",Rarely,90000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,"Arxiv,Blogs,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,,Very useful,,,,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,Other,40,10,50,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Relational data",,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Rule Induction,Python,Google Search,"Online courses,Personal Projects,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,45,15,30,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAS Base,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,Sometimes,,,,,,,Often,Sometimes,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,,Often,Often,,,,Sometimes,,,,Often,,,Often,,,Sometimes,Often,,,Often,,Often,Often,,,,,65,10,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,Often,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Online courses,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,10,10,70,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Rarely,,,"Cloudera,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,,,,Sometimes,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Neural Networks,Prescriptive Modeling,Random Forests,Time Series Analysis",Often,,,,,,Often,,,,,,,,,,,,,Sometimes,,Most of the time,Often,,,,,,,Most of the time,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Often,Often,,,Often,Often,Often,,Often,,Often,,Often,,,,Often,,76-99% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Most of the time,1500000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Other,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,42,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Orange,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"DBA/Database Engineer,Programmer",Self-taught,40,20,30,0,0,10,"Computer Vision,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",Primary/elementary school,Academic,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Video data,Text data",Don't know,100GB,"Bayesian Techniques,SVMs,Other","C/C++,MATLAB/Octave,Perl,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,SVMs",,,,,,Most of the time,Most of the time,,,,,,,Often,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,10,30,10,10,40,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,improve some results published,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,250000,MXN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,"Business Analyst,Other",Kaggle competitions,5,20,35,15,25,0,,,A professional degree,Financial,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft SQL Server Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,,Sometimes,,,Sometimes,,,,Often,,,,,,,Often,,,,,Often,,Often,,,,40,20,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,Often,,Often,,26-50% of projects,More internal than external,Other,Rating Agencies,Data quality,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,I don't typically share data",,Bitbucket,Rarely,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Anomaly Detection,Python,Google Search,"Arxiv,Blogs,Official documentation,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,Very useful,Very useful,Very useful,,,,,,Somewhat useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Neural Networks,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,Often,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,"Cross-Validation,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,RNNs",,,,,,Sometimes,,,,,,,,,,Rarely,,Sometimes,Often,Most of the time,,,,,Most of the time,,,,,,,,,10,20,50,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,Sometimes,,Often,Sometimes,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,,lack of labeled data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git",Always,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,20,10,60,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Nigeria,43,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,TensorFlow,Social Network Analysis,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Online courses,Textbook,Tutoring/mentoring",Very useful,,,,Very useful,,Very useful,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,Data Miner,Machine Learning Engineer",Self-taught,60,20,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Other",Rarely,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","Python,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Often,,Most of the time,Most of the time,,,,,Most of the time,,,,Often,,Most of the time,Often,,Most of the time,,,,,Most of the time,,Often,,,,20,30,10,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,,,,,Often,,Often,,,,,,Most of the time,,,,Most of the time,,10-25% of projects,Entirely external,Standalone Team,,Application,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"150,000",NGN,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,SAS,,"College/University,Company internal community,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,Not Useful,Somewhat useful,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler",University courses,20,0,60,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A professional degree,Financial,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Angoss,QlikView,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,Sometimes,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,Sometimes,Often,,,,,,,,,,,,,,,Often,,,,Often,,,,Rarely,,,,5,20,30,5,40,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,Sometimes,,,Most of the time,,,Often,Often,Often,Most of the time,,Most of the time,Sometimes,,,Most of the time,,,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Canada,41,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",70,20,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs",Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Never,100MB,"Decision Trees,Markov Logic Networks","Amazon Web services,Jupyter notebooks,R",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,HMMs,Recommender Systems,Time Series Analysis",,,,,,,Often,,,,,,Often,,,,,,,,,,,Often,,,,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,6 to 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Segmentation",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,30,20,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,,University courses,25,0,0,75,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Microsoft Excel Data Mining,Neural Nets,Python,GitHub,"Online courses,YouTube Videos,Other",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,30,50,20,0,0,0,Supervised Machine Learning (Tabular Data),,Primary/elementary school,Manufacturing,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Other,GPU accelerated Workstation,Text data,Sometimes,100MB,Regression/Logistic Regression,"C/C++,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Text Analytics,Other",,,,,,Often,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,Often,,,5,15,5,50,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Often,,,Often,,,,,,,Often,,,,,,,Most of the time,,,10-25% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Most of the time,90000,CHF,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,500 to 999 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Other,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,,Often,,,Most of the time,Most of the time,Often,Often,,,,Often,Often,,Often,,Often,,Often,Often,,Often,,Often,,,,,,,,,25,20,45,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Most of the time,Most of the time,Most of the time,,,Most of the time,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,21,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,Linear Digressions Podcast,1-2 years,,,,,Necessary,Necessary,,,,,,,,Other,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,Other,University courses,20,0,60,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Insurance,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Miner,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,42,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,Researcher,University courses,20,0,10,60,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,Sometimes,Sometimes,,,Often,,,Rarely,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Often,Often,,Often,Most of the time,Most of the time,Often,Sometimes,,,,,Often,,Sometimes,,Sometimes,Sometimes,,Often,,Sometimes,Often,,Often,Most of the time,,Most of the time,Most of the time,,,,50,30,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,Often,Most of the time,,,Often,,,,,,Most of the time,,Often,,,Often,,,51-75% of projects,More external than internal,Business Department,www.data.gov.co,Preparing and cleaning dataset,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,50000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Survival Analysis,R,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Data Analyst,Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,38,Employed full-time,,,Yes,,Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,Programmer,Work,50,20,20,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Other,500 to 999 employees,Decreased significantly,6-10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",,,,,,"C/C++,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,"Logistic Regression,Neural Networks,RNNs",,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,Sometimes,,,,,,,,,0,0,0,0,0,0,,Inability to integrate findings into organization's decision-making process,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Argentina,33,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,Very useful,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,10,15,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Video data,Text data,Relational data",Always,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Mathematica,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL",,Most of the time,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Data Visualization,Evolutionary Approaches,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests",,,,,,,Often,,,Sometimes,,,,,,Sometimes,,,Sometimes,,,Sometimes,Rarely,,,,,,,,,,,20,35,25,10,10,0,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,Sometimes,,,,Sometimes,Rarely,,,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,40000,ARS,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,University courses,30,20,20,30,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,TensorFlow",,Sometimes,,Often,,,,,,,,,,,,,Most of the time,,,Rarely,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,Often,Often,,Often,Often,Often,,,,,,Often,,Often,,Often,,Often,Often,,Often,,,,,Often,Often,,,,,20,30,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Engineer,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Statistician",University courses,30,5,30,30,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A doctoral degree,Pharmaceutical,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Most of the time,100MB,"CNNs,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs","Jupyter notebooks,NoSQL,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,Sometimes,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,RNNs",Sometimes,,Sometimes,Sometimes,,Often,Most of the time,,,,,,Most of the time,Often,,Often,Sometimes,,,Often,Sometimes,,,,Sometimes,,,,,,,,,40,20,5,25,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",Often,,,,Sometimes,,,,Often,,,,,,Most of the time,,Most of the time,,,,,,76-99% of projects,More internal than external,Standalone Team,International Immunogenetics Database,Getting access to data spread across the organization in different formats/ different storage methods,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,150000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Programmer,University courses,10,40,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,Data Scientist,Work,0,0,100,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Perl,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,61,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Not important +Male,United States,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",25,45,10,10,0,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",,,"Ensemble Methods,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,38,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Engineer",Self-taught,35,60,5,0,0,0,"Machine Translation,Recommendation Engines,Speech Recognition",,"Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Decreased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,,,"IBM Watson / Waton Analytics,Python,SQL,Unix shell / awk",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,None,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,,,5,,,,,,,,,,,,,,,,,, +Male,Ireland,44,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,,,,,,,Traditional Workstation,2 - 10 hours,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,United States,67,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Computer Scientist,Self-taught,30,30,20,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series",,,Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,,,Regression/Logistic Regression,"C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R",,,,Most of the time,,,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Other,Perfectly,Employed by non-profit or NGO,TensorFlow,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Not Useful,,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A humanities discipline,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,,,A doctoral degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Female,Australia,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,30,0,0,60,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Academic,Fewer than 10 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Image data,Don't know,1GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests",,,,Sometimes,,,,,Often,,,Often,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,30,60,0,10,0,0,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,Often,10-25% of projects,More internal than external,IT Department,kaggle,lack of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,"Not employed, but looking for work",,,,,,,,C/C++,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Podcasts,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,Very useful,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Necessary,,Necessary,,,Necessary,,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Data Analyst,Engineer",Self-taught,60,35,5,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,,,,Somewhat important,,Very Important,,,,,,Somewhat important,,,,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Stan,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Predictive Modeler,"Online courses (coursera, udemy, edx, etc.)",50,30,10,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,100MB,Regression/Logistic Regression,"Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling",,,,,,,Sometimes,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,60,20,0,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,Most of the time,Sometimes,,Often,Sometimes,,Sometimes,,Sometimes,,Often,,Sometimes,,Sometimes,Most of the time,Often,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Brazil,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,Engineering (non-computer focused),More than 10 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,0,0,60,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,Somewhat important,,"GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,"Neural Networks,PCA and Dimensionality Reduction",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,Python,Text Mining,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,20,0,50,20,0,Natural Language Processing,Evolutionary Approaches,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Taiwan,19,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed part-time,,,No,Yes,Programmer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,RapidMiner (commercial version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Natural Language Processing,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Statistician",Work,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Software Developer/Software Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,India,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,28,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Researcher",University courses,50,0,0,50,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",,,,"Google Cloud Compute,Python,R,SQL",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,90,0,10,0,0,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,1GB,"Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Natural Language Processing,Neural Networks,RNNs",,,,Sometimes,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,5,10,50,25,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,23,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,C/C++/C#,Other,Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,Other (Separate different answers with semicolon),3-5 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,,Other,40+,Online Courses and Certifications,No,Bachelor's degree,A humanities discipline,,Other,Other,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,20+,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Very useful,,Very useful,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Engineer,University courses,40,20,0,40,0,0,"Natural Language Processing,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Academic,,,,,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,Never,1GB,"Decision Trees,Ensemble Methods,Random Forests,SVMs","Python,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,,,Often,,Often,Often,,,,,Sometimes,,,,Sometimes,Often,,Sometimes,,Sometimes,Rarely,,,,Most of the time,Most of the time,,,,,15,25,30,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Other",,,,,,,,,,Sometimes,,Sometimes,Rarely,,,,,,,,,Most of the time,Less than 10% of projects,More internal than external,Standalone Team,Twitter data; Citation related data;,trusting the labels,,I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,400000,INR,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,R,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,70,0,0,0,10,20,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,Mix of fields,500 to 999 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Other,Laptop or Workstation and private datacenters,Text data,,,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,20,20,20,20,20,NA,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,Do not know,IT Department,Not Applicable,Not Applicable,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Don't know,0,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,India,48,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,Very useful,Somewhat useful,"FlowingData Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Never,10MB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,TensorFlow",,,,,,,,Often,,,,,Often,,,,Most of the time,,,,Often,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,,Often,,Most of the time,Often,Often,,,,Often,Sometimes,Most of the time,,Most of the time,,,Most of the time,Often,Often,,,,Sometimes,Sometimes,,Most of the time,,,,,,40,20,0,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Often,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,36,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,20,15,5,10,50,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Gradient Boosted Machines,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,Sometimes,,Sometimes,,,,,,,Rarely,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,"CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,,,Rarely,,Most of the time,,Sometimes,,,,Often,,,,Often,,,Often,,,,Often,,,,,,Sometimes,,,,,50,10,20,10,0,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,Sometimes,,Often,Often,,,,,,Most of the time,,,Sometimes,Sometimes,Often,Most of the time,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,123500,CHF,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,5,10,5,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Text Analytics",,,Often,,,,Most of the time,Sometimes,,,,,,,,,,Sometimes,Often,,,,,,,,,,Often,,,,,0,20,0,50,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,NA,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,Logistic Regression,,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,25,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity,Other",GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),I prefer not to answer,I never declared a major,Less than a year,I haven't started working yet,Other,15,75,0,0,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Singapore,43,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,Very useful,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Data Miner,Data Scientist",University courses,40,0,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Logistic Regression",A doctoral degree,Technology,20 to 99 employees,Increased slightly,6-10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Neural Networks,Regression/Logistic Regression,Other","Amazon Web services,IBM SPSS Modeler,Microsoft Excel Data Mining,Python",,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",Often,Often,,,,Often,Most of the time,Often,,,,,,,Often,Often,,,,Sometimes,Often,Often,,,,Often,Sometimes,,,Sometimes,,,,50,5,5,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Often,Often,,,Most of the time,Often,,,Most of the time,Rarely,,,,,,Often,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Sometimes,300000,SGD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Monte Carlo Methods,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Official documentation,Online courses",Very useful,Very useful,Very useful,,,,,,,Very useful,Very useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Other,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Researcher,University courses,20,0,0,80,0,0,"Computer Vision,Reinforcement learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Ireland,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Other,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Online courses,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,Very useful,,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,A professional degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Never,1GB,"Regression/Logistic Regression,Other","Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Recommender Systems",,,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,10,40,0,0,50,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,Often,,,Most of the time,,,Often,,Often,,,,,,,Most of the time,Most of the time,,10-25% of projects,Approximately half internal and half external,Other,It is a research institute focusing on many topics.,Having access to useful data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Git,Sometimes,"18,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,University courses,0,0,0,100,NA,0,"Adversarial Learning,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Always,10GB,"Neural Networks,Random Forests","Jupyter notebooks,Python,SQL,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Often,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,,Most of the time,,,,Sometimes,,Sometimes,,Sometimes,,,Most of the time,Often,Sometimes,,Most of the time,,,,,Sometimes,Most of the time,,,,,50,20,20,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,Often,,Often,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Other,Don't know,45000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Poland,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Haskell,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Newsletters,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,,Very useful,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Other,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Cloudera,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Other",,,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,Most of the time,,,"Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",,,,,Sometimes,,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,,,,,Often,,Most of the time,Sometimes,,,,,,,,,,50,25,20,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Most of the time,,,,Sometimes,,Often,,Rarely,,,,,Often,Sometimes,Most of the time,Often,,Less than 10% of projects,Entirely internal,Other,,Getting appropriate schema definitions; getting information on all possible corner-cases,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Always,120000,PLN,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,43,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,Very useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,Work,25,20,30,20,5,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Telecommunications,"10,000 or more employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Relational data,Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,Sometimes,,,,,,,,Often,,,,Rarely,,,,,,Sometimes,,,Sometimes,Often,,Rarely,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Sometimes,Sometimes,Most of the time,,Often,Often,,,Often,Sometimes,Sometimes,,Sometimes,Sometimes,Rarely,Sometimes,Sometimes,Sometimes,,Often,Often,,,Most of the time,Sometimes,Sometimes,Often,,,,30,20,35,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,,Often,,,Most of the time,,,,,,,,,,Often,Sometimes,Often,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Sometimes,40000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,MATLAB/Octave,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Arxiv,College/University,Kaggle,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,University courses,10,0,10,65,15,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,,1GB,"Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,R,RapidMiner (commercial version),SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,Sometimes,Sometimes,,,,,,,,Often,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Segmentation,SVMs,Text Analytics",,,,,,,Most of the time,,,,,,,Often,,Often,,,Most of the time,,,,,,,Rarely,,Sometimes,Most of the time,,,,,70,10,5,15,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,,,,Sometimes,,,,,,,Often,Often,,,,,,76-99% of projects,More internal than external,Standalone Team,Scientific Publications,Parse PDFs into clean XML,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Other,Python,Google Search,"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,"Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,35,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,SQL,Regression,R,"Google Search,Government website","College/University,Friends network,Online courses,Textbook,YouTube Videos",,,Very useful,,,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Survival Analysis,Time Series",Logistic Regression,A professional degree,Academic,500 to 999 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,Often,,Often,,,,,,,,,,Often,,,,Often,,,,20,30,5,20,10,15,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools",Most of the time,Often,,,,,,,Often,,,,Sometimes,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Other,Most of the time,"84,000",UAH,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Female,People 's Republic of China,27,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,60,0,20,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Weka,Social Network Analysis,Python,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,,Very useful,Very useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,20,50,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,100GB,"Bayesian Techniques,Decision Trees,Random Forests","IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Recommender Systems,Text Analytics",Often,Often,Sometimes,,,,,,,,,,,Often,Rarely,Sometimes,,Often,Often,,,,,Often,,,,,Often,,,,,70,10,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,,,,,Often,,Sometimes,,,,Most of the time,,Most of the time,Sometimes,,,Most of the time,,Less than 10% of projects,More internal than external,Business Department,,collecting data,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Git,Rarely,480000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Bayesian Methods,Scala,GitHub,Company internal community,,,,Somewhat useful,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Analyst,Self-taught,85,NA,NA,15,0,0,Recommendation Engines,Decision Trees - Random Forests,A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10TB,Bayesian Techniques,"Cloudera,Flume,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Unix shell / awk",,,,,Most of the time,,Most of the time,,Most of the time,,,,,,,,Often,,,,,Often,,,,,Most of the time,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Naive Bayes,Random Forests,Recommender Systems",,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,,50,30,0,10,10,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input,Limitations of tools",,,,,Most of the time,,,,,,Often,,Often,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,,600000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Other,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,20,25,20,25,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,,,,Often,,,,Sometimes,Sometimes,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Rarely,Most of the time,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs",,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,Most of the time,,,,,,40,40,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Difficulties in deployment/scoring,,,,Sometimes,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Not Useful,,Not Useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Computer Scientist,Data Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",Sometimes,1GB,"CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,IBM Watson / Waton Analytics,MATLAB/Octave,Python,R,TensorFlow",,Sometimes,,,,,,,,,,,Often,,,,,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,,Often,,,,Often,,,,Most of the time,Often,Most of the time,,,,45,20,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,51-75% of projects,More external than internal,IT Department,data.world; kaggle dataset,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,0,INR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Spark / MLlib,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Online courses,Textbook",,,Somewhat useful,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,10,20,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,SVMs","Cloudera,Jupyter notebooks,NoSQL,Python,SQL",,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Gradient Boosted Machines,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,Often,,,,,,,,,Often,,,,,,,,Most of the time,Often,,,,50,20,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Often,,Often,Often,,,,,,Often,Often,Often,,,Often,,,51-75% of projects,More internal than external,Central Insights Team,governmental datasets,burocracy,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,CKAN,Git,Rarely,57600,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Ireland,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Python,Deep learning,,Google Search,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,Less than a year,"Computer Scientist,Programmer",University courses,5,20,25,50,0,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Random Forests,Ensemble Methods",High school,Technology,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,1MB,"Ensemble Methods,Random Forests","Java,Jupyter notebooks,Microsoft Azure Machine Learning,R,SQL",,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,35,20,30,5,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,Sometimes,,,,,,,Most of the time,,,Sometimes,,Often,,Less than 10% of projects,More internal than external,Standalone Team,,Getting access to clean data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Always,48000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,edX,Workstation + Cloud service,11 - 39 hours,PhD,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important +Male,Pakistan,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics,Employed by non-profit or NGO,Employed by government",TensorFlow,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,"FastML Blog,FlowingData Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100GB,"CNNs,Neural Networks,SVMs","Java,Python,R,SQL",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Neural Networks",Most of the time,,,Often,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,30,30,20,0,20,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,Often,,Most of the time,,,,,,,Most of the time,,,Often,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Git,Subversion",Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,Other,Other,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Somewhat useful,Very useful,,Somewhat useful,,,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,"Emergent/Future Newsletter (Algorithmia),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,0,30,30,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",A master's degree,Internet-based,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Other,Most of the time,1GB,"Bayesian Techniques,Neural Networks,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk,Other,Other",,,,Sometimes,,,,,,,,,,,,,Rarely,,,,Rarely,Rarely,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Most of the time,,Often,Often,Most of the time,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",Often,,Often,,Sometimes,Most of the time,Most of the time,,,,,,,,,,,Sometimes,,Most of the time,Often,,,,Most of the time,,,,,Most of the time,,,,10,40,20,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,,,,,,Sometimes,,,Often,,Sometimes,,,Sometimes,Most of the time,,,Less than 10% of projects,More internal than external,IT Department,physionet; many health related datasets from partnerships,noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,36000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Ukraine,34,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,Data Scientist",University courses,20,10,20,20,30,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important +Male,Brazil,22,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",SQL,Link Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer",Self-taught,60,10,20,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Technology,20 to 99 employees,Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Web services,Java,Microsoft Excel Data Mining,R,SQL,Tableau",,Sometimes,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Random Forests,Time Series Analysis",Often,,Sometimes,,,,Most of the time,Most of the time,,,,,,,,,,Rarely,,,,,Often,,,,,,,Often,,,,10,10,20,35,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,,,,,Often,,Often,Most of the time,,,,,,,76-99% of projects,Entirely external,Business Department,,Board support,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,39000,BRL,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Java,Text Mining,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Not Useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Other,Self-taught,100,0,0,0,0,0,Natural Language Processing,Bayesian Techniques,High school,Academic,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Other,Laptop or Workstation and local IT supported servers,Text data,Never,<1MB,Bayesian Techniques,"C/C++,Java",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,30,50,0,0,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Limitations of tools",Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,None,Do not know,Standalone Team,amazon,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,27,Employed part-time,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Engineer,Work,10,20,30,0,20,20,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Pharmaceutical,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Never,100MB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Segmentation",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,30,20,30,20,0,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,,Often,Often,Often,,,,Often,,,,,,,10-25% of projects,More internal than external,IT Department,,,,Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,300000,INR,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,Primary/elementary school,Technology,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,Regression/Logistic Regression,"Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,90,5,1,2,2,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,100% of projects,More external than internal,IT Department,Meteo data;,Cleaning Data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Other,Sometimes,38000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Business Analyst,University courses,20,10,20,50,0,0,Outlier detection (e.g. Fraud detection),,A master's degree,Internet-based,"5,000 to 9,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100GB,Regression/Logistic Regression,"Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Often,,,,,Sometimes,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,Often,,,,,,Often,,,,51-75% of projects,Entirely internal,IT Department,,data sourcing and cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Other,Sometimes,3000000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Data Scientist,University courses,0,0,15,85,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,I don't know,Increased significantly,Don't know,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Don't know,10GB,"Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,Rarely,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Rarely,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Other",,,,,Rarely,Often,Most of the time,Rarely,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,Most of the time,,,,Sometimes,,,Most of the time,,,70,10,0,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,,,Sometimes,,,,,,,Sometimes,,,,,Often,Most of the time,,Often,Most of the time,,51-75% of projects,Entirely internal,Other,,Unavailability of certain data which would be interesting,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Germany,38,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,Bayesian Methods,Python,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Other,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Hospitality/Entertainment/Sports,"5,000 to 9,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,100MB,"Decision Trees,Random Forests","Hadoop/Hive/Pig,Java,Python,QlikView,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,Rarely,,,,,,,,,Sometimes,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Recommender Systems,Segmentation",,,,,,,Often,,,,,,,,,Often,,,,,,,,Sometimes,,Often,,,,,,,,80,10,0,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Never,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,SQL,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,18,50,0,2,0,Time Series,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,"10,000 or more employees",Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,KNIME (free version),Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,Rarely,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Prescriptive Modeling,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,Often,,,,50,5,5,25,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,,,,Often,,,,Sometimes,,Often,,,Sometimes,Rarely,Most of the time,Often,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,"140,000.00",BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Turkey,27,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Self-employed,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,Very useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection)",Bayesian Techniques,A master's degree,Technology,Fewer than 10 employees,Increased significantly,1-2 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Naive Bayes",Often,,Most of the time,,,Often,Most of the time,Often,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,55,10,10,15,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,Often,,,,,,,,,,,Most of the time,,Most of the time,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,54000,TRY,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Computer Scientist,Data Scientist,Statistician",Self-taught,80,5,10,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Python,R,Stan,Other",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Sometimes,,,,,,Often,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Rarely,,Rarely,,,Often,Most of the time,Often,Often,,,Often,,,,Often,,,,,,Rarely,Often,,,Often,Often,,,Most of the time,,,,60,10,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Most of the time,Often,,Sometimes,Often,,,,Often,Often,Often,,Most of the time,Sometimes,Most of the time,Sometimes,Most of the time,,100% of projects,Entirely internal,Standalone Team,Census; Experian;,Access,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,"180,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Personal Projects,Podcasts,Textbook",,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,DataTau News Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,Other,University courses,60,0,0,30,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,Other,GPU accelerated Workstation,Relational data,Sometimes,100MB,"Gradient Boosted Machines,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Often,Often,,,,,Often,,,,Often,,,,,,,Often,,,,,,,,,,,60,20,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,26-50% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Sometimes,200000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,99,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,Kaggle,,,,,,,Not Useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",15+ years,,,,,,,,,,,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service,Other",40+,,No,I prefer not to answer,Fine arts or performing arts,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning",,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,C/C++,Genetic & Evolutionary Algorithms,R,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Biology,3 to 5 years,I haven't started working yet,Work,50,0,30,20,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods",A doctoral degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,R,SQL,Stan,Unix shell / awk",,Rarely,,Often,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,Sometimes,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,Evolutionary Approaches,PCA and Dimensionality Reduction",,,Most of the time,,,,Most of the time,,Often,Often,,,,,,,,,,,Often,,,,,,,,,,,,,25,10,5,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,Sometimes,Sometimes,,,,Often,,,100% of projects,Entirely internal,Other,ThePlantList; USDA Plants; USFS FIA; TRY plant traits database,"Hard to do meta-analysis on data collected by different teams for different reasons. Hard to deal with different formats, but more importantly, hard to test extent to which data are representative.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,25,5,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Always,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Rarely,Often,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Privacy issues",,Sometimes,,,Often,,,,,,,,,,,,Sometimes,,,,,,10-25% of projects,Entirely internal,Central Insights Team,"Twitter data, Data.gov, Quantl","Cleaning and encoding problems. Some social media data is encoded in UTF-8, while some in Latin. We cannot club these together blindly.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Git,,315000,INR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Podcasts",,Very useful,,,,,Very useful,,,,,Very useful,Somewhat useful,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,10,10,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Decreased slightly,3-5 years,Some other way,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Often,Often,,Often,,,,"Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,Often,Sometimes,,,,50,30,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Often,,,,,,,Often,,,,,,,Often,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,80000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,28,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,1 to 2 years,"Business Analyst,Researcher",University courses,0,0,0,100,0,0,Time Series,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,India,19,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,,No,Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,KNIME (commercial version),Deep learning,Python,University/Non-profit research group websites,Non-Kaggle online communities,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Hospitality/Entertainment/Sports,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Not at all important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Image data,Never,10TB,Other,"Java,KNIME (free version),NoSQL,Python,R,Spark / MLlib",,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,Most of the time,,,,Often,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,"Data Visualization,kNN and Other Clustering",,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,90,0,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Most of the time,,,,,,Most of the time,,,,,,,,,,,Most of the time,Most of the time,,100% of projects,Entirely internal,IT Department,,"persuade our scientific chief to let us use the appropriate algorithm, instead of forcing us to use his useless solutions",Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Git,Never,50000,EUR,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,,University courses,30,20,0,40,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Other,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,1GB,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,35,0,0,35,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team",Often,Often,,Often,,,,,Often,,,Often,Often,,,Often,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Never,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),Less than a year,Data Analyst,University courses,10,20,25,40,5,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,10MB,RNNs,"Java,Python,R",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,RNNs,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,10,30,15,15,30,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,O'Reilly Data Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Necessary,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,,"Business Analyst,Data Analyst,Other",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,Russia,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Self-employed",TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,10,45,5,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAP BusinessObjects Predictive Analytics,Spark / MLlib,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,Often,,,,,,,,Sometimes,,,,,,Often,Sometimes,Rarely,,,,,,Sometimes,,Most of the time,,,,Rarely,,,,Often,Most of the time,,Rarely,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Rarely,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,Often,Often,,Sometimes,,,Sometimes,Often,Often,Rarely,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,55,20,5,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Often,,Often,Most of the time,Most of the time,,,Often,,,,,Sometimes,Most of the time,,,,Often,,Sometimes,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,120000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,NA,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,R,,"Blogs,Kaggle,Official documentation,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,,,,,Not Useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",25,25,20,5,25,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","QlikView,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,Sometimes,,,,Sometimes,,,,,,Often,Often,,,Often,Rarely,,,Sometimes,,,,40,40,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,Often,Most of the time,,,,,,,,,,Often,,,,,Sometimes,Sometimes,,100% of projects,Entirely internal,IT Department,,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Colombia,40,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Google Search,Textbook,,,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Statistician",Self-taught,60,0,30,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,R,SAS Enterprise Miner,SQL",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Segmentation",,,,,,Most of the time,Most of the time,Sometimes,Often,,,,,Sometimes,,,,,,,Often,,,,,Often,,,,,,,,60,10,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Often,,26-50% of projects,More internal than external,Standalone Team,goverment,uae hive and develop models in spark,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,16000,,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Hong Kong,24,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,30,10,10,0,0,,,"Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,10MB,,"C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,Sometimes,,,,,,,"A/B Testing,Data Visualization",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,10,10,10,50,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,Often,,,Rarely,Most of the time,,,,,,Often,Often,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,Sometimes,144000,HKD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Sweden,23,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,0,0,15,35,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,100 to 499 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,18,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities",,,Very useful,,,,Very useful,,Somewhat useful,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,I prefer not to answer,I never declared a major,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,56,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Rarely,100MB,Decision Trees,"Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL",,,,,Often,,,,Most of the time,,,,,Often,Most of the time,,Often,,,,Sometimes,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,Most of the time,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Recommender Systems",Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,65,5,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,,,Sometimes,Often,,Often,,,,,Often,Sometimes,,,Often,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,IP address to geographic database,"Availability and privacy, as data belongs to our direct customers. Also, difficulty to make this raw data adjusted to that it contains really insightful information.",Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Git,Sometimes,95000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed part-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,,"Kaggle,Newsletters,Personal Projects",,,,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Julia,Microsoft Azure Machine Learning,NoSQL,Spark / MLlib,SQL,Unix shell / awk",Rarely,Most of the time,,,,,,,Rarely,,,,,,Most of the time,Sometimes,,,,,,Rarely,,,,,Often,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,Often,,,,"Association Rules,Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,Often,,,,,Often,,,,,,,,,Sometimes,,,,,,Often,,,,Most of the time,,,Often,Sometimes,,,,30,5,10,40,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,Often,,,,,Most of the time,,,,,,,Most of the time,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Decision Trees,R,University/Non-profit research group websites,"Arxiv,Online courses,Stack Overflow Q&A",Very useful,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +A different identity,Finland,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,Google Search,"Blogs,College/University,Friends network,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,20,30,25,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data",Most of the time,10GB,"Bayesian Techniques,Neural Networks","Amazon Web services,C/C++,Java,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Other",,Often,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Rarely,Often,,,,Rarely,,,Sometimes,,,"A/B Testing,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",Often,,,,,,,,,,,,,,,,,Sometimes,,Often,Often,,,,,,,,Often,Sometimes,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Sometimes,Often,,Often,Sometimes,,Often,Sometimes,Rarely,,Often,Often,,,,Often,Sometimes,Most of the time,,Often,,100% of projects,More external than internal,Other,different face image sets,getting enough of good quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",GDrive,Git,Sometimes,48000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,34,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,34,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,37,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Netherlands,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,10 to 19 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Rarely,,,,,Often,,Sometimes,,Rarely,,,,,Most of the time,,,,,Rarely,Often,Rarely,,,,40,20,0,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,Sometimes,,Most of the time,Most of the time,,Often,,,Often,Sometimes,,Often,Sometimes,,Most of the time,Rarely,Sometimes,Most of the time,Often,,76-99% of projects,More internal than external,Other,CBS (population data); KNMI (weather data),Dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,ProjectSend,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Rarely,35000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,17,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,20,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Engineer,Fine,Self-employed,IBM Watson / Waton Analytics,Support Vector Machines (SVM),R,Google Search,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,,,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,20,0,0,50,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,22,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,R,"GitHub,Google Search",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",1-2 years,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",University courses,10,10,10,70,0,0,"Computer Vision,Time Series",Gradient Boosting,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,38,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,Regression,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",15,35,50,0,0,0,,,A bachelor's degree,Internet-based,20 to 99 employees,Decreased slightly,3-5 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,10MB,,"Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Segmentation",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,50,15,15,10,10,0,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Git,Subversion",Rarely,43200,PLN,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,"Employed by a company that performs advanced analytics,Self-employed",R,Decision Trees,R,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Other,University courses,15,30,10,30,15,NA,Unsupervised Learning,Logistic Regression,"Some college/university study, no bachelor's degree",Technology,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Text data,Relational data",,,Regression/Logistic Regression,"Amazon Web services,C/C++,Hadoop/Hive/Pig,NoSQL,Python,R,Spark / MLlib,SQL",,Rarely,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,Rarely,,,,,,,,Rarely,Rarely,,,,,,,,,,Decision Trees,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,None,Do not know,Other,,,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Argentina,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,DBA/Database Engineer,Operations Research Practitioner,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,,,A master's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,,,"Java,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",25,45,0,30,0,0,Time Series,"Bayesian Techniques,Logistic Regression",,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,Natural Language Processing,Bayesian Techniques,High school,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Programmer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Researcher",Self-taught,60,20,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Never,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Java,Python,R,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,RNNs,Text Analytics",,,Sometimes,,,,Most of the time,Often,,,,,,,,Sometimes,,,,Rarely,,,,,Rarely,,,,Most of the time,,,,,30,5,20,35,10,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,Most of the time,Sometimes,,,Most of the time,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,37,Employed full-time,,,Yes,,Statistician,Fine,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,30,10,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Ensemble Methods,Logistic Regression",A doctoral degree,Academic,20 to 99 employees,Stayed the same,Don't know,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Rarely,100MB,"Ensemble Methods,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,,Most of the time,,Sometimes,,,Sometimes,,,,Sometimes,,,,,,Often,,,,,Most of the time,,,Sometimes,,,,20,30,10,30,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,Often,,,Often,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,"Data Analyst,Researcher,Other",University courses,20,65,2,6,7,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,,Decision Trees,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,Other (please specify; separate by semi-colon),,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,40,20,10,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Researcher",University courses,30,25,20,25,0,0,"Reinforcement learning,Speech Recognition,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Sometimes,,,"C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,1 to 2 years,,Self-taught,40,20,0,0,40,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,44,Employed full-time,,,Yes,,Other,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,30,0,5,15,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service",Relational data,Never,,,"R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,Often,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,50,0,20,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,500 to 999 employees,,,,,,,,,,,"Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Engineer,University courses,30,40,0,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,0,30,0,70,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,0,0,70,30,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,1 to 2 years,Engineer,Self-taught,20,80,0,0,0,0,"Computer Vision,Time Series","Logistic Regression,Neural Networks - RNNs",A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,66,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Self-employed,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Operations Research Practitioner,Researcher,Statistician",Kaggle competitions,0,0,25,25,50,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,<1MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,R",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Often,,,Most of the time,,Often,,Rarely,,,,,,,Often,,,Often,Sometimes,Often,,Sometimes,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team",Most of the time,Sometimes,,,Sometimes,,,,,,Often,,,Sometimes,Sometimes,Most of the time,,,,,,,Less than 10% of projects,More internal than external,Other,,,Other,I don't typically share data,,Other,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +A different identity,United States,32,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,"Business Analyst,Other",University courses,10,10,20,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,NoSQL,Neural Nets,Stata,,Conferences,,,,,Not Useful,,,,,,,,,,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,Business Analyst,Self-taught,70,30,0,0,0,0,Adversarial Learning,Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,37,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,More than 10 years,Other,Self-taught,70,10,20,0,0,0,Supervised Machine Learning (Tabular Data),,High school,Financial,,,,,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,,"Computer Scientist,Researcher",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,55,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,20,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Poorly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,,NA,I prefer not to say,,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,22,"Not employed, but looking for work",,,,,,,,SAS Base,Time Series Analysis,R,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,Statistician,University courses,20,0,50,30,0,0,Survival Analysis,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,Less than one year,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,20,10,0,70,0,0,Supervised Machine Learning (Tabular Data),Gradient Boosting,High school,Technology,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,,10GB,Other,"Flume,Hadoop/Hive/Pig,Java,KNIME (free version),NoSQL,Spark / MLlib,SQL,Unix shell / awk,Other",,,,,,,Rarely,,Most of the time,,,,,,Sometimes,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,Often,,,,,,Most of the time,Most of the time,,,"kNN and Other Clustering,Natural Language Processing,Neural Networks,Text Analytics",,,,,,,,,,,,,,Rarely,,,,,Most of the time,Rarely,,,,,,,,,Most of the time,,,,,90,5,0,5,0,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Researcher","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Decision Trees,Python,"Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,,,Very useful,,,Very useful,,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,Other,University courses,NA,NA,NA,NA,NA,NA,,"Logistic Regression,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Argentina,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer,Statistician",University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Government,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Flume,Hadoop/Hive/Pig,Java,NoSQL,R,RapidMiner (free version),Spark / MLlib,TensorFlow",,Often,,,,,Often,,Often,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,Often,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,16,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection)","Logistic Regression,Neural Networks - RNNs",A master's degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","Java,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Logistic Regression,Neural Networks,RNNs",,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,Often,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Belgium,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,1 to 2 years,,Self-taught,50,20,30,0,0,0,,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Personal Projects",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Engineer,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,SVMs,Text Analytics",Often,,,,,,Most of the time,,,,,,,Sometimes,Sometimes,Often,,,,,,,,,,,,Often,Sometimes,,,,,50,15,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization",,Sometimes,Often,,Sometimes,Often,,,Often,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Other,"IPEDS, BLS, Us census ",Accurately recording events of human operators in the system ,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other,Redshift,Git,Most of the time,275000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,Engineering (non-computer focused),,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,Less than a year,I haven't started working yet,University courses,30,5,40,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,24,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A professional degree,Academic,,,,,,,,,,,,"Hadoop/Hive/Pig,Java,KNIME (free version),MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Cross-Validation,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Limitations in the state of the art in machine learning,,,,,,,,,,,,Most of the time,,,,,,,,,,,100% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Argentina,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A professional degree,Technology,Fewer than 10 employees,Increased significantly,Less than one year,Some other way,Somewhat important,Other,"Laptop or Workstation and local IT supported servers,Other",Relational data,Sometimes,10MB,Other,QlikView,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,50,0,50,0,0,,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,Data Miner,Machine Learning Engineer",University courses,0,10,0,80,10,0,Natural Language Processing,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Always,1GB,Other,"Java,Python,SQL,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Natural Language Processing,Neural Networks,RNNs",,,,,,Most of the time,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,6 to 10 years,Business Analyst,Self-taught,80,0,0,20,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Insurance,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,,"Hadoop/Hive/Pig,R,SAS Base,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,18,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,37,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Speech Recognition,Logistic Regression,High school,Other,"10,000 or more employees",Increased significantly,Less than one year,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Never,100MB,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python",,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,19,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst",University courses,0,0,20,80,0,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,20,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Researcher",Self-taught,25,0,50,25,0,0,Time Series,Logistic Regression,A doctoral degree,CRM/Marketing,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer",University courses,30,0,0,70,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Technology,500 to 999 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,1TB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,SQL,Tableau",,,,,,,,,Often,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,Sometimes,Sometimes,,Often,,,,Most of the time,Rarely,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"CNNs,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation",,,,Often,,,Most of the time,Often,,,,,,,,Sometimes,,,,Often,,Often,Often,Most of the time,,,Most of the time,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Official documentation,Textbook",Somewhat useful,Somewhat useful,,,,,,,,Very useful,,,,,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Work,20,0,60,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,100GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,,Often,,Most of the time,,,,,Most of the time,Often,,,,,40,25,15,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Bitbucket,Sometimes,"195,000",USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,United States,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,A health science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",25,55,0,0,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,Primary/elementary school,Internet-based,10 to 19 employees,Stayed the same,3-5 years,Some other way,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Never,,,Mathematica,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Electrical Engineering,More than 10 years,"Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Statistician",University courses,100,0,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,41,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,0,20,0,80,0,0,,,I don't know/not sure,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Spain,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,Less than a year,Business Analyst,Kaggle competitions,0,0,20,80,0,0,Time Series,Ensemble Methods,,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,20,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,7,33,25,0,15,,,A doctoral degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,18,Employed part-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,6 to 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,,,A doctoral degree,Military/Security,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,41,Employed part-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Software Developer/Software Engineer,Other",Self-taught,50,30,20,0,0,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Neural Networks - CNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Other,Self-taught,20,10,0,30,10,30,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,,,,,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,Rarely,Rarely,,,,,,,,Most of the time,,Sometimes,,Rarely,,,,,,,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,,,Sometimes,,Sometimes,,,Often,,Sometimes,,,,,,,Sometimes,,,,50,15,10,15,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others",,Sometimes,,Often,Most of the time,Often,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,27,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Engineer,Self-taught,100,0,0,0,0,0,Time Series,,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Other,Basic laptop (Macbook),Other,Never,,Regression/Logistic Regression,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,0,0,60,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,27,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,Python,Regression,Python,Government website,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,28,"Not employed, but looking for work",,,,,,,,Java,I don't plan on learning a new ML/DS method,C/C++/C#,I collect my own data (e.g. web-scraping),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,20,0,30,30,0,20,Speech Recognition,Logistic Regression,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,Not Useful,,,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,"Information technology, networking, or system administration",,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,27,Employed full-time,,,No,Yes,Statistician,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",SAP BusinessObjects Predictive Analytics,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,DataTau News Aggregator,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,Statistician,University courses,20,10,10,60,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Researcher,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and private datacenters,Text data,Rarely,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Python,R,SQL",,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,,University courses,0,10,0,85,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Fine arts or performing arts,Less than a year,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Mexico,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,60,20,20,0,0,Time Series,Logistic Regression,A master's degree,Telecommunications,I prefer not to answer,Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,22,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,34,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,40,15,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,17,Employed full-time,,,No,Yes,Other,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Statistician",University courses,30,15,25,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,18,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Natural Language Processing,Logistic Regression,A professional degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Other,University courses,50,0,0,25,0,25,Supervised Machine Learning (Tabular Data),Bayesian Techniques,"Some college/university study, no bachelor's degree",Financial,"1,000 to 4,999 employees",,,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,50,10,0,30,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,I haven't started working yet,University courses,30,50,0,20,0,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,30,30,30,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Other,Most of the time,100MB,"Neural Networks,Regression/Logistic Regression,Other","C/C++,IBM Watson / Waton Analytics,Java,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,29,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,University courses,38,30,25,5,2,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Tutoring/mentoring",,,Somewhat useful,,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Linear Digressions Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,Kaggle competitions,20,15,30,17,18,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Gradient Boosting",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,South Korea,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,Very useful,Very useful,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,PhD,Yes,Master's degree,Mathematics or statistics,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,23,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Scientist,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Chile,27,Employed full-time,,,No,Yes,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,Researcher,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Engineer,Machine Learning Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Psychology,1 to 2 years,Researcher,Self-taught,60,5,30,0,0,5,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Other,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Self-taught,25,25,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A doctoral degree,Academic,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests","Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,Sometimes,Most of the time,,,Often,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Operations Research Practitioner,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,26,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,,,A master's degree,Technology,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),NoSQL,Python,Spark / MLlib,SQL",,Often,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Often,,Rarely,,Sometimes,Often,Most of the time,,Rarely,,,,,,,Sometimes,,Rarely,,,Rarely,,Sometimes,,,,,,,Sometimes,,,,10,20,60,10,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",NA,60,NA,40,NA,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,A master's degree,Telecommunications,500 to 999 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),Important,,,,,,,"Jupyter notebooks,MATLAB/Octave,Python,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Engineer,Self-taught,70,10,5,10,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Insurance,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,"Computer Scientist,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,20,0,30,50,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Hidden Markov Models HMMs",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Most of the time,10TB,Other,"Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Often,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Recommender Systems",,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",SQL,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,Very useful,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,0,10,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,Not very important,Other,Laptop or Workstation and private datacenters,Text data,,,,"Jupyter notebooks,Perl,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,Most of the time,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,5,15,0,Enough to tune the parameters properly,"Dirty data,Other",,,,,Sometimes,,,,,,,,,,,,,,,,,Often,10-25% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Email,Share Drive/SharePoint,Other",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,"72,000",USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,25,40,30,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Regression/Logistic Regression,SVMs,Other","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Rarely,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Often,,Often,Most of the time,,Rarely,,,,,,,Sometimes,Most of the time,Most of the time,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,Jupyter notebooks,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Trade book",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,Somewhat useful,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,50,20,10,0,20,0,"Machine Translation,Outlier detection (e.g. Fraud detection)",Bayesian Techniques,Primary/elementary school,Telecommunications,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1TB,Bayesian Techniques,"NoSQL,Perl,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Often,,Often,,,,,,,,,Often,,,,,,Often,,,,"Bayesian Techniques,Decision Trees,Text Analytics",,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,,30,40,0,10,20,0,Enough to run the code / standard library,"Explaining data science to others,Organization is small and cannot afford a data science team",,,,,,Often,,,,,,,,,,Often,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,Often,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,,,1-2 years,,,,,,,,,,,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",35,50,15,0,0,0,,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Ireland,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Engineer,University courses,1,0,0,99,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","C/C++,IBM Watson / Waton Analytics,Microsoft SQL Server Data Mining,NoSQL,Python",,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,Rarely,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Simulation",,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,40,30,0,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Need to coordinate with IT,Unavailability of/difficult access to data",,Sometimes,,,Often,Sometimes,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Matlab,GitHub,"Arxiv,Blogs,Conferences",Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,"Data Stories Podcast,Talking Machines Podcast,Other (Separate different answers with semicolon)",< 1 year,,,,,Necessary,,,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Master's degree,No,Master's degree,Computer Science,,Researcher,University courses,NA,NA,NA,NA,NA,NA,Computer Vision,"Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,30,50,10,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Programmer,Statistician",Kaggle competitions,25,50,0,0,25,0,"Survival Analysis,Time Series","Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Insurance,"1,000 to 4,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100GB,Regression/Logistic Regression,"R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,Often,,,Sometimes,,,,20,20,10,25,25,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,15,Employed part-time,,,No,Yes,Programmer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,Less than a year,I haven't started working yet,Self-taught,40,40,0,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,"Data Analyst,Data Miner,Engineer,Programmer,Researcher",University courses,35,15,25,25,0,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Biology,3 to 5 years,"Data Scientist,Researcher",University courses,10,0,10,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Predictive Modeler,"Online courses (coursera, udemy, edx, etc.)",50,20,20,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,Often,,,,Often,Most of the time,,Sometimes,,,,,Sometimes,,Sometimes,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,10,40,20,10,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,35000,CAD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed part-time,,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Survival Analysis,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,39,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"DataCamp,Other",GPU accelerated Workstation,40+,Online Courses and Certifications,Yes,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Not important,Very Important,Not important,Somewhat important,Not important,Not important +Male,United States,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Proprietary Algorithms,Python,Google Search,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Time Series,"Ensemble Methods,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,44,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Workstation + Cloud service,2 - 10 hours,Kaggle Competitions,No,Master's degree,Mathematics or statistics,6 to 10 years,Programmer,Self-taught,30,30,0,0,40,0,,,A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Female,United States,49,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Researcher,Other",Self-taught,60,10,5,5,0,20,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Adversarial Learning,Computer Vision",Bayesian Techniques,A bachelor's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,45,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",5,75,10,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,26,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,28,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Java,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Official documentation,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,1 to 2 years,Researcher,Self-taught,90,5,0,5,0,0,Unsupervised Learning,Neural Networks - CNNs,A master's degree,Academic,10 to 19 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data",Most of the time,1GB,Neural Networks,"C/C++,Perl,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Often,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,30,30,10,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,Do not know,IT Department,None,Image adjustment,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,"13,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Belgium,26,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,,Self-taught,30,30,40,0,0,0,,,A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,,"Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Other,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Other,Basic laptop (Macbook),Relational data,Never,10GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Talking Machines Podcast",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Other",Other,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,A social science,1 to 2 years,Data Analyst,Work,10,20,70,0,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,23,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,10,25,10,35,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning,Other (please specify; separate by semi-colon)","Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Manufacturing,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Rarely,1TB,"Ensemble Methods,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Other",Self-taught,20,20,60,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,Often,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Sometimes,,,Most of the time,,Often,Sometimes,,,Often,,Sometimes,Most of the time,Often,,Sometimes,,,Sometimes,,Often,,,,,Rarely,,,,,,25,20,15,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",,,,,Often,Sometimes,,,,,,Rarely,,Sometimes,Rarely,,,Sometimes,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,52000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,29,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,Google Search,"Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,,,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer",Work,20,20,50,0,10,0,Natural Language Processing,"Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,1-2 years,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Python,TensorFlow",,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics",,,,,,Often,Sometimes,,,,,,,Often,,,,,Most of the time,Most of the time,,,,,Most of the time,,,Sometimes,Most of the time,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,Raw Text Data,"Difference in templates by different sources, missing labels, etc.",Other,"Commercial Data Platform,Other",shared directories in Unix,Bitbucket,Never,150000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Programmer",Self-taught,40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Cloudera,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,TensorFlow",,Sometimes,,,Sometimes,,,,,,,,,,,,Often,,,,Often,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Neural Networks,RNNs,Time Series Analysis",Sometimes,,Often,Often,,Most of the time,Most of the time,,,,,,,Often,,,,Often,,Most of the time,,,,,Often,,,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Machine Learning Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Iran,27,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",3-5 years,,,,,Necessary,Necessary,Nice to have,,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),,Business Analyst,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +A different identity,United States,22,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Matlab,University/Non-profit research group websites,"Arxiv,Blogs,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,,,,,,,Very useful,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Machine Learning Engineer,Statistician",University courses,30,10,10,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,,Sometimes,,Often,Often,,Often,,,Sometimes,,,,Sometimes,,,,Often,Often,,,,Often,,Sometimes,,,Often,,,,10,30,10,10,50,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,Sometimes,,,,Sometimes,,,,,,,Often,,Most of the time,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Brazil,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,Very useful,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Researcher",Self-taught,15,40,20,10,15,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data,Other",Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,Often,,,,Most of the time,,,,,,Often,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Most of the time,Often,,Often,Most of the time,Most of the time,Most of the time,Sometimes,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Often,Often,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,,,65,15,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Rarely,,,Often,Sometimes,,,Often,,,,,Sometimes,,Sometimes,,,,Sometimes,,,10-25% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Rarely,100000,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Statistica (Quest/Dell-formerly Statsoft),,,,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Researcher,Statistician",University courses,30,10,0,30,30,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",,Academic,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,10TB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Perl,Python,R,SAS Base,SQL,Stan",,,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,Sometimes,Often,,Most of the time,,,,,Sometimes,,,,Sometimes,Rarely,,,,,,,,,"A/B Testing,Bayesian Techniques,Logistic Regression,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",Rarely,,Sometimes,,,,,,,,,,,,,Often,,,,Often,,,Often,,,,Often,,Often,Rarely,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,,,,,,,,,4,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,37,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Physics,1 to 2 years,"Data Analyst,Other",University courses,0,10,0,80,10,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Brazil,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Reinforcement learning,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Nigeria,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,Computer Vision,Logistic Regression,"Some college/university study, no bachelor's degree",Academic,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Rarely,10MB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Logistic Regression,Neural Networks",,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,20,0,0,80,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Biology,3 to 5 years,"DBA/Database Engineer,Predictive Modeler,Software Developer/Software Engineer",Self-taught,80,10,0,0,0,10,"Computer Vision,Recommendation Engines",,"Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Sometimes,100GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Rarely,,,Rarely,,,,,,Rarely,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Operations Research Practitioner,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,100MB,"CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Data Visualization,Logistic Regression,Neural Networks,Time Series Analysis",,,,Sometimes,,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,,,75,10,5,5,5,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Work,85,0,10,5,0,0,"Computer Vision,Speech Recognition,Survival Analysis","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,39,0,0,1,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,A social science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Not important,Somewhat important +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher",University courses,10,25,15,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Rarely,100MB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Julia,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,,,,,30,45,1,19,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,18,Employed part-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Computer Scientist,Engineer,Programmer,Researcher",Self-taught,50,5,20,15,0,10,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",A doctoral degree,Academic,Fewer than 10 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation","Image data,Relational data",Rarely,<1MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,C/C++,Java,MATLAB/Octave,Python,R,Unix shell / awk",,Rarely,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,42,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Military/Security,100 to 499 employees,Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,SVMs,"Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Orange,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,Often,,Most of the time,,Often,,Rarely,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,PCA and Dimensionality Reduction,SVMs",Sometimes,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,Most of the time,,26-50% of projects,Approximately half internal and half external,Standalone Team,Blacklist snd malware Data feeds ,Too many categorical features that cannot be used for analytical insights.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,100000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Matlab,"Google Search,University/Non-profit research group websites","College/University,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,"Computer Vision,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important +Male,United States,48,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,Somewhat useful,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Operations Research Practitioner,Researcher",Self-taught,60,10,30,0,0,0,"Computer Vision,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data",Never,10GB,"CNNs,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,Sometimes,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,,,,Sometimes,Often,,Most of the time,,,,,,,,,,,80,10,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,Often,,,Often,,,,,,,,,,Sometimes,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Engineer,University courses,25,10,20,40,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Engineer,Work,25,10,40,25,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Image data,Sometimes,10GB,"CNNs,Neural Networks","Amazon Machine Learning,C/C++,Julia,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",Most of the time,,,Most of the time,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Rarely,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,40,20,30,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,,,"Hadoop/Hive/Pig,Java,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,20,50,20,10,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,,Bitbucket,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed part-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TIBCO Spotfire,Social Network Analysis,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,10,0,60,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Retail,500 to 999 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,"CNNs,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,Often,,Sometimes,,,,Sometimes,,,,,,Sometimes,Most of the time,,,Sometimes,,Sometimes,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,Often,,,,"A/B Testing,CNNs,Cross-Validation,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs",Often,,,Often,,Often,,,,,,Sometimes,,,,,,,,Often,Sometimes,,Rarely,,Often,,,Sometimes,,,,,,40,30,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,Often,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Most of the time,50000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Japan,53,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,,Somewhat useful,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,20+,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,United States,28,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Time Series,Logistic Regression,A master's degree,Government,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,,,Regression/Logistic Regression,"Jupyter notebooks,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Rarely,,,,,,,,Most of the time,,,Often,,,Often,,Rarely,,,,,,,,,Most of the time,,,,,,Most of the time,Sometimes,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,10,0,50,10,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,Often,,,,,,Sometimes,,,,Most of the time,Most of the time,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,95000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Machine Learning Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Very useful,,,Somewhat useful,Not Useful,,,Somewhat useful,Very useful,,Very useful,,,Very useful,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,50,10,40,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Retail,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Video data,Text data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Mathematica,MATLAB/Octave,Python,SQL,Statistica (Quest/Dell-formerly Statsoft),TensorFlow",,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,Often,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,SVMs,Text Analytics",,,Often,Most of the time,,,Most of the time,Sometimes,,,,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,Often,,,,,20,30,40,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,Often,,,Often,Most of the time,,,Often,Most of the time,,,Most of the time,,Most of the time,,Most of the time,Often,,,Most of the time,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,20000,CNY,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Female,Taiwan,20,"Not employed, but looking for work",,,,,,,,Java,Text Mining,Java,"Google Search,University/Non-profit research group websites",College/University,,,Very useful,,,,,,,,,,,,,,,,"Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Bachelor's degree,Management information systems,Less than a year,"Data Analyst,Data Miner,Software Developer/Software Engineer",University courses,15,0,5,80,0,0,Computer Vision,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,35,0,0,25,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",No education,Insurance,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,100GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,,,Most of the time,Most of the time,Sometimes,Often,,,Most of the time,,,Most of the time,Most of the time,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,65,10,5,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Never,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Engineer,Programmer",Self-taught,60,20,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Python,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Recommender Systems",Often,,,,,Most of the time,Most of the time,Sometimes,Often,,,,,,,,,,,Sometimes,,,Sometimes,Rarely,,,,,,,,,,50,10,30,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Other,15,60,0,0,15,10,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Female,Ireland,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",Work,20,10,50,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Internet-based,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,33,Employed full-time,,,Yes,,Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Software Developer/Software Engineer,Work,0,0,100,0,0,0,Computer Vision,Neural Networks - CNNs,No education,Manufacturing,"10,000 or more employees",Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Never,10GB,CNNs,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,GANs,Neural Networks,Segmentation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,60,20,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,Monte Carlo Methods,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,20,20,20,20,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,16,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,,Very useful,,,Somewhat useful,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Male,Chile,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Predictive Modeler,Researcher",Self-taught,20,20,10,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Retail,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Always,10GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,Impala,Python,R,SQL",,,,,Sometimes,,,,Rarely,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Markov Logic Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Trade book",,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Researcher,Software Developer/Software Engineer",University courses,15,15,30,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Other,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,SVMs","C/C++,Hadoop/Hive/Pig,MATLAB/Octave,Minitab,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,Often,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Often,Sometimes,,,,Sometimes,,,,,,"CNNs,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,,Sometimes,,,,Often,Often,,,,,Often,,Often,,Often,,Sometimes,Often,,Often,,Sometimes,,Often,Often,,Often,,,,20,35,15,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,Often,,,,,,,,,,,,,Sometimes,,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,,"Data labeling, model evaluation","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,25,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",35,10,0,15,40,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,"GitHub,Google Search","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,0,40,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",5,50,25,0,20,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other",,Very useful,Not Useful,,,,Somewhat useful,,,,Very useful,Not Useful,,Somewhat useful,Somewhat useful,,,,,< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,31,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,,,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,,,,,Basic laptop (Macbook),,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,United States,65,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Bayesian Techniques,High school,Academic,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,21,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,Very useful,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,,Data Scientist,University courses,NA,NA,NA,NA,NA,NA,Unsupervised Learning,Neural Networks - CNNs,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Canada,64,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,"Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,30,10,30,30,0,0,"Natural Language Processing,Speech Recognition,Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A professional degree,Academic,10 to 19 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Video data,Text data",Most of the time,1PB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Markov Logic Networks,RNNs","Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,Sometimes,,,,Sometimes,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,HMMs,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,Often,,Sometimes,,,,,,,Often,,,Often,Often,Often,Often,Sometimes,Often,Sometimes,,,,,,,,Sometimes,,,,35,30,15,10,10,0,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,Often,,,,,,,Sometimes,,Often,,51-75% of projects,More internal than external,Standalone Team," Collaborative Research in Computational Neuroscience; Allen Brain Atlas; BIRN fMRI and MRI data; Database for Reaching Experiments And Models (DREAM);The fMRI Data Center;nternational Neuroimaging Data-sharing Initiative (INDI)",Disk size,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Share Drive/SharePoint",,Git,Always,"90,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Data Analyst,Other",University courses,10,0,40,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",A doctoral degree,Retail,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Canada,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,34,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,25,"Not employed, but looking for work",,,,,,,,SQL,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Very useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,,Less than a year,I haven't started working yet,Self-taught,30,10,0,30,30,0,"Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,10,0,10,80,0,0,Time Series,Bayesian Techniques,Primary/elementary school,Telecommunications,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,24,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Amazon Machine Learning,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites,Other","Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Computer Vision,Decision Trees - Gradient Boosted Machines,A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",,1GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Most of the time,,,,,,Sometimes,Rarely,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,Sometimes,,,Sometimes,Sometimes,,,,,,,,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,Sometimes,Sometimes,Often,Sometimes,,,,65,0,15,20,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Rarely,,,,Most of the time,,,Sometimes,Often,,,Sometimes,,Most of the time,,,Sometimes,Most of the time,,,Most of the time,,51-75% of projects,More internal than external,Business Department,Google; Twitter; Facebook; Gov,Cleaning and building a sustainable model,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,52000,MYR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Brazil,52,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,,5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"GPU accelerated Workstation,Workstation + Cloud service",,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",0,30,45,0,25,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,Somewhat useful,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,33,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,Google Search,"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Biology,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Adversarial Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important +"Non-binary, genderqueer, or gender non-conforming",Australia,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Textbook",Very useful,,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,Not Useful,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,40,5,25,25,5,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Internet-based,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,10TB,"CNNs,GANs,Neural Networks,RNNs","Amazon Web services,Python",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,GANs,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Text Analytics",,,,Most of the time,,Most of the time,Most of the time,,,Sometimes,Rarely,Sometimes,,,,Often,,,Often,Most of the time,,Sometimes,Rarely,,Often,Sometimes,,,Often,,,,,15,35,35,10,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Standalone Team,,,,,,,,,,,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,16,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Biology,3 to 5 years,"Researcher,Statistician",University courses,50,20,0,30,0,0,Survival Analysis,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,Other,Other,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,C/C++/C#,,"College/University,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important +Male,Russia,23,"Not employed, but looking for work",,,,,,,,SQL,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,,1 to 2 years,I haven't started working yet,University courses,20,40,0,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,United States,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Predictive Modeler,Fine,Employed by college or university,Microsoft Azure Machine Learning,Time Series Analysis,SQL,Google Search,College/University,,,Very useful,,,,,,,,,,,,,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Master's degree,Management information systems,1 to 2 years,Other,University courses,25,25,0,50,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,39,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,No,Yes,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Python,Text Mining,Python,"Google Search,Government website","College/University,Online courses,Textbook,YouTube Videos",,,Very useful,,,,,,,,Very useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,,1 to 2 years,Researcher,Self-taught,60,40,0,0,0,0,"Natural Language Processing,Other (please specify; separate by semi-colon)",Logistic Regression,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,Self-taught,60,0,0,40,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Operations Research Practitioner,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,25,0,35,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Very Important,Not important,Not important,Very Important +,,NA,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook",,,,,,,,,,,,Somewhat useful,,,Very useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Analyst",Self-taught,50,20,10,5,5,10,Survival Analysis,"Decision Trees - Random Forests,Markov Logic Networks",Primary/elementary school,Academic,10 to 19 employees,Decreased slightly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Most of the time,100GB,CNNs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,PCA and Dimensionality Reduction,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,50,10,5,5,10,20,"Enough to code it again from scratch, albeit it may run slowly",Privacy issues,,,,,,,,,,,,,,,,,Rarely,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Subversion,Sometimes,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Malaysia,39,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",0,30,70,0,0,0,Time Series,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Japan,37,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist",Self-taught,50,20,15,5,0,10,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,SVMs","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Impala,Jupyter notebooks,Python,R,SAS Base,SQL",,,,,,,,,Sometimes,,Sometimes,Often,,Sometimes,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,Sometimes,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,Sometimes,Sometimes,,Sometimes,,Often,,,,,,,Sometimes,,Sometimes,,Often,Sometimes,Sometimes,Sometimes,,Often,,,,,,Often,,,,,50,20,0,0,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,Most of the time,,,,Often,,,,Most of the time,,,Most of the time,Most of the time,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Tutoring/mentoring",,,Very useful,,,,,,,,,,,,,,Very useful,,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,,,,Necessary,,,Necessary,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,0,20,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,24,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Biology,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,NA,30,0,0,,Evolutionary Approaches,A master's degree,Academic,I don't know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Ensemble Methods",High school,Academic,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100GB,"Ensemble Methods,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,SQL,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Academic,500 to 999 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Orange,Python",,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,Sometimes,,Often,Sometimes,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Often,,,,70,10,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations of tools",,,,,Most of the time,Often,,,,,,,Often,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,decoding binary data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"22,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,0,20,40,30,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,"5,000 to 9,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,,Traditional Workstation,Relational data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression",,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Master's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Korea,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,30,10,0,10,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Neural Networks - GANs,I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,SAP BusinessObjects Predictive Analytics,,Python,Other,"Arxiv,Kaggle,Personal Projects,Podcasts",,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",< 1 year,Unnecessary,,Nice to have,,,Necessary,Nice to have,Necessary,,,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,Yes,Bachelor's degree,"Information technology, networking, or system administration",,DBA/Database Engineer,Kaggle competitions,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,Other,12,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,21,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,,No Free Hunch Blog,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,25,5,0,20,50,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Malaysia,22,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Factor Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),Primary/elementary school,Telecommunications,100 to 499 employees,,,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Don't know,,Other,"Cloudera,Hadoop/Hive/Pig,NoSQL,Python,SQL",,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Cross-Validation,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,20,20,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,,36000,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Researcher,Software Developer/Software Engineer",Self-taught,0,80,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Never,100MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,30,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,30,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",A doctoral degree,Other,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Often,,Often,,Rarely,,,,,,Rarely,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Often,Sometimes,,,,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,,,,,Most of the time,,,,50,9,1,20,20,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,6 to 10 years,"Engineer,Software Developer/Software Engineer,Other",University courses,40,30,0,30,0,0,,,A master's degree,Military/Security,100 to 499 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,39,"Not employed, but looking for work",,,,,,,,SQL,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,,1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Other,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,"Natural Language Processing,Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,Brazil,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,1 to 2 years,Other,Self-taught,70,5,0,20,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Stayed the same,Don't know,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,,<1MB,"Bayesian Techniques,Ensemble Methods,Neural Networks,SVMs","C/C++,Python",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,Most of the time,,,Sometimes,Most of the time,,,,,,,Often,,Sometimes,,,,Sometimes,Sometimes,,,,,Often,,,,Sometimes,,,,40,50,0,0,0,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Text Mining,Python,,"College/University,Conferences,Friends network,Kaggle,Tutoring/mentoring",,,Very useful,,Very useful,Somewhat useful,Very useful,,,,,,,,,,Very useful,,"Data Elixir Newsletter,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,University courses,50,10,20,10,10,0,Recommendation Engines,Bayesian Techniques,,Academic,100 to 499 employees,Decreased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Evolutionary Approaches,Random Forests","C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Orange,SQL",,,,Often,,,,,Sometimes,,,,,,Often,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Collaborative Filtering,Naive Bayes",,,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,5,20,50,15,10,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,,,0,,Has increased 20% or more,,,,,,,,,,,,,,,,,,, +Male,Australia,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",,,,"Amazon Web services,Java,Python,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed part-time,,,No,Yes,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,Engineering (non-computer focused),1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,,,Coursera,"Basic laptop (Macbook),Traditional Workstation,Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",35,30,0,0,35,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important +Male,Other,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Support Vector Machines (SVM),Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,Somewhat useful,,Somewhat useful,,,,,Very useful,Very useful,,Very useful,,,Very useful,,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,,,,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important +Male,United States,31,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Other,I don't write code to analyze data,,"Online courses (coursera, udemy, edx, etc.)",NA,100,0,0,0,0,,,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Portugal,54,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Oracle Data Mining/ Oracle R Enterprise,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Personal Projects",,,,,Somewhat useful,,,,,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Other,University courses,70,0,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Neural Networks - RNNs",Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,RNNs,SVMs","IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,Often,,,,,Often,,Often,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,Most of the time,Sometimes,Sometimes,,,,,Sometimes,Sometimes,,Sometimes,,Most of the time,,,,,,Sometimes,,Often,Sometimes,Sometimes,,,,20,20,15,40,5,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Most of the time,,Often,,Sometimes,,,,,,,,,,,,Often,,,76-99% of projects,Do not know,Standalone Team,academic,find useful data sets,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,3,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"GitHub,Google Search,Other","Blogs,Kaggle,Online courses,Personal Projects,Other",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Necessary,Necessary,,Necessary,,,,,Necessary,,,,Coursera,Other,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,I never declared a major,3 to 5 years,"Data Analyst,Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",50,30,0,0,0,20,"Adversarial Learning,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,,,Very Important,,,,,,,,, +Male,Other,19,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,Nice to have,Unnecessary,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",45,45,0,10,0,0,Reinforcement learning,Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Somewhat important,Not important,Somewhat important,,,,,,,,,,,, +Male,United States,NA,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,Very useful,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,30,20,20,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,United States,26,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,People 's Republic of China,29,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Scala,Other,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,,,,,,,,,,,,,, +Male,Canada,23,"Not employed, but looking for work",,,,,,,,Amazon Web services,Social Network Analysis,SQL,Google Search,"College/University,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Australia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,"Data Analyst,Other",University courses,0,0,10,70,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,Yes,,Programmer,Fine,"Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Machine Learning Engineer,Programmer,Other",Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Analyst,Other",Work,18,2,30,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Researcher,Self-taught,50,20,10,0,0,20,Survival Analysis,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Academic,500 to 999 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees","C/C++,Python,R,SAS Base,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,Rarely,Sometimes,,Often,Often,,,,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,,10,30,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,Often,Sometimes,,,,,Often,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,US census,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,30000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Japan,38,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by non-profit or NGO,Spark / MLlib,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician",University courses,10,10,30,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Other,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Rarely,,,,,Most of the time,Most of the time,,Sometimes,,,Often,,,,Often,,,,,Sometimes,Often,Most of the time,,,Sometimes,,,,Sometimes,,,,40,15,30,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools",Sometimes,,,,Sometimes,,,,,,,Most of the time,Most of the time,,,,,,,,,,26-50% of projects,Entirely internal,Business Department,Census; Singh adi;,Limited computational resources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,98000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed part-time,,,Yes,,Statistician,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",25,50,10,0,5,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,High school,Academic,100 to 499 employees,Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,48,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,20,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,A social science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO,Employed by government,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,Less than a year,Software Developer/Software Engineer,Self-taught,90,5,5,0,0,0,Computer Vision,Decision Trees - Random Forests,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,64,Employed part-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Very important,Other,Workstation + Cloud service,"Image data,Text data,Relational data",Never,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Python,SQL,TensorFlow",,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,SVMs",,,,,,Often,Often,Most of the time,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,,,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization",,Most of the time,,,,Often,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,reading one-off file formats and cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Never,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,26,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Online courses,Podcasts,YouTube Videos",,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,More than 10 years,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,,,A bachelor's degree,Technology,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Never,10MB,,"Amazon Machine Learning,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL",Rarely,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,Rarely,Rarely,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Random Forests",,,,,,,Often,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,90,0,5,5,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,Most of the time,Rarely,,100% of projects,Entirely internal,IT Department,,management support,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,112000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,"Basic laptop (Macbook),Workstation + Cloud service",0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,,"Blogs,Online courses,Podcasts,Textbook",,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Researcher,University courses,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Always,1GB,"HMMs,Random Forests,Other","Python,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,"Association Rules,HMMs,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Rarely,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,Most of the time,,,,Most of the time,,,,40,10,0,0,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Privacy issues",Most of the time,,,,,,,,,,,,Most of the time,,,,Often,,,,,,None,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,105000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,Other,Self-taught,50,50,0,0,0,0,Time Series,Logistic Regression,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,,University courses,15,5,10,70,0,0,,,A master's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Important,Other,Basic laptop (Macbook),Text data,,,,"Amazon Web services,Java,Python",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Naive Bayes,Neural Networks",,,,,,,Most of the time,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,60,10,0,20,10,0,Enough to tune the parameters properly,Did not instrument data useful for scientific analysis and decision-making,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,71,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,Udacity,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,I did not complete any formal education past high school,,Less than a year,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,South Korea,41,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Scala,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Data Scientist",Work,30,10,50,10,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Internet-based,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data",Always,1GB,"CNNs,RNNs","Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,Most of the time,,,,,,"CNNs,Data Visualization,Neural Networks,Segmentation,Simulation",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,Sometimes,,,,,,,30,30,20,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",Often,,,,,,,,,,,,,Often,,,,Often,,,,,26-50% of projects,More external than internal,IT Department,recommendation,personalization service,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,,,Has increased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,,NA,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,< 1 year,,,,,,,,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,25,25,25,25,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,Fewer than 10 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,O'Reilly Data Newsletter,< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Other,31,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,36,Employed full-time,,,No,Yes,Other,Poorly,Employed by government,Jupyter notebooks,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,,,,,,Somewhat useful,,Very useful,,Not Useful,,,,Not Useful,"FlowingData Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,I don't write code to analyze data,"Data Analyst,Data Miner,Predictive Modeler,Researcher",Work,40,0,60,0,0,0,,,High school,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,Egypt,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Text Mining,Python,Google Search,"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Not Useful,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,Very useful,,< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Other,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Miner,Programmer,Software Developer/Software Engineer",University courses,70,10,0,5,15,0,"Computer Vision,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,People 's Republic of China,19,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Very useful,,,,Very useful,,,,Very useful,,,,Data Machina Newsletter,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Data Analyst,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,NA,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Very Important,Very Important +Female,India,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,,,Somewhat useful,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,France,22,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,Kaggle competitions,70,10,NA,0,20,NA,,"Decision Trees - Random Forests,Logistic Regression",High school,Other,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Random Forests,Other",,,,,,Often,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,,Often,,,30,10,0,30,30,0,Enough to tune the parameters properly,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,40,50,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +"Non-binary, genderqueer, or gender non-conforming",United States,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",5,5,5,85,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,10MB,Bayesian Techniques,"Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Often,,,Often,Most of the time,,,,,,,Sometimes,,Often,,,,,Sometimes,,Sometimes,,,,Often,,,Most of the time,,,,40,50,0,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,24,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Support Vector Machines (SVM),Java,GitHub,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,,Internet-based,"10,000 or more employees",Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,,Bayesian Techniques,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,20,60,0,0,20,0,Enough to run the code / standard library,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Other,Rarely,200000,CNY,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,Google Search,"Blogs,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Adversarial Learning,Other (please specify; separate by semi-colon),,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Female,India,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Amazon Web services,Neural Nets,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle",,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Data Scientist,Self-taught,60,0,40,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Academic,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,1GB,Decision Trees,Hadoop/Hive/Pig,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,0,100,0,0,0,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,None,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,,,,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,30,0,10,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Military/Security,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Jupyter notebooks,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,Most of the time,Most of the time,,,,,,,Most of the time,,,Often,Often,Often,Sometimes,Most of the time,,,,,,Often,Sometimes,,,,65,15,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,10,5,5,0,"Computer Vision,Reinforcement learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",Image data,Rarely,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,HMMs,Markov Logic Networks,SVMs","C/C++,Jupyter notebooks,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Markov Logic Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,Often,Often,,Most of the time,Most of the time,,Often,,,,Sometimes,Often,,,Sometimes,,,,Often,,,,Sometimes,Most of the time,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,"GitHub,Google Search","Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,44,Employed part-time,,,No,Yes,Computer Scientist,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A,Textbook,Trade book",,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,11 - 39 hours,Master's degree,No,Bachelor's degree,A humanities discipline,Less than a year,Other,University courses,80,0,0,20,0,0,,,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Female,United States,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A doctoral degree,Internet-based,500 to 999 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,,,,"Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,NoSQL,Python,R,SQL",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,50,30,10,10,0,0,Enough to tune the parameters properly,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Never,,,,5,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Non-Kaggle online communities,Official documentation,YouTube Videos",,,Very useful,,,,,,Very useful,Very useful,,,,,,,,Very useful,,< 1 year,,,,,,,,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",11 - 39 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,24,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Engineer,Researcher",Self-taught,50,15,15,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,CRM/Marketing,20 to 99 employees,Decreased slightly,1-2 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Most of the time,10MB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",Most of the time,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,Often,,Sometimes,Often,,,,,,Often,,,,20,10,30,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,Often,,,,,Sometimes,Rarely,,,,,,,,,Most of the time,,Often,,Less than 10% of projects,Entirely internal,IT Department,,Hard to access and consolidate,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,SQL,GitHub,"Kaggle,Non-Kaggle online communities,Podcasts,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Psychology,1 to 2 years,"Business Analyst,Operations Research Practitioner",Self-taught,40,10,35,15,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer",University courses,20,20,30,10,10,10,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Decreased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Mathematica,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Often,,Sometimes,,Often,,,,,Often,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,,,,Often,Most of the time,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",Most of the time,Most of the time,Most of the time,Often,Most of the time,Most of the time,Most of the time,Most of the time,Often,,Often,Most of the time,,Often,Most of the time,Most of the time,,Most of the time,Often,,Most of the time,,Most of the time,,,,Most of the time,Sometimes,Often,,,,,20,30,20,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Indonesia,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,24,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Other,Self-taught,50,5,5,5,35,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Telecommunications,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,100GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM Cognos,Java,Spark / MLlib,SQL",,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,Sometimes,,Often,Often,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,54,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX",Other,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Not important,Not important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important +Male,Brazil,23,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,,University courses,0,0,20,80,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Academic,I don't know,Increased slightly,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,44,"Not employed, but looking for work",,,,,,,,Tableau,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",45,50,0,0,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Canada,54,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,YouTube Videos,Other",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,,,,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Master's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,Self-taught,50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Time Series","Bayesian Techniques,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Brazil,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,R,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Computer Scientist,Data Miner,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow",Rarely,Most of the time,,,,,,Rarely,,,,,,,,,Most of the time,,,,,Often,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Rarely,Sometimes,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",Rarely,Often,,,,,Most of the time,Often,,,,,,,,Often,,,,Rarely,,,Rarely,,,,,,Sometimes,Most of the time,,,,40,20,10,30,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others",,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,Zillow,API limits,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Email",,Bitbucket,Always,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,80,0,10,0,5,"Computer Vision,Natural Language Processing,Reinforcement learning","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Rarely,10MB,"Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,R,RapidMiner (commercial version)",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees",,,Most of the time,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,60,20,5,2,10,3,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Most of the time,Most of the time,,,,,,Often,Sometimes,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Sometimes,"60,000",TRY,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,1 to 2 years,Researcher,Work,90,0,8,0,2,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data,Relational data",Never,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Often,Sometimes,Sometimes,,Often,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,20,30,10,25,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,Sometimes,,Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Often,,,76-99% of projects,Approximately half internal and half external,Business Department,"gene and protein sequence information (ncbi, phytozone, uniprot, pfam db); RNA-Seq data",small data sets with many features ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,720000,RSD,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Random Forests,Python,Google Search,"Blogs,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,Very useful,,Very useful,,Very useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,,University courses,90,0,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher",Self-taught,100,0,0,0,0,0,,,A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",3-5 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,20,0,40,20,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Italy,27,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,0,10,0,90,0,0,"Computer Vision,Reinforcement learning,Speech Recognition","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Java,R",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests",,,,,,Often,,Often,Often,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,43,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,1-2 years,Nice to have,,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,Professional degree,,1 to 2 years,"Data Miner,Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Brazil,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,25,Employed part-time,,,Yes,,Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Canada,41,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,,"Blogs,Kaggle,Online courses,Personal Projects",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,More than 10 years,Predictive Modeler,Self-taught,40,25,25,0,10,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",High school,Financial,"5,000 to 9,999 employees",Increased slightly,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,Regression/Logistic Regression,"Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,Most of the time,Sometimes,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,Simulation",,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,30,30,30,5,5,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Other,Sometimes,120000,CAD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,South Korea,27,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",16-20,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,Very useful,Very useful,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,90,0,5,5,0,"Computer Vision,Natural Language Processing",,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Republic of China,26,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Engineer",Self-taught,50,20,20,0,10,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,69,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Engineer,Self-taught,50,30,5,0,0,15,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","Blogs,College/University,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Psychology,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,Supervised Machine Learning (Tabular Data),,A master's degree,Mix of fields,,,,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,,"IBM SPSS Statistics,Python,R,SQL,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Often,,,Often,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,Often,,,,Sometimes,,,,,,,Most of the time,,,,,,,100% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Never,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,45,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"College/University,Conferences,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,Somewhat useful,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,1 to 2 years,Data Miner,Self-taught,90,0,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Non-profit,I don't know,Increased significantly,Less than one year,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,Logistic Regression,Naive Bayes,Natural Language Processing,SVMs,Text Analytics",,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,Sometimes,Most of the time,,,,,0,100,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,,Often,Often,Often,,,Often,,,,,,Often,,100% of projects,Do not know,Other,UCI ,can not collect the raw datasets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Never,1200,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Brazil,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,36,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,Deep learning,C/C++/C#,University/Non-profit research group websites,"Arxiv,Conferences,Kaggle,Personal Projects,Textbook",Very useful,,,,Very useful,,Very useful,,,,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,"Researcher,Other",Self-taught,50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks",A doctoral degree,Academic,"5,000 to 9,999 employees",Stayed the same,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Rarely,10GB,"Decision Trees,Markov Logic Networks,Random Forests","C/C++,Mathematica",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Markov Logic Networks,Random Forests,Simulation",,,,,,,,Often,,,,,,,,,Often,,,,,,Often,,,,Often,,,,,,,30,50,0,20,0,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Rarely,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,PDB,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,0,10,0,90,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, but looking for work",,,,,,,,C/C++,Random Forests,Python,University/Non-profit research group websites,"College/University,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,Very useful,,,Very useful,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer,Machine Learning Engineer,Researcher",University courses,20,5,10,60,0,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,GitHub,"Blogs,College/University,Conferences,Kaggle,Newsletters,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,20,10,10,10,0,"Machine Translation,Natural Language Processing,Recommendation Engines",Neural Networks - CNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Hong Kong,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Engineer,Researcher",Self-taught,30,20,0,10,20,20,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Insurance,"5,000 to 9,999 employees",Increased significantly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",Text data,Don't know,100GB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Cross-Validation,Gradient Boosted Machines,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,Often,,,,,,Often,,,,,,,Often,,,,,,,,,,Often,Often,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others",Often,Often,,,Often,Often,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,455000,HKD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,NA,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Stories Podcast,FlowingData Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Github Portfolio,Yes,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,10,0,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,Bayesian Methods,Python,GitHub,"College/University,Kaggle,Online courses",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Support Vector Machines (SVMs)",,Technology,100 to 499 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,1GB,"Ensemble Methods,Neural Networks,SVMs","NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,"Ensemble Methods,Naive Bayes,SVMs",,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Often,,,,,,50,10,0,40,0,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input",,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,cleanning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,300000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,5,20,0,5,0,"Adversarial Learning,Machine Translation,Outlier detection (e.g. Fraud detection),Reinforcement learning,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,Often,,,,Sometimes,,,,,,,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,Often,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Often,Often,Sometimes,Sometimes,Sometimes,Sometimes,Often,Often,Often,Often,Often,Sometimes,Sometimes,Often,Sometimes,Often,Sometimes,Often,Sometimes,Often,Often,Sometimes,Often,Often,Sometimes,Sometimes,Sometimes,Often,,,,30,25,25,15,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Sometimes,,,,Often,,Often,,,,,,Sometimes,,Often,Often,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,,,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,51,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,"Business Analyst,Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,0,50,0,0,0,Natural Language Processing,"Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,Orange,Python,R,TensorFlow",,Often,,,,,,,,,,,Sometimes,,,,Often,,,,,Sometimes,,,,,,,Rarely,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests",,,,Often,,Often,,Often,Often,,,,,Rarely,,,,,Most of the time,Often,,,Often,,,,,,,,,,,60,25,5,5,5,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,Often,,,,,,Often,Often,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,twitter,data cleansing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,8000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,GitHub,"Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,PhD,Yes,Doctoral degree,Physics,1 to 2 years,Researcher,University courses,40,20,0,40,0,0,Unsupervised Learning,"Bayesian Techniques,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,16-20,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Female,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Other,No,Bachelor's degree,Computer Science,,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Canada,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Business Analyst,Self-taught,40,60,0,0,NA,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Naive Bayes,Random Forests,Other",,,Sometimes,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,Rarely,,,60,30,3,2,5,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Often,,,,,,,,,,Sometimes,,,Often,,,,Often,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Don't know,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,India,57,Employed full-time,,,No,Yes,Other,Perfectly,Employed by college or university,R,Bayesian Methods,Python,University/Non-profit research group websites,"Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,People 's Republic of China,28,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,I haven't started working yet,Self-taught,80,20,0,0,0,0,Recommendation Engines,,Primary/elementary school,Other,10 to 19 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,Google Search,"Blogs,Online courses,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,25,20,10,5,20,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,21,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",10,30,0,60,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,Other,"Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Korea,26,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Computer Scientist,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Data Scientist,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Data Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Text Mining,Python,I collect my own data (e.g. web-scraping),"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),I did not complete any formal education past high school,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Computer Vision,Speech Recognition","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Indonesia,25,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,,"DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",,Master's degree,No,Master's degree,Computer Science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,50,20,5,0,0,"Adversarial Learning,Recommendation Engines,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,NA,"Not employed, but looking for work",,,,,,,,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,Other,11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,India,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,,,Very useful,,,,,"Data Stories Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,0,40,0,0,,,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,38,Employed full-time,,,No,Yes,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by government",MATLAB/Octave,"Ensemble Methods (e.g. boosting, bagging)",Matlab,GitHub,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,I don't write code to analyze data,Statistician,Self-taught,100,0,0,0,0,0,Time Series,Neural Networks - CNNs,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,"Arxiv,Blogs,College/University,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,Very useful,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,,,,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer",University courses,10,0,20,60,10,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer,Statistician",Work,20,0,20,60,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Increased significantly,3-5 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,Often,,,,,Sometimes,Often,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Sometimes,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by college or university,Jupyter notebooks,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A health science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,100 to 499 employees,Stayed the same,Less than one year,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,10MB,"Ensemble Methods,Neural Networks","Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,Python,TensorFlow",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,Often,Sometimes,Rarely,,,,,,,Sometimes,Sometimes,,,,,60,10,5,25,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,Often,Often,,51-75% of projects,Entirely internal,Business Department,"Kaggle UCI ML repository EMRs","Missing Messy data Text mining","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,60000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Random Forests,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Java,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,Naive Bayes,Natural Language Processing",,,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,40,30,20,10,0,0,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,76-99% of projects,More internal than external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Business Analyst,,,R,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Not Useful,Not Useful,Somewhat useful,,,,Somewhat useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Time Series,Logistic Regression,A master's degree,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,<1MB,Bayesian Techniques,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Segmentation,Text Analytics",Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,60,20,0,10,10,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Unavailability of/difficult access to data",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Do not know,Other,,I am a novice. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Other,Never,"53,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,,,,,,,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,A health science,1 to 2 years,Data Analyst,University courses,50,20,10,10,10,0,Time Series,Hidden Markov Models HMMs,,Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Naive Bayes,RNNs,SVMs",,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,Sometimes,,,,,,80,20,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",Often,,,,,Often,,,,Most of the time,Often,,,,,,,,,,,,26-50% of projects,Do not know,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,R,Cluster Analysis,Matlab,I collect my own data (e.g. web-scraping),"College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Computer Vision,Unsupervised Learning","Gradient Boosting,Hidden Markov Models HMMs",A bachelor's degree,Government,I prefer not to answer,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation",Image data,,100GB,,"C/C++,MATLAB/Octave",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"HMMs,kNN and Other Clustering",,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,0,100,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,Most of the time,,,,,76-99% of projects,Do not know,,,,,,,,Sometimes,20000,BRL,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle",Somewhat useful,Very useful,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Master's degree,Yes,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,0,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,Australia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle",,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,Data Analyst,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,40,10,10,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,Often,,,Most of the time,,,,Most of the time,,,Sometimes,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,Bitbucket,Rarely,80000,AUD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Udacity,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Psychology,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important +Female,United States,23,"Not employed, but looking for work",,,,,,,,R,Monte Carlo Methods,SAS,I collect my own data (e.g. web-scraping),"College/University,Friends network,Kaggle,Personal Projects,Tutoring/mentoring",,,Very useful,,,Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,,< 1 year,,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",80,10,0,10,0,0,Time Series,Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,19,"Not employed, but looking for work",,,,,,,,Java,Decision Trees,R,Google Search,"Blogs,College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,Very useful,"Talking Machines Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,50,0,0,40,10,0,Time Series,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),,,No,I prefer not to answer,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Not Useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Other,Traditional Workstation,Text data,Never,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Less than 10% of projects,,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Never,550000,INR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Kaggle,Online courses,Personal Projects",,Very useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,,< 1 year,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,,11 - 39 hours,Online Courses and Certifications,No,Master's degree,Other,Less than a year,"Researcher,Other",Self-taught,50,40,0,0,10,0,"Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,Other,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,50,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",80,10,5,5,0,0,Supervised Machine Learning (Tabular Data),,Primary/elementary school,Academic,"1,000 to 4,999 employees",,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Laptop or Workstation and private datacenters,Other,,,"Regression/Logistic Regression,Other","Google Cloud Compute,R,SAS Base",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,,"Logistic Regression,Other",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,20,30,5,2,43,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,Often,,,,,,Often,,76-99% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by professional services/consulting firm,Microsoft Excel Data Mining,,Python,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Researcher,Software Developer/Software Engineer",Self-taught,30,25,0,25,20,0,Recommendation Engines,Decision Trees - Random Forests,,Hospitality/Entertainment/Sports,20 to 99 employees,Decreased significantly,Less than one year,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Never,100MB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,50,20,30,0,0,0,Enough to run the code / standard library,"Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,26-50% of projects,Approximately half internal and half external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,,,,,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Germany,NA,Employed part-time,,,Yes,,Data Analyst,Fine,,SQL,Deep learning,R,I collect my own data (e.g. web-scraping),"College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos,Other,Other",,,Very useful,,Very useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Engineer,Researcher,Statistician",University courses,40,30,0,30,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Academic,10 to 19 employees,Stayed the same,More than 10 years,,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,Decision Trees,Neural Networks,RNNs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,Often,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,Often,Often,Often,,,,,,,Sometimes,Often,Most of the time,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,Often,,,,,,,Most of the time,,,,10-25% of projects,More internal than external,Standalone Team,,,,"Email,I don't typically share data",,,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,23,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,Researcher,University courses,5,5,5,80,5,0,,,A doctoral degree,Academic,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Other",,10MB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Often,,Often,,,,20,70,0,10,0,0,Enough to run the code / standard library,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Time Series Analysis,Python,Google Search,"Kaggle,Personal Projects,Podcasts,YouTube Videos",,,,,,,Very useful,,,,,Very useful,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,41,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Self-employed,NoSQL,,,,Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,20,15,15,0,0,,,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Other,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,"Partially Derivative Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,R,SQL,Tableau,Other",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Often,,,Often,,,,Sometimes,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,30,10,10,25,25,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",,,Often,,Often,,,,Most of the time,,,,Often,,,Most of the time,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Metabase,Bitbucket,Never,1500000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Australia,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Data Miner,Engineer,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Management information systems,6 to 10 years,"Data Analyst,Other",University courses,40,20,30,10,0,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,Other,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Always,100MB,"Bayesian Techniques,Decision Trees,Random Forests","Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,,,,,,,,Sometimes,,Sometimes,,,,,,,,Often,,,Rarely,Often,,,,30,25,15,20,10,NA,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,Most of the time,,,Sometimes,,,,,,,,Sometimes,,76-99% of projects,Entirely internal,IT Department,,Dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Other",Report Server,Git,Sometimes,75000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"GitHub,Google Search","Arxiv,Blogs,Official documentation,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,Very useful,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed part-time,,,No,Yes,Data Analyst,Fine,Employed by government,SQL,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity,Other",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,75,10,0,5,0,,"Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important +Female,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),40+,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,49,Employed full-time,,,No,Yes,DBA/Database Engineer,,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,Very useful,,Very useful,,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Analytics Dispatch Newsletter",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,edX,Udacity,Other",Workstation + Cloud service,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,I don't write code to analyze data,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,India,35,Employed full-time,,,No,Yes,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos,Other",,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,I don't write code to analyze data,"Software Developer/Software Engineer,Other",Other,0,10,0,0,0,90,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,20,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Machine Learning Engineer",University courses,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,25,20,0,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Stack Overflow Q&A,Trade book,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,,,,,,,Very useful,,Very useful,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,10,0,70,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,United States,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Official documentation,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,University courses,30,0,8,60,2,0,"Computer Vision,Natural Language Processing","Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Rarely,1GB,"CNNs,SVMs","Amazon Web services,C/C++,MATLAB/Octave,Python,TensorFlow",,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Segmentation,SVMs,Text Analytics",,,,,,Often,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,Often,Sometimes,,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,Most of the time,,,,Most of the time,,Often,Sometimes,,,,,,,,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,OpenCV,Limition and dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,Researcher,University courses,20,5,45,20,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Data Analyst,University courses,30,20,10,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs",A doctoral degree,Other,"1,000 to 4,999 employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Relational data",Sometimes,10MB,,"Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,Often,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,16,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Computer Science,Less than a year,I haven't started working yet,Kaggle competitions,50,30,0,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,Data Machina Newsletter,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Other",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,46,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Non-Kaggle online communities,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Very useful,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,Other,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Canada,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Regression,Python,I collect my own data (e.g. web-scraping),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,30,0,0,0,60,Natural Language Processing,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Female,India,28,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,,,Necessary,,Necessary,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Machine Translation,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,,Python,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Company internal community,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,Somewhat useful,,,,,,Very useful,,,,Very useful,Somewhat useful,,,Not Useful,Becoming a Data Scientist Podcast,< 1 year,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,A social science,Less than a year,Other,Self-taught,15,NA,0,85,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important +Male,Australia,25,Employed part-time,,,Yes,,Data Scientist,Poorly,Self-employed,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Not Useful,,Not Useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,30,10,40,15,0,"Natural Language Processing,Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Other,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Don't know,100MB,"Bayesian Techniques,Neural Networks,Random Forests,RNNs","MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Logistic Regression,Neural Networks,Text Analytics,Time Series Analysis",,,Sometimes,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,Sometimes,Sometimes,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team",,,,,Sometimes,Often,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,600000,INR,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,18,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,Very useful,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Adversarial Learning,Computer Vision","Neural Networks - CNNs,Neural Networks - GANs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Not important +Female,People 's Republic of China,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,40,20,20,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist",University courses,25,10,15,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,,,,Often,,Most of the time,,,,,Sometimes,Sometimes,Sometimes,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",Often,Rarely,Sometimes,Sometimes,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,Often,,Sometimes,Sometimes,,Often,Often,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,,,Sometimes,Often,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Textbook",,,,,,,Very useful,,,,,Very useful,,,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,,,,Necessary,Necessary,Necessary,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,30,15,0,50,5,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,36,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,24,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,60,20,10,5,3,2,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,,,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,10MB,"Bayesian Techniques,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,Most of the time,Often,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,,,,Often,Most of the time,Most of the time,,,,30,30,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Computer Scientist,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,Less than a year,Data Scientist,Self-taught,80,10,0,10,0,0,Unsupervised Learning,Neural Networks - CNNs,,Academic,500 to 999 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,"Business Analyst,Other",Self-taught,70,0,20,0,10,0,Other (please specify; separate by semi-colon),Logistic Regression,A bachelor's degree,Other,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,"Employed by professional services/consulting firm,Self-employed",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,60,0,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,,Somewhat important,Very Important +Male,India,22,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,Australia,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,,University courses,60,0,0,40,0,0,,,A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,Other,Image data,Always,,,"R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Researcher,Other",Self-taught,20,20,0,60,0,0,Other (please specify; separate by semi-colon),"Logistic Regression,Neural Networks - CNNs",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,Don't know,Some other way,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Other",Image data,,100GB,CNNs,"Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SAS Base,TensorFlow",,,,,,,,,,,,,,,,,Often,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,,,,,,,,Often,,,,,,"CNNs,Data Visualization,kNN and Other Clustering,Logistic Regression,Segmentation,Time Series Analysis",,,,Most of the time,,,Often,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Unavailability of/difficult access to data",Most of the time,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Often,,10-25% of projects,Do not know,Other,NIH data,Not enough data,Other,I don't typically share data,,Git,Sometimes,28000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,20,0,20,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,28,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,46,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Other,,Employed by college or university,Python,,Python,,"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,,Necessary,,Necessary,Necessary,,Nice to have,Unnecessary,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Physics,I don't write code to analyze data,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Canada,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Programmer",Self-taught,80,20,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Decreased slightly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Text data,Most of the time,,"Decision Trees,Random Forests","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,10,10,10,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,76-99% of projects,Entirely internal,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,71000,CAD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by college or university,Self-employed",Python,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Conferences,Kaggle,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,,,,,,Very useful,,,,,"Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,70,20,0,5,5,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,I prefer not to answer,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Never,100MB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,Python,Spark / MLlib",,,,Sometimes,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,Often,,Often,Often,Sometimes,,,,Often,,,,,,Sometimes,,Often,Often,,Often,Sometimes,,,,Sometimes,,,,,,50,20,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Privacy issues",,,,,Sometimes,,,,,,,,,,,Often,Most of the time,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Jupyter notebooks,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,25,0,25,0,25,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,High school,Government,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Research that advances the state of the art of machine learning,,Text data,Rarely,100MB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,GANs,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,25,25,0,25,25,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Git,Sometimes,48000,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,17,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Computer Vision,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SQL",,Often,,,,,,,,,,,,,,,Most of the time,,,Often,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Most of the time,Often,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,40,15,0,30,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Often,,Most of the time,,,,Often,Often,,,,,,,,,Most of the time,Most of the time,,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,Git,,130000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +A different identity,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Google Search,"Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Other,Yes,,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Female,Australia,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,Employed by non-profit or NGO,Python,Time Series Analysis,R,Government website,"Blogs,Kaggle,Personal Projects,Podcasts",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,Not Useful,,,,,,"Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Bachelor's degree,A social science,Less than a year,,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,Argentina,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,DataRobot,Deep learning,Python,"Google Search,University/Non-profit research group websites",Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Professional degree,,I don't write code to analyze data,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Evolutionary Approaches,A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important +Male,United States,52,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Textbook,YouTube Videos",,,Very useful,,Very useful,,Very useful,,,,,,,,Very useful,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler",Self-taught,30,20,20,0,0,30,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - GANs",High school,Telecommunications,"5,000 to 9,999 employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Cloudera,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,SQL,Tableau",,Sometimes,,,Often,,,,,,Often,Most of the time,Sometimes,Sometimes,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,Most of the time,,,,35,25,15,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,,,Most of the time,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,,Nice to have,,,Necessary,,,,Nice to have,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,,,,Very useful,,Very useful,,,,Very useful,"Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,10,0,80,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,,"Ensemble Methods,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes",Often,,,,,,,Rarely,Rarely,,,,,Rarely,Rarely,Often,,Sometimes,,,,,,,,,,,,,,,,80,10,5,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,Often,,,,,Most of the time,,,,,,Often,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Rarely,85000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,MATLAB/Octave,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Other,No,Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,Less than a year,Programmer,Self-taught,30,60,0,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,22,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,31,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,"Not employed, but looking for work",,,,,,,,Spark / MLlib,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,"Coursera,DataCamp","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Management information systems,Less than a year,Engineer,University courses,50,20,10,10,10,0,Recommendation Engines,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,28,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A humanities discipline,6 to 10 years,Other,University courses,0,15,0,80,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,,,,Necessary,Necessary,Necessary,Nice to have,,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,United States,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,Less than a year,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,Google Search,"Official documentation,YouTube Videos,Other",,,,,,,,,,Very useful,,,,,,,,Very useful,FastML Blog,1-2 years,,,,,Unnecessary,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Adversarial Learning,Ensemble Methods,,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,,,,,,Not important,,,Not important,,,,,,, +Male,Brazil,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Indonesia,27,Employed part-time,,,No,Yes,Data Miner,Fine,Employed by college or university,Weka,Bayesian Methods,Matlab,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,,,,,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,PhD,Yes,Master's degree,Computer Science,1 to 2 years,Data Miner,University courses,30,0,20,50,0,0,Recommendation Engines,Bayesian Techniques,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,36,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,Jupyter notebooks,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,40,10,40,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Indonesia,24,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Employed by government,Self-employed",R,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Statistician",University courses,30,20,0,40,10,0,"Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,,SVMs,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,SVMs,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Central Insights Team,,,,I don't typically share data,,,Never,68000000,IDR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,46,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,KDnuggets Blog,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Software Developer/Software Engineer,Other",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Blogs,College/University,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,,,,,,,,,Somewhat useful,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,10,30,0,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,"GitHub,Google Search,University/Non-profit research group websites","Blogs,YouTube Videos",,,,,,,,,,,,,,,,,,Very useful,,< 1 year,,,,Unnecessary,Necessary,,,,,,,,,,Basic laptop (Macbook),,,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,6 to 10 years,"Data Analyst,Researcher",Self-taught,80,10,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,500 to 999 employees,Stayed the same,Don't know,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"CNNs,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Recommender Systems",,,,Sometimes,,,,Sometimes,Often,Most of the time,,,,,,,,,,Often,,,Most of the time,Most of the time,,,,,,,,,,50,20,1,20,9,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,,Most of the time,,,,Often,,,,Often,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,TCGA;ICGC;COSMIC;CTD;STRING,Dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Git,Rarely,"30,000",USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Engineer,Self-taught,70,15,5,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft R Server (Formerly Revolution Analytics),Minitab,R",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Most of the time,,,,Most of the time,,,,Sometimes,Often,Most of the time,Often,,,,,Often,,Often,,,,40,40,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Unavailability of/difficult access to data",Rarely,,,,Most of the time,,,,Often,,,,,,,,,Most of the time,,,Most of the time,,76-99% of projects,Entirely internal,Business Department,None,all kinds of issues for a typical messy dataset,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,150000,CNY,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Other,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Physics,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,33,33,33,0,1,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Financial,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Always,100GB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,RNNs","C/C++,Java,Mathematica,Python,R,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Evolutionary Approaches,Logistic Regression,Neural Networks,RNNs,Simulation",,,Often,,,Often,,,,Often,,,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,Often,,,,,,,20,40,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Need to coordinate with IT",,Sometimes,,,,,,,,,,Often,,,Sometimes,,,,,,,,100% of projects,Entirely external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Subversion,Most of the time,36000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Statistician",University courses,20,0,30,50,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Time Series",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by government,Mathematica,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,I haven't started working yet,Self-taught,80,20,0,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,No education,Technology,I prefer not to answer,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not at all important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Never,10MB,"Decision Trees,Random Forests",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,0,20,40,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Limitations of tools",,,Often,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,,,,,,0,AED,I am not currently employed,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Web services,Regression,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,FastML Blog,1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,40,30,10,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,,Python,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,22,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,,,,Very useful,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,Programmer,Other,10,20,30,20,10,10,"Machine Translation,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Chile,23,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Official documentation,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Other,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Telecommunications,100 to 499 employees,Increased significantly,Don't know,A general-purpose job board,Somewhat important,Other,Traditional Workstation,Relational data,Don't know,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Very useful,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Github Portfolio,No,Master's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,SQL,Google Search,"Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,,,,Very useful,Very useful,Data Machina Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,,Master's degree,Yes,Master's degree,Other,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,40,0,30,30,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Republic of China,20,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,,,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),11 - 39 hours,Other,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Data Scientist,I haven't started working yet",Self-taught,50,20,0,0,10,20,Other (please specify; separate by semi-colon),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,3 to 5 years,,Work,30,0,70,0,0,0,"Computer Vision,Time Series","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,10 to 19 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Image data,,100GB,"Bayesian Techniques,Ensemble Methods,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks,Simulation,SVMs,Time Series Analysis",,,Most of the time,,,Most of the time,,,Most of the time,,,,,,,Most of the time,Often,,,Most of the time,,,,,,,Sometimes,Most of the time,,Most of the time,,,,40,40,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Perl,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Online courses",,Somewhat useful,,,,,,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,DataCamp,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Reinforcement learning,Other (please specify; separate by semi-colon),Primary/elementary school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,25,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,University courses,30,10,40,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,,,"C/C++,MATLAB/Octave,Python,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Random Forests,SVMs,Time Series Analysis",,,,,,,,Most of the time,Most of the time,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,40,20,30,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools",,,,,Sometimes,,,,,,,Often,Often,,,,,,,,,,10-25% of projects,Do not know,Business Department,,,,,,,,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Monte Carlo Methods,Python,Government website,"Blogs,College/University,Friends network,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,Very useful,,,,,,,,Very useful,Very useful,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,30,30,0,40,0,0,Machine Translation,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,"Data Machina Newsletter,Data Stories Podcast,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,50,10,30,5,5,0,Speech Recognition,Bayesian Techniques,I prefer not to answer,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Video data,Most of the time,1GB,"HMMs,Markov Logic Networks","MATLAB/Octave,Python,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,Often,,,"Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Often,,,Often,,Often,,,,,,Often,,Often,Sometimes,Sometimes,Sometimes,Sometimes,Often,,Often,,,,,,,Sometimes,,,,40,25,25,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,,,,,Sometimes,,,,Sometimes,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,25000,CNY,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Female,United States,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,Not Useful,Somewhat useful,,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,edX,Udacity,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,PhD,Sort of (Explain more),Bachelor's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",15,65,0,15,5,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Very Important,Somewhat important +Male,Mexico,43,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Reinforcement learning,,A bachelor's degree,Other,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,,"SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Master's degree,,Bachelor's degree,,,Engineer,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",Canada,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,Somewhat useful,Somewhat useful,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Indonesia,26,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,0,5,5,50,0,Survival Analysis,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,62,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,Researcher,Work,0,0,100,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Often,,Sometimes,,Often,,,,,,,Sometimes,,,,Often,,,,,,,0,50,0,10,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,,Often,,,,,,Often,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,185000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Canada,26,"Not employed, but looking for work",,,,,,,,Python,,Python,,"Arxiv,College/University,Friends network,Stack Overflow Q&A",Somewhat useful,,Very useful,,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,1 to 2 years,Researcher,University courses,40,30,0,30,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,Less than a year,Researcher,Self-taught,60,30,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data,Relational data",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Microsoft SQL Server Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,Sometimes,Often,Sometimes,Often,Often,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Often,Sometimes,Often,,Often,Often,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Sometimes,Often,Sometimes,Sometimes,Most of the time,Most of the time,,,,30,30,15,5,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",Often,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Friends network,Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,,,,,,,Very useful,Very useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Australia,39,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,"DataCamp,edX","Laptop or Workstation and local IT supported servers,Workstation + Cloud service",2 - 10 hours,Master's degree,No,Master's degree,Other,1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,90,8,0,2,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,People 's Republic of China,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"GitHub,Google Search","Arxiv,Blogs,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,"Talking Machines Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Malaysia,31,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,PhD,Yes,Master's degree,Computer Science,3 to 5 years,Other,University courses,15,15,5,40,25,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Ukraine,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,60,30,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,,1GB,"CNNs,Neural Networks,Random Forests,SVMs","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Neural Networks,Random Forests,SVMs",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Talking Machines Podcast",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,DataCamp,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Very Important +Male,People 's Republic of China,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,6 to 10 years,Data Scientist,Self-taught,80,0,10,0,0,10,"Natural Language Processing,Reinforcement learning,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Academic,,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",Text data,Always,1TB,"Bayesian Techniques,CNNs,HMMs,RNNs","Hadoop/Hive/Pig,MATLAB/Octave,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,Sometimes,,,,"Bayesian Techniques,CNNs,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,RNNs,Text Analytics,Time Series Analysis",,,Often,Most of the time,,,,,,,,,Sometimes,Often,,,,Often,Often,,,,,,Most of the time,,,,Most of the time,Often,,,,10,30,20,10,30,0,,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,30000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Belarus,4,Retired,,,No,Yes,Business Analyst,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,36,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,,,,,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Machine Learning Engineer,Researcher",University courses,10,35,5,30,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Java,Python,R,SAS Base,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Rarely,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,Most of the time,,Most of the time,,Sometimes,Often,,,,,Often,,Rarely,,,,Most of the time,Sometimes,,Often,Sometimes,,,,Rarely,,,,,,10,30,20,20,20,0,Enough to refine and innovate on the algorithm,"Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,,,,7,,,,,,,,,,,,,,,,,, +Female,United States,27,"Not employed, but looking for work",,,,,,,,Python,Random Forests,R,"Government website,University/Non-profit research group websites",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Biology,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,30,0,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,Republic of China,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Brazil,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Online courses,Personal Projects",,,,,,,,,,,Somewhat useful,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,5,10,0,85,0,0,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,India,18,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Researcher",Self-taught,40,20,30,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,University courses,20,10,20,50,0,0,"Natural Language Processing,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",A bachelor's degree,Academic,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Image data,Text data",Always,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,RapidMiner (free version)",,,,Most of the time,,,,,,,,,,,Most of the time,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Data Analyst,Other",University courses,5,5,20,50,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,GitHub,"Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Very useful,,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,50,25,15,0,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,10 to 19 employees,Stayed the same,Less than one year,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Image data,Rarely,1GB,"CNNs,Neural Networks,SVMs","Java,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Rarely,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,SVMs",,,,Most of the time,,Often,Often,Often,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,20,30,10,15,25,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,26-50% of projects,Do not know,IT Department,VOC,No data,Key-value store (e.g. Redis/Riak),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,12000,ALL,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Self-taught,70,20,0,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,67,Employed full-time,,,Yes,,Business Analyst,,Employed by company that makes advanced analytic software,Microsoft Excel Data Mining,"Ensemble Methods (e.g. boosting, bagging)",SQL,"Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Textbook",,Somewhat useful,Very useful,,Very useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,More than 10 years,"Computer Scientist,Software Developer/Software Engineer,Other",Work,50,0,30,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Insurance,20 to 99 employees,Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,,"Amazon Machine Learning,Amazon Web services,Microsoft Excel Data Mining,SQL,Tableau",Often,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Text Analytics",,,,,,Sometimes,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,,,,,Sometimes,,,Rarely,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,Finding applicable data sources,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,22,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by college or university,Employed by government",R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician,Other",Self-taught,85,10,0,5,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests,SVMs",Sometimes,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,,20,20,15,20,25,0,Enough to tune the parameters properly,"Dirty data,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,51-75% of projects,More internal than external,IT Department,,Inconsistency ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,23000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Philippines,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Weka,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,,Very useful,,,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Master's degree,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,19,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Kaggle competitions,0,10,0,10,80,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,High school,Academic,I don't know,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Don't know,1GB,"Decision Trees,Ensemble Methods,Random Forests","Java,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Random Forests",,,,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,30,10,0,10,50,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Central Insights Team,Datasets Kaggle Competitions,None,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Don't know,400,BRL,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Colombia,39,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,"Partially Derivative Podcast,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,15,0,5,0,,,A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,TensorFlow",Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,Sometimes,Most of the time,Most of the time,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Text Analytics",,,,,,,Most of the time,Often,,,,,,Often,,Often,,,Often,Often,,,Often,,,Often,,,Often,,,,,40,20,10,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Most of the time,,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,85000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Python,Factor Analysis,R,University/Non-profit research group websites,"College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,A social science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,Time Series,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Taiwan,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Blogs,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,15,10,20,50,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"CNNs,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,,Often,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,,,,Often,,,,Most of the time,,,,,40,40,10,5,5,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Never,700000,TWD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,1 to 2 years,Researcher,University courses,10,0,60,30,0,0,"Natural Language Processing,Unsupervised Learning",Bayesian Techniques,A master's degree,Pharmaceutical,20 to 99 employees,Stayed the same,Don't know,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",Sometimes,<1MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,63,Employed part-time,,,Yes,,Computer Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,,"Bayesian Techniques,Neural Networks - CNNs",High school,Retail,Fewer than 10 employees,Stayed the same,6-10 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,Decision Trees,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Python,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,Very useful,,,,Very useful,,Somewhat useful,,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,10,10,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Financial,"10,000 or more employees",Increased slightly,Less than one year,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,70,5,15,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,Sometimes,,,Most of the time,,Often,,Most of the time,,Often,,Most of the time,Often,Often,Often,Most of the time,,51-75% of projects,More internal than external,Standalone Team,World check list; sanctions list,Data Quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Subversion,Rarely,75000,CAD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Mathematica,Deep learning,Python,Google Search,"Arxiv,Blogs,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,Other,GPU accelerated Workstation,40+,PhD,No,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,50,30,0,0,20,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,DataRobot,Random Forests,Matlab,"GitHub,Google Search","Blogs,Friends network,Online courses,Personal Projects",,Very useful,,,,Very useful,,,,,Very useful,Very useful,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,50,20,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,23,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",SAP BusinessObjects Predictive Analytics,Proprietary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Company internal community,Online courses,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,,,,,,,Very useful,,,,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia)",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Management information systems,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,20,0,50,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - GANs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Colombia,29,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,Oracle Data Mining/ Oracle R Enterprise,Neural Nets,Matlab,I collect my own data (e.g. web-scraping),"Blogs,Personal Projects,Podcasts",,Very useful,,,,,,,,,,Very useful,Very useful,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Data Analyst,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,A professional degree,Financial,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1MB,Neural Networks,"Java,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Neural Networks",,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,60,10,30,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others",Most of the time,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,Financial time series in bloomberg,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,40,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Taiwan,20,"Not employed, but looking for work",,,,,,,,R,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,1-2 years,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,1 to 2 years,Data Analyst,University courses,40,20,10,25,5,0,"Natural Language Processing,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,Python,"Google Search,University/Non-profit research group websites","Blogs,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,Very useful,,,,,,,Very useful,,Somewhat useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,30,20,50,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation",Relational data,Most of the time,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM SPSS Statistics,Java,Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Unix shell / awk",,,,,,,,,,,Rarely,Rarely,,,Most of the time,,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Simulation,SVMs,Time Series Analysis",,,Rarely,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,Sometimes,,Often,,,,,,,Sometimes,Often,,Most of the time,,,,50,30,15,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Often,Often,,,Often,,Often,,Most of the time,,Most of the time,,,Often,,Most of the time,,,10-25% of projects,More internal than external,IT Department,FNL;GFS,Data manipulation,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,170000,CNY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Other,More than 10 years,Researcher,Self-taught,15,15,30,20,5,15,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Online courses",Very useful,,,,,,,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Statistician",Self-taught,50,0,23,25,2,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,"10,000 or more employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SAS JMP,SQL,Tableau,Other",,,,,,,,,,,Most of the time,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Sometimes,,Most of the time,,,Often,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Often,Sometimes,,,,,,,Sometimes,,,Often,,Often,Often,Often,,,Most of the time,,,Often,Sometimes,,,,20,5,25,30,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,"Business Analyst,Other",Self-taught,80,20,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,Employed part-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by college or university,Amazon Machine Learning,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle",,Very useful,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,FlowingData Blog,KDnuggets Blog",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,50,10,5,5,5,25,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,75,10,5,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Japan,29,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM SPSS Modeler,Deep learning,Python,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,PhD,No,Master's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,Less than a year,"Business Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Programmer",Self-taught,30,20,20,10,10,10,"Adversarial Learning,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",High school,Academic,20 to 99 employees,Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,10GB,"CNNs,HMMs,Neural Networks,RNNs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,Sometimes,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,HMMs,Neural Networks,RNNs,Text Analytics",Often,,,Most of the time,,Most of the time,Often,,,,,,Often,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects",Often,,,,,,,,,Sometimes,,,,Often,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,"conll;wiki ",cleaning data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git,Mercurial",Always,20000,ALL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Python,Deep learning,Matlab,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Textbook",Very useful,,Somewhat useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Researcher",University courses,30,0,40,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Other",Always,1TB,"Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,MATLAB/Octave,Python",,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis",,,,,,,Often,,,,,,,Sometimes,,Sometimes,,,,Rarely,Most of the time,,,,,,Sometimes,Most of the time,,Most of the time,,,,20,50,0,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,Sometimes,,,Sometimes,,,Rarely,Sometimes,,,,,Often,,Most of the time,,,Less than 10% of projects,Entirely internal,Other,none,noisy data; uncertain labels,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Never,"35,000",EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,South Korea,31,"Not employed, but looking for work",,,,,,,,Python,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,70,0,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Republic of China,22,"Not employed, but looking for work",,,,,,,,R,Bayesian Methods,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Trade book",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",,Online Courses and Certifications,No,Bachelor's degree,,Less than a year,"Business Analyst,Researcher,Statistician,I haven't started working yet",Self-taught,10,0,0,90,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United States,30,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Python,Genetic & Evolutionary Algorithms,Python,GitHub,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,Nice to have,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Biology,I don't write code to analyze data,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,0,30,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Canada,28,Employed full-time,,,Yes,,Engineer,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",SAS,GitHub,"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,0,10,50,20,0,"Computer Vision,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Important,Research that advances the state of the art of machine learning,"Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data",Sometimes,10TB,"Bayesian Techniques,SVMs","C/C++,MATLAB/Octave,Python,R",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Naive Bayes,Neural Networks,SVMs,Time Series Analysis",Often,,Often,,,,,,,,,,,,,,,Often,,Often,,,,,,,,Often,,Often,,,,25,50,0,25,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,,,,Most of the time,,,,,,,Often,Often,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,50000,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Female,Indonesia,23,Employed part-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Machine Learning Engineer,Programmer",University courses,50,20,0,5,0,25,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Other,Fewer than 10 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,33,Employed part-time,,,Yes,,Data Miner,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle",,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Data Miner,DBA/Database Engineer,Researcher",University courses,80,5,0,10,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,SVMs","Hadoop/Hive/Pig,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,Python,R",,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Ensemble Methods,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics",,,Sometimes,Sometimes,,Most of the time,,,Most of the time,,,,,,,,,Often,,Often,Often,,,Most of the time,,,,Most of the time,Most of the time,,,,,50,30,10,10,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,Sometimes,,,Most of the time,,,,Most of the time,Often,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,UCI,Dirty Data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,Singapore,47,Employed full-time,,,Yes,,Other,Fine,"Employed by college or university,Employed by government",TensorFlow,,Python,Government website,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Academic,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,Very important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,Google Cloud Compute,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,50,15,20,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,data.gov.sg,no realtime api for inhouse data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,120000,SGD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,24,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",5,80,10,0,5,0,Natural Language Processing,Support Vector Machines (SVMs),A bachelor's degree,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,,"Random Forests,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,NoSQL,Python,R,Tableau",,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics",,,,,,,Often,Most of the time,Often,,,,,,,Most of the time,,Often,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,,60,30,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,,,,,,Often,,,,,10-25% of projects,Entirely external,IT Department,,cleaning and transforming the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Australia,28,Employed full-time,,,No,Yes,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,"Coursera,Udacity","Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Researcher",University courses,60,25,5,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,Argentina,30,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Management information systems,I don't write code to analyze data,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,"Some college/university study, no bachelor's degree",Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important +Male,People 's Republic of China,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Business Analyst,Self-taught,50,30,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,35,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Support Vector Machines (SVM),R,"Google Search,University/Non-profit research group websites","Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,Somewhat useful,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,"R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,50,0,20,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,20 to 99 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,1MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Segmentation",Often,,,,,Often,Most of the time,,,,,,,Sometimes,,Often,,Sometimes,,,,,Often,,,Often,,,,,,,,20,40,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,Often,,,Sometimes,,Often,Sometimes,,,,,,Rarely,,Most of the time,Sometimes,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,180000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Physics,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,10,0,0,30,0,60,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",Self-taught,60,30,0,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer,Other",University courses,20,40,20,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,18,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,3 to 5 years,Programmer,Self-taught,90,0,0,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Academic,,,,,Somewhat important,Other,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Never,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests,RNNs","Google Cloud Compute,Python,TensorFlow",,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Sometimes,,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,Sometimes,Most of the time,,Most of the time,,Sometimes,Rarely,,Often,,Sometimes,,,,40,60,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,Often,,,,Often,,Often,Often,,Often,,,,,Often,,,,,Often,,Less than 10% of projects,Do not know,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,20000,,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher",University courses,40,0,0,60,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,Blogs,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Statistician",Work,100,0,0,0,0,0,Time Series,Logistic Regression,,Insurance,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Never,1GB,Bayesian Techniques,"R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,Often,,,,,,,,,,,,,Bayesian Techniques,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,0,20,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Business Department,,,Graph (e.g. GraphBase/Neo4j),,,,Never,50000,CAD,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,45,15,0,20,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,Very useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Machine Learning Engineer,Self-taught,40,0,30,0,30,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,,,,,,, +Female,United States,46,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,Government website,"Conferences,Online courses,Stack Overflow Q&A",,,,,Very useful,,,,,,Very useful,,,Very useful,,,,,,< 1 year,Necessary,,Nice to have,,Nice to have,Nice to have,Necessary,Nice to have,,Necessary,,,,"Coursera,DataCamp","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Biology,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United States,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,60,10,5,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",I don't know/not sure,Financial,"10,000 or more employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",,100MB,"Bayesian Techniques,Decision Trees","IBM SPSS Modeler,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,Most of the time,,,,,,,"Decision Trees,Lift Analysis,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,,,Most of the time,,,,,,,Often,,,,,,Often,,,,,Often,,,,Most of the time,,,,60,10,0,15,15,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Often,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,76-99% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,Subversion,Rarely,100000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,60,15,0,15,0,10,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",Very useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",,Master's degree,Yes,Master's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed part-time,,,Yes,,Researcher,,,TensorFlow,Deep learning,Python,Other,"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Unsupervised Learning,Logistic Regression,A bachelor's degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Sometimes,100MB,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Segmentation,Simulation",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,50,40,0,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,26-50% of projects,More internal than external,Other,None,Computational Resources : I do analysis of simulation data and sometimes a single set of data may take ~ 15 minutes to be analysed on titanx gpu-s.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Other",Sometimes,46000,USD,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Support Vector Machines (SVM),Python,GitHub,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Other,"10,000 or more employees",Decreased significantly,1-2 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,MATLAB/Octave,Python,R",,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees",,,,,,Sometimes,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,More external than internal,Other,,,,Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,93000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,Python,Google Search,"Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,10GB,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization",,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,20,80,Enough to explain the algorithm to someone non-technical,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,664000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,Self-taught,66,34,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,,,"Emergent/Future Newsletter (Algorithmia),Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,PhD,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Engineer,Other,10,0,0,0,0,90,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Australia,29,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Business Analyst,Other",Self-taught,50,15,15,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,54,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Company internal community,Friends network,Online courses,Personal Projects,Stack Overflow Q&A",,,,Very useful,,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,Self-taught,50,25,25,0,0,0,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Government,20 to 99 employees,Decreased slightly,6-10 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Image data,Sometimes,100GB,"Neural Networks,RNNs","C/C++,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Cross-Validation,kNN and Other Clustering,Neural Networks,RNNs,Segmentation",,Sometimes,,Often,,Often,,,,,,,,Often,,,,,,Often,,,,,Often,Often,,,,,,,,30,40,20,0,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",Often,Most of the time,,,,,,,,Often,,,,,,Often,Often,,,,,,Less than 10% of projects,Do not know,IT Department,biometric datasets ,the quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,New Zealand,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,More than 10 years,"Business Analyst,Data Analyst",Self-taught,100,0,0,0,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Survival Analysis,Time Series",,,Insurance,"1,000 to 4,999 employees",Decreased significantly,,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,,,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,,Sometimes,,,,,,,,,,"Text Analytics,Time Series Analysis,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,Most of the time,0,0,0,0,0,0,,Difficulties in deployment/scoring,,,,Often,,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,Very useful,"FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,0,0,0,0,50,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Unix shell / awk",,Sometimes,,,Sometimes,,Sometimes,Rarely,Often,,,,,Sometimes,Sometimes,,Most of the time,,,,Sometimes,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,,,,,,,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Sometimes,Often,Often,Often,Most of the time,Most of the time,Often,,,,Often,,Sometimes,Sometimes,Most of the time,,Most of the time,Most of the time,Most of the time,Often,,Often,Most of the time,,Sometimes,,Sometimes,Often,,,,,70,10,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Often,Often,,,,,,Often,Often,Often,Often,,,,Often,Often,Often,Often,,51-75% of projects,Approximately half internal and half external,Business Department,,"limited data availability. stupid turf protection tendencies. unwillingness to fully share, explain data",,Other,ad hoc data provision by teams,Git,Most of the time,1500000,INR,Other,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,40,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,Tableau",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,"Data Visualization,Neural Networks,Prescriptive Modeling,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,Often,,Often,,,,,,,,Most of the time,,,,30,20,15,15,20,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Most of the time,Sometimes,Most of the time,,,,,,,,,,,,Often,,Most of the time,Often,,,10-25% of projects,Entirely internal,Standalone Team,Downloaded online,"Handling Date time format, Multi line comments, Missing Data, too many columns",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,800000,INR,Has stayed about the same (has not increased or decreased more than 5%),,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,"Computer Scientist,Data Analyst",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hong Kong,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),,Other,1 to 2 years,Programmer,University courses,80,5,5,4,6,0,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased significantly,Less than one year,A tech-specific job board,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Sometimes,1GB,CNNs,"C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,50,5,5,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Sometimes,,,Rarely,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Sometimes,8000,,Other,9,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Scala,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Very useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst",Self-taught,40,40,10,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,Mix of fields,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,Text Analytics",,,,,,Often,Often,Often,,,,,,Often,,,,Often,Sometimes,,,,Sometimes,,,Sometimes,,,Often,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,Often,,,,,,Sometimes,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,India,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,I prefer not to answer,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,6 to 10 years,Business Analyst,Self-taught,0,50,0,0,0,50,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Logistic Regression,Markov Logic Networks",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1TB,,"Hadoop/Hive/Pig,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Sometimes,,,Often,,,,,,,"A/B Testing,Data Visualization,Lift Analysis,Logistic Regression,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,,,,,,,,Often,Sometimes,,,,,,,,,,Often,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Personal Projects,Stack Overflow Q&A,Other",,,,,,,,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,Very useful,Somewhat useful,,,Very useful,"Data Elixir Newsletter,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Analyst,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Increased significantly,Less than one year,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Most of the time,,,,,,,,Often,,,,,Often,,Often,,,,,,,,,,,40,10,5,40,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Often,Often,Often,,,Often,Often,,Often,,Often,Often,,,Often,,Often,Often,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,Sometimes,70000,AUD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Friends network,Online courses,Stack Overflow Q&A,Trade book",,,,,,Not Useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,KDnuggets Blog,< 1 year,,Nice to have,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Other,Traditional Workstation,2 - 10 hours,PhD,Yes,Master's degree,Mathematics or statistics,Less than a year,Other,Self-taught,30,40,0,0,0,30,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,Very useful,Very useful,,,,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Adversarial Learning,Natural Language Processing,Reinforcement learning,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,,Very Important +Male,Chile,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,30,20,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - RNNs",High school,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,100MB,"Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,Rarely,,,,,,"Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Often,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,Sometimes,,,,20,50,0,30,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Most of the time,170000,CLP,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Data Machina Newsletter,FlowingData Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,70,0,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,South Korea,36,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,70,5,25,0,0,0,"Recommendation Engines,Time Series",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,10,10,0,80,NA,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses",,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Data Analyst,Other,30,20,40,5,5,0,Other (please specify; separate by semi-colon),,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,India,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"CNNs,Markov Logic Networks,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Data Visualization,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,Often,,,Most of the time,,,,,,Often,,,Most of the time,,Most of the time,Most of the time,Sometimes,Sometimes,,Often,Most of the time,,,,Most of the time,Most of the time,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,Cleaning data,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Rarely,,INR,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Researcher,University courses,30,0,10,60,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Image data,Text data",Most of the time,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs",,,Often,Often,,Often,Often,Often,Often,,,,Often,Often,,,,Often,,Often,Often,,Often,Often,Often,,,Often,,,,,,20,30,30,10,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,34,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher",University courses,NA,50,0,50,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,,,Somewhat important +Male,Brazil,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Online courses,Textbook",,Not Useful,Very useful,,,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,10,60,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,KNIME (free version),Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,Sometimes,,Most of the time,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Often,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Most of the time,,Often,,Sometimes,,Sometimes,,,Often,,Most of the time,,,Most of the time,Sometimes,Rarely,Most of the time,Most of the time,,,,40,40,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Privacy issues",Often,,,Most of the time,Most of the time,Often,,,,,,,,,,,Sometimes,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Bitbucket,Never,170000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,25,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,,"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Self-taught,60,10,5,25,0,0,Other (please specify; separate by semi-colon),,"Some college/university study, no bachelor's degree",Insurance,10 to 19 employees,Decreased slightly,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,100MB,,"IBM SPSS Statistics,Python,R,SQL,Tableau,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,Sometimes,,,Sometimes,,,,Most of the time,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,0,10,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,Often,,,,Often,,Sometimes,,,,,,,,Often,Sometimes,,Most of the time,10-25% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Australia,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,Not Useful,Not Useful,Somewhat useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Researcher,Self-taught,40,0,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,Other",,,,Rarely,,,,,,,Rarely,Rarely,,,,,Sometimes,,,,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,Most of the time,Most of the time,,,Sometimes,,,Sometimes,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,Sometimes,,,,,,Rarely,Most of the time,Sometimes,Sometimes,,,Most of the time,,,,,Most of the time,Most of the time,Rarely,,Most of the time,,,,35,35,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,Sometimes,,Sometimes,,,,,,Often,,,,,Rarely,,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,115000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,TensorFlow,Link Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Kaggle,Official documentation,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,Very useful,,Very useful,,,Very useful,,,Somewhat useful,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Researcher,Self-taught,90,0,9,0,1,0,,,A master's degree,Government,"1,000 to 4,999 employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,100GB,"Markov Logic Networks,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Logistic Regression,Markov Logic Networks,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,Often,Most of the time,,,,50,20,5,10,10,5,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Most of the time,,,,,,,,Often,,Most of the time,,,,26-50% of projects,Entirely internal,Central Insights Team,"Census, USGS",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,51000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle",,Very useful,,,,,Very useful,,,,,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Recommendation Engines,Neural Networks - CNNs,A master's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,<1MB,,"Jupyter notebooks,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Recommender Systems",,,,,,Rarely,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,80,0,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Often,Sometimes,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Google Search,University/Non-profit research group websites","Arxiv,College/University,Company internal community,Friends network,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,15,10,25,50,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",High school,Academic,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Most of the time,100MB,"Random Forests,Regression/Logistic Regression,Other","C/C++,MATLAB/Octave,Orange,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,Often,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,Often,,,,,,,Sometimes,,Often,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,50,15,5,15,15,0,Enough to refine and innovate on the algorithm,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,NCBI;Ensembl,Consistency is raw data. Various aspects of media and formatting.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",External hard drives.,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,55000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Other,NA,0,0,0,10,90,,,High school,Financial,500 to 999 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,,,,,,Oracle Data Mining/ Oracle R Enterprise,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,,Self-taught,25,0,50,25,0,0,,,A master's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Sometimes,,Regression/Logistic Regression,"C/C++,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,PCA and Dimensionality Reduction,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,0,50,50,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,24,Employed full-time,,,Yes,,Data Analyst,,,IBM Watson / Waton Analytics,Monte Carlo Methods,Python,"GitHub,Google Search","College/University,Company internal community,Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Researcher","Online courses (coursera, udemy, edx, etc.)",60,0,0,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,100MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,Sometimes,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics",,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,Often,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,Often,,,,,90,5,1,2,2,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,Often,Often,Often,,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,3800,SGD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Japan,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Psychology,More than 10 years,Researcher,University courses,30,0,20,50,0,0,Reinforcement learning,"Bayesian Techniques,Evolutionary Approaches",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,A general-purpose job board,Important,Other,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,,1GB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Evolutionary Approaches,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,Text Analytics",Often,,Often,,,,Often,,,Often,,,,,,Often,,,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data,Other",,,,,Often,,,,,,,,,,,,,,,,Often,Often,100% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"7,800,000",JPY,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,More than 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,20,20,0,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,32,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,,Self-taught,80,3,0,3,14,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Academic,,,,,Not very important,Other,Laptop or Workstation and private datacenters,Other,,10MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,,,,,"Non-Kaggle online communities,Textbook,YouTube Videos",,,,,,,,,Very useful,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Statistician,Work,20,30,50,0,0,0,Computer Vision,,A bachelor's degree,,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Text data",Most of the time,1GB,,"Flume,Microsoft Azure Machine Learning,Python,R,SAP BusinessObjects Predictive Analytics,Tableau,TensorFlow",,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,30,30,0,10,30,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,,,,,,,,Rarely,,,,,,,Sometimes,,,,,,,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,50,10,10,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,"5,000 to 9,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data",Sometimes,100TB,"CNNs,GANs,RNNs","C/C++,MATLAB/Octave,Python,Other",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Often,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,GANs,RNNs",Often,,,Often,,Often,,Often,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,CIFAR-10,Infrastructure,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,700000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Australia,20,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Programmer,Self-taught,50,20,0,5,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Government,20 to 99 employees,Increased slightly,1-2 years,Some other way,Not very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SAS Base,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,,Often,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation",,,,,,Often,Often,Often,Most of the time,,,Most of the time,,Often,,Sometimes,,,Sometimes,Sometimes,Rarely,Sometimes,Most of the time,Most of the time,,Sometimes,Most of the time,,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,Often,,,Rarely,,Sometimes,Often,,,,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,,8,,,,,,,,,,,,,,,,,, +Male,South Korea,38,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Necessary,,,,Necessary,Necessary,,Nice to have,Nice to have,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Professional degree,,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Computer Vision,Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Weka,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,"O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Computer Vision,Speech Recognition,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),NoSQL,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,Sometimes,,,,,,Often,Often,Often,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Often,,,,,Often,,Often,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,RNNs,Time Series Analysis",,,Sometimes,,,Sometimes,,Often,,Sometimes,,,,Often,,,,Often,Often,Often,Sometimes,Often,,,Often,,,,,Often,,,,25,40,10,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources",,,,Sometimes,Often,,,,,Often,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),,120000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,Mexico,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Time Series Analysis,R,Google Search,"College/University,Friends network,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Software Developer/Software Engineer,Other",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Python,Deep learning,Python,Google Search,"Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,60,20,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Don't know,10GB,"Neural Networks,Regression/Logistic Regression","KNIME (free version),Microsoft Excel Data Mining,Minitab,Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Text Analytics",,,,,,Often,Often,,,,,,,,,Sometimes,,,Often,Often,,,,,,,,,Most of the time,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Often,Often,,,,,,Often,Sometimes,Sometimes,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,Capmas data in egypt,Computional costs;Time; Funds,Document-oriented (e.g. MongoDB/Elasticsearch),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"40,000",,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,"Software Developer/Software Engineer,Other",Self-taught,85,10,0,0,5,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Rarely,100MB,"Ensemble Methods,Neural Networks,SVMs","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,Often,,,,,,Often,,,,,,,Often,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Neural Networks,SVMs,Text Analytics",Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,Often,,,,,40,20,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Often,,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,Often,,,,Most of the time,,76-99% of projects,Entirely internal,Other,Text corpi,Availability of the necessary data.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",Shared files,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Australia,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,SQL,Google Search,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"DataTau News Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,Less than a year,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,75,10,0,5,0,,,A bachelor's degree,Financial,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Never,,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,0,50,20,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team",,,,Rarely,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,,,,,51-75% of projects,More internal than external,Other,None ,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,130000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Neural Nets,R,Other,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Scientist,Other",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100TB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Python,R,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems",Most of the time,,,,,,,Sometimes,,,,,,,,Sometimes,,Rarely,,,,,Rarely,Rarely,,,,,,,,,,10,5,5,5,50,25,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,10-25% of projects,Entirely internal,Business Department,None,,Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,Email",,Mercurial,Sometimes,220000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,SQL,Random Forests,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,Less than a year,Other,Work,20,5,55,20,NA,0,"Computer Vision,Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Deep learning,C/C++/C#,Google Search,College/University,,,Not Useful,,,,,,,,,,,,,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Singapore,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Company internal community,Stack Overflow Q&A",,,,Very useful,,,,,,,,,,Somewhat useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Recommendation Engines,Unsupervised Learning","Gradient Boosting,Support Vector Machines (SVMs)",I don't know/not sure,Other,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Gradient Boosted Machines,Markov Logic Networks","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,NoSQL,Python,Spark / MLlib,SQL",Sometimes,Most of the time,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Gradient Boosted Machines,Markov Logic Networks,Recommender Systems",Sometimes,,,,,Often,,,,,,Often,,,,,Sometimes,,,,,,,Often,,,,,,,,,,80,10,0,0,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data",Often,,Often,,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,API Rate Limit to collect the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,96000,SGD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Linear Digressions Podcast",< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp",Laptop or Workstation and local IT supported servers,,Master's degree,Yes,Master's degree,Computer Science,Less than a year,"Machine Learning Engineer,I haven't started working yet",Other,30,30,10,10,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,37,"Not employed, but looking for work",,,,,,,,R,Regression,R,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Laptop or Workstation and local IT supported servers,0 - 1 hour,PhD,Sort of (Explain more),Doctoral degree,Electrical Engineering,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",15,10,5,50,15,5,Survival Analysis,Decision Trees - Gradient Boosted Machines,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",40+,Github Portfolio,Yes,Doctoral degree,Psychology,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,GitHub,"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,19,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Natural Language Processing,Neural Networks - RNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,Vietnam,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,60,20,0,20,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Rarely,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau,Other",,,,,,,,,,,Often,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,Sometimes,,Often,,,,,,,,,Often,,,Often,,,,Often,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Often,,Often,,,,Most of the time,Often,,,,,,,,Most of the time,,,,,Often,Often,Often,,,Often,,,,Often,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,,Often,Often,,Often,,,Often,Often,,,,,,,76-99% of projects,Entirely internal,Business Department,No one,complexity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,30000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,Researcher,University courses,50,5,15,20,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,"1,000 to 4,999 employees",,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Often,Sometimes,,,,,,Most of the time,Often,,Sometimes,,,Rarely,Often,Often,,,,,,Often,Often,,,,25,25,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Computer Vision,Machine Translation,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,23,Employed part-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Social Network Analysis,Python,University/Non-profit research group websites,"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,,,,,,,,,,,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,31,Employed full-time,,,No,Yes,Other,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,I don't write code to analyze data,Other,Other,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Time Series Analysis,,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,University courses,20,0,0,80,0,0,Time Series,Logistic Regression,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Don't know,1MB,Neural Networks,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,Limitations in the state of the art in machine learning,,,,,,,,,,,,Sometimes,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,1500000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Other,C/C++/C#,Other,"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),0 - 1 hour,Other,No,I prefer not to answer,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by government",R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Master's degree,Physics,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,Survival Analysis,Neural Networks - CNNs,Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,51,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,"Recommendation Engines,Time Series","Decision Trees - Random Forests,Evolutionary Approaches",A doctoral degree,Other,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,10MB,"Decision Trees,Evolutionary Approaches,Neural Networks","Julia,Jupyter notebooks,KNIME (free version),Python,R,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,,,,,,,,,,,,Rarely,Sometimes,,Often,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,Often,Often,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Often,,Often,,,,Often,,,,,,Often,Most of the time,,,Often,,,Often,,,Often,,,,40,20,5,25,10,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,,Most of the time,,,,,,Most of the time,Often,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion,Other",Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,Yes,,Data Analyst,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,,Self-taught,60,20,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Python,Link Analysis,Python,Google Search,"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"DataCamp,edX","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,A social science,1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,30,15,20,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,16-20,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Argentina,43,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,RapidMiner (free version),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,20,10,5,5,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Google Cloud Compute,IBM SPSS Statistics,Jupyter notebooks,NoSQL,Python,QlikView,R,TensorFlow",,,,Rarely,,,,Often,,,,Rarely,,,,,Often,,,,,,,,,,Sometimes,,,,Often,Sometimes,Sometimes,,,,,,,,,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,Time Series Analysis",,,,,,Sometimes,Often,Often,,,,,,Often,,,,,,Sometimes,,,Often,,,,,,,Often,,,,60,10,10,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Often,,,,Rarely,,,,,,,,,,,Often,Often,,76-99% of projects,More external than internal,IT Department,Geo,Speed,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,32000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,27,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,PhD,Yes,Master's degree,Computer Science,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,20,40,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,10 to 19 employees,Increased significantly,1-2 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Orange,Python",,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Neural Networks,RNNs,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,Most of the time,,Most of the time,,,,30,40,5,5,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,Often,Often,,Sometimes,,,,,Often,Sometimes,Sometimes,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Always,100000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Philippines,30,Employed full-time,,,Yes,,Data Miner,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Data Miner,Self-taught,90,10,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,10 to 19 employees,Increased slightly,Less than one year,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,R,"Ensemble Methods (e.g. boosting, bagging)",SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Company internal community,Friends network,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,"Business Analyst,Predictive Modeler,Statistician",University courses,15,5,25,55,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Other,Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Often,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,,,,Often,Most of the time,,,,,Often,,,,,Often,Sometimes,,,Sometimes,,,,55,25,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Sometimes,,,,Often,,,,Sometimes,,,,,Sometimes,Sometimes,,,,26-50% of projects,More internal than external,Business Department,Economic departments of various countries,Cleansing,Other,"Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,2100000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle",,Very useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,DataTau News Aggregator,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Data Analyst,Data Miner,Programmer",Self-taught,30,20,30,10,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,33,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,6 to 10 years,"Engineer,Researcher",Self-taught,60,0,0,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important +Male,United States,24,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,1 to 2 years,I haven't started working yet,University courses,30,0,20,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Data Scientist",Self-taught,70,0,30,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,"Some college/university study, no bachelor's degree",Retail,"5,000 to 9,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Always,100MB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,Sometimes,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,,,,Most of the time,Most of the time,,,,30,20,30,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Scaling data science solution up to full database",Most of the time,Most of the time,,,Most of the time,Most of the time,,Often,Often,,,,,,,,,Often,,,,,100% of projects,More internal than external,Standalone Team,Census,Dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,72500,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",SAS,Other,Blogs,,Very useful,,,,,,,,,,,,,,,,,"Data Stories Podcast,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Miner,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,30,15,5,0,0,Time Series,Logistic Regression,A bachelor's degree,CRM/Marketing,"5,000 to 9,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,<1MB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Perl,Python,R",,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,Sometimes,,,,30,10,5,25,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,Often,Often,Often,,,,,,,,,Often,Often,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,Subversion,Rarely,,,,4,,,,,,,,,,,,,,,,,, +Male,Australia,59,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,GitHub,"Arxiv,Blogs,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,3-5 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",40+,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Data Analyst,Predictive Modeler,Researcher",Self-taught,70,20,0,0,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Data Stories Podcast,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"DataCamp,edX,Udacity",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Natural Language Processing,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,Other,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",High school,Financial,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Rarely,100MB,Regression/Logistic Regression,"C/C++,Julia,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Not Useful,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,"Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",50,40,0,5,5,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important +Male,South Korea,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,,Very useful,"Data Machina Newsletter,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,25,0,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Academic,I don't know,Increased slightly,Less than one year,Some other way,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data",Sometimes,1GB,"CNNs,Markov Logic Networks,Neural Networks,RNNs,SVMs","Java,MATLAB/Octave,Python,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,HMMs,Neural Networks,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,,,,,,Sometimes,,,,,,,Often,,,,,Most of the time,Often,,Sometimes,,Most of the time,,,,50,35,0,15,0,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,Often,,,,,Most of the time,,Most of the time,,26-50% of projects,More external than internal,Standalone Team,"UCI, Competitions Database",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,,"7,200,000",KRW,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Other,Work,20,0,70,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Mix of fields,Fewer than 10 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","NoSQL,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Often,,,,Often,Often,Often,,,,,,Often,,Often,,,Often,Often,,,Often,Often,,Often,Sometimes,Often,Often,Often,,,,35,25,10,15,15,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,10,10,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,SQL,Tableau,TensorFlow",,,,,Rarely,,,,,,,,,,,,Often,,Rarely,,,,Rarely,Sometimes,Sometimes,,Sometimes,,,,Most of the time,Sometimes,Most of the time,,,,,,,,,Most of the time,,,Often,Most of the time,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Most of the time,Sometimes,,Sometimes,Often,Most of the time,Most of the time,Most of the time,,,Sometimes,,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Often,,Most of the time,,Most of the time,Often,,,,,30,10,20,40,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,Often,,Sometimes,Most of the time,,,,,,Often,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,49,Retired,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Programmer,Work,60,20,10,10,0,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,CRM/Marketing,I prefer not to answer,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Microsoft Excel Data Mining,Orange,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Text Analytics,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer",Self-taught,60,30,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data,Other",Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,RapidMiner (free version),SQL,TIBCO Spotfire",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,Often,,,,,Often,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Sometimes,Sometimes,,,Often,Most of the time,Often,Often,,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,,Often,,,,50,10,0,30,10,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Taiwan,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Mathematica,Survival Analysis,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"Data Elixir Newsletter,Data Machina Newsletter,KDnuggets Blog",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Other,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important +Male,Canada,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,DataRobot,Neural Nets,SAS,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,Data Stories Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Mix of fields,20 to 99 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1TB,"Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Amazon Machine Learning,C/C++,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,SAS Base,SQL",Most of the time,,,Most of the time,,,,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,50,90,70,100,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources",Sometimes,,Often,,,,,,,Often,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,"None ","Management ","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial,Subversion",Sometimes,50000,,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,19,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Other,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Online courses,Textbook",Very useful,Very useful,Very useful,,Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer",University courses,15,10,25,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics",,,,Sometimes,,Most of the time,Often,Sometimes,Most of the time,,,,,Often,Sometimes,Often,,Often,Most of the time,Often,,,Most of the time,,Most of the time,,,Often,Most of the time,,,,,40,20,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,,,Often,,,,,,Most of the time,,26-50% of projects,Entirely internal,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,200000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,,,Somewhat useful,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,"Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,50,10,0,0,0,40,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important +Male,People 's Republic of China,NA,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Julia,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Podcasts,Stack Overflow Q&A",,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,,University courses,5,0,10,80,5,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,A bachelor's degree,Government,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,More external than internal,Business Department,Census data; Voting data; Economic data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,1 to 2 years,I haven't started working yet,Self-taught,60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Often,Often,Often,,,,Often,,,,Often,,Often,Often,,Often,Often,Often,Often,,,,Often,Often,Often,,,,60,25,5,5,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,0,AUD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Statistician,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,30,0,60,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"Regression/Logistic Regression,Other","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,36,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Machine Learning Engineer,,,TensorFlow,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,,,,Somewhat useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog",< 1 year,Nice to have,Necessary,,,Necessary,,Necessary,Necessary,,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Engineer,Software Developer/Software Engineer",Kaggle competitions,60,10,10,0,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important +Female,Taiwan,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,,0 - 1 hour,Online Courses and Certifications,No,Doctoral degree,Other,,"Data Analyst,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Survival Analysis,Time Series","Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,29,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,R,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,Yes,Professional degree,,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Unsupervised Learning,Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,25,10,5,5,30,25,"Adversarial Learning,Computer Vision,Time Series","Markov Logic Networks,Neural Networks - GANs",Primary/elementary school,Financial,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,"Neural Networks,SVMs","Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Often,,,,,,"A/B Testing,Cross-Validation,Naive Bayes,PCA and Dimensionality Reduction",Rarely,,,,,Often,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,45,15,5,15,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,Often,Often,,,Sometimes,Often,,,,,,,,Most of the time,,,,Most of the time,,26-50% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)","Company Developed Platform,I don't typically share data",,"Bitbucket,Git,Subversion",Always,600000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Deep learning,R,GitHub,"College/University,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,edX,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing,Speech Recognition",Decision Trees - Random Forests,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Hong Kong,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,< 1 year,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,24,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Time Series Analysis,Python,Google Search,"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Not Useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"Data Elixir Newsletter,FastML Blog",< 1 year,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Programmer,University courses,25,10,20,40,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed full-time,,,Yes,,Researcher,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Government,100 to 499 employees,Stayed the same,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,,,,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",11 - 39 hours,Github Portfolio,Yes,Master's degree,Computer Science,,"Engineer,Programmer,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,18,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,Google Search,"Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,,,,,,,,,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,"DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,45,0,15,30,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Most of the time,1GB,"CNNs,Neural Networks,RNNs","C/C++,Java,Python,R,Spark / MLlib,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Most of the time,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,RNNs,SVMs",,,,Often,,Most of the time,Most of the time,,,,,,,,,Often,,,,Most of the time,,,,,Often,,,Often,,,,,,30,20,35,10,5,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,Often,Most of the time,,,,Sometimes,,,,,,,,,Most of the time,,,,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,200000,BRL,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Not Useful,Very useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,0,0,30,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Financial,"10,000 or more employees",Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service",Text data,Always,10TB,"CNNs,Decision Trees,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,Often,,,,Often,,,,Often,,Rarely,,,,,,,,,Most of the time,,,Most of the time,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,GANs,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Most of the time,Most of the time,Sometimes,Sometimes,Sometimes,Sometimes,Most of the time,Sometimes,,,Sometimes,Rarely,,,,Rarely,,Rarely,,Sometimes,Sometimes,Rarely,,Sometimes,,Often,Often,,Most of the time,Most of the time,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Often,,,,,,,,,100% of projects,More internal than external,IT Department,,Different services,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,70000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Korea,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Manufacturing,100 to 499 employees,Stayed the same,Less than one year,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,Operations Research Practitioner,Self-taught,30,60,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Decreased significantly,3-5 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Often,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,10,20,0,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,160000,SGD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Julia,Text Mining,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer",University courses,30,30,20,10,10,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data",Sometimes,1GB,"CNNs,Random Forests","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks,Random Forests,SVMs",,,,Often,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,,Sometimes,,,,,,10,40,40,5,5,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,IT Department,KITTI,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"1,300,000",TWD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Text Mining,Python,Google Search,"Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Technology,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Other,Basic laptop (Macbook),Text data,Most of the time,,Other,"Perl,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,70,10,20,0,0,0,Enough to run the code / standard library,"I prefer not to say,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,,,26-50% of projects,Do not know,Other,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Cloudera,Random Forests,SQL,I collect my own data (e.g. web-scraping),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,75,0,0,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,Internet-based,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,,Other,"Microsoft Excel Data Mining,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,Rarely,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,20,0,30,20,30,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Most of the time,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,None,Limited amount of data since we are a very young organization.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Sometimes,2500000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,Julia,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,1 to 2 years,"Data Miner,Data Scientist",University courses,20,0,25,50,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,16-20,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important +Male,India,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Deep learning,Python,Google Search,"Blogs,Friends network,Personal Projects",,Very useful,,,,Very useful,,,,,,Very useful,,,,,,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,60,10,10,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,The Data Skeptic Podcast,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,No,Bachelor's degree,Computer Science,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,0,20,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Online courses",,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,,"Data Machina Newsletter,FastML Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Udacity,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,University courses,25,10,NA,50,15,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,22,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Textbook",Very useful,Very useful,,,,,,,,,,,,,Somewhat useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,,,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Male,India,32,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Somewhat useful,,,Very useful,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,10,0,0,90,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,Often,,,,Most of the time,Most of the time,Often,,,,,Most of the time,,Most of the time,,Often,Often,Often,Sometimes,,Most of the time,,,,,,Most of the time,Often,,,,50,20,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,Often,,,,,,Sometimes,,10-25% of projects,Do not know,Standalone Team,,I am not given the opurtunity to work with the data science team yet,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Most of the time,110000,INR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,Tableau,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Data Stories Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Other,Less than a year,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning",Logistic Regression,High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,New Zealand,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,GitHub,"Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher,Software Developer/Software Engineer",Self-taught,60,0,40,0,0,NA,Time Series,"Logistic Regression,Support Vector Machines (SVMs)",,Manufacturing,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Never,10MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,NoSQL,Python,R",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Time Series Analysis",,,Sometimes,,,Often,Often,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,20,10,40,30,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database",Sometimes,Often,,Sometimes,,Often,,,Sometimes,,Often,,,,,,,Sometimes,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,4500000,JPY,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,,< 1 year,,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Computer Vision,Neural Networks - CNNs,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,Very Important,,,,,,,Very Important,,,,,Very Important +Female,Canada,45,"Not employed, but looking for work",,,,,,,,Python,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Engineer,Programmer",University courses,25,25,0,50,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Analyst,Work,0,100,0,0,0,0,Time Series,Logistic Regression,A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,23,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Not important,Very Important,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important +Male,Taiwan,48,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,Jupyter notebooks,I don't plan on learning a new ML/DS method,Python,"GitHub,Google Search","Blogs,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,,,Very useful,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Adversarial Learning,Neural Networks - CNNs,A bachelor's degree,Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Text data,Never,10MB,,Unix shell / awk,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,30,10,5,5,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,A social science,3 to 5 years,"Data Analyst,Engineer",University courses,0,40,0,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,Other,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Online courses,Tutoring/mentoring",,,,,,,,,,,Very useful,,,,,,Very useful,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,United States,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",Primary/elementary school,Technology,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Never,1MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,Often,,,,,Often,Often,,Often,,Sometimes,,,,Often,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,,Often,Most of the time,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,0,0,0,10,90,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,,,,Often,,,,,Often,,Often,,,,Often,,,51-75% of projects,More internal than external,Standalone Team,Any,Lack of business domain; dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,24000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,"FlowingData Blog,O'Reilly Data Newsletter",< 1 year,,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,,,,,,Coursera,Laptop or Workstation and local IT supported servers,,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Other,0,30,0,0,0,70,Computer Vision,,,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,,Very Important,,,,,Very Important,,Very Important,,,Very Important,Very Important +Female,United States,33,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,GitHub,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Recommendation Engines,Reinforcement learning,Survival Analysis","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",No education,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,Brazil,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",Primary/elementary school,Financial,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Rarely,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Cloudera,Python,SAS Base,SQL,Other",,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,Most of the time,"Data Visualization,Decision Trees,Segmentation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,6 to 10 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,50,30,5,5,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,20,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,30,Employed part-time,,,No,Yes,Computer Scientist,Perfectly,Employed by college or university,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,"DataCamp,edX",Traditional Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - RNNs,Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important +Male,India,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,TensorFlow,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Official documentation,,,,,,,,,,Very useful,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Software Developer/Software Engineer",Self-taught,60,20,10,0,10,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Academic,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Video data,Text data,Relational data",Most of the time,10PB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,R,Spark / MLlib,SQL",,,,,Most of the time,,Most of the time,,,,,,,,,,Often,,,,,,,Sometimes,Often,,Sometimes,,,,,,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,"Collaborative Filtering,Cross-Validation,kNN and Other Clustering,Natural Language Processing,Neural Networks",,,,,Often,Often,,,,,,,,Often,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,10,25,20,20,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",,600000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"GitHub,Google Search","Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Not Useful,Very useful,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Internet-based,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",,10GB,"CNNs,GANs","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow,Other",,Often,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,Often,,,"CNNs,GANs",,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,40,50,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Often,,Sometimes,,,,Often,,Most of the time,,,,,Often,,,,Often,Sometimes,,None,Approximately half internal and half external,Standalone Team,"coco, vgg",APIs are old and not flexible enough to provide mass quantity of information in a timely manner.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Other",Sometimes,"115,000",USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,35,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,,,,Necessary,Necessary,Necessary,Necessary,,Nice to have,,,,Other,Traditional Workstation,11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,Very Important,Very Important,Very Important,,Very Important,,,,,Very Important,,Very Important,Very Important,Very Important, +Male,Taiwan,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Monte Carlo Methods,Python,GitHub,Trade book,,,,,,,,,,,,,,,,Very useful,,,Linear Digressions Podcast,10-15 years,Necessary,Necessary,Nice to have,Necessary,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing",Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,16-20,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,39,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,More than 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,1GB,,"Python,R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,,Most of the time,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,10,30,30,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Often,,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,115000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +A different identity,Taiwan,47,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Not Useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,FastML Blog",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,No,Bachelor's degree,Electrical Engineering,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",25,55,0,5,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,United States,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Very useful,"Data Machina Newsletter,FastML Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,70,30,0,0,0,0,"Computer Vision,Natural Language Processing,Time Series,Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,,,,,,"C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Natural Language Processing,Neural Networks,SVMs",,,,,,Most of the time,,,,,,,,Often,,,,,Rarely,Most of the time,,,,,,,,Often,,,,,,0,0,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Data Science results not used by business decision makers,,Often,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,,,,,,,,Bitbucket,,,,,6,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",30,30,10,10,10,10,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,500 to 999 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,Often,,Most of the time,,,,"Association Rules,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Often,,,,,,Often,Often,,,Often,,Often,,Often,,Often,Most of the time,Most of the time,Sometimes,,Most of the time,Sometimes,,Most of the time,,Sometimes,Most of the time,,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,Sometimes,Most of the time,,,,Most of the time,,,,,Often,,,,,,Sometimes,,,26-50% of projects,More internal than external,IT Department,,unstructured free text,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",Cloud Server,Git,Rarely,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +A different identity,India,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",92,8,0,0,0,0,Adversarial Learning,Logistic Regression,Primary/elementary school,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Lift Analysis,Logistic Regression,Other",,,,,,,,,,,,,,,Often,Rarely,,,,,,,,,,,,,,,,Most of the time,,0,0,0,0,100,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Bayesian Methods,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Friends network,Personal Projects,Textbook,Trade book",Somewhat useful,,Very useful,,,Very useful,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,"FastML Blog,No Free Hunch Blog",10-15 years,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Other",2 - 10 hours,PhD,Yes,Master's degree,Other,,"Computer Scientist,Software Developer/Software Engineer,Other",University courses,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,19,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,More than 10 years,"Data Analyst,Software Developer/Software Engineer",Work,25,25,50,0,0,0,,,A bachelor's degree,Pharmaceutical,"5,000 to 9,999 employees",,,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",,Relational data,,10GB,,"SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Cloudera,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University",,Very useful,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,Linear Digressions Podcast,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Traditional Workstation,,Kaggle Competitions,No,Master's degree,Biology,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,,Very Important,,,,,,,,,,, +Male,People 's Republic of China,21,Employed part-time,,,Yes,,Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by college or university",TensorFlow,Deep learning,C/C++/C#,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle",Very useful,Very useful,,,,,Very useful,,,,,,,,,,,,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,10,50,0,10,0,Adversarial Learning,Logistic Regression,A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,Always,10TB,Regression/Logistic Regression,"C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,RNNs",Often,,,Often,,Often,,,,,,,,,,Often,,,,,Often,,Often,,Often,,,,,,,,,10,30,30,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,Do not know,IT Department,No;,Too slow;,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,320000,CNY,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,RapidMiner (commercial version),Deep learning,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Always,10GB,"Neural Networks,RNNs,SVMs","Python,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,20,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Sometimes,19000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,,,,,"FlowingData Blog,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,60,20,20,0,0,0,"Computer Vision,Reinforcement learning,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Financial,"1,000 to 4,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1TB,"CNNs,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Neural Networks,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Flume,Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,Mathematica,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",Most of the time,Most of the time,,,,,Often,,Often,,,,,Often,Most of the time,Most of the time,Most of the time,,,Sometimes,,Often,,Often,Often,,Most of the time,Often,,,Most of the time,Often,Most of the time,,,,,,,,Most of the time,Most of the time,,,Sometimes,Most of the time,Often,Most of the time,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,Often,,Often,Most of the time,Often,Most of the time,Most of the time,,,,,,Most of the time,,Often,Often,,Most of the time,Most of the time,,Often,Often,,Often,,,Often,Most of the time,Most of the time,,,,40,20,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,Often,Often,,,,Often,,,Most of the time,,,Most of the time,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git,Other",Rarely,120000,SGD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,,"Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,KDnuggets Blog,< 1 year,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Male,Japan,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by a company that doesn't perform advanced analytics,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,1 to 2 years,Business Analyst,Self-taught,80,0,0,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Mix of fields,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,10MB,"CNNs,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Python,TensorFlow",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Random Forests,Text Analytics",,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,40,30,0,5,25,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,,,,Often,,,,Most of the time,,,,,,Often,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,"Git,Subversion",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Colombia,23,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Computer Scientist,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Not Useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,Not Useful,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Statistician",University courses,40,40,0,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Random Forests,Time Series Analysis",Often,,Often,,,,,Sometimes,,,,,,Sometimes,,Most of the time,Rarely,Most of the time,Sometimes,,,,Sometimes,,,,,,,Often,,,,40,30,20,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Tutoring/mentoring",,,,,,,Very useful,,,,,Very useful,,,,,Very useful,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Predictive Modeler,Statistician",University courses,0,20,80,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Insurance,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,Regression/Logistic Regression,"R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling",,,,,,,Often,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,80,20,0,0,0,0,Enough to tune the parameters properly,"Explaining data science to others,Inability to integrate findings into organization's decision-making process",,,,,,Often,,Often,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,75000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,25,0,0,25,50,0,,,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search","Blogs,College/University,Conferences,Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,Recommendation Engines,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,31,Employed full-time,,,Yes,,,Poorly,Employed by professional services/consulting firm,Python,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Telecommunications,"10,000 or more employees",Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10MB,Bayesian Techniques,"Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,Rarely,,,Sometimes,,,,,,,"Logistic Regression,Naive Bayes,Segmentation",,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,,,,40,20,0,20,20,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,census,Data is not easily available and needs to be cleaned a lot.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1400000,INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Japan,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,"Engineer,Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,,Yes,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,20,60,0,20,0,0,Reinforcement learning,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important +Male,Other,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,27,"Not employed, but looking for work",,,,,,,,Stan,Bayesian Methods,Python,Google Search,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,GPU accelerated Workstation,,,No,Doctoral degree,Computer Science,3 to 5 years,Researcher,Work,10,10,30,40,10,0,Time Series,Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",1-2 years,Nice to have,,,,Nice to have,,Nice to have,,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",47,33,NA,0,20,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Somewhat important,Not important,Somewhat important,Very Important,,Somewhat important,Very Important,Not important,Somewhat important,,Not important,Not important,Not important,Not important,Not important,Not important +"Non-binary, genderqueer, or gender non-conforming",Other,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Not Useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,KDnuggets Blog,< 1 year,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Other,11 - 39 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,Very Important,,,, +Male,India,20,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,"Data Analyst,Researcher",Other,90,NA,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,70,10,10,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,Often,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",,Often,Sometimes,,Sometimes,Most of the time,Most of the time,Often,Often,,,Often,,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,Often,,,,Most of the time,Often,,,,50,25,5,10,10,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,30,5,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Podcasts,Textbook",,Very useful,Very useful,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,"KDnuggets Blog,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,20,NA,10,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,100 to 499 employees,Decreased significantly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,Regression/Logistic Regression,"Google Cloud Compute,Jupyter notebooks,Python,R,SQL",,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction",Often,,,,,,Most of the time,,,,,,,,,Sometimes,,Rarely,,,Sometimes,,,,,,,,,,,,,70,10,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,Rarely,,,Sometimes,,,,,,,,Rarely,Sometimes,Most of the time,,,Rarely,,,,26-50% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,100000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Mathematica,,,,Podcasts,,,,,,,,,,,,,Somewhat useful,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,Self-taught,10,80,0,0,10,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,NoSQL",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Naive Bayes",Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,30,70,0,0,0,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,26-50% of projects,More external than internal,Business Department,,,Key-value store (e.g. Redis/Riak),Email,,Subversion,,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,C/C++,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,DataTau News Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Survival Analysis,"Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,25,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Engineer,Researcher",Work,45,30,15,5,5,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Not important,Not important +Male,Canada,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Google Search,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,"DataTau News Aggregator,FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Often,Most of the time,Often,,,,,,Often,,Most of the time,,Most of the time,Often,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,Most of the time,,,,,30,30,10,10,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Most of the time,,,,Often,,,,,,,,,,Often,,,,10-25% of projects,More internal than external,Central Insights Team,,Dirty text data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,6000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Canada,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,32,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,Self-taught,30,20,40,0,10,0,"Recommendation Engines,Time Series",,A professional degree,Pharmaceutical,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,,"Decision Trees,Regression/Logistic Regression","R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,,Sometimes,,,Rarely,,,,,,,"Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Recommender Systems,Time Series Analysis",,,,,Rarely,,Often,Sometimes,,,,,,Sometimes,,,,,,,,,,Rarely,,,,,,Often,,,,50,20,0,10,20,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input,Other",,,,,Often,,,,,,Often,,,,,,,,,,,Most of the time,10-25% of projects,Entirely internal,Other,,,,"Email,Share Drive/SharePoint",,,,900000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Uplift Modeling,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Conferences,Friends network,Personal Projects",Very useful,Very useful,,,Somewhat useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",University courses,60,0,40,0,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Other,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Gradient Boosted Machines,Regression/Logistic Regression,Other","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,NoSQL,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,Rarely,,,Most of the time,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,Rarely,,Sometimes,,,,"A/B Testing,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Recommender Systems,Simulation",Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,Often,,Most of the time,,,Often,,,,,,,15,35,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Rarely,Often,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Most of the time,,76-99% of projects,More internal than external,Central Insights Team,data from net-research company,CVR prediction in RTB,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,4500000,JPY,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other,Other,Other",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",,,,,,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,Often,,Often,Often,,Sometimes,,Often,,,,Rarely,,Most of the time,,,,,70,20,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,,Most of the time,Sometimes,,,,Often,,Most of the time,,Rarely,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,720000,PKR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,0,0,5,85,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,Less than a year,Software Developer/Software Engineer,Self-taught,70,30,0,0,0,0,"Recommendation Engines,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,Python,,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,NA,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Decision Trees,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Reinforcement learning,Logistic Regression,"Some college/university study, no bachelor's degree",Other,"10,000 or more employees",,,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,,Relational data,,,,"IBM Cognos,QlikView",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,10,20,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,76-99% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Never,655000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Australia,43,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,SAS Enterprise Miner,,SAS,,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,More than 10 years,Other,University courses,20,0,40,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,A master's degree,Other,"5,000 to 9,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression","SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Simulation,Time Series Analysis",Sometimes,,,,,,,Often,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,Often,,,Often,,,,50,35,5,0,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Often,Sometimes,,Often,,,,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Rarely,250000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,24,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,SAP BusinessObjects Predictive Analytics,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,,Business Analyst,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,India,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,40,10,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher,I haven't started working yet",University courses,50,0,NA,50,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,35,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Online courses,Podcasts",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,,,,,"Talking Machines Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,31,"Not employed, but looking for work",,,,,,,,Cloudera,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,50,10,0,30,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,44,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Time Series Analysis,Python,GitHub,"Friends network,Kaggle,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,Very useful,"FastML Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Speech Recognition,Logistic Regression,A bachelor's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Other",,10MB,Bayesian Techniques,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"HMMs,Logistic Regression,Neural Networks",,,,,,,,,,,,,Sometimes,,,Often,,,,Often,,,,,,,,,,,,,,30,50,10,10,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,,,,,,,75000,INR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,"GitHub,Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"Data Elixir Newsletter,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer,Statistician",University courses,NA,20,10,70,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Neural Networks - CNNs,A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,10MB,Neural Networks,"Hadoop/Hive/Pig,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,Rarely,Rarely,,Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,20,10,30,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Sometimes,,,,,,Often,,,,,,Often,,100% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Rarely,93000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,Statistica (Quest/Dell-formerly Statsoft),Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,Less than a year,I haven't started working yet,Self-taught,50,0,0,0,20,30,"Outlier detection (e.g. Fraud detection),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important +Male,South Korea,38,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Management information systems,1 to 2 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,30,30,40,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Other (please specify; separate by semi-colon)",Decision Trees - Random Forests,A bachelor's degree,Retail,"5,000 to 9,999 employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,1GB,Decision Trees,"Amazon Machine Learning,IBM SPSS Statistics,Java,Microsoft SQL Server Data Mining,QlikView,R,SQL,Tableau,Other",Often,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,Often,,,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,Often,,,,Most of the time,Often,,,,,,Often,,,,Often,Often,,,,,,,Often,,,Often,Often,,,,20,50,0,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects",Very useful,Very useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,,,,,"FastML Blog,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,15,60,10,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Sometimes,Often,,Often,,,,Rarely,,,,,,,Often,,Most of the time,,Often,,Rarely,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,Prescriptive Modeling,Segmentation,SVMs",Often,Sometimes,,Most of the time,,Sometimes,Most of the time,,,,,,,Sometimes,,,,,,Most of the time,,Sometimes,,,,Often,,Often,,,,,,60,20,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,,,Most of the time,,Often,,,Often,,Most of the time,,Most of the time,,Sometimes,,Most of the time,,10-25% of projects,More internal than external,IT Department,public available image datasets,"cleaning it from personal information, de-anonymizing, remove bad parseable data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,6 to 10 years,"Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,20 to 99 employees,Decreased significantly,Less than one year,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,22,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Company internal community,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",Very useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,Very useful,,,,,,Very useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer",Kaggle competitions,20,50,0,20,10,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Don't know,1GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,HMMs,Neural Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs",,,Often,Often,,,Most of the time,,,Most of the time,,,Sometimes,Often,,Often,,Often,Often,,Often,,,,,,,Sometimes,,,,,,40,20,0,10,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Often,,,,Most of the time,Often,,,,,Most of the time,Often,,,,Sometimes,,,,,,,10-25% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Git,Subversion",Sometimes,,,I am not currently employed,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Google Search,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,20,10,0,40,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A master's degree,Internet-based,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,Other,"Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Perl,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,Sometimes,,,,Most of the time,,,,,Rarely,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"Collaborative Filtering,Text Analytics",,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,50,0,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Scaling data science solution up to full database",Sometimes,Sometimes,,,Often,,,,,,,,,,,,,Often,,,,,26-50% of projects,Do not know,Central Insights Team,Data source partners,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,155000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Vietnam,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,RapidMiner (commercial version),Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,"DataTau News Aggregator,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,20,20,20,35,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Text data,Relational data,Other",Always,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,RapidMiner (free version),SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,,Often,,,,Most of the time,,Most of the time,,Sometimes,,,,,,,Often,,,Sometimes,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Often,,Often,Most of the time,Most of the time,Often,Often,,,Sometimes,Sometimes,Often,,Most of the time,Often,Often,Most of the time,Often,Often,,Often,Often,,,Often,Often,Often,Often,,,,30,20,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,,,,Often,,Often,,,Sometimes,,Often,Sometimes,,,Most of the time,Most of the time,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Subversion",Sometimes,2000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Decision Trees,SAS,Google Search,"Blogs,College/University,Company internal community,Conferences,Friends network,Newsletters,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,10,20,40,10,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Survival Analysis","Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,United States,45,Employed part-time,,,Yes,,Engineer,Fine,Self-employed,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook",,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,"Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,20,10,20,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,,,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Text data,Sometimes,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks","Google Cloud Compute,Java,Jupyter notebooks,MATLAB/Octave,Orange,Python,R,Spark / MLlib,TensorFlow",,,,,,,,Often,,,,,,,Sometimes,,Most of the time,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,Often,,,,,,,,Often,,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,,,Often,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,,,,,,Most of the time,,Most of the time,,,Often,,,,,Most of the time,,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Sometimes,,,,,,Rarely,,,,,,,,Sometimes,Sometimes,,26-50% of projects,Entirely internal,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Share Drive/SharePoint,,Bitbucket,,150000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Operations Research Practitioner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Computer Vision,Machine Translation,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,500 to 999 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Tableau,Social Network Analysis,Python,Government website,"Kaggle,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,Very useful,,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,30,20,40,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,Decision Trees,"Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Python,R",,Most of the time,,Rarely,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Neural Networks",,,Often,,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,10,60,20,10,0,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Often,,Often,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Sometimes,50000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Statistician,Poorly,Employed by professional services/consulting firm,Python,Social Network Analysis,R,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Software Developer/Software Engineer,Statistician",Work,30,10,50,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,CRM/Marketing,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,50,20,10,0,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Sometimes,,,,,Often,,Sometimes,Often,,Most of the time,,,,,,Most of the time,,,10-25% of projects,Entirely internal,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,,Rarely,90000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Chile,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,A social science,3 to 5 years,Researcher,Self-taught,100,0,0,0,0,NA,Survival Analysis,Bayesian Techniques,High school,Mix of fields,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,Don't know,1MB,Bayesian Techniques,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,kNN and Other Clustering,PCA and Dimensionality Reduction",,,Most of the time,,,,,,,,,,,Rarely,,,,,,,Sometimes,,,,,,,,,,,,,0,0,0,80,20,0,Enough to explain the algorithm to someone non-technical,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,100% of projects,More internal than external,Other,Data published in papers,any,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",< 1 year,Nice to have,,Nice to have,,Necessary,Necessary,Nice to have,,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Engineer",Self-taught,80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Researcher,University courses,10,0,10,80,0,0,"Computer Vision,Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Government,"1,000 to 4,999 employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Image data,Always,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,TensorFlow",,Often,,Most of the time,,,Often,,Often,,,,,,Often,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs",Sometimes,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,Often,Often,,,Most of the time,Most of the time,Most of the time,,,,,,10,30,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,Often,,,Most of the time,,Often,,,,,,,Most of the time,Most of the time,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,130000,SGD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,20,40,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,Biology,I don't write code to analyze data,Engineer,Self-taught,100,0,0,0,0,0,Computer Vision,Support Vector Machines (SVMs),,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Proprietary Algorithms,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,Not Useful,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,"Coursera,Udacity","Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,40,10,0,40,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Not important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important +Male,Turkey,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,21,Employed part-time,,,Yes,,Computer Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Tableau,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,10,20,30,30,10,0,"Computer Vision,Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,500 to 999 employees,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data,Relational data",,10GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Decision Trees,Logistic Regression,Neural Networks",,,,Most of the time,,Often,,Often,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,25,30,5,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,50000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Machine Learning Engineer,,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,20,20,20,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs",Primary/elementary school,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Always,10TB,"CNNs,Ensemble Methods,Neural Networks","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow,Other",,Most of the time,,,,,,,Often,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,,,,,Rarely,,,Most of the time,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,GANs,Neural Networks,PCA and Dimensionality Reduction,Random Forests",Sometimes,,,Most of the time,Most of the time,Sometimes,,,,,Sometimes,,,,,,,,,Most of the time,Most of the time,,Sometimes,,,,,,,,,,,80,0,10,0,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,GitHub,"Blogs,Official documentation,Personal Projects,YouTube Videos",,Very useful,,,,,,,,Somewhat useful,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Insurance,10 to 19 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Never,,"CNNs,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Java,Python,QlikView,R,SQL,TensorFlow,Unix shell / awk",Sometimes,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,CNNs,Natural Language Processing,Neural Networks,Segmentation,Text Analytics",Often,,,Rarely,,,,,,,,,,,,,,,Often,Rarely,,,,,,Sometimes,,,Often,,,,,20,5,30,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,Sometimes,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,140000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Turkey,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important +Male,France,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Very Important +Male,Other,28,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Engineer,Self-taught,85,0,10,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - RNNs",A master's degree,Telecommunications,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Python,QlikView",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,"Association Rules,Random Forests,Time Series Analysis",,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,55,30,4,1,10,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Central Insights Team,No Third Party or Public datasets,cleaning it,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,590000,BDT,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,,University courses,0,0,10,90,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,500 to 999 employees,Stayed the same,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,Python,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Official documentation,Online courses,Textbook,YouTube Videos",,,Very useful,,,,,,,Very useful,Somewhat useful,,,,Very useful,,,Not Useful,,3-5 years,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp",Traditional Workstation,,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,30,40,10,0,0,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,,,,,,,,,,,,,,,, +Male,Switzerland,17,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,"Coursera,edX",Other,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,,Very Important,Somewhat important,Very Important +Male,Other,44,Employed full-time,,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Machine Translation,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Matlab,GitHub,"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,22,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Online courses,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important +Male,Taiwan,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,40,0,20,40,0,0,Recommendation Engines,"Bayesian Techniques,Support Vector Machines (SVMs)",I don't know/not sure,Technology,I prefer not to answer,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Video data,,,"CNNs,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed full-time,,,Yes,,Programmer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1PB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",Sometimes,Most of the time,,,Rarely,,,Rarely,Rarely,,,,,,Often,,Most of the time,,Rarely,,Rarely,Sometimes,Sometimes,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Most of the time,,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,30,20,10,20,20,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Often,,,,,Often,,Often,,,Often,,,,51-75% of projects,Approximately half internal and half external,Business Department,"social media, facebook, twitter, google, amazon, yelp",not clean,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Most of the time,150000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Iran,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Computer Scientist,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,Statistician,Self-taught,80,0,5,5,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),0 - 1 hour,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,100,0,0,0,0,0,Natural Language Processing,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM Watson / Waton Analytics,Decision Trees,R,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Traditional Workstation,0 - 1 hour,Master's degree,No,Bachelor's degree,Mathematics or statistics,,"Business Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),40+,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,University courses,50,10,0,20,20,0,"Computer Vision,Natural Language Processing","Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Engineer,Self-taught,80,0,20,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Retail,500 to 999 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1MB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,60,15,10,5,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Often,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,,R,,Online courses,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Mexico,16,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Very useful,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,GPU accelerated Workstation,2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,,"Engineer,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,36,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Decision Trees,Python,,Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,,,Self-taught,50,0,25,0,25,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,20 to 99 employees,Increased significantly,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Text data,Relational data",Sometimes,1TB,"CNNs,Ensemble Methods,Neural Networks",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs",Often,,,Most of the time,,Most of the time,,Often,Most of the time,,,,,,,,,,Most of the time,Most of the time,Often,,,Most of the time,Most of the time,,,,,,,,,20,30,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,25,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Other,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",30,40,10,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Increased slightly,6-10 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,10GB,Regression/Logistic Regression,"Amazon Web services,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,23,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,27,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,25,Employed part-time,,,Yes,,Data Scientist,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,Engineer,Software Developer/Software Engineer",Self-taught,80,0,15,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Cluster Analysis,R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,Not Useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Engineer,Self-taught,70,30,0,0,0,0,Time Series,,A master's degree,Academic,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1MB,Other,"Microsoft R Server (Formerly Revolution Analytics),Python,R",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Simulation",,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,80,10,0,5,5,0,Enough to run the code / standard library,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,Key-value store (e.g. Redis/Riak),I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Researcher",Self-taught,70,0,10,20,0,0,Natural Language Processing,Neural Networks - CNNs,High school,Academic,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Non-Kaggle online communities,Podcasts",Very useful,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,,,,,,"Linear Digressions Podcast,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,30,20,0,20,20,"Computer Vision,Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,,"Decision Trees,Gradient Boosted Machines","Amazon Web services,Hadoop/Hive/Pig,NoSQL,Python,SQL,Tableau",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Natural Language Processing",Most of the time,,,,,,Often,,,,,,,,,Rarely,,,Often,,,,,,,,,,,,,,,75,5,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Most of the time,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,Often,,10-25% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,155000,AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Indonesia,25,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,40,20,10,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,1GB,Decision Trees,"R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,"Association Rules,Decision Trees",,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Scaling data science solution up to full database",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Support Vector Machines (SVM),Python,Government website,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,University courses,0,0,0,100,0,0,Time Series,Neural Networks - CNNs,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,C/C++/C#,Google Search,"Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Very useful,,,,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,51,"Not employed, but looking for work",,,,,,,,Python,,,,"Blogs,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important +Male,United States,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,22,Employed full-time,,,Yes,,Statistician,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Friends network,Kaggle,Online courses",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Statistician,University courses,30,10,30,30,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series",Bayesian Techniques,A bachelor's degree,Other,20 to 99 employees,Increased significantly,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,10MB,Bayesian Techniques,"IBM SPSS Statistics,Microsoft Excel Data Mining,Perl,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Segmentation,Text Analytics",Often,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,60,0,15,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Unavailability of/difficult access to data",,Rarely,,,Sometimes,,,,,,,,,,,,,,,,Rarely,,100% of projects,Entirely internal,Central Insights Team,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"25,000",BRL,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,3 to 5 years,"Data Analyst,Researcher",Self-taught,35,0,40,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,20,30,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Video data,Text data,Relational data",Don't know,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Java,MATLAB/Octave,Microsoft SQL Server Data Mining",,,,Often,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression",,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,30,15,15,20,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,46,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Microsoft SQL Server Data Mining,Time Series Analysis,R,GitHub,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,0,20,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - GANs",High school,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"Decision Trees,Regression/Logistic Regression","Google Cloud Compute,KNIME (free version),Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Segmentation,Simulation",,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning",,Often,,,Most of the time,Most of the time,,,,,,Often,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Other,,100000,EUR,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,45,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,Python,University/Non-profit research group websites,"Arxiv,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A humanities discipline,Less than a year,"Other,I haven't started working yet",Self-taught,25,50,0,15,10,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Other,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Miner,Programmer,Software Developer/Software Engineer",University courses,40,10,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,26,Employed full-time,,,No,Yes,Statistician,Fine,Employed by college or university,R,Time Series Analysis,R,Google Search,"College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Talking Machines Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Statistician,University courses,5,20,20,40,10,5,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,India,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,Self-taught,90,5,5,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,80,10,5,5,0,0,Recommendation Engines,"Ensemble Methods,Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Other","Amazon Web services,Perl,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Lift Analysis,PCA and Dimensionality Reduction,Recommender Systems,Time Series Analysis",Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,,,Most of the time,Sometimes,,,,,,Most of the time,,,Most of the time,,,,,,Often,,,,50,30,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,Often,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,3rd party metadata,Depending on approach: clean metadata,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Rarely,40000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,PhD,Yes,I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Very useful,"FastML Blog,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,10,30,0,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression","IBM Watson / Waton Analytics,Java,Jupyter notebooks,NoSQL,Python,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,Sometimes,,Rarely,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,Sometimes,,,,,,Sometimes,Often,,,,Sometimes,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Recommender Systems,RNNs,Segmentation,Text Analytics",,,,,,,Sometimes,,,,,,,Often,,Often,,,Often,,,,,Sometimes,Rarely,Sometimes,,,Often,,,,,45,25,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,Often,Often,,,Most of the time,,,,,,Often,,,10-25% of projects,Entirely internal,Other,,Labelling the unlabelled data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,I don't typically share data",,Git,Never,1400000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician",University courses,70,10,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Technology,"10,000 or more employees",,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,10TB,"Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Mathematica,MATLAB/Octave,NoSQL,Python,R,RapidMiner (commercial version),SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SQL",,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Logistic Regression,Segmentation,SVMs,Time Series Analysis",Sometimes,Sometimes,Sometimes,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,,Often,,Often,,,,30,20,10,0,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Official documentation,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Master's degree,Yes,Master's degree,Other,1 to 2 years,Engineer,University courses,80,0,0,10,10,0,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Female,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Engineer,Programmer,Researcher,Statistician",University courses,0,0,0,100,0,0,"Speech Recognition,Survival Analysis",,A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Somewhat useful,,,,,,,,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,I haven't started working yet",University courses,0,30,0,70,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,United States,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,IBM SPSS Statistics,MARS,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Textbook,YouTube Videos",,,Very useful,,Very useful,,Very useful,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,,,,,Not at all important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",,,"Bayesian Techniques,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Julia,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,Often,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Sometimes,,Often,,,,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,Often,,,,,,,Often,,,,,,,,,,Most of the time,,,Often,,,,Most of the time,,,Often,,,,25,40,0,20,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,Most of the time,Most of the time,,,,,,,,Sometimes,,None,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Git,Mercurial",Always,"125,000",USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,Python,Cluster Analysis,Python,I collect my own data (e.g. web-scraping),"College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,60,20,0,20,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,27,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Python,Deep learning,Python,Google Search,"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Government,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Python,TensorFlow",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Random Forests",Often,,,,,Often,Most of the time,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,Most of the time,Often,,,,,,,,Often,,,Often,,,100% of projects,Approximately half internal and half external,Standalone Team,World Bank (findex),Often statistics are not diaggregated at tegional level,Column-oriented relational (e.g. KDB/MariaDB),Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,50000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,20,5,25,40,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,India,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression",A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,37,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,QlikView,Decision Trees,R,Other,College/University,,,Not Useful,,,,,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher,Statistician",University courses,0,40,0,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,0,10,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,India,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Textbook",,,,,,,Very useful,,,Very useful,Somewhat useful,,,,Somewhat useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,Business Analyst,University courses,0,20,10,70,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,"Random Forests,Regression/Logistic Regression","KNIME (free version),Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Recommender Systems",,,,,,Sometimes,Often,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,Often,,,,,,,,,,40,30,20,0,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team",,Sometimes,,,,,,,,,,Often,Often,,,Most of the time,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,,Sometimes,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Canada,18,Employed part-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Java,Genetic & Evolutionary Algorithms,R,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Adversarial Learning,Unsupervised Learning",,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,52,Employed full-time,,,Yes,,Data Scientist,,Employed by a company that performs advanced analytics,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,Self-taught,100,0,0,0,0,0,,,A bachelor's degree,Other,Fewer than 10 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),Relational data,Most of the time,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation,Time Series Analysis",Often,,,,,,Most of the time,Often,,,,,,Often,,,,,,,,,,,,Often,,,,Often,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,Rarely,,Often,,Often,Most of the time,,Often,,,,Often,Sometimes,Sometimes,Sometimes,,Often,Often,,100% of projects,More internal than external,Central Insights Team,facebook; twitter,access,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,370000,ZAR,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,40,0,30,5,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Anomaly Detection,Python,,"Arxiv,College/University,Company internal community,Kaggle,Stack Overflow Q&A",Very useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,Very useful,,,,,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,Self-taught,30,50,5,5,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,1GB,"Gradient Boosted Machines,Random Forests","Impala,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,Rarely,,Sometimes,,,,"A/B Testing,Association Rules,Gradient Boosted Machines,Lift Analysis,Natural Language Processing,Neural Networks,RNNs",Rarely,Rarely,,,,,,,,,,Often,,,Often,,,,Sometimes,Rarely,,,,,Rarely,,,,,,,,,35,60,5,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,Often,,,,,,,,,Often,,,Less than 10% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Git",Rarely,50000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Never,100MB,Bayesian Techniques,"Google Cloud Compute,Java,Python,Unix shell / awk",,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,60,30,0,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,Often,,,Sometimes,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,,"FastML Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Sometimes,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,Sometimes,Often,,,,,Most of the time,Sometimes,,,,60,10,10,15,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,500000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,South Korea,25,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,Self-taught,50,30,0,10,10,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,29,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Textbook,YouTube Videos",Very useful,,Very useful,,,,,,,,,,,,Somewhat useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,0,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Pakistan,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,,< 1 year,Necessary,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Other",2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,Less than a year,"Software Developer/Software Engineer,Other",University courses,10,10,10,70,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,United States,21,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,"Data Stories Podcast,KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Programmer,Researcher",Kaggle competitions,20,10,20,20,10,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Technology,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,Spark / MLlib,SQL",,Rarely,,Sometimes,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,,,Sometimes,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Sometimes,,,,,Often,Most of the time,Often,,,,,,Often,,,,Often,Often,Sometimes,Often,,Often,,,,,,Often,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,26-50% of projects,Approximately half internal and half external,Business Department,Salesforce,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,,,,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Orange,Survival Analysis,Python,GitHub,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,Researcher,Work,20,0,80,0,0,0,Computer Vision,Neural Networks - GANs,High school,Academic,100 to 499 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Text data",Sometimes,100GB,"Bayesian Techniques,RNNs,SVMs","Jupyter notebooks,Orange,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,,,,Most of the time,,Often,Often,,,,Often,,,,,Sometimes,Most of the time,,,,,20,30,10,10,30,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Sometimes,,Often,,,,,,,,,,,,Sometimes,,,Most of the time,,,26-50% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,45,Employed full-time,,,Yes,,Other,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Flume,Deep learning,Python,Government website,"Blogs,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,,University courses,40,0,10,50,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Neural Networks - CNNs",High school,Insurance,"10,000 or more employees",Increased significantly,,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1TB,,"Amazon Web services,IBM Cognos,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),SAS Base,Tableau,TensorFlow,Other",,Sometimes,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,Sometimes,,,,Often,,Rarely,,Rarely,,,Rarely,,,,,,,Rarely,Rarely,,,Sometimes,,,"Neural Networks,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,Often,,,,30,0,0,0,70,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,Sometimes,,,,,,,,,,Often,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Internal web service,Git,Sometimes,140000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Management information systems,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,34,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by a company that performs advanced analytics,Self-employed",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,"Data Elixir Newsletter,Siraj Raval YouTube Channel",1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Management information systems,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,70,0,0,15,0,"Computer Vision,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important +Female,Indonesia,26,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Support Vector Machines (SVM),Matlab,I collect my own data (e.g. web-scraping),Conferences,,,,,Very useful,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Computer Vision,Support Vector Machines (SVMs),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,30,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased significantly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,1TB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,KNIME (free version),Microsoft Azure Machine Learning,NoSQL,Perl,Python,QlikView,R,Spark / MLlib,SQL,Unix shell / awk",Often,Often,,,Most of the time,,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,,,,Sometimes,,,Often,,,,,Most of the time,,,Rarely,Most of the time,Sometimes,,,,,,,,,Most of the time,Most of the time,,,,,,Often,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,Often,,Often,,Most of the time,,Often,,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Operations Research Practitioner,Programmer,Statistician",University courses,40,0,30,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,40,Employed full-time,,,No,Yes,Predictive Modeler,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,I don't write code to analyze data,Researcher,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",50,30,10,5,5,0,"Adversarial Learning,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,100MB,"Bayesian Techniques,Markov Logic Networks","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,0,0,0,0,0,0,Enough to run the code / standard library,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Predictive Modeler,Programmer",University courses,10,5,65,20,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",High school,Financial,500 to 999 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,31,Employed full-time,,,Yes,,Researcher,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,22,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,10,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,1GB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,50,5,40,5,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,300000,HKD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Other,24,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Data Analyst,Work,20,20,20,0,20,20,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer",Self-taught,60,10,30,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,40,0,20,40,0,"Natural Language Processing,Reinforcement learning","Decision Trees - Random Forests,Gradient Boosting",,Academic,,,,,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Text data,Most of the time,,Neural Networks,"Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,,,Often,,,,,,,,,,,,,Most of the time,Often,Sometimes,,,,,,,,Most of the time,,,,,40,30,10,10,10,0,Enough to run the code / standard library,"Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Work,20,0,40,20,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,20 to 99 employees,Increased significantly,3-5 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,QlikView,Spark / MLlib,SQL,TensorFlow",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,,Often,Most of the time,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Most of the time,Most of the time,,,Often,,Sometimes,,Often,,,,Most of the time,Often,,Most of the time,,,,,Sometimes,,Most of the time,,,,50,25,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,,,,,,Often,,,,Often,,Most of the time,,51-75% of projects,More internal than external,IT Department,weather;electricity public data ,close to real-time data availability ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git,Mercurial",Sometimes,70000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,DataRobot,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Never,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,,Sometimes,,Most of the time,,Most of the time,,,,,Most of the time,Sometimes,Sometimes,,,,50,30,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Entirely external,,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,Never,82000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Canada,45,Employed full-time,,,No,Yes,Other,Perfectly,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO,Employed by government",I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Other,Government website,"College/University,Online courses,Podcasts,YouTube Videos",,,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,A health science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,70,0,30,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Argentina,34,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Academic,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Julia,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL,TensorFlow",,,,,,,,,,,Often,Often,,,,Rarely,,,Sometimes,,,,,Most of the time,,,,,,,Most of the time,,Most of the time,,Sometimes,,,Sometimes,Sometimes,,,Most of the time,,,,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Often,,Most of the time,,,Sometimes,,Often,,Most of the time,,,,Most of the time,,Sometimes,Sometimes,,,,60,30,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization",,Often,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Not Useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Talking Machines Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,,,,,,,,,,,,,,, +Male,Singapore,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler",University courses,0,0,35,60,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Python,R,RapidMiner (commercial version),RapidMiner (free version),SQL,Tableau,Other",,Sometimes,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,Most of the time,Sometimes,Sometimes,,,,,,,Often,,,Most of the time,,,,Often,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,,Often,Most of the time,Often,,,,,Often,,Often,,Sometimes,Sometimes,Often,,,Often,Often,,,,,Often,Sometimes,,,,25,10,10,20,35,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,,,,Sometimes,,,,,,,,,,Often,,Sometimes,,100% of projects,More external than internal,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,Kaggle competitions,50,10,0,0,40,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,500 to 999 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,Sometimes,,Often,Sometimes,,,,,Often,,Often,,,Often,,Often,,Often,,,Often,,Sometimes,Often,,,,,50,20,0,20,10,0,Enough to run the code / standard library,Inability to integrate findings into organization's decision-making process,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,,,,Not Useful,,,,Very useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Engineer,Self-taught,70,20,0,0,10,0,Recommendation Engines,"Bayesian Techniques,Gradient Boosting,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Taiwan,22,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hungary,38,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Researcher,University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series",Logistic Regression,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,10,0,10,10,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression",A master's degree,Academic,I don't know,Increased slightly,3-5 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data",Rarely,,"Bayesian Techniques,Regression/Logistic Regression","Mathematica,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Logistic Regression,Simulation",,,Sometimes,,,Rarely,,,,,,,,Rarely,,Often,,,,,,,,,,,Sometimes,,,,,,,20,20,15,20,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,Often,,,Often,,,,Sometimes,,Often,,,76-99% of projects,Approximately half internal and half external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint,Other",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,24000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Nigeria,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,5,0,15,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow",,,,,Often,,,Sometimes,Often,,,,,Sometimes,,,,,,,,Sometimes,Often,Often,Often,,Often,Rarely,,,Most of the time,Sometimes,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,Often,Often,Often,Often,Often,Often,Most of the time,,Most of the time,,Most of the time,Often,Most of the time,Often,Most of the time,Often,Often,Often,Sometimes,,Often,Often,Sometimes,,,,30,30,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,26-50% of projects,Entirely internal,IT Department,Financial dataset that aren't publicly available.,Understanding the objective required of the data.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,45000,RON,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Philippines,34,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Monte Carlo Methods,R,"Google Search,I collect my own data (e.g. web-scraping)","Friends network,Online courses,Stack Overflow Q&A",,,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,Yes,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Tutoring/mentoring",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,Very useful,,,< 1 year,,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Researcher",Self-taught,50,15,0,10,25,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Egypt,30,Employed full-time,,,No,Yes,Other,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Other,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,41,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,R,"GitHub,Google Search,Government website","College/University,Kaggle,Official documentation,Online courses,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Machine Learning Engineer",Self-taught,60,25,5,10,0,0,"Natural Language Processing,Reinforcement learning,Time Series,Unsupervised Learning","Evolutionary Approaches,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Academic,,,,,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Never,1MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,SVMs","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,RapidMiner (free version)",,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,SVMs",,,Rarely,,,,,Rarely,,Most of the time,,,,,,,Sometimes,Often,Often,Most of the time,,,,,,,,Most of the time,,,,,,20,15,20,40,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,Most of the time,,Often,,,,,Often,Often,Sometimes,,,Often,Often,Most of the time,Most of the time,Often,,,Sometimes,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"25,000",INR,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,C/C++/C#,Google Search,"College/University,Online courses,Tutoring/mentoring",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,No Free Hunch Blog",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,3 to 5 years,"Engineer,Researcher",University courses,40,30,10,20,0,0,,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Male,India,27,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Engineer",Self-taught,0,90,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,,,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,Tableau",,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,0,0,100,0,0,0,Enough to tune the parameters properly,"Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,1 to 2 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Software Developer/Software Engineer",University courses,40,0,30,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,R,Deep learning,R,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,20,20,40,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Other,20 to 99 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,MATLAB/Octave,Orange,R,Tableau",Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,60,10,10,10,0,10,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",,,,,Often,Often,,,Often,,Often,,Often,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,,20500,INR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,22,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,R,"Government website,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis",Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,United States,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,"Business Analyst,Data Analyst",Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Government website,University/Non-profit research group websites","College/University,Company internal community,Online courses,Textbook",,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,,University courses,10,0,40,50,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,100 to 499 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Python,SAS Base,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,Most of the time,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Most of the time,,,,,Sometimes,Most of the time,,,,,Most of the time,,,Often,,,,20,40,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,76-99% of projects,More internal than external,Standalone Team,Census Office Data; Bureau of Labor Statistics Data; Blue Health Intelligence Data; Medicare Data; Medicaid Data,data from different sources not being compatible or giving contradictory results,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,85000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,32,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,R,Neural Nets,R,Google Search,"College/University,Online courses",,,Very useful,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,0,30,30,40,0,0,Unsupervised Learning,Hidden Markov Models HMMs,Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,India,31,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,Less than a year,"Data Miner,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",20,40,0,20,0,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,,,Somewhat important,,Somewhat important,,,Somewhat important,,,,,, +Female,United States,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Monte Carlo Methods,R,University/Non-profit research group websites,"College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,Not Useful,Not Useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,,Not Useful,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,,,,edX,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Biology,,"Data Miner,Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,2,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Personal Projects,Other",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,40+,Other,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,20+,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,R,Decision Trees,Matlab,University/Non-profit research group websites,"College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Engineer,University courses,20,10,0,50,20,0,Time Series,"Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,54,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",DataRobot,Time Series Analysis,Java,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer",Work,0,0,100,0,0,0,Time Series,Bayesian Techniques,,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Always,100MB,Bayesian Techniques,"MATLAB/Octave,R,SQL",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Bayesian Techniques,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,100,0,0,Enough to run the code / standard library,Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,100% of projects,Do not know,Standalone Team,,,,I don't typically share data,,Bitbucket,Sometimes,250000,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Canada,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,SAS Enterprise Miner,Deep learning,SQL,"GitHub,Government website",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,More than 10 years,"Business Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer,I haven't started working yet",University courses,0,40,20,40,0,0,Natural Language Processing,Bayesian Techniques,,Other,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,10MB,Bayesian Techniques,"Cloudera,Hadoop/Hive/Pig,R,SQL,Tableau",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"Bayesian Techniques,Naive Bayes,Natural Language Processing",,,Sometimes,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,50,10,10,30,0,0,Enough to run the code / standard library,"Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,Often,,26-50% of projects,More external than internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,20,0,0,10,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Japan,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Data Scientist,Researcher,Other",Self-taught,70,30,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),A doctoral degree,Telecommunications,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Most of the time,<1MB,"Decision Trees,Other","C/C++,IBM Cognos,IBM SPSS Modeler,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Tableau,TensorFlow",,,,Rarely,,,,,,Rarely,Rarely,,Often,,Sometimes,,Rarely,,,,Rarely,,Often,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Rarely,Rarely,,,,,,"Association Rules,Data Visualization,Decision Trees,Natural Language Processing,Text Analytics,Time Series Analysis,Other,Other",,Sometimes,,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,70,10,5,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,,Most of the time,,,,Often,,Most of the time,Sometimes,Most of the time,,Most of the time,,51-75% of projects,Approximately half internal and half external,Other,,,,,,,,,,,2,,,,,,,,,,,,,,,,,, +Male,South Korea,30,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Data Miner,Other,30,30,40,0,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Neural Networks - RNNs",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Bayesian Techniques,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Statistician,I haven't started working yet",University courses,30,20,0,40,0,10,Time Series,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,Very useful,,,,,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,50,40,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,Technology,"5,000 to 9,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,Often,,,,,,,,Often,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Often,,,,Most of the time,Often,Most of the time,,,,,,,,Most of the time,Sometimes,,,Sometimes,Often,,Most of the time,,,,,,Most of the time,Often,,,,20,60,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,Often,,Often,Often,,,,,,Most of the time,,Sometimes,,,10-25% of projects,More internal than external,Central Insights Team,,privacy,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,"300,000",USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"College/University,Online courses,Trade book",,,Very useful,,,,,,,,Very useful,,,,,Very useful,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,,Programmer,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +A different identity,United States,32,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,Deep learning,,,"Arxiv,Online courses,Personal Projects,Textbook",Very useful,,,,,,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,,Predictive Modeler,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,51,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,C/C++/C#,Other,Other,,,,,,,,,,,,,,,,,,,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,26,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Deep learning,Python,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,< 1 year,,,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,,,,,,"DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,United States,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,,,,,Important,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",,,,Regression/Logistic Regression,"Google Cloud Compute,Jupyter notebooks,Python,R,SQL",,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,100,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,More external than internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Git,Don't know,0,USD,I am not currently employed,5,,,,,,,,,,,,,,,,,, +Male,United States,NA,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,Gradient Boosting,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,QlikView,Social Network Analysis,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Insurance,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Impala,NoSQL,Python,Spark / MLlib,SQL",Rarely,Rarely,,,Most of the time,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees",Most of the time,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,15,10,5,0,Enough to tune the parameters properly,"Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,Often,,Often,,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,700000,INR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,29,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,Very useful,,Very useful,,,,,Somewhat useful,,Very useful,,,Somewhat useful,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,Other,NA,30,50,NA,20,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Mix of fields,Fewer than 10 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Flume,Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Tableau,TensorFlow",,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Often,Rarely,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,,Often,Often,Often,,,,,,Often,,Often,,,Most of the time,,,,Often,,,,,Often,Often,Often,,,,30,20,20,20,10,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Most of the time,,,,,Most of the time,Often,Most of the time,Often,Often,Most of the time,,Sometimes,,Often,,,Rarely,,Less than 10% of projects,Do not know,Business Department,"Consumer complaints,kaggle data sets",Data story telling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,0,ALL,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Somewhat useful,,Not Useful,Not Useful,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),40+,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,48,0,0,2,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,R,Text Mining,Python,I collect my own data (e.g. web-scraping),"Company internal community,Online courses,Personal Projects,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Biology,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Not important +Male,Other,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Conferences,Newsletters,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,,,,< 1 year,,,,,,,,,,,,,,Coursera,,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,60,20,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,C/C++,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",College/University,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,60,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs",High school,Pharmaceutical,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,Decision Trees,HMMs,Random Forests","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Python,SQL,Unix shell / awk",,Most of the time,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests",,,,,,Sometimes,,Most of the time,,,,,Sometimes,Often,,Often,,Often,,,,,Most of the time,,,,,,,,,,,80,10,10,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Slack,Git,Rarely,130000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Norway,23,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Not Useful,,3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,,University courses,55,0,0,45,0,0,"Adversarial Learning,Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Female,India,23,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Deep learning,Python,,"Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Very useful,Somewhat useful,Very useful,,,,Very useful,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Other",Self-taught,30,40,30,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,I don't know,Increased slightly,Don't know,Some other way,Not at all important,Other,Basic laptop (Macbook),Relational data,,<1MB,Regression/Logistic Regression,"Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,Often,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Rarely,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,0,0,0,0,10,90,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,None,More internal than external,Other,,,,Share Drive/SharePoint,,,Never,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,I don't write code to analyze data,Engineer,Self-taught,80,20,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed part-time,,,Yes,,Data Scientist,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,60,0,0,30,0,Reinforcement learning,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,30,Employed part-time,,,Yes,,Engineer,Fine,Self-employed,Weka,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,,,,,Somewhat useful,,,Very useful,"Data Stories Podcast,DataTau News Aggregator,FlowingData Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,University courses,5,15,10,50,20,0,Natural Language Processing,Neural Networks - GANs,,Manufacturing,10 to 19 employees,Increased significantly,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,Bayesian Techniques,"Java,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Collaborative Filtering,kNN and Other Clustering,PCA and Dimensionality Reduction",,,,,Often,,,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,,,40,10,30,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Privacy issues,Scaling data science solution up to full database",Often,,,,Often,,,,,,,,Sometimes,,,,Often,Often,,,,,Less than 10% of projects,Entirely internal,IT Department,Weka,Cluster and then get usefull informar ion,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,"25,000",,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,70,5,5,10,0,Recommendation Engines,"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Increased slightly,Don't know,Some other way,Not very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Rarely,100MB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,30,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,Most of the time,18000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,,Nice to have,Nice to have,,Necessary,,,Necessary,Nice to have,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,,Somewhat important,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,Statistician,University courses,40,0,0,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,Random Forests,"Hadoop/Hive/Pig,Impala,Julia,Python,R,Spark / MLlib,SQL,Stan,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,36,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Researcher,Kaggle competitions,50,20,20,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,CRM/Marketing,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Never,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by college or university",Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,YouTube Videos",,,Somewhat useful,,,,,,,,,,,,,,,Very useful,"Data Stories Podcast,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Adversarial Learning,Computer Vision","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",,Academic,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Rarely,1GB,"CNNs,Neural Networks","C/C++,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,10,50,0,0,40,0,Enough to tune the parameters properly,"Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,,,,Sometimes,,None,Do not know,IT Department,"Imagenet,flw,feret",Programming,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,400,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,72,Retired,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Random Forests,R,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Work,60,0,20,20,0,0,Unsupervised Learning,Decision Trees - Gradient Boosted Machines,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Computer Vision,Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Video data,Text data",Always,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Orange,Python,TensorFlow",,,,Often,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,Rarely,,Rarely,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,Rarely,,,Often,Often,Often,Often,,,Often,,Often,,Sometimes,,,,,Often,,Often,,,Often,,Sometimes,,,,,,30,25,30,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",,,,Sometimes,Often,Most of the time,,,Most of the time,,Most of the time,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,Data mining; Data cleaning; Building models; Choice of decision criteria;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,800000,RUB,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,I don't write code to analyze data,,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Business Analyst,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Newsletters,Online courses,YouTube Videos",,,,,,,,Very useful,,,Very useful,,,,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,,Necessary,,Necessary,,,,,,,,,"Coursera,edX,Udacity",Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,Malaysia,24,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,SAS Base,Time Series Analysis,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,University courses,20,10,0,70,0,0,"Computer Vision,Reinforcement learning,Survival Analysis,Time Series",Logistic Regression,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,Linear Digressions Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,45,0,50,NA,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Other,Basic laptop (Macbook),"Text data,Relational data",,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","DataRobot,Jupyter notebooks,Python,R",,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,100% of projects,Do not know,IT Department,,,,,,Git,Always,"110,000",,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +,,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,1 to 2 years,"Data Analyst,Other",Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Tableau,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Sometimes,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,,,Often,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,50,25,0,25,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Sometimes,,,,,,,Sometimes,,,,,,,51-75% of projects,Do not know,Standalone Team,,,Other,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,30,10,0,20,40,0,Unsupervised Learning,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,Canada,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",University courses,0,33,17,33,17,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,No,Yes,Researcher,Fine,Employed by professional services/consulting firm,R,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Business Analyst,Self-taught,80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,DataRobot,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Personal Projects",,,,,Very useful,,,,,,,Not Useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,80,10,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Pharmaceutical,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,<1MB,Decision Trees,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,90,10,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,US census,too big,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,70000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,60,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Retail,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,46,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"No Free Hunch Blog,The Data Skeptic Podcast",3-5 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,0 - 1 hour,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Engineer",Work,10,70,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Female,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Workstation + Cloud service,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,India,27,Employed full-time,,,No,Yes,Data Analyst,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Scientist,DBA/Database Engineer,Engineer,Programmer",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Bayesian Techniques,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Other,Traditional Workstation,Relational data,Never,100MB,Bayesian Techniques,"NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Lack of data science talent in the organization",,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,24,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Deep learning,R,GitHub,"Arxiv,Kaggle,Personal Projects",Not Useful,,,,,,Very useful,,,,,Somewhat useful,,,,,,,"Jack's Import AI Newsletter,Linear Digressions Podcast,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,"Speech Recognition,Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Personal Projects",,,,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,10 to 19 employees,Stayed the same,1-2 years,A tech-specific job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,SVMs","Jupyter notebooks,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Naive Bayes",,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,30,20,50,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,4500000,JPY,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that performs advanced analytics,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Traditional Workstation,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Kenya,22,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Neural Nets,Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - GANs",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Colombia,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,51,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",5,85,5,0,5,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Decreased significantly,Don't know,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Never,100MB,"Decision Trees,Random Forests","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,,Often,Often,,,,,,,,Sometimes,,,,Sometimes,,,Often,,,,,,,Sometimes,,,,85,5,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,Often,,,,,,,,,Most of the time,,,100% of projects,Do not know,Business Department,,Ensure data quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,Git,Rarely,7000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Philippines,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Stata,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites",College/University,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,10,0,0,90,0,0,"Speech Recognition,Unsupervised Learning",Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Other",,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,65,5,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Other,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Other,"Text data,Relational data",Never,,,"SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,50,20,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,ORX,NO PROPER TOOLS,Other,Email,,Other,Sometimes,"30,000",INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Proprietary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,6 to 10 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",30,10,30,0,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Rarely,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Often,,Most of the time,Often,Often,Sometimes,,,Often,,Sometimes,,Sometimes,,Sometimes,Often,Often,Sometimes,,Sometimes,,,,,Sometimes,Often,Sometimes,,,,0,75,20,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,Sometimes,,,Often,Sometimes,,Most of the time,,Most of the time,,,,,,,Often,,51-75% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Never,250000,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Computer Vision,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Relational data",Never,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,RNNs","Jupyter notebooks,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Recommender Systems,RNNs,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Often,,,,,,,Sometimes,Sometimes,Sometimes,,,,,Often,,,,60,5,0,20,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,Often,Often,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,1000000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Operations Research Practitioner,,40,15,15,15,15,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - GANs",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,Java,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,,Engineering (non-computer focused),6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,0,10,0,0,80,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",CRM/Marketing,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Always,<1MB,Other,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,Rarely,40,25,20,10,5,0,,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,Other,"Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,100000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Netherlands,22,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",3-5 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Github Portfolio,No,Bachelor's degree,Computer Science,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,5,0,5,0,"Natural Language Processing,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,30,"Not employed, but looking for work",,,,,,,,Python,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),Arxiv,Very useful,,,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,11 - 39 hours,PhD,Yes,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,Computer Scientist,University courses,50,40,0,10,0,0,Time Series,Markov Logic Networks,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,India,31,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,20,0,0,30,0,Unsupervised Learning,Logistic Regression,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,,,Regression/Logistic Regression,"Cloudera,Hadoop/Hive/Pig,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Most of the time,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,Often,,Sometimes,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Prescriptive Modeling,Recommender Systems,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,Often,,Often,,,,,Most of the time,,,,,40,20,10,30,0,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization",,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,More external than internal,Standalone Team,,,,Company Developed Platform,,"Bitbucket,Subversion",Most of the time,50000,INR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,10,70,0,10,0,Other (please specify; separate by semi-colon),"Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,10 to 19 employees,Decreased slightly,Less than one year,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data",,10GB,"CNNs,Neural Networks,RNNs","IBM Watson / Waton Analytics,Python",,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,kNN and Other Clustering,Natural Language Processing,RNNs,Segmentation",,,,Most of the time,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,Often,Sometimes,,,,,,,,80,10,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Often,,Most of the time,,,,,Often,,,,,Most of the time,,76-99% of projects,Entirely external,IT Department,public medical image datasets,finding sufficient relevant and clean data to use,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Git,Always,15000,INR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Italy,45,Employed full-time,,,No,Yes,DBA/Database Engineer,,Employed by a company that performs advanced analytics,Python,Deep learning,Matlab,Other,Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,6 to 10 years,DBA/Database Engineer,University courses,20,20,20,20,20,0,Time Series,Support Vector Machines (SVMs),No education,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Important,Other,Other,Relational data,Never,1GB,Bayesian Techniques,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Researcher,University courses,20,30,25,10,15,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Microsoft SQL Server Data Mining,Python,R,SQL,TensorFlow,Unix shell / awk",,Rarely,,Sometimes,,,,,Sometimes,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Rarely,Sometimes,Sometimes,Rarely,Most of the time,Often,Sometimes,Sometimes,,,Most of the time,,Rarely,,Often,,Sometimes,Often,Often,Rarely,,Sometimes,Sometimes,Sometimes,Rarely,,Sometimes,Often,Sometimes,,,,25,20,35,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Rarely,,,,,,Sometimes,,Sometimes,,Often,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,Very useful,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"DataTau News Aggregator,FastML Blog,Other (Separate different answers with semicolon)",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,20,10,10,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Poland,32,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Online courses,Other",,Somewhat useful,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,40,0,0,0,50,"Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Random Forests,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,Romania,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,Other,"R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,"Data Visualization,Logistic Regression,Other",,,,,,,Often,,,,,,,,,Rarely,,,,,,,,,,,,,,,Sometimes,,,30,20,10,20,20,0,Enough to tune the parameters properly,"Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,Sometimes,,,,Sometimes,Often,,26-50% of projects,More internal than external,Central Insights Team,,Connect it,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Email,Other",Google Data Studio,,Rarely,15000000,CLP,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Weka,Support Vector Machines (SVM),Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,"Data Machina Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Engineer,Kaggle competitions,20,30,5,2,40,3,Computer Vision,Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",Technology,,,,,Somewhat important,,Laptop or Workstation and private datacenters,Relational data,Don't know,10MB,"Gradient Boosted Machines,Neural Networks,SVMs",C/C++,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Neural Networks,Simulation,SVMs",,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,,30,20,40,10,0,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,,,,,Rarely,,,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,140000,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,Statistician,University courses,30,5,15,50,0,0,"Adversarial Learning,Natural Language Processing,Survival Analysis","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks","Some college/university study, no bachelor's degree",Non-profit,"1,000 to 4,999 employees",Increased slightly,6-10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Other",Other,Sometimes,<1MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Amazon Machine Learning,IBM SPSS Statistics,R",Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,Sometimes,,,,Often,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,40,40,10,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues,Scaling data science solution up to full database",Sometimes,,,,,,,,Often,Most of the time,,,,,,,Often,Often,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,,Self-taught,40,10,30,20,0,0,,,A master's degree,Academic,10 to 19 employees,,More than 10 years,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,,"C/C++,IBM SPSS Statistics,Mathematica,MATLAB/Octave,Python,QlikView,R",,,,Often,,,,,,,,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,Sometimes,Often,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression,Segmentation,Time Series Analysis",,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Often,,,,40,30,10,0,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,26,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed part-time,,,No,Yes,Data Analyst,Poorly,Employed by professional services/consulting firm,KNIME (free version),Neural Nets,Python,,"Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,,,,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,DataCamp,Other,0 - 1 hour,Online Courses and Certifications,No,Master's degree,Management information systems,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",10,60,0,10,0,20,Other (please specify; separate by semi-colon),,A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Miner,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",0,30,40,20,10,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Recommender Systems",Sometimes,Most of the time,,,Most of the time,,Most of the time,Sometimes,,,,,,,,,,Sometimes,Most of the time,,,,,Most of the time,,,,,,,,,,30,20,30,10,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Limitations of tools,Need to coordinate with IT",,,,Sometimes,,,,,,,,,Rarely,,Often,,,,,,,,76-99% of projects,More external than internal,Other,"kaggle, game sales, competitions","implement in a right way for large scale of data, and how to proof it's a right way","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,70000,USD,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Python,"GitHub,Google Search","Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,30,20,10,0,0,Computer Vision,Evolutionary Approaches,A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data,Text data",Don't know,100MB,Evolutionary Approaches,"Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Segmentation",,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,Most of the time,Often,,,,Often,Most of the time,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,IT Department,Do not know,Resourse issue,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,7600000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,"Data Elixir Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,10,70,5,0,15,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Not very important,Other,Basic laptop (Macbook),Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes",Rarely,,Sometimes,,,,,Often,,,,,,Rarely,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,40,25,15,8,12,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Most of the time,,,Most of the time,,,,Sometimes,,Often,Sometimes,Often,,,,,Often,Sometimes,,,,10-25% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,1800000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,New Zealand,22,Employed part-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,20,10,10,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,34,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,"GitHub,Google Search","Conferences,Kaggle,Online courses,Tutoring/mentoring",,,,,Very useful,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,"Data Stories Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Markov Logic Networks,Regression/Logistic Regression,SVMs","Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,,,,,Rarely,Rarely,,,,,,Sometimes,,Most of the time,,,,,Often,Often,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,GANs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,,Rarely,Sometimes,,,Most of the time,,Sometimes,Sometimes,Sometimes,,Most of the time,,Sometimes,Sometimes,Sometimes,,Most of the time,,Most of the time,Often,Often,,,,40,30,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,Often,,,,,,Sometimes,,,,,,Sometimes,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,60000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,27,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,Mathematica,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,Very useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,10,30,0,10,0,Adversarial Learning,Decision Trees - Random Forests,A master's degree,Internet-based,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Always,10MB,Decision Trees,"Amazon Machine Learning,Amazon Web services,Python,R,SQL",Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Random Forests",Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,40,30,10,10,10,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Romania,57,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees","Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests",,,Most of the time,,,,Most of the time,Often,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,29,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,,,High school,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Business Analyst,Perfectly,Self-employed,,,,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst",University courses,30,10,20,20,20,0,"Adversarial Learning,Machine Translation,Survival Analysis",Neural Networks - CNNs,,Mix of fields,Fewer than 10 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,Bayesian Techniques,"Amazon Web services,Google Cloud Compute",,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,30,10,10,30,0,Enough to run the code / standard library,"Limitations in the state of the art in machine learning,Privacy issues",,,,,,,,,,,,Often,,,,,Sometimes,,,,,,51-75% of projects,More external than internal,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,,500000,SGD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,45,"Independent contractor, freelancer, or self-employed",,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed part-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,,45,Employed full-time,,,No,Yes,Researcher,,Employed by college or university,MATLAB/Octave,Proprietary Algorithms,Matlab,University/Non-profit research group websites,"College/University,Conferences,Friends network,Newsletters,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,Somewhat useful,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Nice to have,,Nice to have,,Nice to have,,,Nice to have,,,,,,"Basic laptop (Macbook),Traditional Workstation",,Experience from work in a company related to ML,Yes,,Electrical Engineering,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Work,0,10,45,34,11,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Markov Logic Networks,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important +Male,Other,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO,Self-employed",Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Online courses,YouTube Videos",Very useful,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer",Kaggle competitions,0,40,0,30,30,0,"Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data",Sometimes,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Collaborative Filtering,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs",,,,Most of the time,Sometimes,,,,Often,,,Often,,,,,,,,Often,,,Often,,Often,,,,,,,,,50,30,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Sometimes,,,,,,,Often,,,,,,,Often,,,,10-25% of projects,Approximately half internal and half external,IT Department,none,none,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Bitbucket,,60000000,MGA,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Haskell,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,40,0,15,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Japan,45,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,,,,Somewhat useful,,3-5 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,A social science,3 to 5 years,"Researcher,Statistician",Self-taught,70,30,0,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A health science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,30,40,0,0,20,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",Primary/elementary school,Pharmaceutical,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,Most of the time,Often,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Time Series Analysis",,,Often,,,,,Often,,,,,,,,Most of the time,,Sometimes,,Often,,,,,,,,,,Most of the time,,,,10,40,20,30,0,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,30,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Random Forests,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Textbook,Tutoring/mentoring",Somewhat useful,,,,Very useful,,Very useful,,,,,,,,Very useful,,Very useful,,Linear Digressions Podcast,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Electrical Engineering,,"Engineer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,,,,Somewhat important,,Not important,,,,Somewhat important +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook",,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,100GB,"CNNs,Neural Networks","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Neural Networks",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,20,60,10,10,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Never,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Australia,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,6 to 10 years,"Machine Learning Engineer,Researcher",University courses,10,20,0,70,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Sometimes,1GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,MATLAB/Octave,NoSQL,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",,,Sometimes,Sometimes,Sometimes,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,Often,Most of the time,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Most of the time,Most of the time,,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,Often,,,,,,,,Often,,Often,,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,80000,AUD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Very useful,,,,,,,Very useful,O'Reilly Data Newsletter,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,,Nice to have,Necessary,,,,"Coursera,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series",Logistic Regression,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Canada,42,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,,,,Very useful,KDnuggets Blog,5-10 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,45,0,35,15,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Link Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Online courses",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,"Data Machina Newsletter,Talking Machines Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important +Male,Taiwan,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by college or university,R,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Friends network,Kaggle",,,Very useful,,Very useful,Very useful,Somewhat useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler",Self-taught,30,20,30,0,10,10,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Never,100MB,"Decision Trees,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Naive Bayes",,Often,,,,Often,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,70,10,0,10,0,10,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Need to coordinate with IT",,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Less than 10% of projects,,,"kaggle datasets , KDD CUP datasets , datasets from stanford univeristies",cleaning and to chose the algorithm,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Git,Rarely,40000,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,India,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Tableau,MARS,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,"Data Scientist,Other",Other,10,11,3,1,5,70,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Sometimes,10MB,"Bayesian Techniques,SVMs","Amazon Web services,Jupyter notebooks,Python,R",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,SVMs,Text Analytics",,,,,Sometimes,,Most of the time,,,,,,,Sometimes,,Most of the time,,Sometimes,Most of the time,,,Sometimes,,,,,,Sometimes,Often,,,,,40,5,15,20,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,,,Sometimes,Sometimes,,,Sometimes,Most of the time,,Most of the time,,,Sometimes,,Most of the time,Sometimes,Often,,,Most of the time,,Less than 10% of projects,Do not know,IT Department,"data.gov,worlddata.org,google alerts",cleaning and understanding,Other,Other,,Other,Rarely,400000,INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Italy,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),Deep learning,R,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,High school,Manufacturing,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Never,,,"KNIME (free version),MATLAB/Octave,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,Most of the time,,,,,,,,,,"Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Rarely,Rarely,,,,10,0,0,90,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Often,Often,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Sometimes,32000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Italy,25,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by professional services/consulting firm,Tableau,Social Network Analysis,R,"Google Search,Government website","Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,Less than a year,Business Analyst,Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Python,SQL",,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Often,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,0,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,,26-50% of projects,More external than internal,Other,no,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Republic of China,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Personal Projects",,Very useful,Very useful,,,,Very useful,,,,,Very useful,,,,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",11 - 39 hours,Master's degree,Yes,Master's degree,A health science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,30,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important, +Male,Russia,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,Programmer,Self-taught,70,20,0,0,10,0,"Reinforcement learning,Unsupervised Learning",Neural Networks - RNNs,A professional degree,Manufacturing,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Very useful,,1-2 years,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,20,0,10,60,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,1 to 2 years,"Data Analyst,Data Scientist",Work,10,20,30,35,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,500 to 999 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Somewhat important,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Most of the time,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Segmentation,SVMs,Time Series Analysis",Often,,Sometimes,,,Most of the time,Most of the time,Sometimes,,,,,,Often,,Often,,Sometimes,,,,,,,,Rarely,,Sometimes,,Sometimes,,,,40,15,30,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Often,,Often,,,Most of the time,,,,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,60000,,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,0,30,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Computer Scientist,University courses,25,15,5,40,15,0,Time Series,Logistic Regression,A doctoral degree,Internet-based,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Oracle Data Mining/ Oracle R Enterprise,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Friends network,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,Other",,,,,,Very useful,Somewhat useful,,,,,,,Very useful,,,Very useful,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,Data Scientist,Operations Research Practitioner,Other",University courses,30,5,35,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1TB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Minitab,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,RapidMiner (free version),SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,Most of the time,Rarely,,Rarely,,Most of the time,,,Sometimes,Sometimes,Most of the time,,Rarely,,,,Rarely,,,Most of the time,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,,,,Often,Most of the time,Sometimes,,,,Sometimes,,Often,,Most of the time,,,Often,Often,Most of the time,,Often,,,Often,,Sometimes,Most of the time,Often,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,Most of the time,,,,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Often,Most of the time,Most of the time,,51-75% of projects,Approximately half internal and half external,Business Department,kaggle public dataset;ucla;amazon public data set; analyticsvidhya,"data is most of the time in not properly manager or higher missing values, data is often not maintained well, scale of the data is a problem. reliability of the data is a big problem ","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,800000,INR,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Iran,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Miner,Data Scientist,Engineer",University courses,40,0,0,60,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Evolutionary Approaches,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Telecommunications,,,,,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Text data",,,"Neural Networks,RNNs,SVMs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Evolutionary Approaches,RNNs,SVMs,Text Analytics",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,0,0,40,60,0,0,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,R,Government website,"Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important +Female,Nigeria,29,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,54,Retired,,,Yes,,Engineer,Poorly,,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,80,0,0,15,0,Survival Analysis,Decision Trees - Random Forests,A doctoral degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Online courses,Personal Projects,Textbook,Other",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Other,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",25,35,0,0,0,40,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Markov Logic Networks",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United Kingdom,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,42,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Predictive Modeler",Self-taught,40,20,10,30,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Podcasts",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,Very useful,,,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,edX,GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,15,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Indonesia,22,"Independent contractor, freelancer, or self-employed",,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,20,20,5,50,5,0,"Computer Vision,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Stayed the same,Less than one year,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data",Rarely,1GB,"Bayesian Techniques,Decision Trees,SVMs","Amazon Machine Learning,MATLAB/Octave,Python,SQL",Rarely,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Naive Bayes,Neural Networks,SVMs",,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources",Often,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Japan,27,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Work,40,30,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important +Male,Ukraine,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Stata,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,Spark / MLlib",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,40,5,15,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,Often,,,Often,Often,,,,,,,,Sometimes,,,Sometimes,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,840000,UAH,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,20,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,29,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,MARS,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,,Nice to have,,Nice to have,Nice to have,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Professional degree,,Less than a year,Programmer,Other,20,0,0,0,0,80,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,,,Very Important,Very Important,,Very Important,,,,,,,,, +Female,Malaysia,37,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,Software Developer/Software Engineer,Work,100,0,0,0,0,0,"Computer Vision,Survival Analysis","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Java,"GitHub,Google Search","Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,"Data Machina Newsletter,FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,"Programmer,Software Developer/Software Engineer",Kaggle competitions,58,10,12,0,20,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)",Logistic Regression,No education,Retail,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,1MB,SVMs,Java,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,25,30,10,5,0,Enough to run the code / standard library,"Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,10-25% of projects,,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Subversion,Never,1272000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,I haven't started working yet,University courses,60,0,0,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,20,20,0,10,0,"Natural Language Processing,Recommendation Engines,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Other,"Image data,Video data,Text data",Sometimes,10GB,"CNNs,Decision Trees,Neural Networks,RNNs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,TensorFlow",,,,,Often,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks",,,Sometimes,Most of the time,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,Most of the time,,,,,,,,,,,,,,30,50,10,10,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Often,Often,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,20,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,"Not employed, but looking for work",,,,,,,,C/C++,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity,Other","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",40+,Kaggle Competitions,No,Master's degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",5,25,0,20,40,10,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,South Korea,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Neural Nets,R,Other,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Data Miner,Programmer,Software Developer/Software Engineer",University courses,0,0,10,60,30,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Other,No,Bachelor's degree,Physics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Reinforcement learning","Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important +Female,Norway,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Scientist,Engineer,Other",Work,5,40,45,5,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Other,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,SAS Base,,,,YouTube Videos,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,20,40,20,0,0,Natural Language Processing,Other (please specify; separate by semi-colon),A master's degree,CRM/Marketing,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Never,1GB,,"Amazon Web services,Cloudera,Python,R,Spark / MLlib,SQL,Tableau",,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,"CNNs,Data Visualization,Natural Language Processing,Random Forests",,,,Rarely,,,Most of the time,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,I prefer not to say",Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,More internal than external,Business Department,,,,,,,,,,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,46,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Very useful,,Very useful,Very useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,60,0,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Other,21,"Not employed, but looking for work",,,,,,,,R,Text Mining,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Online courses,Personal Projects,Textbook",,,,,Very useful,,,,,,Very useful,Very useful,,,Very useful,,,,,1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,40+,Other,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,Self-taught,50,30,10,10,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning","Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Australia,27,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,3 to 5 years,Statistician,Self-taught,20,0,50,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,Python,Support Vector Machines (SVM),R,GitHub,"College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,NA,50,0,50,0,0,"Machine Translation,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A doctoral degree,Other,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",,"Text data,Relational data",,,,"SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed part-time,,,Yes,,Researcher,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,Julia,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,Somewhat useful,,,,Very useful,"Data Machina Newsletter,No Free Hunch Blog,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important +Male,India,23,"Not employed, but looking for work",,,,,,,,Julia,Bayesian Methods,Matlab,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",1-2 years,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity",GPU accelerated Workstation,11 - 39 hours,PhD,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,0,0,0,100,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,Israel,36,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Textbook,YouTube Videos",,Somewhat useful,Not Useful,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,70,0,0,10,20,0,Unsupervised Learning,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Impala,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Official documentation",Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,"FastML Blog,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,0,10,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Relational data,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,NoSQL,Python,R,SQL,Stan,TensorFlow,Unix shell / awk",,,,Rarely,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,Rarely,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,Sometimes,,,Often,,Often,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",Often,,Sometimes,,Sometimes,,,,Most of the time,,,Most of the time,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,,Most of the time,Sometimes,,,,,40,10,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,7500000,JPY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,55,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,,Very useful,Very useful,,,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Financial,10 to 19 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Most of the time,100MB,"CNNs,Neural Networks,RNNs","Amazon Web services,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python",,Sometimes,,,,,,,,,,,Sometimes,,,,Most of the time,,,,Often,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Text Analytics",,,,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,Sometimes,,,,Often,Sometimes,,,Often,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Often,,,,Often,,,,,,,,Most of the time,,Often,,,,,Often,,26-50% of projects,More internal than external,Central Insights Team,Glove; ,Preprocessing,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,Other,8,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,20,40,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","KNIME (free version),Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,Often,,Most of the time,,Rarely,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,,,,,,,Often,,,,Sometimes,,,Most of the time,,Sometimes,,,Sometimes,,Often,,,,30,20,10,20,10,10,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning",,,,,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Stack Overflow Q&A",,Very useful,Very useful,,,,,,,,,,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting",A bachelor's degree,Financial,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,,100MB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,Tableau",,,,,,,,,,,,Often,,,,,Often,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,,,,Often,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,Often,Often,,,,,,,,Often,,,,20,30,0,30,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,,,Sometimes,,,,,,,Often,,,,Often,,Often,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Git,Rarely,15000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,Very useful,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,,Self-taught,40,20,10,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Female,India,22,Employed part-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,I haven't started working yet",University courses,5,15,5,70,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,I don't know,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests","R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,Often,,,,,Sometimes,,Sometimes,Often,Most of the time,,,,,Often,Often,Often,,,,70,5,0,10,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of funds to buy useful datasets from external sources",Sometimes,,,,,Often,,,,Often,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Company Developed Platform",,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",University courses,10,20,40,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Reinforcement learning,Neural Networks - CNNs,A master's degree,Non-profit,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Never,1GB,CNNs,"Amazon Web services,Jupyter notebooks,Python",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",25,20,30,0,25,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,Sometimes,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,SVMs,Text Analytics",,,,,,,Most of the time,,,,,,,,,Most of the time,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,,Often,Rarely,,,,,35,25,20,20,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Company internal community,Friends network,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,,,,,Somewhat useful,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,"Information technology, networking, or system administration",,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Survival Analysis,Bayesian Techniques,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Internet-based,"5,000 to 9,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Self-taught,40,10,10,30,10,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines",,Mix of fields,"5,000 to 9,999 employees",Increased significantly,1-2 years,A tech-specific job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,Don't know,100MB,Bayesian Techniques,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Naive Bayes,Text Analytics",,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,80,20,0,0,0,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,Often,,Most of the time,,,,,,Most of the time,,,,,,Most of the time,,Less than 10% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,355000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,10,25,10,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,Most of the time,,Often,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Often,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,Often,Often,,,,Often,,Often,,Often,,Often,Often,,,Often,Often,Often,,Often,,Often,Often,Often,,,,35,30,5,20,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,Often,Often,,Often,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by college or university,Self-employed",DataRobot,,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Personal Projects",Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Adversarial Learning,Bayesian Techniques,,Academic,,,,,Very important,,Basic laptop (Macbook),Image data,Sometimes,,Bayesian Techniques,Amazon Machine Learning,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,,,,,Column-oriented relational (e.g. KDB/MariaDB),Other,,Git,,220000,CNY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Java,Decision Trees,C/C++/C#,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,I haven't started working yet",University courses,50,10,10,30,0,0,,,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,Canada,39,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"Arxiv,Blogs,College/University,Kaggle,Newsletters,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,30,0,60,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A professional degree,Insurance,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Don't know,,"Bayesian Techniques,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Jack's Import AI Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,"Computer Vision,Speech Recognition,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"5,000 to 9,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Most of the time,100MB,"CNNs,Ensemble Methods,HMMs,Neural Networks,RNNs,SVMs","Amazon Web services,C/C++,Microsoft Azure Machine Learning,Python,Other",,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,,Most of the time,,,,,,,Often,,,,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,Often,,Most of the time,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,Often,,Most of the time,,,76-99% of projects,More external than internal,Standalone Team,"IMAGENET, YOUTUBE, ",Get available dataset,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,500000,TWD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,South Korea,41,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,55,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,Less than a year,Researcher,Self-taught,20,5,5,25,0,45,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data,Other",Rarely,,"Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,C/C++,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow",Sometimes,,,Often,,,,Sometimes,Sometimes,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,Often,Often,,,,,,"Data Visualization,Ensemble Methods,Evolutionary Approaches,GANs,Logistic Regression,Markov Logic Networks,Natural Language Processing,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,,,,,,Most of the time,,Often,Sometimes,Rarely,,,,,Often,Often,,Often,,,,Often,Often,Often,,,Often,Often,,,,,5,5,50,25,5,10,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Rarely,,26-50% of projects,Entirely external,IT Department,Nil,Environment ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,800000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,47,Employed full-time,,,No,Yes,Researcher,Fine,Employed by company that makes advanced analytic software,Python,Cluster Analysis,Python,GitHub,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,DBA/Database Engineer,Engineer,Researcher,Software Developer/Software Engineer",Work,30,40,20,0,5,5,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Engineer,Predictive Modeler,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,21,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Researcher",University courses,0,0,50,25,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,Sometimes,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",Sometimes,,Sometimes,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Often,,,,,,,Most of the time,,,,,,,Sometimes,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,Rarely,Most of the time,,,Sometimes,,Often,Often,,,,,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,15,60,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,"1,000 to 4,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Kenya,30,Employed part-time,,,Yes,,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Engineer,Machine Learning Engineer",University courses,5,10,25,60,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),,Hospitality/Entertainment/Sports,Fewer than 10 employees,Decreased significantly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,54,"Not employed, but looking for work",,,,,,,,R,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Textbook",,,Very useful,,,,Very useful,,,,,,,,Very useful,,,,,1-2 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,0,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,,Not important,Not important,Not important,Not important,Not important +Female,Other,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,20,60,20,0,0,0,,,,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data",Sometimes,,CNNs,"C/C++,IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,Often,,,,,,,,,Often,,Often,,Often,,,,,,Sometimes,,,,,,,,,,,Often,,,,Sometimes,,,,,,CNNs,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,30,20,10,0,Enough to tune the parameters properly,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Sometimes,,,,,,,,,26-50% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,120000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Friends network,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Researcher",Self-taught,30,25,40,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,16-20,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,Google Search,"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,Data Scientist,University courses,50,0,10,40,0,0,Outlier detection (e.g. Fraud detection),Support Vector Machines (SVMs),A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,,1GB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,,,,Most of the time,,,,,,,Often,,,,,,Often,Often,,,,,Often,,,,,,,,0,55,0,30,15,0,Enough to tune the parameters properly,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,Sometimes,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,Git,Most of the time,2000000,INR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter",3-5 years,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,India,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,"Personal Projects,Textbook",,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,University courses,0,0,0,100,0,0,,Decision Trees - Random Forests,A master's degree,Academic,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,,,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,,,,,,Most of the time,Often,,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,70,10,5,5,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,,,,,,"100,000",,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Nigeria,33,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),,A professional degree,Financial,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Don't know,,,"SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Prescriptive Modeling,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,,,35,15,15,15,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools",Most of the time,Often,Sometimes,Often,Most of the time,Often,,Often,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Never,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),40+,Kaggle Competitions,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,30,20,0,30,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important +Male,Switzerland,35,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Bayesian Methods,R,,"Stack Overflow Q&A,Other",,,,,,,,,,,,,,Somewhat useful,,,,,Talking Machines Podcast,1-2 years,,,,,,,,,,,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Psychology,,"Researcher,Statistician,Other",Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,,Very useful,,,,,< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Nigeria,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Necessary,,,,"DataCamp,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Not important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +Male,Colombia,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Self-employed",Microsoft Azure Machine Learning,Decision Trees,R,"Google Search,Government website","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",25,25,25,0,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Other,Always,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,R",,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,Often,Sometimes,,,Often,,Often,,,,Sometimes,,Sometimes,,Often,,,,,,,Often,Sometimes,,,,Often,,Often,,,,25,25,25,25,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,,Most of the time,,,Most of the time,Often,,,,,Often,Often,Most of the time,,,,,,10-25% of projects,Entirely external,Standalone Team,"goberment, ","Governmental entities, private entities","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,40000000,COP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Official documentation,Online courses,Personal Projects,Textbook",,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Australia,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Engineer",Self-taught,90,0,0,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",,Technology,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10TB,,"Amazon Web services,Google Cloud Compute",,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Naive Bayes,PCA and Dimensionality Reduction",Sometimes,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,50,40,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input",,,Often,,,,,,Sometimes,,Rarely,,,,,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,Git,,130000,AUD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Siraj Raval YouTube Channel,< 1 year,,Necessary,Nice to have,,Nice to have,Nice to have,,Nice to have,Nice to have,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,I did not complete any formal education past high school,,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Female,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,24,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Engineer,Operations Research Practitioner,Predictive Modeler,Programmer",University courses,10,10,10,50,0,20,"Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Random Forests,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",65,30,0,0,5,0,Recommendation Engines,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,10GB,Other,"C/C++,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,kNN and Other Clustering,Neural Networks,Recommender Systems,SVMs",Most of the time,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,,,60,15,15,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses",Not Useful,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,,,,,Becoming a Data Scientist Podcast,1-2 years,,,,,,,,,,,,,,Coursera,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Miner,Kaggle competitions,40,20,0,0,40,0,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,19,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Other,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Researcher,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,Textbook,Other",,,,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,"O'Reilly Data Newsletter,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,32,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Researcher,Statistician,Other",Work,30,10,20,20,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression",A master's degree,Academic,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Traditional Workstation,Relational data,Rarely,1GB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SAS Base,SAS Enterprise Miner,SQL,Stan",,,,Rarely,,,,,Sometimes,,,,,,Sometimes,,,,,Often,Often,,Often,Often,,,,,,,,,Most of the time,,,,,Often,Most of the time,,,Most of the time,Sometimes,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Segmentation,Simulation,Text Analytics",Often,,Most of the time,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,Sometimes,,,,,50,30,0,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",Often,Often,,,Most of the time,,,,Most of the time,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Trade book",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,,,"Data Machina Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Software Developer/Software Engineer,Other",Self-taught,80,10,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk,Other",,Often,,,,,,,Most of the time,,,,,,,,Often,,,,Rarely,,Sometimes,,,,Rarely,,,,Often,,Often,,,,,,,,Most of the time,Often,,,,,,Sometimes,Most of the time,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,Rarely,Sometimes,Often,Sometimes,Often,,,Often,,Sometimes,,Often,,Sometimes,,,Often,,Often,,,Often,,,Often,Often,,,,50,20,5,5,10,10,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,Often,,,,,,Often,Most of the time,,,,,Sometimes,,,Less than 10% of projects,More internal than external,Standalone Team,Financial market data from Quandl,Determining appropriate features to extract,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Most of the time,205000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Australia,26,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,"Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",Rarely,,,IBM Watson / Waton Analytics,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Time Series Analysis",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,1,1,1,1,1,95,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Other",Sometimes,,,,,,,,Rarely,,,,,,,,,,,,,Often,76-99% of projects,Approximately half internal and half external,IT Department,None,Complex structure,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,65000,AUD,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Sweden,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by non-profit or NGO,,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Other,Self-taught,50,20,20,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Non-profit,500 to 999 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Relational data,Other",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,Often,,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,Most of the time,,,,,,,,,,Most of the time,,,51-75% of projects,Entirely internal,Standalone Team,USAID Demographic Health Survey Data,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"120,000",KES,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,90,0,5,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Military/Security,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Video data,Rarely,100GB,"CNNs,Decision Trees,Gradient Boosted Machines","Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,Often,,,"CNNs,Decision Trees,Gradient Boosted Machines",,,,Sometimes,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,50,15,30,5,0,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database",,,,,,,,,,Rarely,,Most of the time,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by non-profit or NGO,SQL,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,Very useful,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,1 to 2 years,Data Scientist,Work,30,10,50,5,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Non-profit,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),,Sometimes,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Time Series Analysis",,,Rarely,,,Sometimes,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,20,40,10,10,20,0,Enough to run the code / standard library,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Sometimes,Sometimes,,10-25% of projects,More external than internal,Other,"City data, federal reserve data",Missing values,Other,Other,Dropbox,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,0,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Julia,Neural Nets,R,I collect my own data (e.g. web-scraping),"Friends network,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - CNNs,A master's degree,Technology,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,,Neural Networks,"Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Often,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,,2,,,,,,,,,,,,,,,,,, +,,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,27,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Time Series,,Primary/elementary school,Internet-based,100 to 499 employees,Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Telecommunications,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,1TB,"Random Forests,Regression/Logistic Regression,SVMs",Spark / MLlib,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,"Decision Trees,Prescriptive Modeling,Random Forests",,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,30,30,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,22,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,21,"Not employed, but looking for work",,,,,,,,SQL,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",70,10,0,20,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,KDnuggets Blog,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Other,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,30,0,40,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,India,32,Employed full-time,,,No,Yes,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Blogs,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Master's degree,,1 to 2 years,"Business Analyst,Researcher",Self-taught,50,40,10,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Textbook",Somewhat useful,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,20,10,50,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Other,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Sometimes,10MB,"Regression/Logistic Regression,Other",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Other",,,,,,,Sometimes,,,,,,,Often,,Often,,,,,,,,,,,,,,,Most of the time,,,20,40,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Git,Subversion",Rarely,115000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,31,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,R,Google Search,"Blogs,Online courses,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,Not Useful,,,,,Not Useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"DataCamp,Udacity,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",5,95,NA,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,Switzerland,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Web services,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Very useful,,,,,,Somewhat useful,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,0,0,40,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,100MB,"Decision Trees,Regression/Logistic Regression","Python,R,SAS Base,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,Often,,,,,,,,,Most of the time,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,30,10,0,40,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,,Often,,,,,,,,Often,,,,Most of the time,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Decision Trees,C/C++/C#,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,No Free Hunch Blog,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Master's degree,Sort of (Explain more),Master's degree,Computer Science,,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,Researcher,University courses,50,40,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,,"Bayesian Techniques,Decision Trees",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A health science,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Often,Often,Often,,,,,,Sometimes,,,,,Sometimes,Rarely,Often,,Often,,,,Sometimes,,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,,,,,,,Sometimes,,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A professional degree,Financial,,,,,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",,,,"Amazon Web services,Jupyter notebooks,Python,SQL",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,Less than 10% of projects,Do not know,Other,,,,,,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer",Self-taught,35,0,65,0,0,0,Supervised Machine Learning (Tabular Data),,,Non-profit,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Don't know,1GB,Other,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,20,40,30,10,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,,,"Data Stories Podcast,Partially Derivative Podcast,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Traditional Workstation,11 - 39 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series",Support Vector Machines (SVMs),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important +Male,Netherlands,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,I don't write code to analyze data,"Data Scientist,Engineer",Self-taught,10,10,10,50,10,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,32,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,"Engineer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,Somewhat useful,,,,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,50,10,0,35,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Kenya,26,Employed part-time,,,No,Yes,Programmer,Fine,Self-employed,TensorFlow,Deep learning,C/C++/C#,GitHub,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Unnecessary,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,Sort of (Explain more),Bachelor's degree,Other,Less than a year,I haven't started working yet,Self-taught,90,10,0,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Other,26,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by company that makes advanced analytic software,Flume,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Miner,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,60,20,10,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Python,R,Spark / MLlib,SQL",,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,"Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,SVMs",,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,Often,,Often,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,10,50,30,10,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,"100,000",ETB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Java,,Python,Google Search,"Blogs,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,90,0,10,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Support Vector Machines (SVMs),,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Very Important,Very Important,Very Important,,,,,,,,,,,, +Male,India,59,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer,Statistician",Self-taught,60,10,20,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,,"Text data,Relational data",Rarely,<1MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Julia,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Association Rules,Decision Trees,Logistic Regression,Neural Networks,Text Analytics,Time Series Analysis",,Rarely,,,,,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,Sometimes,Rarely,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Data Scientist,Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon)",Bayesian Techniques,High school,Insurance,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Random Forests,Segmentation,Text Analytics",,Sometimes,Sometimes,,,,Most of the time,Often,,,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,Most of the time,,,,,50,10,20,10,10,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",1-2 years,,,,,Necessary,,,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,20,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Bayesian Methods,SAS,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,45,0,0,25,10,Other (please specify; separate by semi-colon),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,,Never,<1MB,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Decision Trees,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,50,0,0,0,50,0,Enough to run the code / standard library,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,None,Do not know,IT Department,,,Other,Email,,Other,Never,250000,IRR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Company internal community,Kaggle,Textbook",,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,Somewhat useful,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Female,Singapore,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,19,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,GPU accelerated Workstation,0 - 1 hour,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Other,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,No,Yes,Researcher,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,10,5,75,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,37,Employed full-time,,,Yes,,Business Analyst,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,20,80,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Decision Trees,Random Forests,RNNs","Cloudera,Impala,Java,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,Often,,,,,,,,,Often,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Decision Trees,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,,,,,Often,,,,,,,,,,Often,,,Often,,Often,,Sometimes,,,,,Sometimes,,,,25,25,0,25,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,,,Often,,Often,,Often,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,Google Search,"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,,,,,Necessary,Necessary,,Necessary,Necessary,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Business Analyst,Self-taught,50,25,0,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,QlikView,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,,,,,,Sometimes,,Often,,Often,,,,,Sometimes,Rarely,Often,,,,25,25,10,25,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,Often,,,Most of the time,,,,,,Most of the time,,,,,Most of the time,,,,,,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,10000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",Other,25,50,25,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,R,I don't plan on learning a new ML/DS method,Python,Government website,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,O'Reilly Data Newsletter,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,Traditional Workstation,0 - 1 hour,Other,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,10,40,20,20,5,5,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,Australia,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Excel Data Mining,I don't plan on learning a new ML/DS method,Java,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,1 to 2 years,Computer Scientist,University courses,20,0,0,80,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Other,Don't know,,"CNNs,Decision Trees,Neural Networks,Random Forests,RNNs,SVMs","Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Data Visualization,Decision Trees,Neural Networks,Random Forests,RNNs,SVMs",,,,Rarely,,,Sometimes,Rarely,,,,,,,,,,,,Often,,,Sometimes,,Often,,,Often,,,,,,50,20,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Recommendation Engines,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,50,20,0,20,0,"Computer Vision,Speech Recognition","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",I prefer not to answer,Government,Fewer than 10 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",Less than a year,DBA/Database Engineer,University courses,20,10,40,20,5,5,"Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Technology,"5,000 to 9,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,,,Often,,,,Often,,,,Sometimes,,,Often,,,,Most of the time,,,,,,Most of the time,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Personal Projects",,,,Very useful,,,,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines",,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important +Male,Belarus,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,42,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,I don't plan on learning a new ML/DS method,Scala,Google Search,"Official documentation,Online courses,Tutoring/mentoring",,,,,,,,,,Not Useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",Work,55,20,25,0,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,,,,,,,,,Very useful,"FastML Blog,Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Pakistan,20,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Machine Learning Engineer,I haven't started working yet",University courses,25,45,10,20,0,0,Unsupervised Learning,Decision Trees - Random Forests,A bachelor's degree,Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Simulation,SVMs",,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,,,,Sometimes,Rarely,,,,,,,,,,Often,Often,,,,,,25,15,15,25,20,0,Enough to explain the algorithm to someone non-technical,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,Graph (e.g. GraphBase/Neo4j),,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,,,Necessary,Necessary,Necessary,,,Nice to have,Nice to have,Necessary,,,,Coursera,Workstation + Cloud service,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Textbook,YouTube Videos",,,,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,20,20,10,0,0,"Computer Vision,Natural Language Processing,Other (please specify; separate by semi-colon)",Evolutionary Approaches,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United Kingdom,48,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Workstation + Cloud service,Other","Image data,Other",Rarely,1TB,"Decision Trees,Random Forests,SVMs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,Other",,,,Sometimes,,,,,Often,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Often,Often,,,,,,Often,,,,,,,Sometimes,,Often,,,,Often,,,Most of the time,,,,60,10,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Often,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,IT Department,Earthquake catalogues,,,Share Drive/SharePoint,,"Git,Other",Sometimes,150000,GBP,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Turkey,36,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by government,TensorFlow,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,3-5 years,Necessary,Necessary,Necessary,,Nice to have,Nice to have,,,,Nice to have,,,,"Coursera,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,Python,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,5,5,20,NA,Outlier detection (e.g. Fraud detection),Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important,Very Important +Male,United States,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,NoSQL,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,India,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Neural Nets,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,High school,Mix of fields,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",Sometimes,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Sometimes,,,,20,30,10,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input",Often,Often,,,Often,Often,,,,,Often,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,demographic,data credibility,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,650000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Computer Vision,Time Series",Logistic Regression,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Programmer",Kaggle competitions,30,20,10,20,10,10,"Recommendation Engines,Time Series","Logistic Regression,Neural Networks - RNNs",High school,Technology,100 to 499 employees,Stayed the same,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,Most of the time,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Recommender Systems,Time Series Analysis",,,,,Sometimes,Often,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,Sometimes,,,,,,Often,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,26-50% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,56,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,R,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)",College/University,,,Very useful,,,,,,,,,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher",Self-taught,100,0,0,0,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Rarely,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,HMMs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis,Other",,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,Often,Often,Most of the time,,,,,,Most of the time,,,Most of the time,Most of the time,,,50,40,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Sometimes,,,,,,,,,Most of the time,Often,,,Less than 10% of projects,,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,190000,SGD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,France,25,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,"FastML Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,Less than a year,"Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",University courses,30,0,30,20,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),"Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Decreased slightly,1-2 years,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,60,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Other,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Newsletters,Official documentation,Online courses,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Very useful,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",15+ years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,,Statistician,University courses,NA,NA,NA,NA,NA,NA,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Japan,37,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,Very useful,,,Very useful,,,Very useful,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Researcher",Work,20,10,70,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Workstation + Cloud service,Text data,Sometimes,10GB,"Bayesian Techniques,CNNs,Neural Networks,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Mathematica,MATLAB/Octave,Python,Spark / MLlib,SQL",,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,,,,,Often,Often,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"CNNs,Collaborative Filtering,Data Visualization,Naive Bayes,Natural Language Processing,Neural Networks,SVMs,Text Analytics",,,,Rarely,Sometimes,,Most of the time,,,,,,,,,,,Sometimes,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,10-25% of projects,Entirely internal,Standalone Team,"SNS Data, Google Corpus Data",To reconstruct our real world from data approximately with highly levels. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",,65000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,50,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,19,"Not employed, but looking for work",,,,,,,,Python,,Python,,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,< 1 year,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,"Data Analyst,Statistician",Self-taught,60,10,0,30,0,0,,Bayesian Techniques,A bachelor's degree,Manufacturing,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,46,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Tableau,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Conferences,Textbook",Somewhat useful,,Very useful,,Very useful,,,,,,,,,,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Computer Scientist,University courses,60,10,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Don't know,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Unix shell / awk",Sometimes,,,Rarely,Sometimes,,Often,,Most of the time,,,,,Often,Sometimes,,,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Often,,Often,Most of the time,,Often,Often,,,,Often,Most of the time,,Most of the time,,Often,,Often,Most of the time,,Most of the time,,Often,Most of the time,,Most of the time,Most of the time,Most of the time,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,,Self-taught,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,Very important,,Laptop or Workstation and private datacenters,Relational data,,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Natural Language Processing,Segmentation,Time Series Analysis",,Often,,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,,Sometimes,,,,60,30,0,10,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,100% of projects,More internal than external,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Personal Projects,Textbook",Very useful,Very useful,,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,30,0,40,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100GB,"Neural Networks,RNNs","Amazon Machine Learning,C/C++,Cloudera,Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,Sometimes,,,Sometimes,,,,,,,Often,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Natural Language Processing,Neural Networks,Recommender Systems,RNNs",Most of the time,,,,Often,Sometimes,,,,,,,,,,,,,Often,Most of the time,,,,Most of the time,Sometimes,,,,,,,,,25,30,35,0,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Git",Rarely,480000,ILS,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,No Free Hunch Blog,< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",33,33,0,0,34,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,South Korea,46,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Kaggle",,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,1 to 2 years,Business Analyst,Self-taught,40,40,0,0,20,0,,"Bayesian Techniques,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,47,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Tableau,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Company internal community,Kaggle,Personal Projects",,,Somewhat useful,Somewhat useful,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression",High school,Insurance,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,"Markov Logic Networks,Neural Networks","C/C++,Microsoft Excel Data Mining,R,SQL",,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Markov Logic Networks,Neural Networks,Time Series Analysis",,,,,,,Often,,,,,,,,,,Often,,,Often,,,,,,,,,,Often,,,,10,10,10,25,45,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,,,,,,,,,,Often,,Most of the time,,Sometimes,,,,100% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Never,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,Data Machina Newsletter,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Singapore,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,Electrical Engineering,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,16,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,Less than a year,"Data Analyst,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Data Analyst,Programmer",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by non-profit or NGO",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,Survival Analysis,SQL,,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",0,50,0,40,10,0,,Logistic Regression,A bachelor's degree,Manufacturing,"10,000 or more employees",Decreased slightly,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Never,1MB,,"Hadoop/Hive/Pig,R,SAS Base,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Sometimes,,,Often,,,,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,30,0,0,20,50,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,,,,,,,,97,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Turkey,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Java,Time Series Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,,,,Very useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Work,30,0,30,20,20,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Company internal community,Official documentation",,Very useful,Somewhat useful,Very useful,,,,,,Very useful,,,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,50,0,30,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,20 to 99 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Text data,,100MB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,NoSQL,R,TensorFlow",,,,,,,,,Often,,,,,,Rarely,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Often,,Most of the time,,Often,,Often,,Sometimes,,Sometimes,,Often,Most of the time,,Most of the time,,,,,Most of the time,,Sometimes,,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,Unavailability of/difficult access to data,Other",,,,,,Sometimes,,,Often,,Most of the time,,Most of the time,,,,Sometimes,,,,Most of the time,Most of the time,100% of projects,More internal than external,Standalone Team,weather data; remote sensing data; national census data; national survey data on socio economic indicators,Negotiating access to data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Sometimes,2100000,LKR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Israel,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Julia,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Psychology,3 to 5 years,"Data Analyst,Data Scientist",Self-taught,60,15,5,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",,,,,Rarely,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,Often,,,Sometimes,,Often,,Often,Sometimes,,,,,,Often,,,,50,15,7,15,13,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,,,,,,,Sometimes,Often,,,,Often,,,100% of projects,More internal than external,Standalone Team,openEMR; ,Combining security with the need to share data among employees and across different platforms,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,186000,ILS,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Predictive Modeler,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,20,10,0,50,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Other,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,Business Analyst,Self-taught,70,10,10,0,0,10,Recommendation Engines,Other (please specify; separate by semi-colon),A bachelor's degree,Other,"10,000 or more employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,10GB,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,25,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by college or university,SQL,Deep learning,SQL,"Google Search,University/Non-profit research group websites","College/University,Friends network,Textbook",,,Very useful,,,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,50,10,0,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I prefer not to answer,Stayed the same,More than 10 years,,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Rarely,,"CNNs,Neural Networks,SVMs","Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining",,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Decision Trees,Logistic Regression,Neural Networks,Segmentation,SVMs,Time Series Analysis",,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,Often,,Often,,,,20,60,0,5,15,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Most of the time,,26-50% of projects,Approximately half internal and half external,IT Department,,,,Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,80,20,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Bayesian Techniques,No education,CRM/Marketing,500 to 999 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Other,Basic laptop (Macbook),Relational data,Rarely,1MB,,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,50,30,10,0,10,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",25,25,0,0,50,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,Computer Vision,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Official documentation",Somewhat useful,,,,,,Very useful,,,Very useful,,,,,,,,,"FastML Blog,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Male,Israel,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,Programmer,University courses,30,10,20,20,10,10,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Internet-based,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Bayesian Techniques,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,NoSQL,Python",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,,,,Often,,,Often,Often,,,,,,Most of the time,,,,20,40,10,20,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,Often,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Taiwan,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",Blogs,,Very useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Non-profit,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Neural Networks","C/C++,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Neural Networks,Time Series Analysis",,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,30,40,5,20,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",,Often,,,,,,,,,,,,,,Often,,,,,,,26-50% of projects,More external than internal,Business Department,Government opendata,ETL,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,50000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Australia,0,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by government,,Bayesian Methods,R,Government website,"Blogs,College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,,Not Useful,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Other",University courses,10,0,10,70,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,500 to 999 employees,Stayed the same,Don't know,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Rarely,1GB,"Regression/Logistic Regression,Other","IBM Cognos,Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,Other",,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,Often,Rarely,,,Often,,,Rarely,,,,,,Rarely,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,,,,Rarely,,,Sometimes,,,,30,20,0,40,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Often,Often,,,,,,,Sometimes,,,,,,Sometimes,,,,Often,Often,,Most of the time,100% of projects,More internal than external,Standalone Team,"Sample surveys, Public (government) data",Inconsistent definitions and availability across time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,78000,AUD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Other,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Very useful,,,,,,,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),1-2 years,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Neural Networks - RNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Not important,Very Important +Male,Germany,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,,,,,GPU accelerated Workstation,2 - 10 hours,Master's degree,Yes,Master's degree,Management information systems,1 to 2 years,Other,University courses,20,0,0,40,40,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important,Not important +Male,India,31,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,,Very useful,,Very useful,,,Very useful,,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,,Self-taught,60,10,10,0,10,10,Computer Vision,,,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,17,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Computer Science,1 to 2 years,"Programmer,Researcher",Self-taught,75,0,0,0,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Image data,,100MB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by a company that performs advanced analytics,Employed by government",Flume,Anomaly Detection,Python,Other,"Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Not Useful,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,30,0,30,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,20 to 99 employees,Increased significantly,3-5 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Text data,Relational data",Sometimes,10TB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,Often,,,,Sometimes,,,,,Sometimes,Often,,Most of the time,,,Sometimes,Sometimes,,Sometimes,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,Sometimes,Often,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,Rarely,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,,,,,Often,Often,,Most of the time,,Sometimes,,,Sometimes,,Most of the time,,,,50,20,5,25,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,,Often,Often,,Sometimes,Most of the time,,,Most of the time,,Sometimes,Often,,26-50% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,110000,ILS,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Friends network,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,Not Useful,Very useful,,,,Very useful,Very useful,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,SVMs,Text Analytics",Often,,Most of the time,Most of the time,,,,Most of the time,,,,,,,,Most of the time,Most of the time,Most of the time,Sometimes,Most of the time,,,,,,,,Most of the time,Often,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,,,Sometimes,Most of the time,,,,Most of the time,,51-75% of projects,More internal than external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,900000,INR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,Text Mining,Python,GitHub,"Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,Somewhat useful,,Somewhat useful,,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,Less than a year,Researcher,Self-taught,90,10,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Sometimes,1GB,CNNs,"C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Simulation",,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,20,20,0,30,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,IMAGENET,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,"10,000,000",JPY,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Researcher,Self-taught,50,0,30,15,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A professional degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,55,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,Data Analyst,Self-taught,30,10,30,10,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,500 to 999 employees,Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,KNIME (free version),Orange,R,RapidMiner (commercial version),Tableau",,,,,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Often,,,,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,Sometimes,Most of the time,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,Often,Most of the time,,,,25,25,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,,,Often,,Most of the time,,,,,,,,,,,Often,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,,Rarely,250000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,60,5,10,20,5,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,"Text data,Relational data",Sometimes,10MB,,"Java,Python,R,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,"Data Visualization,kNN and Other Clustering,Naive Bayes,Text Analytics",,,,,,,Often,,,,,,,Often,,,,Often,,,,,,,,,,,Most of the time,,,,,60,10,0,5,25,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,Often,Often,,Sometimes,,,Often,,,Most of the time,Often,,,51-75% of projects,Do not know,Other,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,750000,INR,Other,8,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,Very useful,,,,,,,Very useful,,Very useful,Very useful,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,23,Employed full-time,,,No,Yes,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Tableau,I don't plan on learning a new ML/DS method,Python,Other,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Other (please specify; separate by semi-colon),,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Female,United States,25,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,Other,"College/University,Conferences,Personal Projects",,,Very useful,,Very useful,,,,,,,Very useful,,,,,,,,1-2 years,,,,,,,,,,,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Researcher,University courses,40,0,0,60,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Other,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,,Very useful,,,,"Data Machina Newsletter,Data Stories Podcast,KDnuggets Blog",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation",11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,20,5,20,40,15,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,60,0,20,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,21,"Not employed, but looking for work",,,,,,,,Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,,,,Somewhat useful,"FastML Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,10,20,10,0,"Adversarial Learning,Computer Vision","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Self-employed,R,Monte Carlo Methods,R,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects",,,,,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,,University courses,10,10,10,60,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,<1MB,Other,"Amazon Web services,R",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Simulation",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,10,20,0,20,50,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Rarely,,,,,,,Sometimes,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,R,Deep learning,R,University/Non-profit research group websites,"Online courses,Other",,,,,,,,,,,Somewhat useful,,,,,,,,"Data Stories Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Time Series,Decision Trees - Random Forests,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Indonesia,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Conferences,Kaggle,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Very useful,,Somewhat useful,,,,,Very useful,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,"Data Stories Podcast,Other (Separate different answers with semicolon)",< 1 year,,,Nice to have,Nice to have,,Necessary,Nice to have,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,70,30,0,0,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Other,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,15,5,5,35,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Government,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,Rarely,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests",,,,,Sometimes,Often,Often,Often,Often,,,Often,,Often,,Often,,Often,Sometimes,Often,Sometimes,Sometimes,Often,,,,,,,,,,,20,25,10,20,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,10-25% of projects,Entirely external,IT Department,None,Didn't get enough useful data that I can deeply look into.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"50,000",USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Malaysia,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,Researcher,Self-taught,70,10,10,10,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",,Don't know,Some other way,Somewhat important,Other,Other,"Text data,Other",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,"Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,6 to 10 years,Researcher,University courses,10,30,0,10,0,50,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",Relational data,Don't know,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,Often,,Sometimes,,,,,,,,Often,Often,,,Rarely,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",Sometimes,,Rarely,,,Often,Most of the time,,,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,Sometimes,,,,20,30,0,10,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Often,Often,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,31,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,"Business Analyst,Other",Self-taught,70,10,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,"Coursera,edX",Traditional Workstation,0 - 1 hour,Online Courses and Certifications,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,University courses,30,0,10,60,0,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,India,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Software Developer/Software Engineer,Other",Self-taught,15,10,20,0,55,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Sometimes,,,,,Most of the time,,Most of the time,Sometimes,,,Most of the time,,Often,Most of the time,Sometimes,,,,,Sometimes,,Sometimes,Often,,Most of the time,,,,,,,,50,20,5,12,13,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",,Often,,,Most of the time,,,Often,Sometimes,,,,,,Often,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,130000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by government,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,,,,,Necessary,,,Nice to have,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,Engineer,Self-taught,40,10,0,0,50,0,,Neural Networks - CNNs,A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,30,0,0,10,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,35,5,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Markov Logic Networks",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,,Sometimes,Sometimes,,Most of the time,Rarely,,Sometimes,Often,Sometimes,Sometimes,Often,Sometimes,,Often,,,Sometimes,Most of the time,,,,45,30,0,5,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Often,,,,Most of the time,,,,,,Often,,,Most of the time,,,,,,,,,76-99% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,India,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,50,"Not employed, but looking for work",,,,,,,,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Trade book",,,,,,,,,,,Very useful,,,Very useful,,Somewhat useful,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,,1 to 2 years,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Not important,Somewhat important,Not important +Male,India,23,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Company internal community,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Very useful,,,Very useful,,,Somewhat useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Predictive Modeler,Statistician",University courses,8,3,4,80,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,R,Spark / MLlib,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",Often,Often,Often,,,Often,Often,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,Often,Often,Often,Often,,Often,Often,,Often,,Often,,Often,,,,25,50,0,0,25,0,Enough to refine and innovate on the algorithm,Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,500000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,6 to 10 years,"Data Scientist,Researcher,Statistician",University courses,15,25,25,35,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Conferences,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Not Useful,,Somewhat useful,,,,,,,,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Work,30,10,40,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,15,5,10,60,10,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10GB,Neural Networks,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Often,,Often,,,Most of the time,Most of the time,,,,,,,,,,,Often,Often,Sometimes,Often,,Sometimes,,,,,,Sometimes,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,Sometimes,,,Often,,,Often,,,,,,,,,Often,Sometimes,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,R,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,50,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Azure Machine Learning,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests",,Rarely,,,,,Most of the time,Sometimes,,,,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,,,,,,20,10,10,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Git,Never,150000,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Vietnam,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Personal Projects,Textbook",,,Very useful,,,,Very useful,,,Very useful,,Very useful,,,Somewhat useful,,,,No Free Hunch Blog,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important +Male,Other,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Never,10MB,Neural Networks,"MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Logistic Regression,Neural Networks,Text Analytics",,,,Often,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Often,,,,,,,,,,,Often,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Online courses,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,20,10,30,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Technology,,,,,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Rarely,1GB,"CNNs,Neural Networks,Other","Python,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,Often,,"CNNs,Data Visualization,Neural Networks",,,,Often,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,10,20,40,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Often,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Philippines,20,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,,Employed by a company that performs advanced analytics,,,,,Blogs,,Very useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,1 to 2 years,,University courses,50,20,10,10,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,Retail,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Logistic Regression",Sometimes,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,50,20,20,5,5,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,,,,Often,,,,Often,,,,,,,Most of the time,,,76-99% of projects,More internal than external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,The Data Skeptic Podcast,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,No,Master's degree,A social science,Less than a year,Other,Self-taught,25,25,0,50,0,0,Natural Language Processing,,,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Indonesia,23,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,R,Decision Trees,Python,"Google Search,Government website","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Laptop or Workstation and local IT supported servers,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important +Female,Taiwan,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,0,10,0,90,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",50,25,20,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by college or university,Employed by non-profit or NGO",TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Online courses,Personal Projects,Textbook",,,Not Useful,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,6 to 10 years,Researcher,University courses,30,30,0,30,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data,Other",Rarely,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Perl,Python,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Rarely,,Most of the time,,,,Rarely,,Sometimes,,,,,,,Sometimes,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,,Often,,,,,,Most of the time,,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,,,,,,,30,30,0,40,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process",,,,,Most of the time,Often,,Sometimes,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,"Sgd, ncbi",Assess quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,50000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,Very useful,Somewhat useful,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",40,20,40,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,Fewer than 10 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data,Relational data",Don't know,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Rarely,Often,,Most of the time,,Often,Often,Most of the time,Often,,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Often,,,,20,20,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Often,Often,,,Often,Most of the time,Most of the time,,,,,,,,,Often,Most of the time,,26-50% of projects,Approximately half internal and half external,Standalone Team,"Lending club, Census, Competitions data ",Unstructured and unlabelled ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Git,Other",Sometimes,600000,INR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer",University courses,20,20,20,20,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,28,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",University courses,85,0,0,10,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Anomaly Detection,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Podcasts,Textbook,YouTube Videos",,,Very useful,,,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Udacity,Traditional Workstation,0 - 1 hour,PhD,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Speech Recognition,Markov Logic Networks,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",Not Useful,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,Very useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,25,45,5,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Decreased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,,,,,Often,,,,,,,,Often,,,,,Often,Most of the time,,Most of the time,,,,,,Most of the time,Sometimes,Most of the time,,,,,,,,Sometimes,Most of the time,,,Most of the time,Often,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,,,,Often,Sometimes,Often,Often,Sometimes,,,Sometimes,,Often,Often,Often,,Often,Often,Sometimes,Often,,Most of the time,Most of the time,,,,Most of the time,Sometimes,Most of the time,,,,35,15,5,20,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,Often,,,,,,Sometimes,,Sometimes,,,Often,,,,,Most of the time,,,,Less than 10% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,927000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Turkey,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Online courses",,,,,,,,,Not Useful,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,,Necessary,Nice to have,Nice to have,,,,,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,50,50,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,29,Employed full-time,,,No,Yes,Business Analyst,,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Friends network,Kaggle",,Very useful,Very useful,,,Very useful,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,,3 to 5 years,Engineer,Self-taught,70,0,0,30,0,0,Computer Vision,"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Online courses,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,,,Very useful,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Researcher,I haven't started working yet",University courses,20,40,0,40,0,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important +Female,Russia,26,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,"Business Analyst,Data Analyst,Researcher",University courses,25,25,10,20,20,0,"Computer Vision,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Mix of fields,20 to 99 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,QlikView,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Text Analytics",Rarely,,,,Sometimes,Most of the time,Most of the time,Often,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,Often,,Often,,,Often,,,,,50,15,0,20,15,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,10-25% of projects,Entirely internal,IT Department,,Make a recommendation system,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,85000,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Sweden,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Newsletters,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,Not Useful,,,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Most of the time,,Sometimes,,,Often,Most of the time,Often,,,,,,Sometimes,Sometimes,Often,,Sometimes,,,,,Often,,,Often,,,Rarely,Often,,,,30,10,5,30,0,25,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process",,Often,,,,,,Most of the time,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Rarely,820000,SEK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Python,Bayesian Methods,Python,Google Search,"Blogs,College/University,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,,,,,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Data Analyst,Data Scientist,Researcher",Self-taught,30,20,30,10,10,0,"Adversarial Learning,Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I don't know/not sure,Technology,I don't know,Increased slightly,More than 10 years,,Important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Don't know,100GB,"CNNs,GANs,Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation",,,,,,Often,,,Sometimes,,,,,,,,,,Often,Most of the time,Sometimes,,,,Sometimes,Often,,,,,,,,25,20,15,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input",,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Most of the time,,,I am not currently employed,7,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,30,50,5,0,15,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Greece,21,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,Computer Scientist,Other,4,4,5,6,75,6,Machine Translation,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,50,20,20,10,0,0,Recommendation Engines,Hidden Markov Models HMMs,A master's degree,Academic,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,I haven't started working yet,Kaggle competitions,10,20,NA,0,70,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Engineer,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,SQL",,,,Most of the time,,,,,,,,,,,,,Sometimes,,Often,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,Sometimes,,,Most of the time,Most of the time,Often,Often,,,,,Often,,Sometimes,,Sometimes,,,Often,,Often,,,Often,Most of the time,Sometimes,,Sometimes,,,,30,40,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Sometimes,,Most of the time,Most of the time,,,,,Often,,,,,,Sometimes,,,Sometimes,,,76-99% of projects,Entirely internal,IT Department,None,Understanding,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",,50000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Not Useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",50,25,20,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,Less than one year,Some other way,Very important,Other,Traditional Workstation,"Image data,Text data,Relational data,Other",,,Other,"Java,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,0,70,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",Often,,,,,,,,Most of the time,,Often,Most of the time,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,654000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Turkey,39,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Google Cloud Compute,Rule Induction,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Textbook",,Very useful,Very useful,,,,Very useful,,,,,Very useful,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher",University courses,50,0,20,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Image data,Text data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","C/C++,Java,MATLAB/Octave",,,,Sometimes,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics",,,,,,Most of the time,,Most of the time,Most of the time,,,,,Often,,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,Often,Often,,Most of the time,,,,,30,45,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,,,Often,Sometimes,,,,,,Most of the time,Most of the time,,,,,Sometimes,,,Sometimes,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,25000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Poland,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Self-employed,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,,,Necessary,,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Software Developer/Software Engineer,Other",Self-taught,30,0,0,0,10,60,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,21,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Matlab,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,Very useful,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Social Network Analysis,C/C++/C#,"GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Official documentation,Personal Projects,Textbook",,,Very useful,,,,Somewhat useful,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Data Miner,Researcher",University courses,40,5,25,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Sometimes,,Rarely,,Rarely,,Rarely,,,,Often,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,Often,Often,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Often,,Most of the time,,,Most of the time,Most of the time,Rarely,,Most of the time,Sometimes,Most of the time,Rarely,,Rarely,Most of the time,Most of the time,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,,,,Most of the time,,Rarely,,,Sometimes,,,,,,Often,,,10-25% of projects,Entirely internal,Business Department,"US Census, etc.",Sloppy/inconsistent formatting,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",Most of the time,"104,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Regression,R,Google Search,"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,29,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,95,0,5,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Decision Trees,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,,Somewhat useful,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Master's degree,No,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,20,0,10,30,0,Time Series,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Python,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer,Other",University courses,20,20,10,40,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Other,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression",,,,,,Often,Often,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,40,20,10,10,10,10,Enough to run the code / standard library,"Lack of data science talent in the organization,Other",,,,,,,,,Most of the time,,,,,,,,,,,,,Often,26-50% of projects,Approximately half internal and half external,IT Department,yes some time but not all time ,Source getting from my business users,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,1000000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,< 1 year,,Nice to have,,,Nice to have,Nice to have,,,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,60,0,25,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important +Male,People 's Republic of China,23,Employed part-time,,,Yes,,Researcher,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Non-Kaggle online communities",,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,Data Analyst,Self-taught,50,15,25,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction",Sometimes,,,,,,,Most of the time,,,,,,,,Often,,Often,,,Sometimes,,,,,,,,,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,Often,,,Most of the time,,,,,,,,,Often,,,,,,,,,51-75% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",R,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos,Other,Other",,Somewhat useful,Very useful,,Not Useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,,"Coursera,DataCamp,Udacity,Other",Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Other,,"Business Analyst,Data Analyst",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Very Important +Male,United States,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer",University courses,10,0,30,50,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",,10GB,"Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,SAS Enterprise Miner",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",,,,,Sometimes,Most of the time,Often,Sometimes,Sometimes,,,Often,,Sometimes,,Often,,,,,Often,,Most of the time,Sometimes,,Often,,,,,,,,95,3,0,1,1,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,50,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Male,United States,57,"Independent contractor, freelancer, or self-employed",,,No,Yes,Computer Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,3-5 years,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,,Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,30,40,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Ukraine,31,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,26,"Not employed, but looking for work",,,,,,,,Python,Regression,Python,Google Search,"Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,edX,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,No,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,50,30,0,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Spain,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,5,0,25,70,0,0,Survival Analysis,Decision Trees - Gradient Boosted Machines,Primary/elementary school,Government,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,35,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,30,0,0,70,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,Not important,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A health science,I don't write code to analyze data,"Researcher,Statistician",Self-taught,50,30,20,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A professional degree,Pharmaceutical,Fewer than 10 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Other",Rarely,,Regression/Logistic Regression,"IBM SPSS Statistics,Minitab,R",,,,,,,,,,,,Often,,,,,,,,,,,,,,Rarely,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,10,10,0,30,50,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Often,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,Less than a year,"Data Analyst,Engineer",Self-taught,80,19,1,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,Financial,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Very important,,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,SVMs","Amazon Machine Learning,Amazon Web services,Python,R",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Naive Bayes",Sometimes,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,10,60,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,39,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Other,Other,Textbook,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,,Other,0 - 1 hour,Experience from work in a company related to ML,No,Doctoral degree,Physics,I don't write code to analyze data,Programmer,Work,90,0,10,0,0,0,Time Series,Bayesian Techniques,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Scientist",Self-taught,40,40,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,Rarely,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,,Sometimes,,Most of the time,,Often,,,,,,,,Most of the time,,,Most of the time,Sometimes,,,Often,,,,,,Most of the time,,,,,50,30,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Sometimes,,,,Often,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Julia,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Very useful,,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and local IT supported servers,Image data,Sometimes,10GB,"Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Sometimes,,,,,,Often,,Sometimes,,,,,Often,,Sometimes,,,Most of the time,,Often,,Sometimes,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,100% of projects,More internal than external,Standalone Team,lightning detection network data,aligning different datasets both in space and time to get some meaningful insights,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,"35,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Data Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,"5,000 to 9,999 employees",Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Never,10GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,,,Graph (e.g. GraphBase/Neo4j),I don't typically share data,,Git,Never,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,,,Somewhat useful,Very useful,Very useful,Very useful,Not Useful,Very useful,,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,3 to 5 years,"Data Analyst,Data Miner,Predictive Modeler,Researcher",Self-taught,80,0,0,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Python,R",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",Often,,,,,Most of the time,Most of the time,Sometimes,,,,,,Often,,,,,,,Often,,Often,,,,Often,,Rarely,,,,,30,30,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,,,,,Sometimes,,,Often,,,,,,,,,,,Often,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,95000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Denmark,22,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,41,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,R,GitHub,"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Programmer",Self-taught,40,40,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM Cognos,Microsoft Excel Data Mining,R,SQL",,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Naive Bayes,Random Forests,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,,,Rarely,,,,,Often,,,,,,,Sometimes,,,,20,10,5,50,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations of tools",Most of the time,Often,,,,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Never,,RUB,,7,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,Very useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Master's degree,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Predictive Modeler,University courses,0,20,10,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,0,30,70,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,No Free Hunch Blog,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,"edX,Other",Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,20,10,0,60,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)",,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,Conferences,Friends network,Kaggle,Personal Projects,Textbook",,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Random Forests",Rarely,,,,,,Most of the time,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Most of the time,,,,Rarely,,Often,,,Often,,,100% of projects,More internal than external,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Sometimes,100000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +A different identity,India,42,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Self-employed,R,,,I collect my own data (e.g. web-scraping),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,1-2 years,,,,,,Nice to have,,,,,,,,,,,Kaggle Competitions,No,Professional degree,,I don't write code to analyze data,,Other,100,0,0,0,0,0,Outlier detection (e.g. Fraud detection),,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Somewhat important,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Julia,Random Forests,R,Google Search,"Blogs,Friends network,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,Very useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist",Work,50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Technology,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,Rarely,,,,"A/B Testing,Bayesian Techniques,HMMs,Logistic Regression,Time Series Analysis",Sometimes,,Often,,,,,,,,,,Often,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,20,30,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data",Often,Often,,,Often,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,15.5,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Ukraine,20,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Work,30,0,70,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,Other,Java,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,10,0,30,30,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher,Statistician",Self-taught,70,10,10,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Sometimes,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Time Series Analysis",,,,Often,,Often,Sometimes,,,,,,,Sometimes,,Rarely,,,Often,Most of the time,Sometimes,,,,Often,Rarely,,,,Sometimes,,,,70,28,0,1,1,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,Often,,Often,Most of the time,,,Sometimes,,Sometimes,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,43,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,Engineer,Self-taught,70,10,20,0,0,0,"Computer Vision,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,1GB,"Bayesian Techniques,Neural Networks","Amazon Web services,C/C++,Google Cloud Compute,Java,MATLAB/Octave",,Most of the time,,Most of the time,,,,Often,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Neural Networks",,,Most of the time,Often,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,50,40,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,I prefer not to say",,,,,Often,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,"50,000",GBP,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",Very useful,,Very useful,,,,Very useful,,,,,Very useful,Very useful,Very useful,,,,,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Scientist",Self-taught,20,10,20,30,20,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,C/C++,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow",Rarely,,,Sometimes,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,Often,,,,Most of the time,Most of the time,Often,,,,Often,,,,Most of the time,,Sometimes,Often,,Often,,Sometimes,Sometimes,Sometimes,,,Sometimes,Often,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Often,,,,Often,,,,,,,,,,Often,Often,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,,,,5,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,University courses,30,30,30,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Computer Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,45,30,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,I don't write code to analyze data,Computer Scientist,Self-taught,50,50,0,0,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Amazon Machine Learning,Decision Trees,C/C++/C#,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,Somewhat useful,Somewhat useful,,,,,,Very useful,,Somewhat useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs",,Military/Security,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks","C/C++,NoSQL,Python,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,Often,,,,"Decision Trees,Naive Bayes,Text Analytics",,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,15,5,15,5,60,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,I prefer not to say,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,,,,,Less than 10% of projects,Entirely external,IT Department,,Speed,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,36000,EUR,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Pakistan,27,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Engineer,Self-taught,25,20,15,0,40,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Operations Research Practitioner,Researcher,Statistician",Work,5,5,85,5,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Logistic Regression",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,10GB,Regression/Logistic Regression,"Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,New Zealand,36,"Not employed, but looking for work",,,,,,,,Tableau,Support Vector Machines (SVM),SAS,GitHub,"College/University,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,5-10 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,,Experience from work in a company related to ML,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,15,0,5,80,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Markov Logic Networks,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important +Male,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher",University courses,10,10,20,55,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs",A professional degree,Academic,"10,000 or more employees",Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,1 to 2 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Engineer,Fine,Self-employed,TensorFlow,Time Series Analysis,Python,Government website,"Newsletters,Online courses,YouTube Videos",,,,,,,,Very useful,,,Very useful,,,,,,,Very useful,"O'Reilly Data Newsletter,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,1,9,0,Time Series,Logistic Regression,A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10TB,Regression/Logistic Regression,"Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,Sometimes,,,Sometimes,,,,Often,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Naive Bayes,Segmentation,Time Series Analysis",,,Sometimes,,,Often,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,,,,Most of the time,,,,50,10,20,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,Most of the time,,,Most of the time,,,,,Most of the time,,Often,Most of the time,Most of the time,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,2500000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Vietnam,27,Employed full-time,,,Yes,,Data Miner,Fine,Employed by company that makes advanced analytic software,,,,,"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Miner,Machine Learning Engineer,Programmer",Kaggle competitions,50,20,10,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - GANs",A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,10MB,,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Neural Networks,Recommender Systems",,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Cluster Analysis,R,"GitHub,Other","Conferences,Personal Projects,Podcasts,Textbook",,,,,Somewhat useful,,,,,,,Not Useful,Somewhat useful,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Scientist,DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs",,CRM/Marketing,500 to 999 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Video data,Text data",Sometimes,,"RNNs,SVMs","Hadoop/Hive/Pig,Microsoft Excel Data Mining,NoSQL,Perl,Python,QlikView,R,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,"Collaborative Filtering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction",,,,,Sometimes,,,,,,,,,,,,,Sometimes,Rarely,,Sometimes,,,,,,,,,,,,,20,60,10,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",Sometimes,,,,,,,,,Rarely,,,,,,Sometimes,Sometimes,,,,,,51-75% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Subversion,Other",,100000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,0,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Newsletters,Online courses,Textbook,YouTube Videos",Very useful,Very useful,Not Useful,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,Netherlands,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Tableau,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Engineer,Researcher",University courses,10,20,40,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,SVMs","NoSQL,Orange,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",Often,Sometimes,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,,,,,,Most of the time,Often,Most of the time,,,,,,,Often,,,,30,15,0,35,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,,Sometimes,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",,75000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +A different identity,Russia,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,1 to 2 years,Engineer,Self-taught,40,20,20,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Never,1MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Most of the time,Most of the time,,Often,,,,,Often,,,,,,Often,Often,,,,,,,Often,,,,,,50,40,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hong Kong,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Management information systems,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",15,75,10,0,0,0,,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,Not Useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,I haven't started working yet",Self-taught,25,15,25,25,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,15,15,20,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Rarely,,,,,,,,,,,,Often,,Sometimes,Sometimes,,26-50% of projects,More internal than external,Business Department,na,na,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Belgium,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Researcher,University courses,5,5,80,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,"Text data,Relational data",,1GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,Often,,Rarely,,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,Most of the time,Rarely,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Sometimes,,,,Most of the time,,Often,,,,"Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,SVMs,Text Analytics",,,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,,Most of the time,Often,Most of the time,,,,,,,,Most of the time,Often,,,,,70,10,0,5,15,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data",,,Sometimes,,Often,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,List of KDNuggets repositories,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"FastML Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,PhD,No,Some college/university study without earning a bachelor's degree,Computer Science,,"Engineer,Machine Learning Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,Greece,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Weka,Genetic & Evolutionary Algorithms,Python,GitHub,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Data Machina Newsletter,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,5,15,80,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Decision Trees,Evolutionary Approaches,Neural Networks","C/C++,Java,Mathematica,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,RapidMiner (free version),SQL,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,,,,Sometimes,,Sometimes,Most of the time,,,,,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Data Visualization,Decision Trees,Natural Language Processing,Neural Networks,Simulation,Text Analytics",Rarely,,,,,,Often,Most of the time,,,,,,,,,,,Most of the time,Most of the time,,,,,,,Often,,Most of the time,,,,,40,30,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,,Often,Sometimes,,Most of the time,,,,,,,,,,,Most of the time,Sometimes,Most of the time,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,45000,EUR,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Finland,28,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Text Mining,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Very useful,,,,,,Somewhat useful,,,Somewhat useful,,Very useful,Not Useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,,Less than a year,Other,University courses,0,65,0,35,0,0,Survival Analysis,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,Germany,26,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,35,20,40,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,RapidMiner (free version),SQL",,,,,,,,,,,Most of the time,Rarely,Sometimes,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Often,Often,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,,,,,Sometimes,Sometimes,,,,,65,20,2.5,2.5,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,,,,Most of the time,,,,,,,,Sometimes,,,Most of the time,Often,,Less than 10% of projects,More internal than external,Standalone Team,,understanding and selecting the right data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,65000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,University/Non-profit research group websites,"Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,Very useful,,,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Reinforcement learning,Unsupervised Learning",,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses",,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,0,10,30,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",No education,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Random Forests,Regression/Logistic Regression","MATLAB/Octave,Microsoft Azure Machine Learning,Python,SQL",,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,SVMs",Often,,,,Sometimes,Most of the time,Most of the time,Often,,,,Often,,,,Most of the time,,,,Often,,,Most of the time,Sometimes,,,,Sometimes,,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,,,Most of the time,,Sometimes,,,,Sometimes,,,Sometimes,,,Often,,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,75000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Taiwan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,,,,Very useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Ukraine,51,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,60,15,0,0,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Deep learning,Python,Google Search,Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Russia,49,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,R,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Not Useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,,Less than a year,"Business Analyst,Data Analyst,Predictive Modeler,Researcher,Statistician",Self-taught,100,0,0,0,0,0,Survival Analysis,"Bayesian Techniques,Logistic Regression",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,C/C++/C#,GitHub,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,,,Very useful,,,,,Somewhat useful,,,,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series",,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Italy,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,R,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician",University courses,30,0,40,30,0,0,Time Series,Logistic Regression,A master's degree,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,Other,"IBM SPSS Statistics,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Often,,,,20,10,15,20,35,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Often,,,,,,,,,,Often,,Often,,,Often,,,76-99% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Vietnam,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Somewhat useful,,Very useful,,Very useful,,,Somewhat useful,,,Somewhat useful,,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,30,0,10,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems",,Sometimes,Sometimes,,Rarely,,Most of the time,Most of the time,,,,,,,,Most of the time,,,Often,Often,,Often,Most of the time,Often,,,,,,,,,,25,30,15,30,0,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Often,,Most of the time,Sometimes,,,,,,Often,,,,,,,Often,,,,100% of projects,Entirely internal,Other,,Small Data problem and data not being mapped properly in repositories.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,8800,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Israel,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Google Cloud Compute,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Programmer",Self-taught,25,15,40,10,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Military/Security,I prefer not to answer,Increased slightly,Don't know,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Traditional Workstation","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Sometimes,Sometimes,Often,Sometimes,Sometimes,Most of the time,,Rarely,Often,,Sometimes,,Sometimes,,,Often,Sometimes,Often,,Sometimes,Sometimes,Sometimes,Sometimes,,Rarely,Often,Often,,,,20,25,15,10,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,Sometimes,,,,,,Sometimes,,,Sometimes,Sometimes,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,225000,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Turkey,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,SQL,I collect my own data (e.g. web-scraping),"Newsletters,YouTube Videos",,,,,,,,Very useful,,,,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Engineer",University courses,20,20,50,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,Fewer than 10 employees,Stayed the same,6-10 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,IBM Watson / Waton Analytics,KNIME (free version),Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,RapidMiner (commercial version),SAS Base,SQL",,,,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Decision Trees,Logistic Regression,Naive Bayes,Segmentation",Often,,,,,,,Often,,,,,,,,Sometimes,,Rarely,,,,,,,,Sometimes,,,,,,,,80,10,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,Often,Rarely,,,Rarely,,Sometimes,,,,Sometimes,,Often,,,Often,,,Less than 10% of projects,More internal than external,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Scala,University/Non-profit research group websites,"Company internal community,Kaggle,Online courses",,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,0,10,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +A different identity,India,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Company internal community,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",,Often,Often,Often,,Most of the time,Most of the time,Often,Often,,,,,Most of the time,,,,Often,,Most of the time,Often,,Often,,Often,Often,,Often,Often,,,,,30,40,15,15,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,10-25% of projects,Entirely internal,IT Department,,Cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Git,Subversion",Rarely,,INR,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",60,0,30,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Evolutionary Approaches",High school,Other,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"I collect my own data (e.g. web-scraping),Other","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,YouTube Videos",Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,,1-2 years,,,,,,,,,,,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher",University courses,40,0,30,15,15,0,"Adversarial Learning,Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Female,Other,23,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,10,10,0,60,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Belgium,22,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics",R,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,,Very useful,,,Very useful,Very useful,,,,,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist",University courses,15,0,0,70,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,I don't know,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,Often,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,Often,,,,Often,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs",Often,Sometimes,,Sometimes,Most of the time,Often,,Sometimes,Often,,,,,Most of the time,,Often,,,Sometimes,Sometimes,,,Sometimes,Most of the time,,,,Sometimes,,,,,,20,40,0,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations of tools,Scaling data science solution up to full database",,,Often,,Often,Often,,,,,,,Sometimes,,,,,Often,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Sometimes,30000,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Iran,40,Employed part-time,,,No,Yes,Other,Poorly,Employed by government,I don't plan on learning a new tool/technology,Cluster Analysis,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Textbook",,,,,,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Computer Scientist,University courses,40,10,0,20,10,20,Speech Recognition,Decision Trees - Random Forests,A doctoral degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Philippines,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,SQL,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,,"KDnuggets Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,60,5,25,5,5,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data",Sometimes,100GB,"CNNs,Decision Trees,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,Sometimes,Often,,Most of the time,Most of the time,,,,Often,,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,,,,,20,50,15,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,Most of the time,Sometimes,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,"kaggle datasets, UCI datasets",some results from public datasets cannot be published.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"140,000",PHP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,No,Yes,Researcher,,,Spark / MLlib,Deep learning,Java,Google Search,"Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,Very useful,,,,,,Somewhat useful,,,Very useful,,,,,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher",Self-taught,50,0,0,25,0,25,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hong Kong,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Scientist,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,29,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,Pakistan,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Simulation",Often,,Often,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,20,20,10,20,30,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",,Most of the time,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Git,Never,700000,PKR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,37,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Tableau,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Online courses",,,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,Very useful,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,70,5,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Mix of fields,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,R,SAS Enterprise Miner,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,Sometimes,,,,,,Often,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Often,,,Sometimes,Often,,,Sometimes,Sometimes,Often,,Often,Sometimes,,Sometimes,,,Rarely,Sometimes,,,,60,20,0,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Often,,Most of the time,,,,,,Sometimes,,,Often,,,76-99% of projects,Entirely internal,Business Department,Goverment open data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Sometimes,3000000,RUB,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,,Self-taught,40,20,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Scientist,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",University courses,0,0,50,50,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Association Rules,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Online courses,Textbook",,,Somewhat useful,,,,,,,,Very useful,,,,Somewhat useful,,,,Jack's Import AI Newsletter,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Survival Analysis","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important +Male,Japan,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,GitHub,"Company internal community,Kaggle,Personal Projects",,,,Very useful,,,Very useful,,,,,Very useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Programmer",Self-taught,60,0,30,5,5,0,"Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,SQL",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,Often,Often,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Time Series Analysis",Often,,Sometimes,Sometimes,,,,,,,,,,,,Most of the time,,Often,,Often,,Most of the time,,,,,,,,Most of the time,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Most of the time,,,,,,,,,,,,,Often,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,Web API,Data selection insight comes from business experience.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"3,520,000",JPY,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,No,Yes,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,46,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Other",,Very useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Pharmaceutical,20 to 99 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Most of the time,10MB,Regression/Logistic Regression,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,0,10,60,10,20,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Often,,,,Sometimes,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,Other,Rarely,800000,NOK,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Other,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Physics,,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Analyst,Data Scientist",University courses,10,0,80,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,CRM/Marketing,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,10GB,"Bayesian Techniques,Random Forests","Microsoft R Server (Formerly Revolution Analytics),R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,Often,Sometimes,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Random Forests",,,Often,,,Most of the time,Often,,,,,,,Sometimes,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,20,50,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization",Most of the time,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Entirely external,Standalone Team,"Ad servers, social networks, search engines, audience metrics ",Identify users between devices along differents media,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,32000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist",University courses,35,30,10,20,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,Spark / MLlib,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed full-time,,,No,Yes,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Management information systems,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Google Search,"Arxiv,College/University,Company internal community,Conferences,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,Very useful,Not Useful,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,University courses,5,0,25,70,0,0,"Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Often,,,Often,Rarely,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,Rarely,Often,Often,Rarely,Rarely,,,Rarely,,Sometimes,,Often,,,,,,,Sometimes,Rarely,,,,Rarely,,Sometimes,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process",,Rarely,,,Often,,,Rarely,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,140000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Random Forests,Python,Other,"College/University,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",< 1 year,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,I don't write code to analyze data,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,90,0,10,0,0,Reinforcement learning,Decision Trees - Random Forests,A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important +Male,Indonesia,24,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Engineer,Operations Research Practitioner",Self-taught,70,10,10,10,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,37,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",IBM Watson / Waton Analytics,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Not Useful,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +A different identity,United States,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",35,55,0,10,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A doctoral degree,Telecommunications,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Don't know,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,SVMs","Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Often,Often,,,Often,,,,Often,,Often,Sometimes,,,,Often,,,,,Often,Most of the time,Most of the time,,,,35,30,15,10,10,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,Rarely,,,,,,,,,,,,,Sometimes,,,100% of projects,More internal than external,Central Insights Team,Customer data that belongs to the organization we work with (third party organization),Fully utilizing the multiple sources of data,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",I don't typically share data,,Other,Sometimes,36000,MYR,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Random Forests,Java,University/Non-profit research group websites,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,The Analytics Dispatch Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,0,30,0,0,Recommendation Engines,Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,57,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (free version),Other,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,No Free Hunch Blog,3-5 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Fine arts or performing arts,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),,Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,Australia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Tableau,Deep learning,R,,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Other,University courses,70,0,10,10,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Never,10MB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Text Analytics",,,,,,Rarely,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,,,,Sometimes,,,,,30,30,0,40,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,More external than internal,Business Department,,,,Email,,,,,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Other",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",50,49,0,0,1,0,Natural Language Processing,Decision Trees - Random Forests,,Pharmaceutical,"10,000 or more employees",Increased significantly,1-2 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1GB,Decision Trees,"IBM Watson / Waton Analytics,Python,Tableau,Unix shell / awk",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,25,25,10,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,,,,,Sometimes,,,Often,,Often,Often,,Often,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,1600000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,More than 10 years,"Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler",University courses,10,20,0,50,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,IBM Watson / Waton Analytics,Python,R,SQL,Tableau",,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Decision Trees,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Often,,Often,,,,,,,,,,Sometimes,Often,,Often,,Most of the time,,,,,Often,Often,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,,Often,,10-25% of projects,More external than internal,Standalone Team,Kaggle,cleaning:quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,60,10,10,10,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Evolutionary Approaches",A master's degree,Non-profit,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10MB,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,"Data Visualization,Ensemble Methods,Evolutionary Approaches,Logistic Regression",,,,,,,Often,,Often,Often,,,,,,Often,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,21,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,Very useful,,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",15,70,0,15,0,0,,,A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Ukraine,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,48,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Kenya,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,Udacity,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Taiwan,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Very useful,,< 1 year,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Spain,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,30,20,15,30,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - GANs",Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,Julia,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,30,0,0,20,0,Time Series,Logistic Regression,A professional degree,Non-profit,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Traditional Workstation,Relational data,Rarely,10MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R,RapidMiner (free version),Other",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,Most of the time,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,,50,0,0,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Often,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,76-99% of projects,Approximately half internal and half external,Other,Census of India; NSSO; DLHS ,Errors in data ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Git,Rarely,,,Other,3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University",,Somewhat useful,Very useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Female,Germany,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Google Search,"Conferences,Kaggle,Newsletters",,,,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,1 to 2 years,Data Scientist,University courses,30,10,60,0,0,0,Natural Language Processing,Neural Networks - RNNs,A master's degree,Internet-based,100 to 499 employees,Stayed the same,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,"Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,Often,,,,Sometimes,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,none,none,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,Git,,-,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Amazon Web services,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Naive Bayes,Time Series Analysis",Often,,,,,Often,Often,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Miner,Machine Learning Engineer",Self-taught,70,20,10,0,0,0,Adversarial Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - RNNs",A bachelor's degree,Manufacturing,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer",Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,26,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",11 - 39 hours,PhD,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,Self-taught,50,20,0,15,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important +Male,Israel,29,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,,2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Physics,1 to 2 years,Researcher,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Italy,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,Engineer,University courses,20,10,5,60,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100MB,"Ensemble Methods,Regression/Logistic Regression","Angoss,NoSQL,Perl,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SAS Enterprise Miner,Tableau,TIBCO Spotfire",,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,Most of the time,,,Sometimes,,Most of the time,Most of the time,,,,,,Often,,Sometimes,,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",Sometimes,,,,,Most of the time,,Most of the time,Most of the time,,,,,Sometimes,Often,Most of the time,,,,,Sometimes,,,,,Often,,,,,,,,40,20,5,5,30,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Biology,Less than a year,Researcher,University courses,50,10,30,10,0,0,,,A bachelor's degree,Academic,"5,000 to 9,999 employees",,,Some other way,Not very important,,Basic laptop (Macbook),Text data,,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,15,25,25,25,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,Other,University courses,40,30,0,30,0,0,"Adversarial Learning,Machine Translation","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,10,10,10,0,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Insurance,10 to 19 employees,Stayed the same,Less than one year,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,41,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,45,5,0,50,0,0,Reinforcement learning,"Hidden Markov Models HMMs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses",,,,,,Very useful,Very useful,,,,Very useful,,,,,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,Survival Analysis,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Researcher",University courses,50,0,0,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Markov Logic Networks",A doctoral degree,Academic,100 to 499 employees,Increased slightly,Don't know,Some other way,Very important,Other,Traditional Workstation,Text data,Sometimes,1MB,Regression/Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,,Deep learning,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Engineer,Self-taught,100,0,0,0,0,0,"Computer Vision,Reinforcement learning",Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,25,20,25,25,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Image data,Never,10GB,,"Amazon Web services,C/C++,Google Cloud Compute,SQL,TensorFlow",,Often,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Neural Networks,RNNs,SVMs",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,Often,,,,,,10,50,0,15,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,10-25% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Bitbucket,Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Other,0,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,RapidMiner (free version),Decision Trees,Java,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician,I haven't started working yet",Self-taught,NA,NA,NA,NA,NA,NA,Reinforcement learning,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,45,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Not Useful,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Self-taught,25,25,50,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Other,Laptop or Workstation and private datacenters,Other,Never,1MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,Sometimes,,,,Often,Sometimes,Sometimes,,,,,,,,Often,,,,Often,Sometimes,,,,,,,,,,,,,20,30,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,Often,,,,,,,,Often,,,10-25% of projects,More external than internal,IT Department,UCI dataset,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,"Coursera,DataCamp,edX","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,,"Data Analyst,Data Scientist,Predictive Modeler,Statistician",Kaggle competitions,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,United Kingdom,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,10,40,20,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Data Analyst,Programmer,Software Developer/Software Engineer",Kaggle competitions,25,20,20,20,10,5,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - RNNs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,<1MB,"Decision Trees,RNNs","Amazon Machine Learning,Amazon Web services",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Text Analytics",,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,25,20,20,20,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,I haven't started working yet,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Military/Security,"10,000 or more employees",Increased significantly,Don't know,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1TB,Random Forests,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,"Data Visualization,Prescriptive Modeling,Random Forests",,,,,,,Often,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,Weather,Not labeled correctly.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Subversion,Rarely,90000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,37,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (commercial version),Other,Python,Google Search,"Arxiv,Stack Overflow Q&A",Very useful,,,,,,,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Biology,1 to 2 years,"Data Analyst,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,45,0,35,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation",Other,Sometimes,1GB,"CNNs,Gradient Boosted Machines,RNNs,SVMs","Mathematica,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Evolutionary Approaches,Gradient Boosted Machines",,,Often,Often,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,50,20,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,1320000,RUB,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,Python,Regression,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects",,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,3-5 years,Nice to have,,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,"Coursera,DataCamp,edX","Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,50,30,20,0,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Vietnam,26,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by government,TensorFlow,Deep learning,Python,GitHub,"Blogs,Conferences,Non-Kaggle online communities,Online courses,Textbook,Tutoring/mentoring",,Somewhat useful,,,Very useful,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,Very useful,,Emergent/Future Newsletter (Algorithmia),1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,,Doctoral degree,Mathematics or statistics,Less than a year,"Business Analyst,Computer Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,Recommendation Engines,Logistic Regression,"Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,France,32,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,SQL,,Python,Google Search,"Arxiv,Kaggle,Online courses,Textbook,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,"Data Scientist,Researcher",Self-taught,30,40,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,,1MB,Other,"C/C++,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,20,40,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Sometimes,,,,,,,,,Often,Often,,,10-25% of projects,Do not know,Central Insights Team,"Census Data, Publicly Available Data",Understanding what questions to ask the data. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Most of the time,"35,000",GBP,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,Natural Language Processing,Neural Networks - RNNs,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,NA,Employed full-time,,,Yes,,Predictive Modeler,Poorly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician,Other",Self-taught,35,0,30,35,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,1TB,Regression/Logistic Regression,"IBM SPSS Statistics,Java,Mathematica,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,Tableau",,,,,,,,,,,,Often,,,Sometimes,,,,,Rarely,,,Often,,,,,,,,Often,,Often,,,,,Often,Often,,,,,,Sometimes,,,,,,,"Data Visualization,Lift Analysis,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,,,,,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,50,20,0,5,25,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,Most of the time,,,,,Often,,,,,,,Often,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,Other,R,Other,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,Less than a year,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,5,0,90,0,0,"Machine Translation,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,Software Developer/Software Engineer,University courses,40,3,5,50,2,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,Work,0,30,40,0,30,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Python,Neural Nets,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Miner,Engineer",Self-taught,70,20,0,0,10,0,Unsupervised Learning,"Bayesian Techniques,Evolutionary Approaches",I prefer not to answer,Academic,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,Rarely,100MB,"Bayesian Techniques,Evolutionary Approaches","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,"Naive Bayes,PCA and Dimensionality Reduction,Simulation",,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,Often,,,,,,,5,85,0,10,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,stanford u complex network database; etc.,no challenge at all,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j)",Email,,,Sometimes,43000,CNY,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Finland,46,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Google Search,"Company internal community,Kaggle,Personal Projects",,,,Very useful,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,Other,Self-taught,50,25,25,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - RNNs",A professional degree,Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,100MB,"Decision Trees,RNNs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Decision Trees,RNNs,Text Analytics",,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,50,20,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database",Often,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,10-25% of projects,More internal than external,IT Department,At the moment none,Data sits in silos and sometimes it is difficult to have holistic view,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,70000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Cluster Analysis,R,University/Non-profit research group websites,"Conferences,Kaggle,Online courses,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",30,50,5,0,15,0,"Survival Analysis,Time Series",Logistic Regression,A master's degree,Technology,100 to 499 employees,Decreased significantly,1-2 years,A general-purpose job board,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Relational data,Never,10GB,,"Microsoft Excel Data Mining,Python,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,20,20,5,55,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Often,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,,,76-99% of projects,Approximately half internal and half external,Other,,"No expert is present in company to give solution of problem only online help we need to take. When Unstructured data and Text data processing and convert to structure. ",Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,Other,Sometimes,360000,INR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Belarus,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Online courses",,Very useful,Very useful,,,,Very useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Mathematics or statistics,1 to 2 years,Statistician,University courses,20,40,0,30,10,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,France,57,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,Other,Poorly,,Python,,Python,GitHub,"Blogs,Friends network,Kaggle,Newsletters,Podcasts,Textbook,YouTube Videos,Other",,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Partially Derivative Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,No,Bachelor's degree,,Less than a year,Other,Self-taught,30,10,0,50,10,0,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,Python,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,YouTube Videos",,Not Useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Israel,28,Employed part-time,,,No,Yes,Programmer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Other,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Data Miner,Software Developer/Software Engineer,Other",University courses,0,80,0,20,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,R,,"Blogs,Company internal community,Newsletters,Personal Projects",,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Scientist,Software Developer/Software Engineer",Self-taught,35,5,60,0,0,0,"Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,500 to 999 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Video data,Text data,Relational data",Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Cloudera,Flume,Google Cloud Compute,Java,NoSQL,R,Spark / MLlib,SQL",,Sometimes,,Sometimes,Sometimes,,Sometimes,Rarely,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,Rarely,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,,Sometimes,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,Sometimes,Sometimes,,Often,Most of the time,,,,40,15,10,10,25,0,,"Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,Most of the time,,,,,Often,,,,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",,,,,4,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Uplift Modeling,R,,"Blogs,Stack Overflow Q&A",,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Statistician",University courses,0,0,70,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,30,30,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,,Often,,,,,,Often,,,,Often,,76-99% of projects,Do not know,Standalone Team,,,,,,,,220000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Nigeria,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Self-employed,Google Cloud Compute,Deep learning,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher,Statistician,Other",Self-taught,80,0,5,5,0,10,"Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",High school,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Video data,Text data,Relational data",Always,100MB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Python,R,Statistica (Quest/Dell-formerly Statsoft),Other,Other",,,,,,,,,,,,Often,,,,,,,,Rarely,Sometimes,,Most of the time,,,Sometimes,,,,,Most of the time,,Most of the time,,,,,,,,,,,Rarely,,,,,Sometimes,Sometimes,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis,Other",,,Often,,,Often,Often,,,,,,,Often,,Often,,,,,Most of the time,,,,,,Most of the time,,Most of the time,Most of the time,Often,,,22,23,25,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,Often,Often,,,,Often,Most of the time,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,10-25% of projects,Do not know,Other,Can't say,Lack of tools,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,"750,000",,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Hong Kong,38,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,,,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Other,Traditional Workstation,"Text data,Relational data",Never,10MB,Neural Networks,"Jupyter notebooks,Microsoft Azure Machine Learning,Python,RapidMiner (free version),SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,"CNNs,Cross-Validation,Logistic Regression,Naive Bayes,Neural Networks,RNNs,SVMs,Text Analytics",,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,Often,,,,,Sometimes,,,Sometimes,Sometimes,,,,,50,0,50,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,,,Often,Often,,Often,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,288000,HKD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important +Male,Germany,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,,Unnecessary,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Italy,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Other,Self-taught,45,50,0,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A doctoral degree,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,10MB,Bayesian Techniques,TensorFlow,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Decision Trees,C/C++/C#,Google Search,"Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,30,0,10,0,Computer Vision,Logistic Regression,"Some college/university study, no bachelor's degree",Manufacturing,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Image data,Most of the time,10GB,"Bayesian Techniques,Neural Networks","Amazon Web services,C/C++,Java,MATLAB/Octave,Microsoft SQL Server Data Mining,SQL",,Rarely,,Most of the time,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization",Sometimes,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,20,30,20,10,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Email",,Other,Most of the time,535000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Pakistan,32,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Other,"Basic laptop (Macbook),Traditional Workstation","Image data,Video data",Sometimes,10GB,"CNNs,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,,Most of the time,,,"CNNs,Logistic Regression,Neural Networks",,,,Most of the time,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,40,40,10,5,5,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,I don't plan on learning a new ML/DS method,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,Very useful,,Very useful,,,,,Very useful,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important +Male,France,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,Not Useful,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,45,20,30,0,5,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Other (please specify; separate by semi-colon)",High school,Other,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Other,Never,1GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs,Other","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation,SVMs",,,,,,Sometimes,,,,,,,,Often,,Sometimes,,Often,,,Most of the time,,,,,,Most of the time,Often,,,,,,20,50,5,10,0,15,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,Often,,Often,Sometimes,,,,,,,,,Most of the time,,76-99% of projects,Entirely internal,Other,None,Very specific data : simulations of fluid dynamics,Column-oriented relational (e.g. KDB/MariaDB),"Email,Share Drive/SharePoint",,Git,Sometimes,35000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Management information systems,Less than a year,Software Developer/Software Engineer,Self-taught,20,40,0,20,20,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,,,,,,,,,,,,,,, +Female,Hong Kong,26,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Time Series Analysis,Matlab,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Researcher",University courses,60,0,0,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Support Vector Machines (SVMs),Primary/elementary school,Academic,10 to 19 employees,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Other",Rarely,1MB,SVMs,"Amazon Machine Learning,Java,MATLAB/Octave,Python",Rarely,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,Often,,,,,,,Most of the time,,,,,,10,50,0,0,0,40,Enough to refine and innovate on the algorithm,"Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,Sometimes,,,,,,Often,Often,,,,,,,,,,,Less than 10% of projects,Entirely external,IT Department,n/a,remove the outliers/noise ,Other,I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"19,000",HKD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,26,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Newsletters,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,Other,University courses,35,5,0,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,Other,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,0,10,50,30,0,Recommendation Engines,Decision Trees - Random Forests,A master's degree,Technology,"5,000 to 9,999 employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,,,"Amazon Web services,Java",,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,,,Necessary,,,,,,,,,"Coursera,Other",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Researcher",University courses,0,10,70,5,5,10,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"5,000 to 9,999 employees",Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Relational data",Rarely,100GB,"CNNs,Neural Networks,RNNs,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,Most of the time,,,Sometimes,,,,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,,,,,,,,50,30,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources",,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Matlab,Google Search,"Blogs,Kaggle,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Software Developer/Software Engineer,Self-taught,70,0,0,0,30,0,,,A bachelor's degree,Other,20 to 99 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Never,10GB,"CNNs,Decision Trees,RNNs","C/C++,MATLAB/Octave",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Segmentation,Simulation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,10,50,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Standalone Team,,Finding the ground truth of a segmentation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Rarely,45000,EUR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Statistician,Other",University courses,20,15,20,40,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Mix of fields,500 to 999 employees,Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,Text Analytics",,,,,,Often,,Often,Often,,,Often,,,,,,,Often,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,70,10,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Most of the time,,,,,,Often,,,,,,,,,Sometimes,,,76-99% of projects,Entirely internal,Other,wikipedia;law texts,"The documentation for the data is often poor, and then of course the always present things: values that are errors, missing values, lack of subject matter expert input with interpreting the data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,50000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Sweden,44,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Neural Networks,Other","Google Cloud Compute,NoSQL,Python,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,Most of the time,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,20,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Decision Trees,Java,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Textbook",,,Very useful,,,,,,,,,,,,Somewhat useful,,,,"Talking Machines Podcast,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,,Yes,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,I haven't started working yet,University courses,10,0,0,90,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Female,Romania,23,Employed full-time,,,No,Yes,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",0 - 1 hour,Master's degree,Yes,Bachelor's degree,Computer Science,3 to 5 years,Programmer,University courses,30,5,5,60,0,0,"Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,Republic of China,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Microsoft Excel Data Mining,Social Network Analysis,SQL,,"Blogs,College/University,Conferences,Friends network,Personal Projects",,Somewhat useful,Very useful,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,1 to 2 years,Business Analyst,University courses,20,20,30,30,0,0,Natural Language Processing,Logistic Regression,A bachelor's degree,Academic,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,,"MATLAB/Octave,Microsoft Excel Data Mining,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,Often,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,40,10,10,10,30,0,Enough to run the code / standard library,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools",,,,,Often,,,,,,,Often,Often,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,Japan,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Textbook,Other",,,,,,Somewhat useful,,,,,,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,40,10,40,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,DataRobot,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Spark / MLlib,SQL,Tableau",Sometimes,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,Sometimes,Often,,,,,,,,Often,,Sometimes,,,Sometimes,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,Often,,,,,,,,Sometimes,,Most of the time,Often,,,26-50% of projects,More internal than external,IT Department,climate data;IPxGeo mapping;CENSUS,"Reliability, Data Understanding,","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,90000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Software Developer/Software Engineer",University courses,0,30,20,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,SVMs","Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python",,,,,,,,Sometimes,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,Sometimes,,,Sometimes,Often,Sometimes,,,,,,Sometimes,,,,,,,Often,,,,,,,Sometimes,,Often,,,,60,5,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Often,,Sometimes,Often,,,Sometimes,,,,,,,,,Sometimes,,,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,80000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,South Korea,26,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,Researcher,University courses,0,0,0,100,0,0,Outlier detection (e.g. Fraud detection),"Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs",High school,Academic,Fewer than 10 employees,Increased significantly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed part-time,,,Yes,,Statistician,Poorly,Employed by college or university,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,30,10,9,1,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Academic,100 to 499 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Rarely,<1MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Statistics,Python,R,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Often,Most of the time,Sometimes,Often,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,,,,,,,,90,7,0,2,1,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",,,,,Most of the time,Sometimes,,,Sometimes,,Often,,,,,,,Most of the time,,Often,Often,Most of the time,76-99% of projects,Entirely internal,Central Insights Team,,Cleaning,Other,"Email,I don't typically share data,Other",Power point,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,KZT,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,France,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Engineer,Work,25,5,50,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,29,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,MATLAB/Octave,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,Very useful,Very useful,,,,,Very useful,,Somewhat useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",5-10 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",40+,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,0,0,0,100,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Java,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Trade book,YouTube Videos",,Very useful,,,,,,,,,,,,,,Very useful,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer",Self-taught,90,0,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,100 to 499 employees,Increased slightly,Don't know,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Decision Trees,RNNs","IBM Cognos,Java,Jupyter notebooks,Python,SQL",,,,,,,,,,Rarely,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Neural Networks",,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,70,20,0,10,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,Often,Often,Sometimes,Most of the time,,Most of the time,Most of the time,Often,,Most of the time,Often,Often,,,,,,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,8000,BSD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Poland,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Unix shell / awk,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",50,20,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Internet-based,20 to 99 employees,Increased slightly,Less than one year,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,10MB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,TensorFlow",,Sometimes,,,Most of the time,,Sometimes,,Most of the time,,,,,Rarely,Most of the time,,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction",,,,,,Often,Often,,,,,,,,,,,Sometimes,,Often,Often,,,,,,,,,,,,,70,20,5,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,Sometimes,,,Most of the time,Often,,,,,,Most of the time,Sometimes,,Rarely,,,,10-25% of projects,More external than internal,Central Insights Team,kaggle;UCI,preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,18000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Ukraine,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Podcasts,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,,Very useful,,,,,Very useful,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Spain,23,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Jupyter notebooks,Random Forests,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses",,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Partially Derivative Podcast",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,,Logistic Regression,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,,,,Coursera,,2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,30,30,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Denmark,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,Amazon Machine Learning,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,60,0,30,0,10,0,Unsupervised Learning,Logistic Regression,A master's degree,Technology,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Segmentation",,,,,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,,,,,40,10,10,30,10,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,,76-99% of projects,More external than internal,Business Department,Dataset from public organization such as Denmarks statistics,data wrangling,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,800000,DKK,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Turkey,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,1 to 2 years,"Data Analyst,Data Miner,Operations Research Practitioner,Researcher",University courses,30,30,0,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,,,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,Other,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,,,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Russia,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,38,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,,"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,Very useful,,Very useful,,,,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Other",Basic laptop (Macbook),2 - 10 hours,,No,Bachelor's degree,Other,Less than a year,"DBA/Database Engineer,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,25,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Kaggle,Online courses,Textbook",Very useful,,,,,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Researcher",University courses,5,15,10,65,5,0,Computer Vision,Neural Networks - CNNs,High school,Academic,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Never,10GB,"CNNs,Neural Networks","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,5,83,0,2,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Unavailability of/difficult access to data,Other",,,,,,,,,Sometimes,,,,,,,,,,,,Often,Often,10-25% of projects,More external than internal,Standalone Team,DIV2k,Resources availibility,Other,I don't typically share data,,"Bitbucket,Git",Rarely,45000,PKR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Programmer,University courses,0,0,0,100,0,0,"Natural Language Processing,Speech Recognition",Bayesian Techniques,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites,Other","Blogs,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,Engineer,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,30,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,"1,000 to 4,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),R,SAS Base,SAS Enterprise Miner,Other",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,,,,,,,,Often,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,Sometimes,,,,,,Sometimes,Most of the time,Most of the time,,Sometimes,,,,,Often,,,Most of the time,Sometimes,,Sometimes,Sometimes,,,,50,20,10,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Sometimes,,,,,,Often,,,Often,,Often,,Sometimes,Often,,,26-50% of projects,More internal than external,Standalone Team,Credit score agencies,Dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,2475000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,,,,Somewhat useful,,,,,Very useful,Very useful,,,Somewhat useful,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,,Nice to have,Nice to have,,Necessary,,,Necessary,Necessary,,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,Management information systems,Less than a year,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,Logistic Regression,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,20,20,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Retail,"1,000 to 4,999 employees",Increased significantly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines","Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,Python,SQL,TensorFlow,Other",,,,,,,,,Often,,Often,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,Often,,,,,Most of the time,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Recommender Systems,Segmentation,Text Analytics",Often,,,,Sometimes,Most of the time,Most of the time,,Often,,,Most of the time,,,,Often,,,,Often,,,,Most of the time,,Often,,,Rarely,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Regression,Python,Other,"Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,DataCamp,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,20,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Italy,58,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Julia,Neural Nets,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Personal Projects,Tutoring/mentoring,Other",,Somewhat useful,,Very useful,,,,,,,,Very useful,,,,,Somewhat useful,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Software Developer/Software Engineer,Other",Work,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A master's degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Most of the time,1TB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Other","IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,Python,R,Spark / MLlib",,,,,,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,Rarely,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,,Often,Most of the time,Most of the time,,,,,Sometimes,,,,,,Often,Often,Most of the time,Sometimes,,,,,,,Often,,,,50,20,25,1,4,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,Often,,,Most of the time,,,,,,Often,,,Less than 10% of projects,More internal than external,IT Department,Weather,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,"90,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Belarus,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Recommendation Engines",,Primary/elementary school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,,Work,20,0,80,0,0,0,,,High school,Academic,500 to 999 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Image data,Text data",Never,1GB,Ensemble Methods,"C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Ensemble Methods",,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,35,20,5,20,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,32,Employed full-time,,,Yes,,Predictive Modeler,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by government",Spark / MLlib,Deep learning,R,"Government website,University/Non-profit research group websites","Conferences,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",15,15,0,70,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Rarely,1TB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,IBM Cognos,Jupyter notebooks,Python,R,SAS Base,SQL",,Rarely,,,,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,Sometimes,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics",Sometimes,,,,,Often,Most of the time,,,,,,,Often,,Most of the time,,,,,Most of the time,Most of the time,Sometimes,,,,,Sometimes,Often,,,,,30,10,0,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,Most of the time,,,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,Most of the time,,Often,,,Most of the time,,,10-25% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,,52000,EUR,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Spain,52,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Other,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,edX,Traditional Workstation,,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,39,"Not employed, but looking for work",,,,,,,,DataRobot,Time Series Analysis,Python,GitHub,"Arxiv,Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,Very useful,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer,Other",University courses,10,30,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,20,10,30,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,,Sometimes,Sometimes,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Often,,Often,,Often,,Often,,Often,,Often,,Sometimes,Most of the time,,Often,,,,Often,Often,,Sometimes,,,,40,10,10,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",Rarely,,,,Most of the time,,,,,,,,Sometimes,,,,,Sometimes,,,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,,Dirty Data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,,,,7,,,,,,,,,,,,,,,,,, +Male,Portugal,33,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Other,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,60,0,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Not important +Male,Brazil,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,University courses,25,0,5,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,,,C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,No,Yes,Data Analyst,,Employed by professional services/consulting firm,Amazon Web services,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,3 to 5 years,,University courses,25,25,0,10,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Other,40,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by professional services/consulting firm,Amazon Machine Learning,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts",,Very useful,,,,,Very useful,,Very useful,,Very useful,,Very useful,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",20,55,10,0,15,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning",Logistic Regression,I prefer not to answer,Telecommunications,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1TB,"Decision Trees,Random Forests","Java,Jupyter notebooks,R,SQL,Tableau",,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Sometimes,Often,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,Often,Sometimes,,,Most of the time,,,,Most of the time,,,,60,15,3,15,7,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Most of the time,,Sometimes,Most of the time,,Sometimes,,,Often,,,,,,Often,,,51-75% of projects,Approximately half internal and half external,Business Department,None,The data sets are huge and take time to process,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",,110000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,Google Search,"Blogs,Newsletters,Non-Kaggle online communities,Online courses",,Very useful,,,,,,Somewhat useful,Very useful,,Very useful,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Retail,20 to 99 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,Other,"C/C++,Python,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,kNN and Other Clustering,Recommender Systems,Segmentation",Often,Often,,,Often,,,,,,,,,Often,,,,,,,,,,Often,,Often,,,,,,,,40,45,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,Often,,,,,,Most of the time,,Most of the time,,,Often,,Most of the time,,,,Often,,,None,Entirely internal,Other,,Not enough feedback collected from customers and random values entered in several fields.,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Never,650000,INR,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Spain,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Text Mining,Python,GitHub,"College/University,Friends network,Kaggle,YouTube Videos",,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,Data Scientist,Work,0,0,100,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Technology,Fewer than 10 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Most of the time,,,,,,,,Rarely,,,Sometimes,,Most of the time,,Most of the time,,,,,,,,,,,40,10,20,30,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,Often,,,,,Most of the time,,,,Most of the time,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),,19000,EUR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Other,49,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,KNIME (commercial version),Deep learning,Python,"GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,0,0,60,40,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,500 to 999 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Impala,Java,KNIME (commercial version),Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,Often,Often,,,Often,Often,,,,,Often,Often,,,Often,,,,Often,Often,,,,,,,,Often,,Often,,,,,,,,Often,Often,,,Often,Often,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,Text Analytics,Time Series Analysis",Often,,Often,Often,,,Often,Often,,,,Often,,,,Often,Often,,Often,Often,,,Often,,,Often,,,Often,Often,,,,20,20,30,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Explaining data science to others,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Often,,,,,,,,,Often,,,,Often,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,86000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Business Analyst,Self-taught,75,0,15,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Naive Bayes,SVMs,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Often,,,,65,10,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Statistica (Quest/Dell-formerly Statsoft),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,20,30,0,5,5,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Pakistan,20,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,0,40,0,60,0,0,"Recommendation Engines,Unsupervised Learning",Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Mexico,27,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,Survival Analysis,Python,University/Non-profit research group websites,"College/University,Conferences,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,70,0,0,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Rarely,1GB,"Ensemble Methods,Evolutionary Approaches,Random Forests","Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Naive Bayes,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,,,30,10,10,10,40,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,Often,,,,,,,Most of the time,,51-75% of projects,Approximately half internal and half external,Standalone Team,,Feature engineering,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Most of the time,156000,MXN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,24,Employed full-time,,,No,Yes,Other,Perfectly,Employed by government,C/C++,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,25,0,0,75,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,"Programmer,Researcher",Self-taught,80,0,10,0,10,0,"Computer Vision,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Rarely,100GB,"CNNs,Random Forests","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,Rarely,,Most of the time,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,,Often,,Often,Sometimes,Rarely,,,,Sometimes,,Rarely,,Sometimes,,,,,Sometimes,,Sometimes,,,Often,Often,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,26,Employed full-time,,,Yes,,Operations Research Practitioner,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,I haven't started working yet,University courses,NA,0,30,70,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Miner,Data Scientist,Predictive Modeler,Programmer,Statistician",University courses,50,20,20,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Decreased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (commercial version),KNIME (free version),MATLAB/Octave,NoSQL,Python,R,SQL",,Sometimes,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,Often,Sometimes,Often,,Rarely,,,,,,Rarely,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,Sometimes,Often,Rarely,,Sometimes,Rarely,Often,,Most of the time,,,Sometimes,Sometimes,Often,Sometimes,Often,,,,30,15,15,15,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Most of the time,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,60000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Russia,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,15,0,5,75,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Random Forests,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Time Series Analysis",Often,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,55,35,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Unavailability of/difficult access to data",Sometimes,,,,,,,,Often,,,,,,,,,,,,Rarely,,10-25% of projects,Do not know,Other,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"I don't typically share data,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,52000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,Employed full-time,,,Yes,,Data Analyst,Fine,Self-employed,IBM SPSS Modeler,Regression,Other,Other,Other,,,,,,,,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,1 to 2 years,Data Analyst,University courses,10,10,20,60,0,0,Time Series,Logistic Regression,,Other,I don't know,,,Some other way,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,,,,"IBM SPSS Statistics,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,20,10,0,60,10,0,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,26-50% of projects,Do not know,Other,,,Graph (e.g. GraphBase/Neo4j),Email,,Other,,,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Switzerland,31,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,College/University,Conferences,Friends network,Textbook",Very useful,,Very useful,,Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,0,70,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,,100MB,"Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,Tableau,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Sometimes,,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing",,,,,,Often,Often,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Often,Often,,76-99% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,40000,,Other,7,,,,,,,,,,,,,,,,,, +Male,Greece,27,"Not employed, but looking for work",,,,,,,,Cloudera,Deep learning,Python,University/Non-profit research group websites,Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,30,10,0,50,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Male,India,17,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses",,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Researcher",University courses,30,30,0,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Support Vector Machines (SVMs),A doctoral degree,Government,"1,000 to 4,999 employees",,Less than one year,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Never,,"CNNs,SVMs","C/C++,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Segmentation,SVMs",,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,Most of the time,,Often,,,,,,0,40,0,50,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,,,,,,,,,,,Most of the time,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,25000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Singapore,29,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,Very useful,,,,,,,,Very useful,,,Very useful,,,,,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,PhD,Yes,Master's degree,Computer Science,,"Researcher,Software Developer/Software Engineer,Other",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer,Other",Self-taught,80,0,20,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",65,35,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Relational data,Rarely,10GB,Gradient Boosted Machines,"Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL",,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,Rarely,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Time Series Analysis",,,,,,Most of the time,Most of the time,,Often,,,Often,,Sometimes,Often,,,,,,,,,,,,,,,Sometimes,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,,,Often,Sometimes,,76-99% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,50000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,KDnuggets Blog",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",33,34,0,0,33,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Africa,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,More than 10 years,"Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Workstation + Cloud service","Image data,Text data",Most of the time,10GB,"CNNs,Neural Networks","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow",,Sometimes,,Sometimes,,,,Often,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,Segmentation",,,,Often,,Often,Often,,Often,,,,,Often,,Often,,,,Often,,,,,,Often,,,,,,,,60,10,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Often,,,,,Most of the time,,51-75% of projects,More external than internal,Other,Publicly available medical imaging data,Getting it and then preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Time Series Analysis,Java,I collect my own data (e.g. web-scraping),"College/University,Online courses",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Nice to have,,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),40+,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Data Analyst,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A health science,1 to 2 years,"Data Scientist,Machine Learning Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Fine,Self-employed,TensorFlow,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,Less than a year,"Engineer,Other",Self-taught,35,50,0,0,15,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A master's degree,Telecommunications,,,,,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Relational data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,Often,Rarely,,Often,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,Often,,,Most of the time,Most of the time,Often,Sometimes,Rarely,,,,Often,Often,Most of the time,,Often,Sometimes,Sometimes,Often,,,Rarely,,,,,,,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,,,,,,,,,,,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau",,,,,Often,,Often,,Often,,,,,Often,,,Often,,Rarely,,,,,Often,,,Often,,,,Often,,Often,,Rarely,,,,,,Often,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics",Often,,,,,Often,Often,Often,Often,,,,,,,Often,,,Sometimes,,Often,Sometimes,Often,,,,,,Sometimes,,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,100% of projects,More internal than external,IT Department,airlines data; students data; sales data; datasets from kaggle; uci,Lack of good datasets,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,"Git,Other",Sometimes,960000,PKR,Other,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Time Series Analysis,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Official documentation,Personal Projects,Textbook",Very useful,,,,,,,,,Very useful,,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,Researcher,Work,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,,,,,,,,,,,,,,Often,,Rarely,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,,,Often,,,,Sometimes,,,Most of the time,,,,Sometimes,,,Most of the time,,,,15,50,0,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources",Often,Often,,,,,,,,Most of the time,,,,,,,,,,,,,100% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Turkey,62,Retired,,,Yes,,Data Miner,Fine,,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Operations Research Practitioner,Statistician",Self-taught,80,0,0,20,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,55,"Not employed, but looking for work",,,,,,,,Python,Deep learning,,,Tutoring/mentoring,,,,,,,,,,,,,,,,,Not Useful,,,< 1 year,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,Less than a year,"Other,I haven't started working yet",Other,0,100,0,0,0,0,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,Employed full-time,,,No,Yes,Programmer,Fine,Self-employed,NoSQL,Proprietary Algorithms,Python,GitHub,Podcasts,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,50,10,10,10,10,Machine Translation,Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,Python,I collect my own data (e.g. web-scraping),Company internal community,,,,Very useful,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,"Computer Scientist,Engineer,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Hidden Markov Models HMMs,Markov Logic Networks",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Very useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,The Data Skeptic Podcast,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Online Courses and Certifications,Yes,Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,0,0,30,0,"Recommendation Engines,Survival Analysis","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Male,Switzerland,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,44,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Other,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by professional services/consulting firm,SAS Base,Regression,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,Somewhat useful,,Very useful,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Data Analyst,Researcher",Self-taught,100,0,0,0,0,0,Survival Analysis,,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,27,"Not employed, but looking for work",,,,,,,,,,,,"Blogs,College/University,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,,,Very useful,,Very useful,Very useful,"Data Stories Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,10,0,20,40,30,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Switzerland,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Google Search,"Arxiv,Conferences,Friends network,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Engineer,Researcher,Software Developer/Software Engineer",Self-taught,30,0,33,32,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs",,Technology,500 to 999 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data,Other",Always,100GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks","C/C++,Java,MATLAB/Octave,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Random Forests,Time Series Analysis",,,Most of the time,Sometimes,,Most of the time,Most of the time,,,Sometimes,,,,Most of the time,,,,Most of the time,,,,,Often,,,,,,,Most of the time,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Often,,,,Often,,,,,,,,,,Sometimes,,,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",Sometimes,100000,CHF,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,,,,Somewhat useful,,Somewhat useful,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,98,1,1,0,0,0,"Time Series,Unsupervised Learning",Neural Networks - RNNs,A doctoral degree,Financial,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,Rarely,100MB,"Neural Networks,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,,,,Most of the time,,,,,,,,Sometimes,,,,,,Most of the time,Sometimes,,,,Most of the time,,,,,Most of the time,,,,20,50,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,Sometimes,,,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Most of the time,18000,EUR,I am not currently employed,9,,,,,,,,,,,,,,,,,, +Male,Germany,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,Other,University courses,50,40,0,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Government,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data,Text data",Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (commercial version),KNIME (free version),NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Other",,,,,,,,,Most of the time,,,,,,Most of the time,,Often,Most of the time,Most of the time,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,Often,Sometimes,,,,Most of the time,,,Most of the time,,,"Association Rules,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,Often,Often,Often,,,Sometimes,Often,Often,,Sometimes,,,Often,,Often,,Often,Often,Often,,,Often,,Often,,,Often,Often,Sometimes,,,,50,34,1,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Privacy issues",Often,Often,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,26-50% of projects,More external than internal,IT Department,Social Media,Crawling social media,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Other,Rarely,45000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Japan,46,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer,Researcher",Self-taught,50,0,0,50,0,0,Computer Vision,,High school,Mix of fields,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,Important,Other,Traditional Workstation,"Image data,Video data",Rarely,1GB,Other,"C/C++,Perl",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,Employed part-time,,,No,Yes,Computer Scientist,Fine,Employed by college or university,IBM Watson / Waton Analytics,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Online courses,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,Very useful,,,,,,,Very useful,O'Reilly Data Newsletter,< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,University courses,10,40,10,40,0,0,Recommendation Engines,"Logistic Regression,Support Vector Machines (SVMs)",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,,Somewhat important,Not important,Very Important +Male,India,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,R,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,,Necessary,Necessary,Necessary,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Other,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Data Scientist,Machine Learning Engineer",Other,70,0,20,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Somewhat important,Very Important,,Very Important,Very Important,,Very Important,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,Neural Nets,,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A master's degree,Academic,"5,000 to 9,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,,,Image data,,,,Mathematica,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,0,80,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Sometimes,,,76-99% of projects,,,,,,,,,,,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,DataRobot,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,5,20,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,,,,,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Most of the time,1TB,"Ensemble Methods,Markov Logic Networks,Random Forests,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Java,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,Often,,,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,,Most of the time,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Markov Logic Networks,Naive Bayes,Recommender Systems,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,,,,,Often,Often,Often,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,40,30,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,,,Most of the time,,,,,,Often,,,Sometimes,Most of the time,,,26-50% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Git,Most of the time,700000,TWD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,South Korea,20,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Support Vector Machines (SVM),Python,Google Search,"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,Hadoop/Hive/Pig,Bayesian Methods,Python,GitHub,"Arxiv,Blogs",Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Researcher,Self-taught,50,0,50,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Sometimes,1GB,"CNNs,Neural Networks","C/C++,Mathematica,Perl,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Neural Networks,Simulation",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,,,,,40,40,0,0,20,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Sometimes,67000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,Python,Deep learning,R,,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Technology,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Most of the time,1GB,Random Forests,Microsoft Azure Machine Learning,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,40,40,10,10,0,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,Often,,,Often,,,,,,Sometimes,,,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Sometimes,,EUR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,28,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Udacity",Basic laptop (Macbook),40+,Github Portfolio,Yes,Bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Reinforcement learning,"Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Australia,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TIBCO Spotfire,Deep learning,Python,Government website,"Arxiv,Blogs,Company internal community,Conferences,Friends network,Podcasts,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Operations Research Practitioner,Programmer,Researcher,Statistician",University courses,60,0,40,0,0,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks",A doctoral degree,Internet-based,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",Often,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,Often,Often,,,,10,25,20,20,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,Sometimes,75000,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Cloudera,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Podcasts,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Operations Research Practitioner,Other",Self-taught,10,35,0,50,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Other,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Relational data,Other",Never,10MB,Regression/Logistic Regression,"IBM SPSS Statistics,IBM Watson / Waton Analytics,KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Python,R,RapidMiner (free version),SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau,TIBCO Spotfire",,,,,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,Often,,,Sometimes,,,,,Often,,Often,,Sometimes,,,,,,,Often,,Rarely,Often,,Sometimes,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",Rarely,,Rarely,,,,Often,,,,,,,,,Often,,,,,Sometimes,Often,,,,,,,,Sometimes,,,,40,25,10,15,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,I prefer not to say,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Often,Sometimes,,,,,Most of the time,,Most of the time,,,,,,,Often,Most of the time,,,,,,10-25% of projects,More external than internal,Other,"Kaggle, Data.gov, data.gov.in, etc.",Dates,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Git,Sometimes,650000,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,30,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer",Self-taught,70,20,10,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Technology,500 to 999 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Always,100MB,Bayesian Techniques,"Jupyter notebooks,Microsoft Azure Machine Learning,Python,QlikView,R",,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Naive Bayes,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,10,40,20,10,20,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,Text Data,Text Data format,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Bitbucket,,12000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,College/University,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,I haven't started working yet,University courses,20,15,10,55,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Image data,Never,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,50,10,20,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,,,,Document-oriented (e.g. MongoDB/Elasticsearch),,,Git,,,,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"GitHub,Google Search","Company internal community,Conferences,Friends network,Kaggle,Newsletters",,,,Somewhat useful,Somewhat useful,Not Useful,Not Useful,Not Useful,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,20,30,15,15,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Neural Networks","Hadoop/Hive/Pig,MATLAB/Octave,TensorFlow",,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Neural Networks",,,Often,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,20,50,10,15,5,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools",,Often,,,Sometimes,,,,,Often,Often,,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Subversion",Sometimes,,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Unix shell / awk,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Conferences,Newsletters,YouTube Videos",,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Newsletters,Podcasts",,Very useful,,,Very useful,,,Somewhat useful,,,,,Somewhat useful,,,,,,"Data Machina Newsletter,KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,,Self-taught,70,20,NA,10,0,0,,,A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,MATLAB/Octave,Other,Scala,I collect my own data (e.g. web-scraping),Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Data Scientist,Engineer,Researcher,Other,I haven't started working yet",University courses,50,0,30,20,0,0,Computer Vision,Neural Networks - CNNs,,Academic,10 to 19 employees,Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,100MB,Other,"Amazon Machine Learning,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining",Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Text Analytics,Other",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,30,30,30,10,0,0,Enough to run the code / standard library,"Limitations of tools,Organization is small and cannot afford a data science team",,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,10-25% of projects,Entirely external,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Subversion,Sometimes,100,ALL,Has decreased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,United States,29,"Not employed, but looking for work",,,,,,,,SQL,Bayesian Methods,SQL,Google Search,"Blogs,Friends network,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Master's degree,Yes,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Researcher",University courses,10,10,20,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,49,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,10,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,"Data Stories Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,,3 to 5 years,"Business Analyst,Data Analyst",University courses,50,10,20,20,0,0,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Markov Logic Networks,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Denmark,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Julia,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,30,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",,100MB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Julia,Mathematica,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Most of the time,,,,,,,,,,,Rarely,Rarely,,,,Rarely,Most of the time,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,"CNNs,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,,Most of the time,,,,,,,Sometimes,,Often,,,,Most of the time,Sometimes,,,,,Often,,Sometimes,,Most of the time,,,,30,30,5,5,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,Often,Most of the time,Most of the time,,Often,,,Most of the time,,,,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,MNIST,"Low volumes of curated data for supervised learning, image obtained from suboptimal experiment conditions","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,34000,DKK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,NA,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Google Search,"Arxiv,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,50,30,20,0,0,Computer Vision,Support Vector Machines (SVMs),A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Video data,Most of the time,10GB,CNNs,"C/C++,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,20,50,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,Rarely,,JPY,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"FlowingData Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Researcher",University courses,35,25,0,30,10,0,"Computer Vision,Reinforcement learning,Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Academic,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Sometimes,10GB,"CNNs,Ensemble Methods,Neural Networks,RNNs","C/C++,MATLAB/Octave,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Most of the time,,Most of the time,Often,,Sometimes,,,,,,,Sometimes,,,,Often,Sometimes,,,,Often,,,Often,,,,,,20,30,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,Often,Most of the time,,,,,,,,Most of the time,Sometimes,,10-25% of projects,Approximately half internal and half external,Standalone Team,DISFA;UNBC_PAIN;FER2013;NVIE;ImageNet;PASCAL_VOC,sometimes the annotations are missing and noisy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Share Drive/SharePoint,Other",store on a shared machine,"Bitbucket,Git",Most of the time,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Pakistan,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Text Mining,R,"GitHub,Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",30,40,10,0,10,10,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,,Basic laptop (Macbook),"Image data,Text data",Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,Tableau,Other,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,Most of the time,,"Cross-Validation,Data Visualization,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,Often,Sometimes,,,,20,10,30,40,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,,,,,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Git,,30000,PKR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Germany,39,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,,,A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Kaggle",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,"Data Machina Newsletter,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,University/Non-profit research group websites,"Company internal community,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,Very useful,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Researcher,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",,Other,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (free version),SAS Base,Spark / MLlib,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,Often,,Often,Often,,,,Sometimes,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,Sometimes,,,Often,,,,Most of the time,,Most of the time,,Rarely,,,Often,,,Often,Most of the time,,,,Rarely,Most of the time,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,Most of the time,Sometimes,Often,Often,Often,,,Most of the time,Most of the time,,Most of the time,Rarely,Sometimes,Sometimes,,Often,Often,,Often,Sometimes,Often,,Sometimes,,,,70,10,10,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,Most of the time,,,,,,,Rarely,,Most of the time,,Often,,,,Most of the time,,76-99% of projects,More internal than external,Business Department,Twitter;Facebook;RSS,The format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Most of the time,44000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",50,30,0,10,10,0,Recommendation Engines,Decision Trees - Gradient Boosted Machines,High school,Telecommunications,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,Often,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,,Most of the time,,Often,Sometimes,,,,Often,,Rarely,,,,0,50,0,10,40,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Official documentation,YouTube Videos",,,,,,,,,,Somewhat useful,,,,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,Engineer,University courses,40,5,15,40,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Evolutionary Approaches,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Mix of fields,"1,000 to 4,999 employees",Increased significantly,Don't know,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Don't know,100GB,RNNs,"Java,MATLAB/Octave,Python",,,,,,,,,,,,,,,Often,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,5,70,0,5,20,0,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,Often,,,,,,,,Sometimes,,,Less than 10% of projects,Entirely internal,Standalone Team,None,Text mining,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,"110,000",PLN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Turkey,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Hong Kong,35,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,6 to 10 years,Other,Self-taught,70,0,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Gradient Boosted Machines,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,Very useful,Not Useful,,Not Useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Predictive Modeler,Researcher",Self-taught,70,5,20,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Technology,"10,000 or more employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Never,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,RapidMiner (free version),Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,,,Rarely,,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,Sometimes,,Often,Sometimes,,,,Most of the time,,,,,,,Most of the time,,,,10,80,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,Most of the time,Often,Sometimes,,Most of the time,Sometimes,,76-99% of projects,More external than internal,Standalone Team,,Insufficient data for automation. Data collection just sufficient for manual work.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,1000000,INR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,24,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,South Africa,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,20,0,30,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Mix of fields,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Often,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",Often,Often,,,,Often,Often,Often,,,,Often,,,,,,,Sometimes,,Often,,,,,Often,,,,Sometimes,,,,50,10,0,30,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,,Often,,,,,,Often,,,,,Often,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,1300000,ZAR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,46,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Scientist,Programmer,Researcher",University courses,0,0,45,45,10,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data",Don't know,,"Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,Often,,,,Often,Sometimes,Sometimes,,Sometimes,,Sometimes,Most of the time,Often,Sometimes,,,,,,,Often,Often,Often,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Often,76-99% of projects,Entirely internal,Other,,Funding,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,480000,SEK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,I don't plan on learning a new ML/DS method,Python,Government website,"College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Researcher,University courses,10,20,10,50,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Don't know,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,RNNs","Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,QlikView,R,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Sometimes,,,,Most of the time,Often,Sometimes,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Rarely,,,,,Sometimes,Often,,,,,,,Often,,,,,Rarely,,Rarely,,Sometimes,Often,,,,,,,,,,60,5,5,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Sometimes,,Often,Rarely,,,Sometimes,,,,,,Often,,,,Most of the time,Sometimes,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,34000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Russia,43,Employed full-time,,,Yes,,Programmer,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics,Employed by government",TensorFlow,Deep learning,C/C++/C#,GitHub,"College/University,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,Somewhat useful,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data",Sometimes,1TB,"CNNs,Neural Networks,RNNs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,GANs,Neural Networks,RNNs",,,,Often,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,40,20,25,5,10,0,Enough to run the code / standard library,"Dirty data,Need to coordinate with IT",,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"800,000",RUB,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Denmark,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Data Analyst,Other",University courses,10,10,40,30,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",Rarely,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics",Often,,,,,Often,Most of the time,Often,,,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,Often,Often,,,Sometimes,,,Often,,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process",Often,Sometimes,,,Most of the time,,,Often,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,"Data often not collected in a way that make sense, End to End tests of software produce test data in production Databases, without it being removed, No input validation when e.g. sales persons put in sales numbers","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,660000,DKK,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Israel,36,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Other,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,Engineer,Other,0,0,100,0,0,NA,,,A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,,Other,"Perl,Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly",Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,None,Do not know,Other,,,Other,Other,,Git,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Belgium,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,,University courses,80,20,0,0,0,0,"Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Relational data,Other",Sometimes,1GB,"Decision Trees,Random Forests","Java,Python,R,RapidMiner (free version)",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,50,10,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Sometimes,,,Most of the time,,,,,,,,,,Often,Often,,,100% of projects,More external than internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",,25000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Programmer",Work,30,0,70,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Perl,Decision Trees,R,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,Engineer,Programmer",University courses,10,20,20,40,10,0,Machine Translation,Decision Trees - Random Forests,A master's degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Sometimes,1TB,Decision Trees,"C/C++,IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Perl,R,SQL",,,,Often,,,,,,,,Sometimes,,,,,,,,,,,Often,,Often,,Sometimes,,,Most of the time,,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Decision Trees,Neural Networks",Sometimes,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,,78000,SGD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,56,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Researcher,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,,Not Useful,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),40+,Master's degree,Yes,Master's degree,Computer Science,Less than a year,Other,University courses,30,0,0,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Vietnam,31,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",I don't write code to analyze data,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX",,0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,5,15,0,80,0,0,Speech Recognition,"Bayesian Techniques,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,Sweden,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Link Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Friends network,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,60,10,0,15,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Other,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,Often,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,25,15,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Often,,,,,Sometimes,,,Sometimes,,,Sometimes,,Often,,,,10-25% of projects,More internal than external,Business Department,,Understanding what it means,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,500000,SEK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,< 1 year,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,Self-taught,80,10,0,10,0,0,,Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,,Python,GitHub,Textbook,,,,,,,,,,,,,,,Not Useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,30,10,0,0,Survival Analysis,Logistic Regression,"Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Simulation",,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Australia,49,Employed full-time,,,Yes,,Other,Fine,Employed by government,Spark / MLlib,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A,Other",,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A professional degree,Other,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,,,Regression/Logistic Regression,"IBM Cognos,Microsoft Azure Machine Learning,Python,R,SQL,Tableau",,,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,PCA and Dimensionality Reduction,Time Series Analysis",,,Sometimes,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,65,5,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Often,,,,,,,,,,,Often,Often,,76-99% of projects,More external than internal,Other,Aus BoM weather data; rail ticketing data,Accessing sufficient data with appropriate authorities,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",Rarely,,AUD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by company that makes advanced analytic software,Python,Deep learning,Python,GitHub,"Arxiv,Blogs,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,Researcher,University courses,30,40,10,20,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Most of the time,10GB,"CNNs,Neural Networks,RNNs","Amazon Machine Learning,Java,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,GANs,HMMs,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,Most of the time,,Often,Often,,,,Often,,Often,,,Often,,,Most of the time,Most of the time,Often,,,,Most of the time,,,Often,Most of the time,,,,,20,40,20,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,Often,,Often,Most of the time,,Often,,Often,Often,,Often,Often,,Most of the time,Often,Most of the time,,100% of projects,Entirely internal,IT Department,,preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Subversion,Sometimes,55000,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Software Developer/Software Engineer,Self-taught,10,20,60,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",,1TB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,RNNs","Amazon Web services,IBM SPSS Modeler,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,TensorFlow,Other",,Sometimes,,,,,,,,,Rarely,,Rarely,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,30,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,Researcher,Other,50,0,30,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer",University courses,0,20,20,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,NoSQL,Python,R,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Simulation,Text Analytics,Time Series Analysis",,,,,,,Often,Often,Often,,,,,Often,,,,,,,,,,,,,Most of the time,,Often,Most of the time,,,,20,40,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Privacy issues,Unavailability of/difficult access to data",Most of the time,Often,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,Weather data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Never,2200000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Amazon Web services,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,6 to 10 years,,Self-taught,90,5,5,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,100MB,Decision Trees,"Jupyter notebooks,Microsoft Excel Data Mining,Python,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,Often,Rarely,,,,"Cross-Validation,Data Visualization,Decision Trees,Neural Networks,PCA and Dimensionality Reduction",,,,,,Often,Most of the time,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Often,,,Often,,,,,Often,,,,,Sometimes,Most of the time,,,51-75% of projects,Entirely internal,Other,,Justification around collecting more data due to cost,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,75000,GBP,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,South Korea,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,30,15,20,30,5,0,"Computer Vision,Speech Recognition,Time Series","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Always,10TB,"CNNs,Neural Networks,RNNs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,RNNs,Segmentation",,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Often,Most of the time,,,,,,,,20,60,10,5,5,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,Data Miner,University courses,20,40,10,30,0,0,Natural Language Processing,"Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"1,000 to 4,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines",Logistic Regression,A master's degree,Technology,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,Regression/Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,50,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,26,Employed full-time,,,No,Yes,Programmer,Poorly,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects",,,,,,,Very useful,,,Somewhat useful,,Very useful,,,,,,,,< 1 year,,,,,,,,,,,,,,,Workstation + Cloud service,,Kaggle Competitions,No,I prefer not to answer,Mathematics or statistics,Less than a year,"Engineer,Programmer",Self-taught,100,0,0,0,0,0,Computer Vision,"Gradient Boosting,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,80,10,5,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Image data,Never,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,0,50,0,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Evolutionary Approaches,Neural Networks","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,Sometimes,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,,,"Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",,,,,,,,Often,,,,,,,,Often,,Most of the time,Most of the time,Most of the time,,,,,,,,,Often,,,,,65,15,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,I prefer not to say,Organization is small and cannot afford a data science team",,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,16,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,51,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Programmer,Researcher,Software Developer/Software Engineer",Work,50,20,30,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,31,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Online courses",,,Very useful,Very useful,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician,Other",University courses,25,25,5,45,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SAS Base,TensorFlow",,,,,,,,Often,,,,,Rarely,,,,Often,,,,,Sometimes,Most of the time,,,,,,,,Often,,Often,,,,,Most of the time,,,,,,,,Rarely,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Recommender Systems,Segmentation,Text Analytics",Sometimes,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,,60,15,5,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Often,,,,Often,,Often,,,,Often,,Often,,,Often,Often,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,65000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,DBA/Database Engineer,Self-taught,30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Most of the time,,Regression/Logistic Regression,"C/C++,Java,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,Sometimes,,,,"Data Visualization,Logistic Regression,Naive Bayes,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,Sometimes,,,,20,30,30,15,5,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Hospitality/Entertainment/Sports,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Other,Traditional Workstation,"Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",Rarely,,,,,Often,Most of the time,Sometimes,Sometimes,,,,,,,Often,,,Often,Sometimes,,,Rarely,,,,,,Often,Most of the time,,,,40,10,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,Most of the time,Most of the time,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Bitbucket,Git",,36000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,C/C++/C#,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Kaggle,Tutoring/mentoring",Very useful,Somewhat useful,Very useful,,,,Very useful,,,,,,,,,,Very useful,,"Data Elixir Newsletter,Data Machina Newsletter,No Free Hunch Blog",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Other,3 to 5 years,Researcher,University courses,60,0,0,40,0,0,Recommendation Engines,Neural Networks - CNNs,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Switzerland,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,University courses,40,0,20,30,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",University courses,20,10,40,0,0,30,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,,,,,Important,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,I haven't started working yet,Self-taught,50,30,10,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Academic,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,SVMs","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,50,20,0,30,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Operations Research Practitioner,Researcher,Software Developer/Software Engineer,Statistician",University courses,15,0,85,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10MB,"Bayesian Techniques,Ensemble Methods,Regression/Logistic Regression,Other","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,Stan",,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,Sometimes,,,,,,,Rarely,,Most of the time,,,,,,,,Sometimes,Often,Often,,,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,PCA and Dimensionality Reduction,Time Series Analysis",,,Often,,,Most of the time,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,5,20,10,10,10,45,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint,Other",,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,C/C++/C#,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Very useful,Very useful,Very useful,,,,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,10,70,0,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Most of the time,100GB,"CNNs,SVMs","C/C++,Python,TensorFlow,Other",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,"CNNs,SVMs",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,50,20,10,0,0,20,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Sometimes,Often,,,,,,Often,,Most of the time,Most of the time,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,JPY,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,Very useful,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst,Other",Work,50,15,30,0,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,500 to 999 employees,Increased significantly,Don't know,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","DataRobot,Jupyter notebooks,Python,R,SQL",,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,Rarely,Often,Rarely,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,50,5,0,15,30,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,Often,,Often,,,,Often,,51-75% of projects,More internal than external,Business Department,Public DMP,"Getting permission to touch client data. Coordinating with client IT team","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,6000000,JPY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Israel,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Researcher",University courses,20,30,25,20,5,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A doctoral degree,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,39,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,25,15,5,10,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data",Sometimes,1GB,"CNNs,Neural Networks","C/C++,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"CNNs,Data Visualization,Neural Networks",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,30,30,10,20,10,0,Enough to refine and innovate on the algorithm,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,fddb;activitynet;mscoco;voc2007;voc2012,category imbalance; bbox annotation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,500000,CNY,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,R,Regression,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Company internal community,Kaggle,Personal Projects",,,,Somewhat useful,,,Somewhat useful,,,,,Very useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,Statistician,Self-taught,50,0,40,10,0,0,Other (please specify; separate by semi-colon),"Logistic Regression,Other (please specify; separate by semi-colon)",I prefer not to answer,Financial,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Relational data,Other",Sometimes,100MB,Regression/Logistic Regression,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,180000,ALL,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Spain,45,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,"Data Elixir Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Other",Self-taught,50,10,0,0,40,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",No education,Mix of fields,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R",,Often,,,,,,,,,,,,,,,Most of the time,,Often,,,Sometimes,,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,Most of the time,Most of the time,,,,,Often,,Most of the time,,,,,,,Sometimes,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Sometimes,Most of the time,Often,Most of the time,Often,,,Most of the time,,,,,,Often,Most of the time,,,,,Most of the time,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Other,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,DBA/Database Engineer",Other,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,,Other,"Impala,Microsoft Excel Data Mining,SQL,Tableau",,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,50,0,0,50,0,0,,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT",Often,,,,Often,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,100% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,,15000,,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,"GitHub,Google Search,Government website","Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Work,50,0,50,0,0,0,,,Primary/elementary school,Mix of fields,100 to 499 employees,Increased significantly,Don't know,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,1GB,,"IBM SPSS Statistics,Java,NoSQL,Perl,R",,,,,,,,,,,,Rarely,,,Often,,,,,,,,,,,,Often,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Simulation",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,25,25,20,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,Often,,,,,,,,Often,,Sometimes,,,,10-25% of projects,Approximately half internal and half external,IT Department,"Open Street Map, Government Open Data Sets",Relating different datasets to produce new once,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email",,"Git,Subversion",Sometimes,58000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Switzerland,43,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by non-profit or NGO,TensorFlow,Regression,Python,GitHub,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Unsupervised Learning,Bayesian Techniques,A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,,,,,,,,,,,,,,,, +Male,Indonesia,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,FastML Blog",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,40,0,10,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Kenya,29,Employed part-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Statistician",University courses,10,15,15,60,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs",,Financial,Fewer than 10 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Bayesian Techniques,CNNs,Markov Logic Networks","Hadoop/Hive/Pig,IBM SPSS Statistics,NoSQL,R",,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Neural Networks,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,10,25,15,20,30,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,Often,,,,,,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,22,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,C/C++/C#,Google Search,"College/University,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,,Very useful,,,,,Somewhat useful,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,15,25,30,30,0,0,Computer Vision,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Norway,28,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Operations Research Practitioner,University courses,10,40,0,50,0,0,Reinforcement learning,Evolutionary Approaches,A doctoral degree,Other,Fewer than 10 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Prescriptive Modeling",,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,5,45,45,5,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Other",University courses,10,20,20,50,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A master's degree,Mix of fields,10 to 19 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100GB,"Bayesian Techniques,RNNs","IBM SPSS Statistics,NoSQL,Python,R",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees",Most of the time,,,,,Often,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,34,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Regression,C/C++/C#,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Data Analyst,University courses,10,30,60,0,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Other,Traditional Workstation,"Text data,Relational data",Most of the time,1TB,Decision Trees,"NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods",,,,,,,Often,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,40,10,0,40,10,0,Enough to tune the parameters properly,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,76-99% of projects,More external than internal,IT Department,,to clean it and present it properly,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,hard disk,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"26,000",EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Germany,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,A humanities discipline,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Natural Language Processing,Unsupervised Learning",Neural Networks - CNNs,A bachelor's degree,Financial,I prefer not to answer,Increased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,No,Yes,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Web services,Time Series Analysis,Python,GitHub,Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,10,10,20,60,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"5,000 to 9,999 employees",Increased significantly,1-2 years,A tech-specific job board,Very important,,Laptop or Workstation and private datacenters,,Rarely,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,SVMs","Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,Often,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,I prefer not to say,Lack of data science talent in the organization,Scaling data science solution up to full database",Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,Less than 10% of projects,,IT Department,,,,,,,,26000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Russia,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,38,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,,Other,0,30,30,0,0,40,"Machine Translation,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,41,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,6 to 10 years,"Data Analyst,Researcher,Statistician",University courses,30,10,20,40,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10MB,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R,SQL",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Rarely,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,kNN and Other Clustering,Logistic Regression",Often,Often,,,,,Often,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Belarus,20,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,,,,Not Useful,Very useful,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,30,30,20,0,0,20,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",India,25,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,"Jack's Import AI Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,"DataCamp,edX,Udacity",GPU accelerated Workstation,40+,Github Portfolio,No,Bachelor's degree,Electrical Engineering,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,16-20,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Sweden,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Bayesian Methods,Python,"I collect my own data (e.g. web-scraping),Other","Kaggle,Newsletters,Stack Overflow Q&A",,,,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,"FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Other",University courses,0,0,20,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100GB,"Regression/Logistic Regression,SVMs","Google Cloud Compute,Hadoop/Hive/Pig,Python,R,Spark / MLlib,SQL",,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"A/B Testing,Logistic Regression,SVMs",Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,40,0,0,20,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Most of the time,,Most of the time,Sometimes,,,,,,,,,,,,,Often,Sometimes,,,100% of projects,Entirely internal,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,70000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,Random Forests,C/C++/C#,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,A bachelor's degree,Insurance,10 to 19 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,<1MB,,"Amazon Web services,R,SQL",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,0,30,40,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Never,80000,AUD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Germany,34,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Other (please specify; separate by semi-colon)",High school,Technology,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,,I haven't started working yet,Work,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,No,Yes,Computer Scientist,,,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Very useful,,,,,Very useful,Very useful,Very useful,,,Somewhat useful,,,Not Useful,"FastML Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,Self-taught,80,10,0,10,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Female,Egypt,26,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,I don't write code to analyze data,Programmer,Other,80,10,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,A bachelor's degree,Academic,20 to 99 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Never,,CNNs,"Java,Microsoft Excel Data Mining",,,,,,,,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Naive Bayes,SVMs",,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Matlab,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,,Very useful,,Not Useful,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",University courses,10,10,0,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Male,Malaysia,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Computer Scientist,Self-taught,80,10,5,5,0,0,Natural Language Processing,"Bayesian Techniques,Neural Networks - CNNs",Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,Yes,Business Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,47,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,Survival Analysis,Decision Trees - Random Forests,No education,Government,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Important,Other,Workstation + Cloud service,"Text data,Relational data",Rarely,100MB,Decision Trees,"R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods",,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Other,Other,Python,GitHub,"Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,Very useful,Not Useful,,< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Other,No,Master's degree,Computer Science,I don't write code to analyze data,Other,Other,40,60,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,Computer Scientist,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,"Software Developer/Software Engineer,Other",University courses,20,30,15,35,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A doctoral degree,Non-profit,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,,Basic laptop (Macbook),Other,Rarely,,,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",Rarely,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,25,30,10,30,5,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,Often,,,,,,,,,,,Often,,,76-99% of projects,More external than internal,Standalone Team,,,,,,,,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,France,40,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,MATLAB/Octave,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Kaggle,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician",Work,0,10,60,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",High school,Technology,100 to 499 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Other",,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,Often,,,Sometimes,Often,,,,,,,Sometimes,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Rarely,Sometimes,,,Rarely,,Often,Sometimes,Sometimes,,,Sometimes,,Often,Often,Often,,,Sometimes,Rarely,Often,,Rarely,Rarely,,Often,,,Sometimes,Often,,,,45,10,10,10,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,Sometimes,,,,,,Sometimes,Often,,Often,,Often,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,R,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,Very useful,Very useful,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,25,25,25,0,25,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,Rarely,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Sometimes,,,Sometimes,,,Sometimes,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Other,Never,2300000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Spain,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Other,Decision Trees,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,Other,Traditional Workstation,0 - 1 hour,Other,No,Professional degree,,Less than a year,"Data Miner,Programmer,Software Developer/Software Engineer",Self-taught,70,30,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,India,19,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,60,NA,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Japan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Stan,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Textbook",Somewhat useful,,,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,37,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Friends network,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,10,90,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Association Rules,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Text Analytics",,Sometimes,,,,,Sometimes,Often,,,,,,,,,,,Most of the time,,,,Often,,,,,,Most of the time,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,Often,,,Often,,,,,,,,,,,Often,,,Less than 10% of projects,More external than internal,Standalone Team,Na,Na,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Subversion,Rarely,2000000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Egypt,42,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,Very useful,,Somewhat useful,Very useful,,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,No,Doctoral degree,Computer Science,3 to 5 years,"Business Analyst,Operations Research Practitioner,Software Developer/Software Engineer",Self-taught,30,30,0,20,20,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,,,,,,,,,,,,,, +Male,United States,34,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Kaggle,Online courses,Personal Projects",,Very useful,,,,Very useful,Very useful,,,,Very useful,Very useful,,,,,,,,< 1 year,Necessary,Nice to have,,Nice to have,Necessary,Nice to have,,,Unnecessary,Necessary,,,,Coursera,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Doctoral degree,Physics,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Sweden,26,"Not employed, but looking for work",,,,,,,,Java,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Other,2 - 10 hours,Kaggle Competitions,No,Master's degree,A health science,1 to 2 years,Researcher,Self-taught,60,0,0,10,30,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,29,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Time Series Analysis,Python,Google Search,"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,20,20,5,5,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Other,Most of the time,1GB,"HMMs,Neural Networks,RNNs,Other","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,HMMs,Natural Language Processing,Neural Networks,RNNs",,,,,,Sometimes,Sometimes,Sometimes,,,,,Most of the time,,,,,,Often,Often,,,,,Sometimes,,,,,,,,,40,40,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Often,Most of the time,,,,Often,,Often,Often,,,Sometimes,Most of the time,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,linux machine + rsync,Git,Most of the time,84000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,18,"Not employed, but looking for work",,,,,,,,Amazon Web services,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),FastML Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,"edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,Less than a year,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Italy,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,1 to 2 years,Computer Scientist,Self-taught,70,20,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,Some other way,Not very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Python,R",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",Sometimes,Sometimes,Often,,,Most of the time,Most of the time,Most of the time,Often,,,Often,,,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,Often,,Often,,,,,,20,60,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,Jupyter notebooks,Text Mining,R,GitHub,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,"Business Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,20,10,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,Primary/elementary school,Retail,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,"Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Random Forests,Time Series Analysis",Sometimes,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,40,20,20,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,,,,,Sometimes,,51-75% of projects,Approximately half internal and half external,IT Department,yahoo finance; world bank,The granularity is not always appropriate .,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"300,000",CNY,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Italy,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,Other,No,Bachelor's degree,A health science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Somewhat important,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,TensorFlow,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks,Support Vector Machines (SVMs)",A professional degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Markov Logic Networks,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Orange,Python,R,RapidMiner (commercial version),SQL",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,Often,,Most of the time,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,Most of the time,,,,,,Often,,,Often,Often,Often,,,,Most of the time,,,,Often,Often,Often,Often,,,,70,12,12,2,4,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,Sometimes,Most of the time,,,,,Often,,,,Often,,,Sometimes,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Most of the time,,,,7,,,,,,,,,,,,,,,,,, +Female,India,27,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Somewhat useful,FastML Blog,< 1 year,Necessary,Necessary,Necessary,,,Necessary,,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,Yes,Master's degree,Computer Science,,"Data Analyst,Researcher,I haven't started working yet",University courses,10,10,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Very useful,,Very useful,Not Useful,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,,University courses,30,30,0,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Malaysia,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,SQL,Google Search,"Online courses,Other",,,,,,,,,,,Somewhat useful,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer",University courses,50,10,10,30,0,0,"Natural Language Processing,Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,1-2 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,,,Other,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,30,0,10,40,10,10,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Do not know,IT Department,Enterprise Data,Build Predictive Model using Structured data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,,RON,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Italy,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,"Data Stories Podcast,FastML Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,University courses,50,20,0,20,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data",Most of the time,10MB,"CNNs,Neural Networks,RNNs","Amazon Web services,C/C++,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Jupyter notebooks,KNIME (free version),Mathematica,MATLAB/Octave,NoSQL,Python,QlikView,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,Rarely,Sometimes,,,Most of the time,Often,,,,,Sometimes,,,Most of the time,,Sometimes,Rarely,Rarely,,,,,,Often,,,,Most of the time,Sometimes,Sometimes,,Rarely,,,,,,Often,Often,,,Rarely,Most of the time,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",Sometimes,Often,Often,Most of the time,,Most of the time,Most of the time,Often,Often,,,Often,,,,,,Often,Most of the time,Most of the time,Most of the time,,Often,,Most of the time,Most of the time,,Sometimes,Most of the time,,,,,30,20,20,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database",,,,,,,,,Most of the time,,,,Often,,,,,Most of the time,,,,,51-75% of projects,More internal than external,Standalone Team,Imagenet ,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Git,Rarely,55000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Indonesia,24,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,0,20,0,80,0,0,Reinforcement learning,Logistic Regression,,CRM/Marketing,10 to 19 employees,Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,,Basic laptop (Macbook),Text data,Rarely,1GB,Neural Networks,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,,Nice to have,Necessary,,Necessary,Nice to have,Necessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,30,30,0,0,40,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Data Analyst,Self-taught,50,50,0,0,0,0,Natural Language Processing,Decision Trees - Gradient Boosted Machines,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,France,43,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Newsletters,Online courses",,,,,,,,Somewhat useful,,,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests",,,,,,Often,Often,Sometimes,Sometimes,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,35,20,15,10,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Most of the time,Often,Often,,,Most of the time,,Most of the time,,,,,,,Often,,Sometimes,,,Less than 10% of projects,More internal than external,IT Department,,"Data cleaning, finding insights by removing noise from data, missing values imputation","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,Git,Don't know,2100000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,France,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Miner,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Technology,"5,000 to 9,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,Decision Trees,"MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Friends network,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,,,,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,,Necessary,,,,,Other,11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,Other,30,NA,0,0,30,40,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Other,46,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,59,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,DataRobot,Deep learning,R,"GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Statistician",Self-taught,50,0,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Mix of fields,500 to 999 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,RNNs,Other","C/C++,Cloudera,DataRobot,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Julia,KNIME (free version),Microsoft Azure Machine Learning,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,Stan,Tableau,TensorFlow",,,,Sometimes,Sometimes,Sometimes,,,,,Rarely,,Sometimes,,,Rarely,,,Often,,,Often,,,,,,,,,Rarely,,Most of the time,,,,Sometimes,Sometimes,Rarely,,,,Sometimes,,Sometimes,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",Rarely,Rarely,Often,Often,,Often,Most of the time,Often,Often,Sometimes,,Often,Sometimes,Often,,Often,,Sometimes,,,Often,Often,Often,Rarely,Often,Often,Often,,Sometimes,Often,,,,10,20,20,20,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,Most of the time,,,,,,,,Often,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,"Data from government agencies (E.g., Census Data), Financial Data, Social Network Data",To select appropriate variables and to obtain an optimal representation of data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,120000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Israel,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Textbook",,,,,,,Very useful,,,Very useful,Somewhat useful,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Operations Research Practitioner",Kaggle competitions,25,15,50,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Internet-based,500 to 999 employees,Stayed the same,6-10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,Sometimes,Most of the time,,,,,Often,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Often,Often,,,Often,,,,Most of the time,,,,,Sometimes,,Most of the time,,,,,,,,,,,25,25,35,10,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Most of the time,Most of the time,,,,Often,,,Sometimes,,,,,,Most of the time,Often,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Rarely,90000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Czech Republic,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,80,15,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Financial,"5,000 to 9,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,100MB,Regression/Logistic Regression,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Segmentation",,,,,,Sometimes,Often,,,,,,,,,Often,,,,,,,Often,,,Often,,,,,,,,60,20,0,20,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,YouTube Videos",,Very useful,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,25,40,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,NoSQL,Python,R,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,,,,Often,Often,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,Rarely,,Sometimes,,,,Most of the time,,,,55,15,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT",Sometimes,,,Often,Often,,,,,,,,,,Often,,,,,,,,51-75% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Sometimes,80,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Mexico,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Online courses,Textbook,Trade book,YouTube Videos",,,,,,,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Retail,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,,Often,,,,,,Often,,Often,,Often,,Often,Sometimes,,Often,,,,,Sometimes,,,,,,50,10,10,10,20,0,Enough to run the code / standard library,"Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,100000,MXN,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Australia,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Software Developer/Software Engineer,Other",Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Web services,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,3 to 5 years,Researcher,Kaggle competitions,30,10,20,10,30,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,"5,000 to 9,999 employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Never,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Often,Often,,,Most of the time,,Sometimes,,Often,,Sometimes,,Sometimes,Often,Most of the time,Often,,,,Often,Often,,Often,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Sometimes,,100% of projects,Approximately half internal and half external,Other,maps; weather; stock prices; many more,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Other",dropbox,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,50000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Russia,23,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,,Very useful,,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,Not Useful,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),,Other,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,80,0,0,0,20,Time Series,Other (please specify; separate by semi-colon),A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,India,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist,Engineer,Researcher",Self-taught,30,0,50,5,15,0,"Adversarial Learning,Natural Language Processing,Reinforcement learning,Speech Recognition,Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Military/Security,20 to 99 employees,Decreased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Video data,Text data,Relational data",,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,IBM SPSS Statistics,Java,Julia,Mathematica,MATLAB/Octave,Orange,Perl,SQL,Stan,Unix shell / awk",,Most of the time,,Often,,,,,,,,Most of the time,,,Often,Sometimes,,,,Often,Often,,,,,,,,Often,Often,,,,,,,,,,,,Often,Often,,,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Ensemble Methods,Evolutionary Approaches,GANs,Random Forests,Recommender Systems,Segmentation,Simulation",Often,Often,Often,Often,Often,,,,Often,Often,Often,,,,,,,,,,,,Often,Often,,Often,Often,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,37,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,40,25,0,15,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",Rarely,Most of the time,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",Rarely,,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,Sometimes,,,Most of the time,,Often,Most of the time,Most of the time,,,Often,,Often,,Most of the time,,,Often,,,Often,,,,,10,5,10,5,20,50,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Haskell,I collect my own data (e.g. web-scraping),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst",University courses,0,20,0,70,10,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Text data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,Mathematica,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Rarely,,Rarely,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,Often,,,,"Cross-Validation,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,Sometimes,,,Often,,,Often,,,,,0,60,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Rarely,528000,RUB,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Kaggle,Official documentation,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,30,10,20,20,20,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Increased significantly,3-5 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Image data,Most of the time,1TB,"CNNs,Ensemble Methods,Neural Networks,Random Forests",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,,,,,Most of the time,Most of the time,,Sometimes,,,Often,,Sometimes,,Sometimes,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Privacy issues",,,Often,,Often,Often,,,,Often,,,,,,,Often,,,,,,10-25% of projects,More internal than external,Standalone Team,LIDC-IDRI,"Not big enough, sometimes label is not correct","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,70000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Ireland,48,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,Amazon Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,6 to 10 years,Data Scientist,Self-taught,50,30,0,0,20,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Random Forests,SVMs","Microsoft Azure Machine Learning,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,20,30,0,40,10,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process",,,Sometimes,Often,,,,Often,,,,,,,,,,,,,,,100% of projects,Entirely external,Standalone Team,Twittee,acquisition cost,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Always,50000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,Very useful,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,10,25,0,50,15,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Not important,Somewhat important +Female,Kenya,32,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,R,University/Non-profit research group websites,"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Female,Israel,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Other,Self-taught,40,0,50,0,10,0,Other (please specify; separate by semi-colon),,A doctoral degree,Academic,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Sometimes,100MB,Neural Networks,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,40,5,5,10,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,26-50% of projects,More external than internal,Standalone Team,none,finding good data for learning to properly represent the full data set,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,70000,ILS,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,South Africa,29,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Uplift Modeling,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,,Less than a year,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,Natural Language Processing,,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Poorly,Self-employed,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,Udacity,,,Kaggle Competitions,No,I did not complete any formal education past high school,,Less than a year,"Programmer,Other,I haven't started working yet",Self-taught,10,75,10,0,5,0,Computer Vision,"Bayesian Techniques,Gradient Boosting",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Textbook",,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,Very useful,,,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,No,Bachelor's degree,Computer Science,,,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Google Cloud Compute,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression,RNNs","Java,Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,Often,Often,Sometimes,,Sometimes,,Sometimes,,,Often,Often,,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,Often,,,,,,,Rarely,,Often,,,,,,,,,,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,1150000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,New Zealand,34,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Other",University courses,50,0,0,30,20,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,Less than one year,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,20,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Cloudera,Neural Nets,SQL,Google Search,"Conferences,Friends network,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Very useful,Very useful,,,,,Very useful,,,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Programmer,Work,30,50,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Gradient Boosting",No education,CRM/Marketing,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines",Java,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes",,Often,,,,Sometimes,,Most of the time,Often,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,,,,,,,,50,20,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,Often,,,Often,,,Sometimes,,,,,,,,,,,,,Often,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Greece,41,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Perfectly,"Employed by a company that doesn't perform advanced analytics,Self-employed",Microsoft Excel Data Mining,Monte Carlo Methods,Matlab,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,< 1 year,Nice to have,,Necessary,Necessary,,,Necessary,Nice to have,,,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Self-taught,90,0,0,10,0,0,"Survival Analysis,Time Series",Logistic Regression,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,South Africa,23,"Not employed, but looking for work",,,,,,,,Oracle Data Mining/ Oracle R Enterprise,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,YouTube Videos",,Somewhat useful,,,,,Very useful,Very useful,,,,,,,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation",,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Other,University courses,10,0,0,60,30,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,South Africa,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,,,,,,,,Very useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,,,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Self-taught,45,20,0,0,35,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,26,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Official documentation,Online courses,YouTube Videos",Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,,,,,,Very useful,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,PhD,Sort of (Explain more),Master's degree,I never declared a major,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Not important +Male,United Kingdom,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,Less than a year,Data Analyst,University courses,40,10,0,40,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,Python",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,Often,,Sometimes,,Sometimes,,,,,,Rarely,Sometimes,,Often,Rarely,,,,Most of the time,Rarely,Sometimes,,,,30,20,10,15,25,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,24,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,28,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Somewhat useful,,,Not Useful,Somewhat useful,,,Very useful,Somewhat useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,"Computer Scientist,Data Scientist","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,Sometimes,,,,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,Rarely,Rarely,,,Often,Sometimes,Sometimes,Rarely,,,Sometimes,,,,Rarely,,Sometimes,,,Rarely,,Sometimes,,,,,Sometimes,,,,,,40,5,45,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Rarely,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,,Sometimes,,Most of the time,51-75% of projects,Entirely internal,Standalone Team,foursquare;google places;openstreetmaps,version control; feature extraction from Python objects; unsupervised learning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,s3 URLs,Bitbucket,Sometimes,0,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,"Data Analyst,Predictive Modeler",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,75,5,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,10 to 19 employees,Stayed the same,6-10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,,,"Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,,,0 - 1 hour,,No,Bachelor's degree,Physics,Less than a year,Business Analyst,Other,10,90,0,0,0,0,Other (please specify; separate by semi-colon),Neural Networks - RNNs,A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Personal Projects",,,Not Useful,,,,,,,,Very useful,Very useful,,,,,,,,3-5 years,Necessary,Necessary,Necessary,,Necessary,,,,Nice to have,Necessary,,,,"Coursera,edX,Udacity,Other",GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,46,Employed full-time,,,Yes,,Other,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,1 to 2 years,Other,Self-taught,99,1,0,0,0,0,,,A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,No,Yes,Computer Scientist,Poorly,Employed by professional services/consulting firm,Python,Deep learning,Python,,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Other,University courses,20,10,5,40,10,15,Outlier detection (e.g. Fraud detection),Markov Logic Networks,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Israel,31,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,"DataTau News Aggregator,FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler",Self-taught,30,15,55,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python",Rarely,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics",Sometimes,Rarely,,,Sometimes,Often,Most of the time,Sometimes,Sometimes,,,,,,,Often,,,Often,,Sometimes,,Sometimes,Often,,,Often,,Most of the time,,,,,25,20,20,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Sometimes,Sometimes,,,Most of the time,Often,,,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,26-50% of projects,More internal than external,IT Department,"Crunch base, Google's API's",Normalizing it,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,85000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,30,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,50,10,0,0,20,20,Recommendation Engines,,Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,,Somewhat important,Very Important +Female,India,29,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,Very useful,Very useful,R Bloggers Blog Aggregator,1-2 years,,,,,,Necessary,,,,,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,,I don't write code to analyze data,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,50,10,20,20,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Technology,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,47,"Not employed, but looking for work",,,,,,,,NoSQL,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Newsletters,Online courses,Personal Projects",,Very useful,,,,,,Very useful,,,Very useful,Very useful,,,,,,,Data Elixir Newsletter,1-2 years,Nice to have,Nice to have,,,Nice to have,,,,Nice to have,,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,50,0,0,0,50,Natural Language Processing,"Bayesian Techniques,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,Very useful,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,65,0,0,5,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,20 to 99 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Important,Other,Laptop or Workstation and private datacenters,Other,Most of the time,1GB,Regression/Logistic Regression,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,60,5,25,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,,,,,Most of the time,Sometimes,,,,Sometimes,,Most of the time,,,,Most of the time,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,None,Security restrictions,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,65000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Switzerland,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,30,10,10,50,0,0,Time Series,"Bayesian Techniques,Logistic Regression",High school,Financial,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,10,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,,Don't know,A general-purpose job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"CNNs,Ensemble Methods,Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,1500000,RUB,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Predictive Modeler,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,R,SQL,Tableau,TIBCO Spotfire",Sometimes,Often,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,Sometimes,,Often,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,Sometimes,,,Sometimes,Often,,,Most of the time,,Often,Sometimes,Sometimes,,,,,Sometimes,Often,Often,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,,,,,,Often,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Pakistan,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Podcasts,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,,Very useful,,,,,Very useful,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Denmark,34,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,C/C++,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,<1MB,Other,"C/C++,MATLAB/Octave,Python,R,SQL,Tableau",,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Rarely,,,Rarely,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,15,5,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others",,Most of the time,Most of the time,,Often,Sometimes,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,450000,DKK,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses",,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",50,20,5,0,25,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Spain,40,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Python,,C/C++/C#,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,,Github Portfolio,No,Doctoral degree,Mathematics or statistics,I don't write code to analyze data,Predictive Modeler,"Online courses (coursera, udemy, edx, etc.)",40,20,0,40,0,0,Time Series,,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,F#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Podcasts,Textbook",,Somewhat useful,,,,,Very useful,,,,,,Somewhat useful,,Very useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Other,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,60,20,10,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",Primary/elementary school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Denmark,39,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,A humanities discipline,6 to 10 years,Other,University courses,50,NA,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer",Self-taught,30,60,10,0,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,23,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Biology,3 to 5 years,DBA/Database Engineer,Self-taught,50,0,40,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",High school,Technology,"5,000 to 9,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,Often,,,,"Collaborative Filtering,Data Visualization,Prescriptive Modeling",,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,20,50,4,25,1,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues",Sometimes,,,,Most of the time,,,,Sometimes,,,,Sometimes,,Most of the time,,Often,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,25000,GBP,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Friends network,Online courses",Somewhat useful,Very useful,Very useful,,,Very useful,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Time Series,"Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Other",Most of the time,100MB,"Bayesian Techniques,Markov Logic Networks,SVMs","Amazon Web services,C/C++,MATLAB/Octave,Python,TensorFlow",,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,HMMs,Naive Bayes,Simulation,SVMs",,,,,,Sometimes,,,,,,,Often,,,,,Sometimes,,,,,,,,,Sometimes,Often,,,,,,60,20,0,20,0,0,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,10-25% of projects,Entirely internal,Standalone Team,,,Other,,,,,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Italy,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,20,20,10,30,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"1,000 to 4,999 employees",,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Random Forests",Sometimes,,,,Often,Sometimes,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,20,10,10,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,Sometimes,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,,,,,,,Git,,40000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Russia,45,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,1 to 2 years,"Data Analyst,Data Scientist,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,30,0,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Financial,"5,000 to 9,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Always,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",Often,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Sometimes,,Sometimes,,,Sometimes,,,,,,,,,,Sometimes,Rarely,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,"Map data from Google, Yandex, Openstreetmap",Size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Never,30000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,South Korea,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Statistician",University courses,10,NA,50,0,10,30,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Financial,500 to 999 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Traditional Workstation",Relational data,Sometimes,,"Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,IBM SPSS Modeler,Oracle Data Mining/ Oracle R Enterprise,Orange,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,DBA/Database Engineer,Work,10,60,20,10,0,0,Natural Language Processing,Markov Logic Networks,,Internet-based,"5,000 to 9,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Sometimes,100GB,CNNs,"Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,Rarely,,,,,,"A/B Testing,Time Series Analysis",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,10,20,20,20,10,20,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,10-25% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Other,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,40,30,0,30,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,I don't know,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,,"Text data,Relational data",Rarely,100MB,"Decision Trees,SVMs","C/C++,Java,Microsoft SQL Server Data Mining,Python,SQL",,,,Often,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"Natural Language Processing,SVMs",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,30,25,15,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,0,70,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics",,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,,20,30,50,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,"I work in a social network, I have everything I need",,Key-value store (e.g. Redis/Riak),Company Developed Platform,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by college or university,NoSQL,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Tutoring/mentoring",,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Decision Trees,Random Forests","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,Often,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation",,,,,,Often,Most of the time,Often,,,,,,,,Sometimes,,,,,,,Most of the time,,,,Most of the time,,,,,,,60,30,4,5,1,0,Enough to run the code / standard library,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,120000,HUF,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Deep learning,R,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,O'Reilly Data Newsletter,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,Natural Language Processing,Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst",Self-taught,25,0,50,0,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,,,,,Often,,,,,,,,Often,,,,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Somewhat useful,,,,,Somewhat useful,,,Not Useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Other",Self-taught,100,0,0,0,0,0,Survival Analysis,Logistic Regression,High school,Insurance,"10,000 or more employees",Stayed the same,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,100MB,,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,Sometimes,,,,,,,,,,"Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools",Most of the time,Often,,,Most of the time,Often,,,Often,,,,Often,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,Data quality ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,90000,,Has decreased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Other,38,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Cluster Analysis,Stata,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,5-10 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Nice to have,,Nice to have,Nice to have,,,edX,Laptop or Workstation and local IT supported servers,0 - 1 hour,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,Data Analyst,University courses,5,30,40,20,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,61,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Researcher,Statistician",University courses,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,Rarely,,,,,Sometimes,,Sometimes,Sometimes,,,,,Often,,,,Rarely,,,,,,Rarely,,,,Most of the time,,Often,,,,,Sometimes,Sometimes,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Rarely,,,Often,,Sometimes,,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,Often,Sometimes,,,,,,Rarely,,Sometimes,Often,,,,50,40,0,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,Most of the time,,Sometimes,,Most of the time,,,,,,,Most of the time,Sometimes,Often,,Often,,26-50% of projects,Entirely internal,Standalone Team,Fiscal data; oficial statistics data,Privacy,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,72000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,Python,Other,"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,33,34,0,0,33,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,R,SQL,Tableau,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,Sometimes,,,,,,Most of the time,Often,Often,,,,,,,,,Most of the time,,,Often,Often,,,,,,"A/B Testing,Association Rules,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",Often,Rarely,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,Most of the time,,,Sometimes,Often,,,,40,30,15,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Other",,,,,Sometimes,,,,,Often,,,,,,,,,,,,Most of the time,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Other,,44000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Vietnam,23,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,Recommendation Engines,"Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Sometimes,1GB,Markov Logic Networks,"C/C++,Java,Python,R",,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"GANs,Naive Bayes",,,,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Impala,Link Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,70,20,0,5,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)",,High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Malaysia,29,Employed part-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,0,NA,20,20,0,Time Series,"Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,Fewer than 10 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data,Relational data",Never,1GB,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,63,Retired,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,More than 10 years,"Data Analyst,DBA/Database Engineer",Work,50,0,50,0,0,0,Time Series,Decision Trees - Random Forests,I prefer not to answer,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Software Developer/Software Engineer",University courses,35,15,20,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1TB,Ensemble Methods,"Amazon Web services,Cloudera,Hadoop/Hive/Pig,Spark / MLlib",,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,"Data Visualization,kNN and Other Clustering",,,,,,,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,50,30,10,5,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",40,0,15,40,5,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,Often,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs",,,,Sometimes,,Often,Most of the time,,,,,,,,,Sometimes,,Sometimes,Often,Most of the time,,,Often,,Often,,,Sometimes,,,,,,50,30,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,76-99% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,69,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Predictive Modeler,Statistician,Other",Self-taught,80,0,20,0,0,0,Time Series,"Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Friends network,Kaggle,Personal Projects",Very useful,,,,,Somewhat useful,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,50,5,10,5,30,0,"Reinforcement learning,Unsupervised Learning","Ensemble Methods,Gradient Boosting,Neural Networks - CNNs",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Sometimes,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,R,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,,,,,,,,,,,,,,,Most of the time,,,Often,,,,Often,,,,,,Most of the time,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,Often,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Git,,70000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,34,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Deep learning,Python,GitHub,"Kaggle,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Yes,Doctoral degree,Computer Science,,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Portugal,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"Partially Derivative Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,20,10,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Image data,Rarely,1GB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Most of the time,,,Often,Most of the time,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,Sometimes,,,,30,20,20,20,10,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,More internal than external,Other,E.g. OpenFMRI,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Work,30,15,50,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Python,R,SAP BusinessObjects Predictive Analytics,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,Most of the time,,,,,,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Prescriptive Modeling,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,Rarely,,,Rarely,Often,Rarely,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,Often,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Often,,Often,,Often,,100% of projects,Do not know,Central Insights Team,,Privacy regulation ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,New Zealand,55,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),40+,Experience from work in a company related to ML,Yes,Master's degree,Biology,,"Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Spain,43,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,Python,Bayesian Methods,,Google Search,"Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),3 to 5 years,Researcher,Self-taught,70,30,0,0,0,0,Survival Analysis,Logistic Regression,A professional degree,Academic,"10,000 or more employees",,Don't know,Some other way,Somewhat important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,,"Evolutionary Approaches,Regression/Logistic Regression,SVMs","Perl,R,SAS Base,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,Sometimes,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,Sometimes,Often,,,Rarely,,,,,,Often,,,,,Often,,,,,,,Sometimes,,,,,,20,20,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others",,,,,Sometimes,Often,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Sometimes,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Norway,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Programmer,Self-taught,70,10,0,20,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Increased slightly,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Text data,Sometimes,1GB,RNNs,"Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Often,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Time Series Analysis",,,,,Often,Often,,Sometimes,,,,,,Sometimes,,Rarely,,Rarely,Most of the time,Often,Most of the time,,,Most of the time,,,,Sometimes,,Sometimes,,,,65,20,2,8,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Limitations of tools,Privacy issues",,,,,Most of the time,,,,,,Sometimes,,Rarely,,,,Sometimes,,,,,,Less than 10% of projects,More external than internal,IT Department,Twitter;small company data,clean data and data sparsity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Rarely,"250,000",NOK,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Nigeria,31,Employed full-time,,,No,Yes,Other,Perfectly,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,Germany,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Most of the time,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,SQL",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,,,Most of the time,,,,,,,Often,,,,,,,Often,,,,5,50,15,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,Often,,Sometimes,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,,Commercial Data Platform,,Git,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Conferences,Online courses,YouTube Videos",Very useful,,,,Very useful,,,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,"edX,Udacity",Other,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Podcasts,Stack Overflow Q&A",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,,,,,"Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Other,Self-taught,80,15,5,0,0,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Other","Image data,Text data,Relational data",Rarely,100GB,"Bayesian Techniques,CNNs,HMMs,Neural Networks","Amazon Web services,Jupyter notebooks,Mathematica,Perl,Python,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,Often,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,Rarely,Sometimes,,Most of the time,,,,"Bayesian Techniques,CNNs,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,Text Analytics",,,Often,Often,,,Most of the time,,,,,,Often,Often,,Sometimes,,,,Often,,,,,,,,,Rarely,,,,,20,30,20,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Often,,,,,,Most of the time,,,,Sometimes,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,"95,700",,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Norway,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,0,20,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","Python,R,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Sometimes,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,70,0,2,20,8,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Most of the time,,,,,,Most of the time,,,,,,,Often,,,,,Sometimes,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,730000,NOK,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Taiwan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Association Rules,R,Government website,"Official documentation,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Stories Podcast",1-2 years,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Management information systems,,Data Miner,Kaggle competitions,NA,NA,NA,NA,NA,NA,Recommendation Engines,"Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Chile,58,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Statistician",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,,Somewhat important,Somewhat important,,,Somewhat important +Male,Russia,26,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,Very useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,"Programmer,Researcher,Statistician,I haven't started working yet",Other,15,10,0,5,20,50,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important +Female,France,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,44,"Not employed, but looking for work",,,,,,,,Amazon Web services,Anomaly Detection,Python,Government website,"Conferences,Official documentation,Online courses,YouTube Videos",,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Yes,Professional degree,,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,20,10,50,20,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,27,Employed part-time,,,Yes,,Computer Scientist,,Employed by college or university,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Elixir Newsletter,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,University courses,0,0,20,80,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,,"CNNs,GANs,Neural Networks,RNNs,SVMs","C/C++,Perl,Python,R,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,GANs,HMMs,Logistic Regression,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Most of the time,,,,,,,Often,,Most of the time,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,40,60,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,300000,SAR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United Kingdom,60,Employed full-time,,,Yes,,Other,Perfectly,"Employed by college or university,Employed by non-profit or NGO",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,More than 10 years,"Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,24,20,3,3,10,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Google Cloud Compute,KNIME (free version),Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow,Other",,Rarely,,,Rarely,,,Rarely,,,,,,,,,,,Sometimes,,,,Most of the time,Rarely,Sometimes,,Rarely,,,,Sometimes,,Often,,Rarely,,,,,,,Often,,,Most of the time,Sometimes,,,Most of the time,,,"Bayesian Techniques,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,,Most of the time,Often,,,,,,,Sometimes,Often,,,,,,,Often,Often,,Sometimes,,,Often,Often,,,,40,10,5,20,15,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,Sometimes,,Sometimes,Often,,,,Sometimes,,Most of the time,,Sometimes,,,,Sometimes,,100% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Data warehouse,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,55000,GBP,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,Software Developer/Software Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,45,0,0,5,0,Recommendation Engines,Neural Networks - CNNs,Primary/elementary school,Retail,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Never,1GB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,Sometimes,Most of the time,Most of the time,Most of the time,Often,,,,,,Often,,Often,,Often,,Often,Sometimes,,Often,Often,,,,Often,,,,,,50,15,15,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,Often,Often,Most of the time,,,Often,,,Often,,,,,Often,Sometimes,,,Most of the time,Often,,10-25% of projects,Do not know,Business Department,Government ,Get insights ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Hong Kong,28,Employed full-time,,,Yes,,Data Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Programmer",Self-taught,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Machina Newsletter,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,25,25,0,0,25,"Recommendation Engines,Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Italy,54,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Cloudera,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,,1-2 years,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,Self-taught,80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Bayesian Techniques,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,24,"Not employed, but looking for work",,,,,,,,TensorFlow,,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Physics,3 to 5 years,"Engineer,Programmer,Researcher",Self-taught,70,0,0,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,,,,,,,,,,,,,,,, +Male,Other,44,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by company that makes advanced analytic software,Other,Anomaly Detection,Python,Other,"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer",Self-taught,60,40,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",High school,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Flume,Hadoop/Hive/Pig,Java,Python,Spark / MLlib,Tableau,Other,Other",,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Sometimes,,,,,Most of the time,Most of the time,"Data Visualization,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,Often,,,,Often,,Often,,Most of the time,,,Most of the time,Often,,,,,Most of the time,Most of the time,,,,65,15,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,,Often,Often,,,,Sometimes,,,Often,Often,,76-99% of projects,Entirely internal,Other,telecom data; retail customer data,"find an a method that will answer the proposed question, and then validate it","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Other",Sometimes,180000,HRK,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,France,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Official documentation,Online courses,Textbook",Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Engineer,Researcher",University courses,10,10,20,60,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,Tableau,TensorFlow",,,,Rarely,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Often,Often,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,Often,,Often,,,Often,,Often,,Most of the time,,,,30,40,0,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,Often,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,Often,,,,,Often,,Often,,10-25% of projects,More internal than external,,flight stats,Legal aspects associated to it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Most of the time,53000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Business Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Other,94,Retired,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Official documentation,Online courses,Personal Projects",,,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,5,5,40,40,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",,Technology,I prefer not to answer,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,Often,,,,,,Often,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,Often,Often,,,,Often,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Text Analytics",Often,Sometimes,Sometimes,,Often,,Most of the time,,Sometimes,,,,,Sometimes,,Most of the time,,Rarely,,,Most of the time,,,,,,,,Most of the time,,,,,20,30,30,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Often,,Often,Often,Often,Often,,,Often,,,,,Often,,,Often,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,Git,Never,,,,5,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,,University courses,30,20,10,10,10,20,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Stayed the same,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,28,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,R,Google Search,"College/University,Online courses,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,0,40,0,60,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Logistic Regression,A doctoral degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,36,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Business Analyst,Programmer,Researcher,Software Developer/Software Engineer",Kaggle competitions,0,50,0,0,50,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Academic,20 to 99 employees,Stayed the same,Less than one year,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Minitab,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests,Text Analytics",,Most of the time,,,,,Most of the time,Often,,,,,,,,,,,,,,Often,Often,,,,,,Most of the time,,,,,20,25,20,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,Sometimes,,,Often,,,Often,,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,kaggle,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,480000,INR,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,0,5,5,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,28,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Predictive Modeler,Statistician",University courses,25,5,25,40,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks",High school,CRM/Marketing,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,Jupyter notebooks,Python,R,Tableau",,Most of the time,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Rarely,,,,,,Most of the time,Sometimes,,,,,,Rarely,,,Often,Rarely,,,Rarely,,Sometimes,,,Sometimes,Sometimes,,,Most of the time,,,,55,20,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,,,Often,,,,,,,,,Sometimes,,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,loyalty programs; credit scores; census data; shapefiles; crm data,It comes from many different sources in many different formats. Difficulty in merging it together to automate/standardize processes.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,135000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,42,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,"GitHub,University/Non-profit research group websites","Company internal community,Kaggle,Online courses,Stack Overflow Q&A",,,,Somewhat useful,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Analyst,Predictive Modeler",Work,25,30,30,15,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs",High school,Technology,"10,000 or more employees",Decreased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Neural Networks,Regression/Logistic Regression,Other","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL",,,,,Rarely,,,,Often,,,,,,,,Often,,,,Often,Rarely,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,Often,,Often,,,,Often,Most of the time,,,,,,Often,,,Sometimes,,,,20,20,10,30,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,IT developments to make the platform available to users. ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,"Git,Subversion",Sometimes,64000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Taiwan,33,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,,Nice to have,Necessary,,Nice to have,,,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",,Master's degree,No,Master's degree,Mathematics or statistics,,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Business Analyst,,,Python,Neural Nets,R,,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,"Business Analyst,Researcher",Work,50,0,20,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,QlikView,R,RapidMiner (commercial version)",,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,Often,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Neural Networks",Sometimes,,,,,Often,Most of the time,Sometimes,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,10,30,20,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Sometimes,,,Sometimes,Sometimes,,,Often,,,,,,,Sometimes,,,,,,,76-99% of projects,More external than internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,,Rarely,18000,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,45,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,Association Rules,F#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Stack Overflow Q&A",Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Predictive Modeler,Statistician",University courses,10,10,40,20,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Traditional Workstation","Image data,Text data,Relational data",Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Java,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Rarely,,,Often,,,,,,,,,,Often,,,,,,Most of the time,Often,,,,Sometimes,,Rarely,Often,,,,40,10,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Rarely,Sometimes,,,Sometimes,Rarely,,,Sometimes,,,Rarely,,,,Rarely,,,,,Sometimes,,51-75% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Turkey,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Genetic & Evolutionary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",Somewhat useful,Not Useful,,,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,"Data Machina Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer",University courses,15,5,0,80,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data,Other",Rarely,100GB,,"Amazon Machine Learning,Amazon Web services,Flume,Hadoop/Hive/Pig,IBM SPSS Statistics,Jupyter notebooks,NoSQL,Python,QlikView,RapidMiner (free version),SAP BusinessObjects Predictive Analytics,Statistica (Quest/Dell-formerly Statsoft),Tableau",Rarely,Rarely,,,,,Rarely,,Rarely,,,Rarely,,,,,Rarely,,,,,,,,,,Rarely,,,,Rarely,Rarely,,,Rarely,,Sometimes,,,,,,,Rarely,Rarely,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Rarely,Rarely,Rarely,,,Rarely,Most of the time,Rarely,,,,,,Rarely,,,,Rarely,,,Rarely,,Rarely,Rarely,,,,,,,,,,5,5,20,50,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues",,,,Often,Often,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,4450,TRY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Sweden,38,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,Self-taught,40,30,20,10,0,0,"Computer Vision,Recommendation Engines","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Rarely,1GB,"CNNs,Neural Networks,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Neural Networks,Recommender Systems,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,27,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Monte Carlo Methods,R,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Data Stories Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Other,,"Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,"Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important +Female,Spain,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Cluster Analysis,SQL,Government website,"Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,Less than a year,Other,University courses,30,40,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,"1,000 to 4,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Minitab,Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,Rarely,Often,,Rarely,,Sometimes,,,,,Most of the time,,,,,Rarely,Rarely,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation",Sometimes,Sometimes,Sometimes,,,Sometimes,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,,,,,,,20,10,10,30,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,"Opendata, facebook, twitter",,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,40000,EUR,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, but looking for work",,,,,,,,Amazon Web services,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,Very useful,,KDnuggets Blog,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +A different identity,Other,99,Employed full-time,,,Yes,,Other,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by government",MATLAB/Octave,Deep learning,SQL,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",0,80,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Bayesian Techniques,,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Never,1GB,"Bayesian Techniques,Random Forests","Hadoop/Hive/Pig,Java,Python,R,SQL,Unix shell / awk",,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Naive Bayes,Random Forests,Recommender Systems",,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,Often,Often,,,,,,,,,,70,10,10,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",Most of the time,,,,Often,,,Often,,,,,,,Most of the time,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,15000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TIBCO Spotfire,I don't plan on learning a new ML/DS method,Other,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,"10,000 or more employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Other,Other,Other,Never,,Other,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,60,20,0,15,5,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,Most of the time,,,,Most of the time,Sometimes,Most of the time,,Most of the time,,,,,,,Most of the time,,Most of the time,26-50% of projects,Entirely internal,Business Department,n/a,formatting,Other,Share Drive/SharePoint,,Subversion,Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Israel,27,Employed part-time,,,Yes,,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,NA,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Python,GitHub,Conferences,,,,,Very useful,,,,,,,,,,,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Laptop or Workstation and local IT supported servers,,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,30,0,10,0,40,Computer Vision,Gradient Boosting,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Spain,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,Less than a year,Researcher,Self-taught,15,0,70,0,15,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,Less than one year,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Bayesian Techniques,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,Sometimes,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,SVMs",,,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,10,20,50,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,Sometimes,,,26-50% of projects,More internal than external,IT Department,,Inconsistent semantics,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"Periscope, python notebooks",Git,Sometimes,45000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Israel,42,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,Very useful,Very useful,Somewhat useful,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,,1 to 2 years,I haven't started working yet,Self-taught,50,0,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Male,India,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,Engineer,University courses,20,30,0,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,,,,Often,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Often,,,,,,Often,,Often,,Often,,,,Often,Often,,,Often,,,Often,Often,,,,20,30,20,20,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",,Sometimes,,,,Most of the time,,,Often,,Most of the time,,,,,,,,,,,,76-99% of projects,,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"1,050,000",INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,R,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,50,20,30,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important +Male,Turkey,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Engineer,Self-taught,75,20,5,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,10MB,Regression/Logistic Regression,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Naive Bayes",,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,0,10,10,40,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,More internal than external,IT Department,openstreetmap,Initial understanding of data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Never,54000,TRY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Netherlands,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Conferences,Friends network,Kaggle,Official documentation,Podcasts,YouTube Videos",,,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,0,0,40,20,0,"Natural Language Processing,Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,Python,Other,Other,,,,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Analyst,Work,0,0,100,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Gradient Boosted Machines,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,Often,,Rarely,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Logistic Regression,Text Analytics",Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,15,5,12.5,12.5,25,30,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,Often,,,Often,,,Often,Often,,Often,Often,,Sometimes,,,Rarely,,Sometimes,Sometimes,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,67000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Japan,31,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important +Male,Netherlands,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer",University courses,30,0,20,30,20,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,France,24,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,Text Mining,R,University/Non-profit research group websites,"Blogs,College/University,Online courses,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,Very useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,20,0,0,80,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Never,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests",,,,,,,,Often,Often,,,Often,,,,,,,,,,,Often,,,,,,,,,,,0,100,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,Often,,,Often,,,,Often,,,,Often,,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Canada,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,Other",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,25,45,30,0,0,0,Supervised Machine Learning (Tabular Data),,"Some college/university study, no bachelor's degree",Academic,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,,,"Amazon Web services,Mathematica,MATLAB/Octave,NoSQL,Python,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,,,,"Data Visualization,Logistic Regression,SVMs",,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,60,20,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations of tools",,Often,,,Often,,,,,,,,Often,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (free version),Association Rules,Haskell,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,"Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,30,10,29,1,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Academic,,,,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Always,,"Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests","C/C++,Java,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"Data Visualization,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,Segmentation,Simulation",,,,,,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,,,,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of significant domain expert input,Limitations of tools,Privacy issues",,,,,Sometimes,,,,,,Sometimes,,Often,,,,Most of the time,,,,,,26-50% of projects,Do not know,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",,,EUR,,8,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer",Kaggle competitions,40,40,10,10,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Unsupervised Learning","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Decreased slightly,1-2 years,A tech-specific job board,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,,,"Amazon Machine Learning,NoSQL,Python,R,TensorFlow",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Logistic Regression,Naive Bayes,Natural Language Processing",,,,,,,,,,,,,,,,Often,,Sometimes,Often,,,,,,,,,,,,,,,30,20,10,20,20,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,More external than internal,,,,,,,,,480000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Denmark,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,0,0,90,10,0,0,"Time Series,Unsupervised Learning",Hidden Markov Models HMMs,A bachelor's degree,Academic,20 to 99 employees,Stayed the same,Don't know,Some other way,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Relational data,Sometimes,100MB,HMMs,"C/C++,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"HMMs,Time Series Analysis",,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,5,80,10,5,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,Most of the time,,,,,,,Often,,,Less than 10% of projects,Do not know,Other,,,Other,I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,120000,DKK,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",,Telecommunications,I don't know,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Video data,Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,< 1 year,,,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Other,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Not important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Turkey,25,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,KDnuggets Blog,1-2 years,Nice to have,,Nice to have,,Necessary,Necessary,,Nice to have,,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,University courses,40,0,0,60,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,Pakistan,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,15,20,20,40,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",High school,Technology,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,TensorFlow",Sometimes,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,,,,,,Often,,,,,,Sometimes,,,Often,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,600000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Poland,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,40,10,30,15,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data,Text data,Relational data",Sometimes,100GB,"Ensemble Methods,Neural Networks,Random Forests","Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Data Visualization,PCA and Dimensionality Reduction,Recommender Systems",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,40,10,10,20,20,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,Often,,,,,,Sometimes,,,,,Sometimes,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git,Mercurial",Sometimes,,PLN,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Researcher,Statistician",Self-taught,50,0,30,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Government,20 to 99 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,,,"C/C++,IBM SPSS Statistics,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Logistic Regression,Time Series Analysis",,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools",Sometimes,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Newsletters,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,,,"FlowingData Blog,KDnuggets Blog",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,20,40,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,Hungary,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Online courses,Podcasts",Very useful,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,Very useful,,,,,,"Linear Digressions Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",40+,Online Courses and Certifications,Sort of (Explain more),Master's degree,,,Other,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,R,Monte Carlo Methods,R,University/Non-profit research group websites,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,University courses,10,15,0,75,0,0,Computer Vision,"Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,25,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Bayesian Methods,SQL,GitHub,"College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,"Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter",< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,I never declared a major,Less than a year,"Programmer,Software Developer/Software Engineer",Work,10,40,0,50,0,NA,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Sweden,53,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Never,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,Sometimes,Sometimes,,Often,,,,"A/B Testing,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,20,0,70,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Markov Logic Networks,Support Vector Machines (SVMs)",Primary/elementary school,Government,100 to 499 employees,Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,47,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,50,15,10,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,100 to 499 employees,Decreased slightly,1-2 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,,,"Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Monte Carlo Methods,,GitHub,"Friends network,Textbook",,,,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Work,0,0,40,0,60,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Neural Networks - RNNs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,"Ensemble Methods,GANs","Amazon Machine Learning,C/C++",Often,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Prescriptive Modeling",,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,0,0,100,0,0,0,,"Explaining data science to others,Lack of significant domain expert input",,,,,,Often,,,,,Often,,,,,,,,,,,,10-25% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Other,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Stack Overflow Q&A,Other",Very useful,,,,Very useful,,,,,,,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,60,20,0,20,0,0,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation","Image data,Other",Rarely,10GB,"CNNs,Neural Networks,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Java,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau,TensorFlow,Other,Other",,Sometimes,,Sometimes,,,,Rarely,,,,,,,Rarely,Rarely,Often,,,,Sometimes,,,,Rarely,,Often,,,,Most of the time,,Rarely,,,,,,,,,Often,,,Most of the time,Often,,,Sometimes,Most of the time,,"CNNs,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,Most of the time,,,,,Sometimes,,,,,Often,,,,,,Most of the time,Sometimes,,Often,,,Most of the time,,Often,,,,,,10,30,20,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Often,,,,,Often,,Often,,,,,,Most of the time,,,,10-25% of projects,Entirely internal,Standalone Team,UAVSAR; LandSAR; Sentinel,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Commercial Data Platform,,Other,Rarely,336000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,New Zealand,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,,University courses,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,Python,Google Search,"Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,20,10,20,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Telecommunications,500 to 999 employees,Increased slightly,1-2 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,100TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests","NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests",Often,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,Sometimes,,Sometimes,Often,,Sometimes,,,,,,,,,,,0,10,10,40,40,0,Enough to refine and innovate on the algorithm,"Dirty data,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,76-99% of projects,More external than internal,IT Department,,,Key-value store (e.g. Redis/Riak),Commercial Data Platform,,Subversion,Rarely,7500,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Spain,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Google Search,"Blogs,College/University,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,,,,Somewhat useful,,Somewhat useful,,Not Useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,,,,,,,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Female,Belarus,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Programmer,Self-taught,50,30,10,10,0,0,Computer Vision,"Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Vietnam,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by government,R,Deep learning,SQL,GitHub,"Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,0,0,10,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,39,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,60,0,30,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Never,<1MB,Regression/Logistic Regression,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Rarely,,,,90,5,0,3,2,0,Enough to tune the parameters properly,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,110000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Spain,51,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,Python,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Podcasts,Stack Overflow Q&A,Trade book",,Very useful,,,,,Very useful,,,,,,Somewhat useful,Very useful,,Very useful,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Workstation + Cloud service,0 - 1 hour,Kaggle Competitions,Yes,Doctoral degree,,1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Adversarial Learning,"Bayesian Techniques,Logistic Regression",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,New Zealand,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,MATLAB/Octave,Time Series Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,10,30,0,0,Time Series,"Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",,Academic,10 to 19 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Never,1GB,"HMMs,Neural Networks,RNNs","C/C++,Java,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,30,40,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations of tools,Privacy issues",Often,Sometimes,,,Often,,,,,,Most of the time,,Often,,,,Often,,,,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,25000,NZD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,France,24,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,1 to 2 years,"Data Miner,Programmer,Researcher",University courses,60,10,0,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Personal Projects",,Very useful,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,,,,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Software Developer/Software Engineer,University courses,30,0,0,60,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,Mix of fields,500 to 999 employees,Increased slightly,6-10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Relational data,Other",Sometimes,1GB,"Gradient Boosted Machines,Neural Networks","C/C++,Java,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",Often,,,,Rarely,,,,Sometimes,,,Often,,,,Often,,,,Often,Rarely,,,,,,,,,,,,,20,50,30,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,merge different sources,"Column-oriented relational (e.g. KDB/MariaDB),Other",I don't typically share data,,Git,Sometimes,100000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Other,22,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,20,50,0,0,30,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,Rarely,,,,,,Often,,,,"Cross-Validation,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,,,,Often,,,,,,,,,,Often,,Sometimes,,,Often,,Often,,,Rarely,Sometimes,Sometimes,,,,,,30,30,10,20,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT",,,,Often,Often,,,,,,,,,,Sometimes,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,Sometimes,45000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,Becoming a Data Scientist Podcast,1-2 years,,,Necessary,,Necessary,Necessary,,Necessary,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,0,30,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Portugal,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,,,High school,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Stan,Other,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Friends network,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Miner,Data Scientist,Engineer,Programmer",Self-taught,80,0,15,0,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Rarely,,,,Rarely,Often,,,,,,Often,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics",Often,Sometimes,Often,Often,Often,Sometimes,Often,Often,Sometimes,,,Often,,Often,,Often,,,Often,Often,Sometimes,Rarely,Sometimes,Often,Often,,Often,Sometimes,Sometimes,,,,,30,25,20,10,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,Most of the time,,,Most of the time,Often,,,,Often,Most of the time,,,Sometimes,,,Rarely,,10-25% of projects,More internal than external,IT Department,,not enough labeled data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,36,JOD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Singapore,37,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,,"Business Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Online courses",Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,,,Very useful,,,,,,,,"FastML Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,PhD,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Programmer,University courses,80,15,0,0,5,0,Adversarial Learning,"Decision Trees - Random Forests,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,16-20,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,21,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Very useful,Very useful,,,,,,,,Somewhat useful,,Very useful,Very useful,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Female,Hungary,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,35,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,"Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,Researcher,University courses,10,30,50,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Other,"1,000 to 4,999 employees",Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Sometimes,Most of the time,Sometimes,,,,,,Often,,Sometimes,,,,,Often,,Sometimes,,,,Often,,,,,,,50,10,10,30,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT",Most of the time,,,,Rarely,,,,,,,,,,Often,,,,,,,,100% of projects,Entirely internal,Standalone Team,TCGA; COSMIC;,"n<<p, unlabelled data",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Other",SFTP,Git,Most of the time,30000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Netherlands,46,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Very useful,,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Germany,61,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,53,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,GitHub,"Kaggle,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,Very useful,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,I never declared a major,I don't write code to analyze data,Other,University courses,20,10,0,50,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Switzerland,66,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,16,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Conferences,Personal Projects,Textbook",,Very useful,,,Somewhat useful,,,,,,,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Programmer",Work,30,10,60,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),NoSQL,Python,SQL,Tableau,Other",,,,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,Often,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Text Analytics",,,,,Often,Often,Most of the time,Most of the time,,,,,,Often,,Most of the time,,Most of the time,Often,Sometimes,Often,,Most of the time,,Sometimes,Often,,,Often,,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,,,,,Sometimes,,,,,Often,,,100% of projects,More internal than external,Business Department,"Swiss Government (Geo, Health, Statistics)",Data Quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,CHF,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,10,0,0,10,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,500 to 999 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,RapidMiner (free version),SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,Most of the time,,,,Sometimes,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,SVMs,Text Analytics",Sometimes,Most of the time,,,,Sometimes,Often,,,,,,,Often,,Often,,Sometimes,,,Sometimes,Sometimes,,Sometimes,,,,Sometimes,Often,,,,,20,20,20,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Sometimes,Often,Sometimes,,,,,Sometimes,Often,,,Sometimes,,,51-75% of projects,Approximately half internal and half external,Business Department,ebusiness;communication,privacy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,200000,CNY,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Stata,Government website,"Online courses,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,6 to 10 years,Data Analyst,Self-taught,40,0,10,50,0,0,Time Series,Logistic Regression,A master's degree,Insurance,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Very important,Other,Traditional Workstation,"Text data,Relational data",Rarely,,Regression/Logistic Regression,"Perl,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",Sometimes,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,Sometimes,,,,75,5,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Often,,,Often,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Most of the time,90000,USD,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Italy,22,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,Somewhat useful,,Very useful,Very useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,,University courses,0,20,40,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,I don't know,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Git,Other",,10000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Germany,24,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,20,70,0,5,5,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,10MB,Neural Networks,"NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"CNNs,Data Visualization,Prescriptive Modeling,RNNs,Text Analytics",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,Often,,,,Sometimes,,,,,20,25,10,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,Sometimes,,,,Often,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Self-employed,Other,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Somewhat useful,Not Useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,Not Useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,75,0,20,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches",High school,Mix of fields,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,Rarely,Often,,Rarely,,,,,,Sometimes,Rarely,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,Often,,,,"Data Visualization,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,20,20,0,40,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Often,Often,,,Most of the time,Often,,Often,Sometimes,Most of the time,Often,,,,Sometimes,Sometimes,,,,Sometimes,Most of the time,Most of the time,51-75% of projects,Do not know,Other,"data.gov.au, data.vic.gov.au - multiple datasets, Aurin. Australian Bureau of Statistics, scrapes from yellow pages",Time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,100000,AUD,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Survival Analysis,Python,GitHub,"College/University,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,10,5,25,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Always,10TB,"CNNs,Decision Trees,Regression/Logistic Regression","Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Tableau,TensorFlow",,,,,,,,Often,Most of the time,,,,,,,,Most of the time,,,,,Often,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,,,,Often,Most of the time,,,,,,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,Time Series Analysis",,,Most of the time,Most of the time,,,,Often,,,,Most of the time,,,,Often,,,Sometimes,Most of the time,,,,Most of the time,,,,,,Often,,,,25,35,30,5,5,0,Enough to tune the parameters properly,"Explaining data science to others,Privacy issues,Scaling data science solution up to full database",,,,,,Most of the time,,,,,,,,,,,Sometimes,Often,,,,,10-25% of projects,More internal than external,Other,Kaggle,Implementation for Production ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,852000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Africa,26,"Not employed, but looking for work",,,,,,,,,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer",University courses,0,0,0,100,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Other,54,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher",Work,80,0,10,10,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests",,,100 to 499 employees,Stayed the same,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,Other,"Text data,Relational data,Other",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests","C/C++,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),Spark / MLlib,TensorFlow",,,,Often,,,,,,,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,,,,Often,,Often,,Sometimes,,,,,,Sometimes,,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Random Forests,Segmentation,Time Series Analysis",,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,,,,Most of the time,,,,20,50,10,15,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,Most of the time,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Researcher",Self-taught,50,30,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased slightly,Less than one year,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Relational data,Sometimes,,"Ensemble Methods,Neural Networks,Random Forests","Jupyter notebooks,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,,Most of the time,,,,,,,,,,,,Sometimes,Sometimes,Often,,Most of the time,,,,,,,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,Kaggle; Cities open data,Every city have a different format for its data,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Never,47000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Hungary,27,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",University courses,20,10,0,50,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,42,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,SQL,,"Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,75,10,15,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,Telecommunications,"5,000 to 9,999 employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,,"Hadoop/Hive/Pig,Java,Perl,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Sometimes,Often,Sometimes,,,,,,,,Rarely,Most of the time,,,,Rarely,,Often,,,,"Logistic Regression,Natural Language Processing,Time Series Analysis",,,,,,,,,,,,,,,,Rarely,,,Rarely,,,,,,,,,,,Most of the time,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,Often,,Sometimes,Most of the time,,Sometimes,Sometimes,Sometimes,,,Often,Most of the time,Often,,,Sometimes,Most of the time,,Most of the time,,76-99% of projects,More internal than external,Other,Weather data; public transport,Acquisition; adequat processing environment other than a relational database,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Bitbucket,Other",Sometimes,68000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Researcher,Other",Self-taught,30,20,50,0,0,0,"Natural Language Processing,Recommendation Engines,Time Series",,"Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Operations Research Practitioner,Researcher,Other",University courses,32,0,30,33,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,,Relational data,Don't know,,,"Amazon Web services,Jupyter notebooks,Python,R,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,Mathematica,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,30,20,0,0,0,"Machine Translation,Natural Language Processing,Reinforcement learning,Speech Recognition,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,62,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Russia,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,20,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Sometimes,,"Decision Trees,Ensemble Methods,Random Forests,SVMs,Other","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,,,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Rarely,Sometimes,,Most of the time,Often,Often,,,,Often,,Sometimes,,Sometimes,,,,,Often,,Often,,,,,Often,,Often,,,,60,20,10,10,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,Amazon Machine Learning,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,60,15,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,Often,Rarely,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,Often,Rarely,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Simulation,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Rarely,,Rarely,,,,,Rarely,,,,Sometimes,,,Sometimes,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Often,,,,Often,Often,,,Often,,,,,,Most of the time,,Often,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,17,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,RapidMiner (free version),Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,R,RapidMiner (free version),SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,Rarely,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Prescriptive Modeling,Recommender Systems,Segmentation,Time Series Analysis",Often,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,Rarely,,Often,,,,Rarely,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,Most of the time,,Often,,,,,Most of the time,,Most of the time,,Rarely,,,Less than 10% of projects,Entirely internal,Central Insights Team,Opendata,Acquiring the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Most of the time,40000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Cluster Analysis,Other,"Google Search,University/Non-profit research group websites","College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Female,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,,R,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Data Machina Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,,No,Bachelor's degree,Computer Science,,"Data Analyst,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",15,15,15,20,15,20,Machine Translation,Decision Trees - Gradient Boosted Machines,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Africa,37,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Proprietary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Predictive Modeler,Programmer,Statistician",Work,15,5,60,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Retail,10 to 19 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","SAS Base,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",Sometimes,Often,,,,,Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,15,20,5,30,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,Census,None,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Rarely,600000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,Other",Very useful,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,10,10,60,10,0,10,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Julia,Jupyter notebooks,Mathematica,Microsoft Excel Data Mining,Python,R,SQL,Unix shell / awk,Other,Other",,,,Rarely,,,,,,,Sometimes,Sometimes,,,,Sometimes,Most of the time,,,Sometimes,,,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Rarely,,,,,,Rarely,Sometimes,Sometimes,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,Sometimes,,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,,,Often,Sometimes,Most of the time,,Most of the time,,,Most of the time,,Often,Sometimes,Often,,,,20,35,10,15,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Other",Most of the time,,,,Often,Often,,,Often,,,Often,Most of the time,,Most of the time,,,,,,,Most of the time,10-25% of projects,More internal than external,Other,None,"Can't use any API, Cloud based tools or storages and limited open tool use previleges ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"90,000",,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer,Other",University courses,20,0,0,80,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A master's degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Rarely,10MB,"Bayesian Techniques,Decision Trees","Amazon Web services,C/C++,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Minitab,NoSQL,QlikView,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Often,,Often,,,,,Sometimes,,,,,,,,,,,,,,Often,,,Sometimes,Rarely,,,,,Rarely,,,,,,Sometimes,Often,,,Often,,,Often,,,,,,,"A/B Testing,Association Rules,Data Visualization,HMMs,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Often,Often,,,,,Often,,,,,,Rarely,,Rarely,Often,,Often,,,,,Rarely,Often,,,,,Most of the time,Most of the time,,,,20,40,15,15,10,0,,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input",,,,Often,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,44,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,20,10,20,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Academic,500 to 999 employees,Increased significantly,Less than one year,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods","KNIME (free version),R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,"Association Rules,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",,Sometimes,,,,,,Often,Often,,,,,Often,,,,Often,,,Often,,Often,Sometimes,,,,,,Sometimes,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,Other,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Stack Overflow Q&A",Very useful,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,,Self-taught,60,0,20,0,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"SVMs,Other","Amazon Web services,Jupyter notebooks,Python,SQL",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,,,,,,Most of the time,,,,,,,Sometimes,,,,,Often,,Most of the time,,,Often,,,,,Often,,,,,20,10,40,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,Most of the time,,,,,,,,Often,,,,,,,,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,80000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Very useful,Very useful,"FastML Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,30,40,5,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,TensorFlow,Unix shell / awk",,Rarely,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Sometimes,,,,Often,Often,,Most of the time,,,Often,,Most of the time,Sometimes,Often,,,,20,10,30,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,,,,,,,,,Sometimes,Sometimes,,,,,Most of the time,,,,,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,"Zillow, US Census, US Labor, US Demographic, Moody's Analytics Data",Non availability of updated data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,600000,INR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Netherlands,25,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,Kaggle competitions,20,20,0,10,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Russia,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Self-employed,Amazon Web services,Survival Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Predictive Modeler,Programmer,Researcher,Statistician",Work,50,0,20,20,10,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A professional degree,Retail,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Most of the time,10GB,Regression/Logistic Regression,"Cloudera,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,SAS Base,SQL,Other",,,,,Rarely,,,,,Rarely,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,,,,,,,,Most of the time,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,,20,30,25,10,15,0,Enough to tune the parameters properly,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,Often,Most of the time,,,,,,,,,Most of the time,,,10-25% of projects,More internal than external,IT Department,WorldBank,Convert it into an appropriate form for further analysis,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Always,5000,USD,I am not currently employed,5,,,,,,,,,,,,,,,,,, +Male,Japan,20,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Business Analyst,Other,30,20,0,0,20,30,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Financial,20 to 99 employees,Decreased significantly,Don't know,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,100MB,Regression/Logistic Regression,"Python,QlikView,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,Sometimes,,,,,Sometimes,,,,Often,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,7,1,1,90,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Sometimes,Sometimes,Most of the time,,,,Most of the time,,Most of the time,Most of the time,Often,,Most of the time,Most of the time,,,,Often,,,100% of projects,More internal than external,Central Insights Team,"TransUnion, Experian",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,,Never,400000,INR,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,India,19,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,Somewhat useful,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,30,60,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Hospitality/Entertainment/Sports,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Video data,Always,10GB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Naive Bayes,Neural Networks,SVMs,Time Series Analysis",,,Sometimes,,,Often,,,,,,,,,,Most of the time,,Sometimes,,Most of the time,,,,,,,,Often,,Often,,,,20,30,30,15,5,0,Enough to tune the parameters properly,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,Data parsed live from games,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,50000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,South Korea,46,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,Software Developer/Software Engineer,University courses,45,0,20,30,5,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Manufacturing,20 to 99 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Relational data,Other",Always,1GB,"Bayesian Techniques,Ensemble Methods,HMMs,Random Forests,Regression/Logistic Regression,RNNs,SVMs","MATLAB/Octave,Python,R,SAS Enterprise Miner,Tableau,TIBCO Spotfire,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,Sometimes,,,,,,Often,,Rarely,Often,Often,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,Sometimes,Often,Most of the time,Often,Often,,,,Sometimes,,,Often,,,,,Often,,Often,,Sometimes,,,Often,,Often,,,,30,25,5,20,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,,,,,,,,,,,,Most of the time,,,26-50% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Rarely,100000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,31,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Data Elixir Newsletter,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,,,,,Nice to have,,,,,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,"Data Analyst,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing","Bayesian Techniques,Ensemble Methods",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,Kenya,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Self-taught,20,20,20,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,39,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Business Analyst,Self-taught,50,40,10,0,0,0,Outlier detection (e.g. Fraud detection),,A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,Other,"R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Mathematica,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring",Very useful,Very useful,,,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Adversarial Learning,Computer Vision","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Internet-based,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Image data,Most of the time,1GB,"GANs,Neural Networks","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"GANs,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,,,,,,,,Often,,,,,,,,,Most of the time,Rarely,,,,,Most of the time,,,,,,,,5,80,10,0,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning",,,,,,,,,Sometimes,Rarely,,Rarely,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,CelebA,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,SyncThing,Git,Always,1200000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,19,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",45,50,0,0,5,0,"Computer Vision,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Company internal community,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,Very useful,,,,,,,Very useful,Very useful,,,,,Very useful,Very useful,,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist",Self-taught,20,20,50,10,0,0,"Recommendation Engines,Speech Recognition","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Ukraine,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,78,0,1,1,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Internet-based,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,44,Employed part-time,,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,0,10,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Don't know,100GB,"Decision Trees,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Text Analytics",Often,,,,,Often,Often,Often,,,,,,Often,,Often,,,,,,,,,,,,,Often,,,,,70,15,10,5,0,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others",,,,,Most of the time,Often,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,,,,Git,Most of the time,35000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,40,0,20,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Mix of fields,"10,000 or more employees",,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression",Often,,,,,,Most of the time,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process",Often,Most of the time,,,,Sometimes,,Often,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,,Ownership,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,2110000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Poland,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,,Self-taught,70,10,0,10,10,0,Other (please specify; separate by semi-colon),Neural Networks - CNNs,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,Udacity","Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important +Male,Italy,42,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,Less than a year,Researcher,Self-taught,20,20,20,20,0,20,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Image data,Sometimes,1GB,,"Amazon Web services,KNIME (free version),Microsoft Azure Machine Learning,Tableau,TIBCO Spotfire",,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,HMMs,Neural Networks,Prescriptive Modeling,SVMs,Time Series Analysis",Rarely,,Sometimes,,Often,Most of the time,,,,,,,Most of the time,,,,,,,Often,,Sometimes,,,,,,Often,,Sometimes,,,,10,20,20,1,49,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",Rarely,Sometimes,Sometimes,,Often,,,,,Sometimes,,,Most of the time,Often,Most of the time,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,40,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","Google Cloud Compute,Jupyter notebooks,Python,Tableau",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,40,10,10,30,10,0,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,100% of projects,Entirely internal,Other,-,-,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,30000,USD,Other,7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,0,10,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,R,,,,"Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,Other,Self-taught,49,0,50,0,1,0,Time Series,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Mix of fields,10 to 19 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Other",,,,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,Perl,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,Often,,,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,Rarely,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,Often,,,,48,2,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,Most of the time,,100% of projects,Entirely internal,Other,Data from Riverbed tools,,,Share Drive/SharePoint,,Git,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Spain,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer",University courses,50,25,0,25,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs",Primary/elementary school,CRM/Marketing,,,,,Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Evolutionary Approaches,Gradient Boosted Machines,Regression/Logistic Regression","Julia,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Tableau",,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Evolutionary Approaches,Gradient Boosted Machines,Lift Analysis,Recommender Systems,Simulation",Often,,Often,,Often,Sometimes,,,,Sometimes,,Rarely,,,Often,,,,,,,,,Often,,,Often,,,,,,,30,30,20,20,0,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,40,30,10,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,22,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,Less than a year,"Data Scientist,Other",Self-taught,80,10,5,0,5,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning",Neural Networks - CNNs,A master's degree,CRM/Marketing,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,Often,,Often,,,,Often,Most of the time,,,Often,,,,,,Most of the time,,,,50,30,10,7,3,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",20,10,70,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",,70,0,30,0,0,0,"Natural Language Processing,Speech Recognition",Logistic Regression,,Technology,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,,"Text data,Relational data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed part-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Statistician,University courses,20,0,0,80,0,0,,Logistic Regression,A master's degree,Academic,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Never,<1MB,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Programmer,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Friends network,Online courses,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,20,30,0,0,0,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United Kingdom,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,75,0,20,5,0,0,"Natural Language Processing,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,Very useful,Very useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,Other (Separate different answers with semicolon)",< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Laptop or Workstation and local IT supported servers,11 - 39 hours,Other,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Programmer",Self-taught,50,45,0,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,6 to 10 years,Data Analyst,Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Microsoft Excel Data Mining,R,SQL",,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,,Most of the time,,Sometimes,,,Often,,Sometimes,,Most of the time,,,,Most of the time,,Sometimes,Most of the time,,,Rarely,Sometimes,,,Sometimes,,,,50,15,15,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools",Most of the time,Sometimes,Often,Rarely,Most of the time,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,36000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,United States,29,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Not Useful,,Not Useful,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Not Useful,,,Somewhat useful,,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Other,Yes,Master's degree,Physics,1 to 2 years,"Data Analyst,Engineer,Programmer,Researcher",Self-taught,50,50,0,0,0,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important +Male,United Kingdom,49,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,50,10,20,20,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,100TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,DataRobot,Flume,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,RapidMiner (commercial version),SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Tableau,TensorFlow,Other",Often,Often,,,Often,Sometimes,Often,Often,Often,,Sometimes,Sometimes,Sometimes,Often,,,,,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Often,,,,Most of the time,,Most of the time,Often,,,Sometimes,Often,Often,Sometimes,Most of the time,Most of the time,,,Sometimes,Most of the time,,,Most of the time,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Often,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Often,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,50,30,5,10,0,5,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",Often,,,,Most of the time,Often,,,Most of the time,,,,,,,,Often,,,,,,26-50% of projects,More external than internal,Standalone Team,N/A,"Privacy Issues. Very sensitive data. Findings can impact large groups of people negatively. ","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Subversion",Most of the time,275000,GBP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,India,27,"Not employed, but looking for work",,,,,,,,QlikView,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",10,53,10,2,20,5,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Female,India,37,"Not employed, but looking for work",,,,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,10,0,40,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,16-20,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +"Non-binary, genderqueer, or gender non-conforming",India,24,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,Python,"Google Search,University/Non-profit research group websites","Conferences,Friends network,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Predictive Modeler",Self-taught,10,20,70,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,Most of the time,Most of the time,,Rarely,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Sometimes,Often,Most of the time,,,,Often,Often,,Most of the time,,,,,,,Often,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,publicly available statistics data,cleaning the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,97000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Singapore,31,Employed full-time,,,Yes,,Data Analyst,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Deep learning,Scala,GitHub,"Blogs,Conferences,Kaggle,Textbook",,Very useful,,,Very useful,,Very useful,,,,,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",15,15,40,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",,Technology,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,,"Random Forests,Regression/Logistic Regression","MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,Random Forests,Text Analytics",,Often,,,Most of the time,,Most of the time,Sometimes,,,,,,Often,,Often,Rarely,,Often,,,,Rarely,,,,,,Often,,,,,35,10,15,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,55200,SGD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Denmark,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Self-employed",Microsoft Azure Machine Learning,Neural Nets,Python,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Business Analyst,Computer Scientist","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,SQL",,,,,,,,,,,Rarely,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Often,,,,,Rarely,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Decision Trees,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,Most of the time,,,,,Most of the time,,,,,,,,,,,Most of the time,,Sometimes,,Most of the time,,,,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,,,,,5,,,,,,,,,,,,,,,,,, +,,NA,Employed part-time,,,Yes,,Computer Scientist,,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Fine arts or performing arts,More than 10 years,Statistician,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,47,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,,Very useful,,Very useful,,KDnuggets Blog,3-5 years,Necessary,,,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important, +Male,Ireland,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,Microsoft SQL Server Data Mining,Time Series Analysis,SQL,I collect my own data (e.g. web-scraping),"Podcasts,Textbook",,,,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",High school,Government,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Always,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks","IBM SPSS Statistics,Python,R,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Neural Networks,Simulation",,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,25,10,10,10,45,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,Sometimes,,,Most of the time,,Often,,,,,Often,,,Often,,,,51-75% of projects,Entirely internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,57000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Not employed, but looking for work",,,,,,,,R,Bayesian Methods,Python,Google Search,"College/University,Friends network,Online courses,Personal Projects,Textbook",,,Very useful,,,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,1 to 2 years,Programmer,University courses,10,60,0,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,Russia,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Company internal community,Kaggle,Trade book",,,,Somewhat useful,,,Somewhat useful,,,,,,,,,Somewhat useful,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",70,10,10,5,5,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important +Male,Other,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"GitHub,I collect my own data (e.g. web-scraping)",Blogs,,Very useful,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,< 1 year,,,,,,,,Necessary,,,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,Other,80,20,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Not important,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Machine Translation,Recommendation Engines,Reinforcement learning","Decision Trees - Random Forests,Evolutionary Approaches","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Statistician,Poorly,Employed by professional services/consulting firm,Python,Cluster Analysis,R,GitHub,Company internal community,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Statistician,University courses,25,0,25,45,0,5,,Bayesian Techniques,A bachelor's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Rarely,1MB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Often,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization",,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,15,15,15,55,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,47000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Engineer,University courses,5,10,30,50,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs",,Financial,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,10TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines","Cloudera,IBM SPSS Statistics,Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,SAS Enterprise Miner,Spark / MLlib,TensorFlow,TIBCO Spotfire",,,,,Often,,,,,,,Often,,,Sometimes,,,,,,,,Rarely,,Rarely,,,,,,Most of the time,Most of the time,,,,,,,Rarely,,Often,,,,,Sometimes,Most of the time,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Often,Most of the time,,,,,,,Sometimes,,,Often,,Most of the time,,,,,,,,Often,Most of the time,,,,20,20,20,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,C/C++,Time Series Analysis,Python,GitHub,"Arxiv,College/University,Conferences,Kaggle,Stack Overflow Q&A",Very useful,,Very useful,,Very useful,,Very useful,,,,,,,Very useful,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,15,0,15,35,35,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"5,000 to 9,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,Rarely,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,Often,Often,,,Sometimes,Sometimes,Sometimes,,Most of the time,,,,,Rarely,Sometimes,,,,,70,15,0,5,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Central Insights Team,,,,,,,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,R,Support Vector Machines (SVM),R,"Google Search,Government website","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Not Useful,,,,Somewhat useful,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,"Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",30,40,20,NA,10,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Decision Trees,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Google Search,"Friends network,Official documentation,Online courses,Stack Overflow Q&A",,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Mathematics or statistics,Less than a year,"Business Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",I prefer not to answer,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests","Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,Rarely,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Segmentation,Time Series Analysis",,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team",,Sometimes,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,RUB,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,Less than a year,Other,Other,0,15,0,0,5,80,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Insurance,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Rarely,,Decision Trees,"Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Spark / MLlib,SQL",Rarely,Rarely,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,10,0,20,70,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,Often,,,,,,Most of the time,,,,,,,,100% of projects,Approximately half internal and half external,Other,,,,,,,,45000,GBP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed part-time,,,Yes,,Computer Scientist,Poorly,"Employed by college or university,Employed by government",Python,Neural Nets,Python,University/Non-profit research group websites,"College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,Programmer,Researcher",University courses,35,35,20,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,I prefer not to answer,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Image data,Text data,Relational data",Rarely,10MB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,,,,,,,"kNN and Other Clustering,Markov Logic Networks,Segmentation",,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,20,40,25,15,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,Gathering data with specific equipment (eye tracking tasks),Document-oriented (e.g. MongoDB/Elasticsearch),,,,Don't know,256,RUB,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",0,20,0,80,0,0,Natural Language Processing,"Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",,10GB,HMMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Researcher",University courses,0,0,100,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Ensemble Methods",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,NoSQL,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Other","Online courses (coursera, udemy, edx, etc.)",30,15,25,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,"1,000 to 4,999 employees",Decreased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Rarely,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,Often,,,,5,20,5,20,50,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Business Department,Geographic Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,320000,BRL,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Germany,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,Very useful,,,,,,Very useful,,Very useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,"Data Scientist,Researcher",Self-taught,30,30,40,0,0,0,,,A bachelor's degree,Government,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,10TB,"Ensemble Methods,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Impala,Jupyter notebooks,Python,SAS Base,Spark / MLlib,Tableau,Unix shell / awk",,Rarely,,,Often,,,,,,,,,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,Rarely,,,Often,,,,Sometimes,,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Often,Most of the time,,Sometimes,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,40,30,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,,,,,,,,,,,,,,,Most of the time,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Most of the time,40000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,59,Employed part-time,,,No,Yes,Other,Fine,Employed by government,I don't plan on learning a new tool/technology,,,,"Blogs,Online courses,Podcasts",,Very useful,,,,,,,,,Very useful,,Very useful,,,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,Other,Self-taught,20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,21,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Bayesian Methods,R,"Google Search,University/Non-profit research group websites","Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Programmer,Statistician",University courses,20,10,5,55,10,0,"Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",Primary/elementary school,Other,500 to 999 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1MB,Regression/Logistic Regression,"C/C++,IBM SPSS Statistics,Java,Microsoft Excel Data Mining,Python,R,SAS Base",,,,Rarely,,,,,,,,Sometimes,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,Rarely,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Text Analytics",,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,,,,30,30,20,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,120000,KES,Other,5,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,50,25,25,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,100 to 499 employees,Stayed the same,More than 10 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation,Workstation + Cloud service","Image data,Text data",Rarely,10MB,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,Time Series,Logistic Regression,,Financial,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Very important,,Laptop or Workstation and private datacenters,Text data,Rarely,100GB,"Bayesian Techniques,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Naive Bayes,Prescriptive Modeling",,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,41,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,R,Decision Trees,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Electrical Engineering,I don't write code to analyze data,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,,,Very Important,Very Important,Very Important +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,Self-taught,60,0,40,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Decision Trees,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Often,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs,Text Analytics",,,,,,Often,Most of the time,Often,,,,Sometimes,,Often,,Often,,Sometimes,Sometimes,,Often,,,,,Often,Sometimes,Sometimes,Sometimes,,,,,30,35,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,Often,,,,,,,Sometimes,,100% of projects,Do not know,Other,,Not having a good dataset and right kind of problem to solve.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,750000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,26,Employed part-time,,,Yes,,Business Analyst,,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",50,30,0,0,0,20,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Mix of fields,,,,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,,,"Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Portugal,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Africa,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,"Programmer,Software Developer/Software Engineer,Other",Self-taught,70,25,5,0,0,0,,,High school,Mix of fields,"5,000 to 9,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Personal Projects,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,,Very useful,,,Very useful,,Somewhat useful,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,20,0,0,80,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Germany,33,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,38,"Not employed, but looking for work",,,,,,,,DataRobot,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"DataCamp,Other","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",11 - 39 hours,Github Portfolio,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,25,0,0,25,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Denmark,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Business Analyst,,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Unsupervised Learning,"Bayesian Techniques,Logistic Regression,Markov Logic Networks",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,SAS Base,Deep learning,Python,Google Search,"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,"Computer Vision,Time Series","Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,,,,,Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,,100MB,"Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Java,Python,R",,,,,,,,,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Logistic Regression,SVMs,Time Series Analysis",,,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,Often,,,,20,50,0,10,20,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,20,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Online courses,Textbook",Very useful,,,,Very useful,,Very useful,,,,Very useful,,,,Very useful,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important +Male,Ukraine,21,Employed part-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Deep learning,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Kaggle,Official documentation,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,"Data Elixir Newsletter,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Very Important,Not important +Male,United Kingdom,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",University courses,30,15,25,30,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Increased significantly,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Image data,Sometimes,10GB,"CNNs,GANs,Neural Networks,RNNs","Amazon Machine Learning,Jupyter notebooks,Python,TensorFlow,Other",Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,"CNNs,Data Visualization,GANs,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,,Most of the time,,,Most of the time,,,,Often,,,,,,,,,Most of the time,Often,,,,Most of the time,Most of the time,,Most of the time,,,,,,65,20,10,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,,,,Sometimes,Sometimes,,,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,0,GBP,Has increased 20% or more,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,24,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Coursera,,2 - 10 hours,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Python,Deep learning,Python,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Doctoral degree,Mathematics or statistics,Less than a year,Other,Self-taught,30,70,0,0,0,0,Adversarial Learning,Logistic Regression,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Japan,41,Employed full-time,,,Yes,,Computer Scientist,,,DataRobot,,C/C++/C#,Google Search,"Arxiv,Kaggle,Trade book",Very useful,,,,,,Very useful,,,,,,,,,Very useful,,,"Data Machina Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,Programmer,Self-taught,80,10,10,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Decreased slightly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Image data,Rarely,10MB,Bayesian Techniques,"C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Data Visualization,Evolutionary Approaches,GANs,Gradient Boosted Machines,RNNs,SVMs,Time Series Analysis",Often,,Often,Often,Often,,Often,,,Often,Often,Most of the time,,,,,,,,,,,,,Often,,,Most of the time,,Often,,,,20,20,20,10,30,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,Often,Most of the time,,,,,,Often,,,,Most of the time,,,,,,,10-25% of projects,Approximately half internal and half external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Rarely,"75,000",USD,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,I never declared a major,3 to 5 years,"Data Scientist,Researcher,Statistician",Work,50,10,40,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - RNNs,Other (please specify; separate by semi-colon)",High school,Academic,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Not very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,,"Decision Trees,Neural Networks,Random Forests,SVMs","Java,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,Tableau,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Often,,,,Often,,Sometimes,Rarely,Rarely,,,,,,Sometimes,,,,Rarely,Rarely,,,,,,"Association Rules,Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,Rarely,,,,Sometimes,Most of the time,,,,,,,Often,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,,30,40,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,,,,,,,,,Sometimes,,Often,,,,100% of projects,Entirely external,Standalone Team,None,"Too less data, cleansing kills around 50% of the data","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Sometimes,,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Female,Republic of China,24,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,College/University,Kaggle,Online courses,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,CRM/Marketing,20 to 99 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data",Sometimes,1GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Jupyter notebooks,MATLAB/Octave,Python,Stan,TensorFlow,Unix shell / awk",,,,,Often,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,SVMs,Time Series Analysis",,,Sometimes,,,Often,Sometimes,Sometimes,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Often,Often,,,,Sometimes,Often,,,Often,,Often,,,,10,40,40,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,Sometimes,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",,2,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,India,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Republic of China,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Software Developer/Software Engineer,University courses,5,5,20,65,5,0,Adversarial Learning,Decision Trees - Random Forests,A bachelor's degree,Internet-based,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,1GB,Decision Trees,"C/C++,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,"Decision Trees,Random Forests",,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,40,20,15,5,20,0,Enough to run the code / standard library,"Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,,,Often,,,,,,,Sometimes,,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,,,,3,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,Mathematica,Genetic & Evolutionary Algorithms,R,GitHub,"Blogs,College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,20,10,0,Reinforcement learning,Decision Trees - Random Forests,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ukraine,24,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,R,Genetic & Evolutionary Algorithms,SQL,Google Search,"Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",40,25,20,10,5,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Orange,R,SQL,Other",,,,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,Often,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Rarely,Often,,,,Often,Most of the time,Most of the time,Rarely,,,,,Often,,Often,,,,,,Sometimes,Most of the time,,,Often,Most of the time,,,Often,,,,60,25,5,8,2,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,Sometimes,,,Sometimes,,,,,,Sometimes,,,,,,Often,,51-75% of projects,More internal than external,Other,We rarely use third party datasets only for training purposes. ,The biggest challenge is where you need to reestablish current business process to obtain good data. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,500000,UAH,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,28,"Not employed, but looking for work",,,,,,,,Microsoft R Server (Formerly Revolution Analytics),Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,Yes,Master's degree,,Less than a year,"Data Analyst,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,30,60,0,10,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,20+,Very Important,Very Important,Somewhat important,,Very Important,,,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,France,25,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (free version),Decision Trees,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Kaggle,Online courses,Podcasts,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,20,10,0,"Adversarial Learning,Machine Translation,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Retail,20 to 99 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data,Relational data",,,,"C/C++,Java,Python,QlikView,R,SQL,Tableau",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Rarely,Rarely,Rarely,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,0,40,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",,,,,Sometimes,Often,,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,Often,,,,,,26-50% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Never,36000,EUR,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Netherlands,47,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Ensemble Methods,,Government,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Very useful,Very useful,,,,,,Very useful,,Very useful,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,Researcher,University courses,20,0,40,40,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs",A doctoral degree,Academic,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Never,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,Other","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,Sometimes,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,30,50,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,Sometimes,,,,Most of the time,,,,,,,Most of the time,,,,,,,26-50% of projects,More internal than external,Other,xeno-canto audio data,noisy data; annotation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,1800000,KES,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Belgium,63,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,55,10,0,0,1,34,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Internet-based,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,,"C/C++,Java,Oracle Data Mining/ Oracle R Enterprise,Python,SQL",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Decision Trees,Simulation,Text Analytics",,,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,10,0,10,15,0,65,Enough to refine and innovate on the algorithm,"Explaining data science to others,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,Often,,,,,,,Sometimes,,,,,,Often,,Often,,Less than 10% of projects,,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",C/C++,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",30,20,20,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Insurance,,,,,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Always,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,DataRobot,Hadoop/Hive/Pig,Python,R,SQL,TensorFlow",,Most of the time,,,,Rarely,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,Rarely,,,,Most of the time,,,Most of the time,,,Rarely,Rarely,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,,,,,,,,Often,Often,Often,Often,,,,76-99% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Always,84000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Vietnam,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Friends network,Kaggle,Online courses,Textbook",,,,,,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer",University courses,20,20,30,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,10 to 19 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Python,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests,Text Analytics",,,,,,Often,Often,Often,Most of the time,,,,,,,Often,,Often,,,,,Most of the time,,,,,,Often,,,,,50,20,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Often,Most of the time,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Entirely external,Standalone Team,imdb;facebook;twitter,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Never,25000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Israel,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,6 to 10 years,,University courses,30,0,40,30,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Java,Microsoft R Server (Formerly Revolution Analytics),Python,R",,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines",Bayesian Techniques,High school,Internet-based,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,Sometimes,100MB,Bayesian Techniques,"Amazon Machine Learning,Amazon Web services,Java",Most of the time,Sometimes,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,40,10,20,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Often,,,Often,,,Often,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A health science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Orange,Python,R,RapidMiner (commercial version),RapidMiner (free version),Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,,"Data Scientist,Other",Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Finland,35,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Other",Self-taught,85,0,10,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,,Other,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Other",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,30,5,5,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,,,,,,,Often,,,,Most of the time,,51-75% of projects,More internal than external,Other,SNL; WoodMacKenzie; Other consultant data,Getting access to data that is relevant and raw.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,50000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,65,Retired,,,No,Yes,Programmer,Fine,Self-employed,Python,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Anomaly Detection,Python,Other,"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Very useful,,,,,Somewhat useful,Very useful,,,Very useful,,,Very useful,Somewhat useful,,Very useful,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Other,50,30,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests",,,,,,,Often,Sometimes,Sometimes,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,10,20,30,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Often,,,Sometimes,,,,,,,,,Often,Often,,,,,,,Often,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,50000,GBP,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Kaggle competitions,25,25,0,0,50,0,"Computer Vision,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Markov Logic Networks",A master's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Work,80,0,20,0,0,0,Natural Language Processing,,A doctoral degree,CRM/Marketing,10 to 19 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100GB,Neural Networks,"Java,SQL,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,50,10,40,0,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,22,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Genetic & Evolutionary Algorithms,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Programmer,University courses,35,15,0,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Academic,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Most of the time,<1MB,Decision Trees,"C/C++,IBM SPSS Statistics,Java,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,Rarely,,,Often,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Text Analytics",,,,,,Sometimes,Often,Often,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,5,50,10,35,0,0,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Always,3000,PLN,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Australia,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Conferences,Kaggle,Online courses,Personal Projects,Tutoring/mentoring",Somewhat useful,,,,Very useful,,Very useful,,,,Very useful,Very useful,,,,,Somewhat useful,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",High school,Technology,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Impala,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,SQL,Tableau",,,,,Often,,,,Often,,,,,,,,Most of the time,,,,,Sometimes,Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Recommender Systems,Segmentation,Simulation,Text Analytics",Often,,,,,,Often,Often,,,,,,Often,,Rarely,,,,,,,,Sometimes,,Often,Often,,Sometimes,,,,,35,25,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,Often,Often,,,,,,,Sometimes,,,Often,,Often,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,110000,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Australia,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Financial,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Other,Sometimes,1GB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,MATLAB/Octave,Python,R,SQL",,Rarely,,,,,,,Rarely,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,Sometimes,,Often,Sometimes,,Often,Sometimes,Sometimes,,,,,,Sometimes,,,,Often,,,51-75% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,Bitbucket,Rarely,,,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Portugal,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,60,5,15,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher",Self-taught,35,5,50,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Pharmaceutical,Fewer than 10 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,30,0,0,70,0,Computer Vision,Neural Networks - CNNs,High school,Technology,500 to 999 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Rarely,1GB,Regression/Logistic Regression,"NoSQL,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,"Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Text Analytics",,,,,,,Rarely,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,20,30,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Japan,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,More than 10 years,Data Analyst,Work,10,10,80,0,0,0,Adversarial Learning,Decision Trees - Gradient Boosted Machines,,CRM/Marketing,"1,000 to 4,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,Perl,Python,R,SAS Base,SQL",,Most of the time,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,Often,Often,,Often,,,,,Often,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Often,,Often,Often,,Most of the time,,,Sometimes,Often,,,,20,10,0,50,20,0,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Most of the time,,,,,,51-75% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,33,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,RapidMiner (commercial version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,Very useful,,Very useful,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,A bachelor's degree,Technology,Fewer than 10 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,Decision Trees,"IBM Cognos,Python,QlikView,R,RapidMiner (commercial version),Tableau",,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Most of the time,Sometimes,,,,,,,,,,,Rarely,,,,,,,"Bayesian Techniques,CNNs,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks",,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,,,,,,,40,20,5,20,15,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Most of the time,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,40000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,29,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,,Other,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,44,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Podcasts",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,Not Useful,,,,,,FlowingData Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,,"Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,Singapore,31,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,Less than a year,Other,Other,20,0,0,0,0,80,,,A bachelor's degree,Other,500 to 999 employees,Stayed the same,Less than one year,Some other way,Not at all important,Other,Laptop or Workstation and private datacenters,Text data,Never,100MB,Random Forests,"Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,Most of the time,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,70,0,0,25,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Often,,,,,,,,,Often,,,,,,Often,,,,,,,76-99% of projects,More external than internal,Other,"Singstat Data.gov ",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Rarely,35000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Italy,48,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",,,,Very useful,Very useful,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Software Developer/Software Engineer",Self-taught,50,5,40,0,5,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Decreased slightly,Don't know,Some other way,Not very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Relational data",Never,1MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Rarely,,Sometimes,,,,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,Most of the time,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs",,,,Often,Sometimes,Most of the time,Most of the time,Sometimes,,,,,,Often,,Often,,,,Most of the time,Often,Most of the time,Sometimes,Sometimes,,Rarely,Sometimes,Often,,,,,,10,20,0,20,50,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Often,,,,,,,,,,Sometimes,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,40000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Other",University courses,0,30,20,50,0,0,,,A doctoral degree,Academic,10 to 19 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Traditional Workstation,Text data,,,,"NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing",,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,70,0,0,30,0,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input",,,,,Often,,,,,,Rarely,,,,,,,,,,,,51-75% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,,Always,,,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Japan,40,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,Jupyter notebooks,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Other",,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,Natural Language Processing,Support Vector Machines (SVMs),,Telecommunications,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,SVMs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,20,0,0,30,50,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),"Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,7000000,JPY,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Turkey,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer",Work,0,20,70,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Other,Other",,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,,,,,Most of the time,Most of the time,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",Most of the time,,,,,Most of the time,Often,Often,,,,,,,,,,,Sometimes,,,,Often,,,,,,Often,Most of the time,,,,70,20,5,4,1,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Scaling data science solution up to full database",,,,,Most of the time,Often,,,,,,,,,,,,Sometimes,,,,,Less than 10% of projects,Entirely internal,IT Department,,Cleaning and normalising the data and handling missing values.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,144000,TRY,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Engineer,University courses,50,0,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Non-Kaggle online communities,Online courses,YouTube Videos",Very useful,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,10,30,0,0,"Computer Vision,Natural Language Processing","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Most of the time,10GB,"CNNs,Neural Networks,Regression/Logistic Regression","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,Often,,Most of the time,Most of the time,,,,,,,Sometimes,,Sometimes,,,Often,Often,Sometimes,,Sometimes,,,,,Sometimes,Often,,,,,30,40,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Often,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Rarely,960000,RUB,Other,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,,"Computer Scientist,Data Analyst,Programmer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,24,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Podcasts,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,30,0,40,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Other,25,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,,,"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,,< 1 year,Nice to have,Necessary,Nice to have,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,30,20,0,40,10,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Other,"Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,Not Useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",49,0,0,0,2,49,Recommendation Engines,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,Colombia,22,"Not employed, but looking for work",,,,,,,,Oracle Data Mining/ Oracle R Enterprise,Time Series Analysis,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Online courses,Personal Projects",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,,,"Data Stories Podcast,Partially Derivative Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Physics,Less than a year,Engineer,Self-taught,50,30,0,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Physics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Ukraine,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,Other,Self-taught,60,5,20,0,15,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Other,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,Often,,,,,,,,,,Rarely,,,,,Rarely,,,,,,Sometimes,,,,,,,,Rarely,,Most of the time,,,,,,,,,Rarely,,,,Rarely,,,,,,"Association Rules,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,Rarely,,,,,Most of the time,,,,,,,,,Rarely,,Rarely,Often,,,,Rarely,,,,,,Often,Rarely,,,,30,20,10,40,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Sometimes,,,,,,,Most of the time,,,Sometimes,,,,76-99% of projects,Approximately half internal and half external,Other,Sociology data,Work time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,300000,UAH,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,21,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,39,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",,Very useful,,,,,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,,Somewhat important,Not important,Somewhat important,Somewhat important,,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important +Male,India,34,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,University/Non-profit research group websites,Friends network,,,,,,Very useful,,,,,,,,,,,,,,3-5 years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer,Researcher",Self-taught,70,0,0,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,,,,,,,,,,,,,,,, +Male,South Korea,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Monte Carlo Methods,Python,I collect my own data (e.g. web-scraping),"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,Less than a year,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,30,60,0,0,0,"Natural Language Processing,Recommendation Engines",Logistic Regression,A bachelor's degree,Technology,10 to 19 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,NoSQL,R,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Text Analytics",Sometimes,,,,,Often,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,Often,,,,,70,20,5,0,5,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,,Often,Sometimes,,Often,,,Most of the time,,,,,,,,Sometimes,,,,,,10-25% of projects,More external than internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Bitbucket,,30000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Programmer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,95,5,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,A general-purpose job board,Not at all important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Rarely,10GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,,26,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",Work,40,20,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,Don't know,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,University/Non-profit research group websites,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"Data Machina Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher",Self-taught,45,10,45,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series",Hidden Markov Models HMMs,A bachelor's degree,Other,"1,000 to 4,999 employees",Increased significantly,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1GB,"Regression/Logistic Regression,Other","Jupyter notebooks,Python,SQL,Tableau,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,,Most of the time,,,"Cross-Validation,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,,,,,,,,,,Sometimes,,,Most of the time,,,,Often,,,,,,Most of the time,Sometimes,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,,,,Often,,,,,Often,,,,,,Often,Sometimes,,51-75% of projects,More internal than external,IT Department,Acxiom,Lack of expertise,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Sometimes,48000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Mexico,24,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,45,5,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,,10MB,"Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Java,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"kNN and Other Clustering,Neural Networks,Random Forests,RNNs,Text Analytics",,,,,,,,,,,,,,Rarely,,,,,,Often,,,Often,,Often,,,,Most of the time,,,,,30,20,20,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,We don't use third party datasets,"correcting human mistakes like typos and that the data isn't stored centrally, I have to gather it from a lot of sources","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,GitHub repos contain them,Git,Most of the time,96200,MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Microsoft Azure Machine Learning,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle",,,Very useful,,Somewhat useful,,Very useful,,,,,,,,,,,,"O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Researcher,Statistician",University courses,10,20,30,30,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Military/Security,Fewer than 10 employees,Stayed the same,6-10 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Sometimes,100MB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","C/C++,Mathematica,Python,SAS Base,SAS Enterprise Miner,SQL",,,,Sometimes,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,Most of the time,,Sometimes,Most of the time,Often,,,,,,Often,,,,,Most of the time,Most of the time,Sometimes,,Often,,,,,Most of the time,Most of the time,Sometimes,,,,30,10,30,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Sometimes,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Most of the time,60000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Republic of China,35,Employed full-time,,,Yes,,Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by college or university",Python,Social Network Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Personal Projects",,,,,Somewhat useful,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,,3 to 5 years,Researcher,Self-taught,90,5,5,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,<1MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Java,KNIME (free version),MATLAB/Octave,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SQL",,,,,,,,,,,,Sometimes,,,Often,,,,Often,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,Sometimes,Rarely,,,,Sometimes,,,,,,,,,,"Decision Trees,Ensemble Methods,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,SVMs",,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,Most of the time,,Often,Most of the time,,Often,,,Most of the time,,,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",,,Sometimes,,,,,,Often,Most of the time,,,,,,Most of the time,Most of the time,,,,,,51-75% of projects,More external than internal,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Mercurial,Sometimes,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,France,26,Employed full-time,,,Yes,,Researcher,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,Less than a year,"Business Analyst,Data Analyst,Programmer,Researcher,Statistician",Self-taught,50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Academic,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Impala,Java,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Random Forests",Sometimes,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,20,20,20,20,20,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues",,,,Sometimes,Most of the time,,,,,Sometimes,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,Work,70,0,20,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,27,Employed part-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Google Search,"Arxiv,College/University",Very useful,,Very useful,,,,,,,,,,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Nice to have,,,,Nice to have,Nice to have,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,20,10,0,70,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,28,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Official documentation,YouTube Videos",,,Somewhat useful,,Very useful,Very useful,Very useful,,,Somewhat useful,,,,,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,PhD,No,Doctoral degree,Computer Science,,"Computer Scientist,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Other (please specify; separate by semi-colon)","Bayesian Techniques,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,United States,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Blogs,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,,O'Reilly Data Newsletter,< 1 year,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Management information systems,1 to 2 years,I haven't started working yet,University courses,20,0,0,75,5,0,Time Series,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,India,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Perfectly,Self-employed,Spark / MLlib,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Researcher",University courses,10,5,30,35,15,5,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Often,Often,,,,,,,,,,,,Most of the time,,,Often,,,,,,Most of the time,Most of the time,,,,45,20,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,700000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Online courses,Stack Overflow Q&A,Other",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Miner,Predictive Modeler",University courses,0,40,20,20,20,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Always,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,KNIME (commercial version),MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,TensorFlow",,Often,,,,,,Rarely,Sometimes,,,,,,,,,Sometimes,,,Most of the time,Rarely,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Time Series Analysis",,Often,,,Sometimes,Most of the time,Often,Often,,,,,,,,Often,,,,Often,Often,,Most of the time,Often,Often,Often,,Often,,Sometimes,,,,60,10,30,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,,,,,,,,,,,Often,Most of the time,,,,,,Less than 10% of projects,More internal than external,IT Department,INE,Find data :/,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,16000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,32,Employed full-time,,,Yes,,Other,Fine,Self-employed,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Other,Other",,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,15,5,0,80,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,10 to 19 employees,Stayed the same,Don't know,Some other way,Not at all important,Other,Traditional Workstation,Text data,Never,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,0,0,0,100,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Rarely,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,10,30,40,20,0,0,Unsupervised Learning,Decision Trees - Random Forests,,CRM/Marketing,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,Decision Trees,"Amazon Web services,Java,SQL",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,70,10,10,10,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,RapidMiner (free version),Deep learning,Python,Government website,College/University,,,Very useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,30,5,0,0,5,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,3-5 years,Some other way,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Neural Networks","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,NoSQL,Orange,Python,R,RapidMiner (free version),SQL,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Often,,,,,,,Often,,,,Most of the time,,Often,,,,"Association Rules,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks",,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,Most of the time,Often,,,,,,,,,,,,,,80,10,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools",,Often,,,Often,,,,Sometimes,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,IT Department,india.gov.in;,Managing Data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,"70,000",INR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Vietnam,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,40,0,10,0,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,,,,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Sometimes,Most of the time,,,,,,"CNNs,Data Visualization,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics",,,,Often,,,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,Sometimes,Most of the time,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Most of the time,,,,Often,Often,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,"google word2vec, fastText wikipedia",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Subversion",Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,28,Employed full-time,,,No,Yes,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,Researcher,Work,0,70,30,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,26,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Stack Overflow Q&A",,,,,Very useful,,Very useful,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Statistician",University courses,15,0,20,60,0,5,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",I don't know/not sure,Insurance,100 to 499 employees,Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,Most of the time,,,,Often,,,Rarely,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation,Text Analytics",,Sometimes,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Often,,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Often,Sometimes,,51-75% of projects,Entirely external,Standalone Team,forum; web sites contain,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,43000,EUR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Stack Overflow Q&A",Very useful,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,5,0,15,80,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,Never,<1MB,"Bayesian Techniques,CNNs","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Often,,,,,,"Bayesian Techniques,CNNs,kNN and Other Clustering,Naive Bayes,Neural Networks",,,Often,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,40,50,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,openflight ; cooking websites,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,,EUR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,Very useful,Very useful,,,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,,Self-taught,95,5,0,0,0,0,,,A professional degree,Academic,100 to 499 employees,Increased slightly,Don't know,A general-purpose job board,Not at all important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation",Other,Always,1GB,Neural Networks,"C/C++,MATLAB/Octave,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,0,50,20,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,,Often,,,,,Often,,,,,,,76-99% of projects,Do not know,,,,,Share Drive/SharePoint,,Git,Sometimes,50000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Other,35,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Deep learning,R,Government website,"College/University,Online courses,Stack Overflow Q&A,Textbook,Other",,,Very useful,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Fine arts or performing arts,Less than a year,"Researcher,Other",University courses,20,40,0,40,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Academic,500 to 999 employees,Increased significantly,Less than one year,Some other way,Not at all important,Other,Laptop or Workstation and local IT supported servers,Other,Rarely,100MB,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R,SQL,Tableau",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Rarely,,,Often,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling",,,,,,Sometimes,Often,,,,,,,Rarely,,Often,,,,,,Often,,,,,,,,,,,,70,5,10,7,8,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,,,Often,Most of the time,Sometimes,,,,,,,,,,,Sometimes,,100% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,12400,EUR,Other,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,34,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,No,Professional degree,,Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Online courses,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Necessary,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,,"Engineer,Machine Learning Engineer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision",,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Other",Self-taught,80,10,10,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",Primary/elementary school,Internet-based,,,,,Important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,100GB,"CNNs,Evolutionary Approaches,GANs,Neural Networks,RNNs","Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Sometimes,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,,Often,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,GANs,kNN and Other Clustering,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Text Analytics",,,,Often,,Sometimes,Sometimes,,,Often,,,,Often,,,,,Sometimes,Often,,,,Often,Often,,,,Sometimes,,,,,10,80,5,4,1,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,Most of the time,Often,,,,Most of the time,,Less than 10% of projects,Entirely external,Other,,Find it; get the proper authorization to use it,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform",,Git,,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,South Africa,45,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Other",Self-taught,50,15,25,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series",,High school,Telecommunications,"1,000 to 4,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,100MB,"Bayesian Techniques,Neural Networks","Microsoft SQL Server Data Mining,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Segmentation,Time Series Analysis",,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,60,15,15,10,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Online courses,Personal Projects",,,Very useful,,Very useful,,,,,,Very useful,Very useful,,,,,,,"Data Machina Newsletter,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher",Self-taught,20,20,30,25,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,,10GB,"Neural Networks,RNNs,SVMs","Java,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,Segmentation,Text Analytics",,,,Sometimes,,Often,,,,,,,,Sometimes,,Sometimes,,,Often,Often,,,,,Often,Sometimes,,,Most of the time,,,,,25,25,25,5,20,0,Enough to explain the algorithm to someone non-technical,Difficulties in deployment/scoring,,,,Sometimes,,,,,,,,,,,,,,,,,,,None,Entirely external,IT Department,,,Key-value store (e.g. Redis/Riak),Commercial Data Platform,,Git,,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,34,"Not employed, but looking for work",,,,,,,,TensorFlow,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities",,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,< 1 year,Necessary,Nice to have,,,,,,,,,,,,,GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,20,0,0,0,70,10,"Computer Vision,Speech Recognition,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,France,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,25,50,15,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Other,I don't know,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,,10GB,,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,25,0,0,50,25,0,Enough to tune the parameters properly,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Never,,,Other,7,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),Other","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,15,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Financial,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,R,SQL,Stan,TensorFlow",,Sometimes,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Rarely,,Often,Most of the time,Often,Often,Rarely,,Often,,Most of the time,,Sometimes,,Sometimes,,Often,Often,,Often,,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,,,30,15,10,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,Often,,,,,Sometimes,Rarely,,Often,,,,,,Sometimes,Sometimes,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,39000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Other,21,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,5,5,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,0,10,80,10,0,0,"Computer Vision,Time Series",Bayesian Techniques,High school,Internet-based,500 to 999 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,100MB,Other,"Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,0,30,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Key-value store (e.g. Redis/Riak),Email,,Git,,500000,CNY,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,30,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",Relational data,Sometimes,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,Sometimes,,Rarely,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,,Sometimes,,Most of the time,Often,Often,,,,Often,,Often,Often,Often,Sometimes,,Often,Sometimes,Most of the time,Often,Often,Often,,Often,Often,Often,Often,Often,,,,40,5,10,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,,Often,Most of the time,Often,,,,,,,,,Sometimes,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,11,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,,,TensorFlow,Neural Nets,Python,Google Search,"Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,,,,,,Very useful,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Computer Vision,Unsupervised Learning",Neural Networks - CNNs,A master's degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Sometimes,10MB,Neural Networks,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,90,4,4,1,1,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,100% of projects,Entirely external,IT Department,Minecraft textures;Terraria Textures;The Internet,Finding it,Other,I don't typically share data,,Git,Always,0,GBP,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Germany,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,I don't plan on learning a new ML/DS method,Python,"GitHub,Google Search","Blogs,College/University,Personal Projects,Textbook,YouTube Videos",,Very useful,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,54,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,GitHub,"College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,Coursera,GPU accelerated Workstation,40+,Github Portfolio,No,Master's degree,Engineering (non-computer focused),,"Data Analyst,Machine Learning Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series","Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,34,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",0,70,20,0,5,5,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,Perl,Deep learning,Python,GitHub,"College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Very useful,,,,,,,"FastML Blog,Talking Machines Podcast,The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,50,30,0,20,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Philippines,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Data Scientist,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Predictive Modeler",Work,0,0,70,0,0,30,"Adversarial Learning,Machine Translation,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Never,100MB,Regression/Logistic Regression,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,Most of the time,,,,0,0,0,70,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Other",Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Retired,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,DBA/Database Engineer,Other",University courses,30,20,0,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,44,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Proprietary Algorithms,Python,I collect my own data (e.g. web-scraping),"Blogs,Friends network,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,,,,Somewhat useful,,,,Somewhat useful,Not Useful,,,Very useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A health science,6 to 10 years,"Business Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,0,80,0,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Always,10MB,Neural Networks,"Amazon Web services,Jupyter notebooks,NoSQL,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,kNN and Other Clustering,Natural Language Processing,Neural Networks,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,,,,,,,,Often,,,,,Most of the time,Often,,,,,,Often,,,Most of the time,Often,,,,35,40,25,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,Often,,,,Most of the time,,,,,,,,Often,,Most of the time,,,,,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Always,113000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,Very useful,Very useful,,,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",18,80,0,0,2,0,Time Series,Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Oracle Data Mining/ Oracle R Enterprise,Neural Nets,,"GitHub,Google Search",Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,20,0,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,6-10 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Never,10MB,,"Oracle Data Mining/ Oracle R Enterprise,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,,,kNN and Other Clustering,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,,,,,Share Drive/SharePoint,,Subversion,Rarely,80000,INR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,No,Yes,Programmer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,0,0,0,50,0,Computer Vision,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Social Network Analysis,Python,"GitHub,Google Search","College/University,Friends network,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,30,5,10,5,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",,Technology,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Relational data",Always,1GB,"CNNs,Neural Networks,SVMs","Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Logistic Regression,Neural Networks,SVMs",,,,Often,Often,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,Sometimes,,,,,,10,40,45,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Git,Sometimes,6000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Republic of China,24,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,Very useful,,Very useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,"Data Miner,Programmer,Software Developer/Software Engineer",Self-taught,80,0,10,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,500 to 999 employees,Increased slightly,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Don't know,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,RNNs,SVMs",,,,Rarely,Sometimes,Often,,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,Rarely,,,Sometimes,Sometimes,Rarely,,,Sometimes,,,,,,60,30,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,100% of projects,More external than internal,Standalone Team,none at work,to clean it up,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Subversion,Never,150000,CNY,Other,8,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Researcher,University courses,40,0,0,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,"Partially Derivative Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Work,25,5,60,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,Rarely,,,Often,Sometimes,,,Most of the time,,,,50,5,15,10,20,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Often,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,TensorFlow,Proprietary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Conferences,Kaggle,Textbook,YouTube Videos",,,Very useful,Very useful,Very useful,,Very useful,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,20,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important +Male,United Kingdom,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher,Statistician",University courses,30,0,0,40,30,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100GB,Gradient Boosted Machines,"Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Time Series Analysis",,Sometimes,Sometimes,,,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Often,,,,Often,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,,,15,45,25,15,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,51,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,33,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Researcher,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Bayesian Techniques,,Financial,100 to 499 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,100GB,Bayesian Techniques,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,0,0,100,0,0,0,Enough to tune the parameters properly,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,25,0,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,MATLAB/Octave,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,60,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Academic,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Image data,Text data",Always,100MB,Regression/Logistic Regression,"MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,20,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,Often,,,Most of the time,,,Sometimes,,,,Often,,,,,,,26-50% of projects,More internal than external,IT Department,,,Other,"Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"88,000",,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Australia,57,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Personal Projects,Trade book",,,,Very useful,,,Very useful,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,Other,Self-taught,40,40,20,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Pharmaceutical,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Other,GPU accelerated Workstation,Text data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Mathematica,Microsoft Excel Data Mining,Python,R",,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,SVMs,Text Analytics",,,Often,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,Often,Sometimes,,,,,5,20,10,5,5,55,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,51-75% of projects,More external than internal,Other,"Australian Pharmaceutical Benefits Scheme, Australian Medicare Statistics",Web scrapping to a satisfactory data table,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Romania,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,20,50,0,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,5,25,50,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data,Text data",Never,100GB,"Bayesian Techniques,Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,Stan,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,Often,,,,,,"Bayesian Techniques,CNNs,Neural Networks,RNNs,Simulation",,,Often,Often,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,Sometimes,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,20,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Podcasts",Very useful,Very useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,"Partially Derivative Podcast,Talking Machines Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,40+,PhD,No,Master's degree,Computer Science,,"Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,,RapidMiner (commercial version),Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer",Self-taught,50,30,20,0,0,0,Recommendation Engines,Logistic Regression,A bachelor's degree,Technology,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10MB,Regression/Logistic Regression,"Cloudera,Hadoop/Hive/Pig,NoSQL,QlikView,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Sometimes,,,,50,10,30,10,0,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,1400000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Hong Kong,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by college or university,Python,,,,"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Programmer,Researcher",University courses,40,0,0,30,30,0,Reinforcement learning,"Decision Trees - Random Forests,Neural Networks - GANs",,Mix of fields,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",,,"Ensemble Methods,Gradient Boosted Machines,Neural Networks","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"GANs,Gradient Boosted Machines,Neural Networks",,,,,,,,,,,Often,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,20,40,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Vietnam,NA,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Very useful,,Very useful,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,,Experience from work in a company related to ML,No,Master's degree,Computer Science,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Recommendation Engines,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Romania,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,R,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,30,30,10,30,0,0,"Machine Translation,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",High school,Military/Security,500 to 999 employees,Increased slightly,6-10 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,Rarely,10GB,"CNNs,Neural Networks,RNNs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Often,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,Most of the time,Often,,,,,,,,,,,,,0,30,0,20,50,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,Often,,,,,,,,,Sometimes,Often,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,,,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Other,Anomaly Detection,SQL,I collect my own data (e.g. web-scraping),"Online courses,Stack Overflow Q&A,Other",,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Other",,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,"Cross-Validation,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,,Often,,,,,Sometimes,,,,,,60,20,0,0,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Most of the time,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Never,45000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Google Search,"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Scientist,Other",University courses,20,5,60,10,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,,,,,Most of the time,Most of the time,,,,,Most of the time,,Often,Often,Often,,,,,Most of the time,,Often,,,Often,,,,Most of the time,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,,,,,,,,Most of the time,,,,,,Often,,,Most of the time,,,Sometimes,,76-99% of projects,More internal than external,Central Insights Team,Weather; Street map,Insufficient hardware,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,38000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,6 to 10 years,Statistician,Self-taught,75,15,10,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,47,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,,,,,Very useful,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,"Adversarial Learning,Machine Translation,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs",,Manufacturing,500 to 999 employees,Stayed the same,Don't know,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Image data,Video data,Text data,Relational data",Sometimes,,SVMs,"C/C++,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,30,0,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,None,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Mercurial,Sometimes,120000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Spain,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Personal Projects",Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,South Korea,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,Less than a year,Engineer,University courses,0,0,15,80,5,0,Recommendation Engines,Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,,,,"Coursera,Udacity",GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,20,50,0,0,30,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,30,Employed full-time,,,Yes,,Computer Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,50,30,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,I haven't started working yet,Self-taught,15,0,80,0,5,0,Supervised Machine Learning (Tabular Data),"Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Other,20 to 99 employees,Increased significantly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist",University courses,40,0,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,10 to 19 employees,Decreased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Microsoft R Server (Formerly Revolution Analytics),Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,40,25,0,25,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,KNIME (free version),Python,R,Spark / MLlib",,Sometimes,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics",,,,,Sometimes,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,Sometimes,Often,,Most of the time,,,Often,Often,,Most of the time,Most of the time,Most of the time,Often,,Most of the time,Often,Most of the time,,,,,40,20,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,Often,,,,Most of the time,,,,,,Sometimes,,,,Often,,,,26-50% of projects,More internal than external,Standalone Team,publicly available data,data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,1500000,INR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Israel,27,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Work,35,4,60,1,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A master's degree,Technology,I prefer not to answer,Increased slightly,3-5 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Relational data,Other",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Netherlands,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",60,20,0,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,49,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Support Vector Machines (SVM),Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,Somewhat useful,,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data,Relational data",Rarely,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,SQL,Unix shell / awk",,,,Sometimes,,,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,"CNNs,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Often,,,Often,,,,,,,,,,,,,Often,Often,,,,,Often,,Often,,,,,,60,20,NA,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,Often,,,,,,,,,,,,Often,,Sometimes,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,60000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Spain,23,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Online courses,Stack Overflow Q&A,YouTube Videos,Other",Somewhat useful,,Very useful,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Programmer,University courses,2,8,40,45,5,0,Natural Language Processing,"Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,,,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Basic laptop (Macbook),Other,Don't know,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Often,,,,,,,,Often,,Sometimes,Often,,,,Often,,,,,Often,Sometimes,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,51-75% of projects,Approximately half internal and half external,Other,We use data from edX,Design of proper indicators,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"12,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Programmer,Self-taught,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Never,,HMMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by professional services/consulting firm,Julia,Bayesian Methods,Julia,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,10MB,"Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Rarely,,,,Rarely,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Time Series Analysis",Rarely,,,,,Sometimes,Often,Sometimes,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,Sometimes,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,Most of the time,,,,Sometimes,,Most of the time,,Most of the time,,Often,,,,Most of the time,,,,,Sometimes,,76-99% of projects,Entirely external,Standalone Team,CENSUS dataset,My skills level,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,610000,RUB,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Other,18,Employed part-time,,,No,Yes,Programmer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,India,37,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,R,Deep learning,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites",Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Researcher,Other",University courses,30,0,10,60,0,0,"Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,A master's degree,Academic,10 to 19 employees,Stayed the same,Less than one year,A tech-specific job board,Not very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Rarely,10MB,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,60,20,5,10,5,0,Enough to refine and innovate on the algorithm,"Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,Often,,,,,,,,Often,,,Less than 10% of projects,Do not know,Standalone Team,,time bound,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Most of the time,25000,RSD,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,Other,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,50,40,5,0,5,0,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Relational data,Other",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,NoSQL,Perl,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk,Other",Rarely,Most of the time,,Sometimes,,,,Sometimes,Often,,,,,,Often,,,,,,,Sometimes,,,,,Most of the time,,,Sometimes,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,Sometimes,Often,,Most of the time,Most of the time,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Most of the time,Often,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Often,Most of the time,,Most of the time,Often,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,Most of the time,,Sometimes,Sometimes,Most of the time,,,,10,5,10,5,70,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database",Often,Most of the time,Often,,Most of the time,Most of the time,,,Often,Often,Often,,,,,Sometimes,Most of the time,Sometimes,,,,,100% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,"1,000,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,,,,Mix of fields,"1,000 to 4,999 employees",,,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",,,,,"Amazon Web services,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,kNN and Other Clustering",Often,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,60,15,5,15,5,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,,,,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,Python,Deep learning,Python,"GitHub,Google Search","Blogs,College/University",,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",0 - 1 hour,PhD,Yes,Master's degree,A social science,1 to 2 years,Machine Learning Engineer,University courses,0,0,0,100,0,0,Recommendation Engines,Decision Trees - Random Forests,,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,35,20,10,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,36,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,"Jack's Import AI Newsletter,Partially Derivative Podcast",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,PhD,No,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,0,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,40,10,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",Sometimes,,Often,,,Most of the time,Most of the time,Sometimes,,,,,,Often,,Often,,Often,Often,Often,,,Often,,,,,,Often,,,,,40,20,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,10,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Java,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,Sometimes,,,,Often,,Often,,,,,,,,Sometimes,Most of the time,,,Rarely,Often,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Segmentation,SVMs,Text Analytics",Sometimes,Often,Sometimes,,,Often,Often,Often,,,,,,Sometimes,,Rarely,,Rarely,Often,Sometimes,,,,,,Sometimes,,Sometimes,Often,,,,,30,20,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data",,Often,,Often,Most of the time,,,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,0,HKD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Cluster Analysis,Python,Google Search,"College/University,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,30,"Not employed, but looking for work",,,,,,,,Java,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites,Other",Other,,,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,Emergent/Future Newsletter (Algorithmia),Talking Machines Podcast",3-5 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher,Other",Self-taught,80,20,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Male,Switzerland,30,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,edX,GPU accelerated Workstation,0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",28,70,0,0,2,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Very Important +Female,Spain,24,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,R,Association Rules,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Friends network,Non-Kaggle online communities,Online courses",,Somewhat useful,Very useful,,,Somewhat useful,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,Other,6 to 10 years,Other,University courses,10,5,85,0,0,0,Reinforcement learning,"Bayesian Techniques,Logistic Regression",High school,Other,10 to 19 employees,Increased slightly,More than 10 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Logistic Regression,PCA and Dimensionality Reduction",,Often,Often,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,70,10,2,3,15,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Limitations of tools",,,,,,Often,,,Often,,,,Most of the time,,,,,,,,,,,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Mercurial",Rarely,25000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,France,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,R,University/Non-profit research group websites,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,"Data Analyst,Data Miner,Data Scientist,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Hungary,41,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Psychology,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,20,10,10,0,"Computer Vision,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow",Sometimes,Most of the time,,,Sometimes,,,Sometimes,Often,Rarely,Rarely,Rarely,Sometimes,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,Rarely,Rarely,,Sometimes,Most of the time,,,Most of the time,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Natural Language Processing,Neural Networks,Recommender Systems,SVMs,Text Analytics",,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,Sometimes,Sometimes,,,,,30,20,10,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Privacy issues",Often,,,,,,,,,,,,,,,,Often,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,120000,AUD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,46,Employed full-time,,,Yes,,Engineer,Perfectly,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Non-profit,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data,Relational data",Sometimes,10MB,"CNNs,Decision Trees,Neural Networks,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,RapidMiner (commercial version),SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Often,Often,,,,,,,,Often,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,Often,Sometimes,,Most of the time,Most of the time,Often,,,,,,Often,,,,,,Often,Often,,,,,,,,,Most of the time,,,,35,10,10,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,Often,,Most of the time,Often,,,Often,Most of the time,,,,,Often,Most of the time,,Sometimes,,,,Sometimes,,76-99% of projects,More external than internal,Standalone Team,Google maps; open data Amsterdam; open data netherlands ,Accessibility,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Google drive,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,75000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,36,Employed full-time,,,Yes,,Scientist/Researcher,,,Google Cloud Compute,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,More than 10 years,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data,Other",Sometimes,100MB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,Often,,,,,,,,,Often,,,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Most of the time,,,Often,Most of the time,,,,,,,Sometimes,,Often,,Sometimes,,Often,Often,,Sometimes,,,,Often,Sometimes,Sometimes,Often,,,,10,40,25,20,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Sometimes,,Most of the time,Often,,,Often,,Rarely,Often,Often,,,,Often,,Often,Sometimes,Often,,100% of projects,Entirely external,Standalone Team,,Getting the data in the first place...,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Email,Other",Dropbox,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,325000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Hong Kong,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Official documentation,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,"Data Analyst,Data Scientist,Statistician",Self-taught,50,0,0,0,50,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Insurance,I don't know,Increased slightly,Don't know,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","KNIME (commercial version),Orange,Python,R,SAS Base,SAS Enterprise Miner,TensorFlow",,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,Sometimes,,Often,,,,,Most of the time,Sometimes,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,Most of the time,Often,Sometimes,Sometimes,,,,,,,Often,,,,Often,Sometimes,,Often,,,Often,,Sometimes,Often,,,,,40,40,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Australia,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Technology,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Rarely,10GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,Random Forests",,,,,,Most of the time,Most of the time,Sometimes,Often,,,,,,,Often,,Sometimes,,Often,,,Often,,,,,,,,,,,30,10,20,10,30,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,Often,,,,,,,Most of the time,,Less than 10% of projects,Do not know,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,Git,Always,65000,AUD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Iran,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Computer Scientist,University courses,10,10,40,10,10,20,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Data Analyst,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,0,30,0,20,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,28,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,Blogs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Text Mining,Python,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,1-2 years,,,,,Nice to have,,,Necessary,,,,,,,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,70,0,10,20,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,,,,Somewhat important,Somewhat important,,,,,,Very Important,, +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos,Other,Other,Other",,,Somewhat useful,,,,Somewhat useful,Very useful,Very useful,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,,Self-taught,25,25,5,45,NA,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,,,,Most of the time,Most of the time,Often,Most of the time,,,Often,,Often,Often,Often,,Often,Often,,Most of the time,,Most of the time,,,Most of the time,,Often,Most of the time,Most of the time,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Rarely,100000,,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,C/C++,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Trade book,YouTube Videos,Other,Other",Very useful,Very useful,,,,,Very useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Software Developer/Software Engineer,University courses,10,70,0,0,20,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",Primary/elementary school,Retail,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Relational data,Most of the time,10GB,"CNNs,Ensemble Methods,Markov Logic Networks","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,,Often,Often,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Ensemble Methods,Markov Logic Networks,Naive Bayes,Neural Networks",,,Often,Most of the time,,,,,Often,,,,,,,,Most of the time,Often,,Often,,,,,,,,,,,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,Often,,,,,Often,,,,,,Sometimes,,,,,,,51-75% of projects,Entirely internal,IT Department,Various internet,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,75000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Germany,20,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Personal Projects",,,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,10,0,20,70,0,0,,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10GB,,"R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Logistic Regression",Often,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,50,10,0,10,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Explaining data science to others,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,Most of the time,,,,,Often,,,,,,,,Sometimes,,,,51-75% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Udacity,Other","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,,"Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,Time Series,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Male,Other,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Business Analyst,Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Other,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs",,Sometimes,,,,,Often,Often,,,,Often,,Often,,Often,,,Sometimes,,Often,,Often,Sometimes,,Often,,Often,,,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Most of the time,Often,,,Often,Often,,Most of the time,Most of the time,,,,,Often,,,,,,,Often,,100% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,120000,HRK,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Spain,49,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,90,5,3,2,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A professional degree,Other,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,Python,RapidMiner (free version),SQL,TensorFlow",,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,Sometimes,,,,Often,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,,Most of the time,Often,,,,,,Often,,Often,,,Most of the time,Most of the time,Often,Often,Often,,,,,,Most of the time,,,,,80,10,3,2,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Unavailability of/difficult access to data",Sometimes,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,51-75% of projects,Entirely internal,Business Department,,Develop a predictive model to improve the Alerts in the Emergency Service of the Hospital,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,South Korea,44,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Conferences,Kaggle,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,,,,,,Very useful,,,Very useful,"FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,Business Analyst,Self-taught,30,10,60,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Ensemble Methods,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Rarely,100MB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,Often,Most of the time,,,,,,"Collaborative Filtering,Data Visualization,Natural Language Processing,Neural Networks,Recommender Systems,SVMs,Text Analytics",,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,Sometimes,,,,Often,,,,Sometimes,Often,,,,,50,20,0,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,Most of the time,,,Often,,,,,,Often,,,10-25% of projects,More external than internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,,,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",University courses,10,40,0,20,0,30,"Computer Vision,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Philippines,32,Employed full-time,,,Yes,,DBA/Database Engineer,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",R,GitHub,"Kaggle,Online courses,Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",High school,Technology,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Most of the time,1TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,70,30,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues",Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,,,,,Less than 10% of projects,Entirely internal,Other,,"infrastructure, lack of computing resource","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,1000000,PHP,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Netherlands,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Sometimes,10MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,Rarely,,,,,,Rarely,,Often,,,,,Often,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics",,,Rarely,Sometimes,,Most of the time,,,,,,,,Rarely,,Sometimes,,,Sometimes,Sometimes,Most of the time,,,Sometimes,,,,Sometimes,Sometimes,,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,Sometimes,,Often,,,,Most of the time,Sometimes,Often,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Rarely,Sometimes,Sometimes,Sometimes,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,No,Yes,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,46,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",Kaggle competitions,40,20,36,0,4,0,"Computer Vision,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Bayesian Methods,Python,University/Non-profit research group websites,"College/University,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,,Other,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,Germany,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,Researcher,Work,50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Support Vector Machines (SVMs),High school,Academic,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Time Series Analysis,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,75,0,0,25,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Decision Trees - Random Forests,A bachelor's degree,Insurance,500 to 999 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,,1GB,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Fine,Self-employed,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Workstation + Cloud service,40+,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Physics,1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,65,10,0,5,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important +Male,United States,30,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,10,10,0,60,20,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Belgium,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,10,80,0,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Most of the time,1GB,,"Amazon Web services,Java,NoSQL,QlikView,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,,Most of the time,,,,,,Often,,,,"Logistic Regression,Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,Most of the time,,,,,30,0,0,0,10,60,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Often,,,Most of the time,,,,,Often,Most of the time,,,,Most of the time,Often,Sometimes,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Rarely,80000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,France,28,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Time Series Analysis,Python,GitHub,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Talking Machines Podcast,3-5 years,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,PhD,Yes,Doctoral degree,Computer Science,,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,Neural Networks - RNNs,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,France,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Text Mining,R,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Basic laptop (Macbook),,Kaggle Competitions,Yes,Master's degree,Engineering (non-computer focused),,Data Scientist,,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,Netherlands,39,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,3 to 5 years,Researcher,University courses,30,20,20,30,0,0,Time Series,"Hidden Markov Models HMMs,Neural Networks - RNNs",,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,Researcher,University courses,20,10,30,40,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Google Cloud Compute,IBM Watson / Waton Analytics,Python,R",,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,Most of the time,,Most of the time,Most of the time,,,,,,,,Most of the time,,,Most of the time,,Most of the time,Most of the time,,Most of the time,,Most of the time,Often,,Most of the time,Often,,,,30,20,10,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Often,Often,,Most of the time,,,Often,,,,Most of the time,Most of the time,,,,Most of the time,Most of the time,,26-50% of projects,Approximately half internal and half external,IT Department,Labor government ,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,55000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Mathematica,Monte Carlo Methods,Other,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Personal Projects,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,,,,,,Very useful,,,,,,Not Useful,FastML Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,15,20,35,25,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Academic,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",,1TB,"CNNs,GANs,Neural Networks,RNNs","MATLAB/Octave,Python,Other",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Data Visualization,GANs,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,,Most of the time,,,Often,,,,Sometimes,,,,,,,,,Most of the time,Sometimes,,,,Often,,,,,Rarely,,,,50,25,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,"Human3.6M, NTU RGBD Dataset, ChaLearn Dataset.",Data Preprocessing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,25000,INR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Other,Anomaly Detection,Python,Google Search,"Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Insurance,,,,,Not at all important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Rarely,10GB,"CNNs,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Microsoft Azure Machine Learning,Python,SQL,TensorFlow,Other",,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Rarely,,,Most of the time,,,"CNNs,kNN and Other Clustering,Natural Language Processing,Neural Networks,RNNs",,,,Most of the time,,,,,,,,,,Often,,,,,Often,Often,,,,,Often,,,,,,,,,10,40,5,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",Most of the time,Often,,,,Sometimes,,Sometimes,Rarely,,,,,,,,,,,,Rarely,,10-25% of projects,Entirely internal,IT Department,MNIST;,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,Employed full-time,,,Yes,,Computer Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,90,0,10,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Decreased significantly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,New Zealand,NA,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,,Nice to have,,,Nice to have,Nice to have,Nice to have,,,,"Udacity,Other",Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Engineering (non-computer focused),Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Technology,100 to 499 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,Arxiv,Very useful,,,,,,,,,,,,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,Computer Scientist,University courses,30,5,0,65,0,0,Reinforcement learning,"Ensemble Methods,Gradient Boosting,Markov Logic Networks",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Ensemble Methods,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process",,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,,,,,,24000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Belgium,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Tableau,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Somewhat useful,,,,,,,,,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Researcher",Kaggle competitions,40,0,0,0,60,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,Fewer than 10 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Most of the time,1TB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Java,MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Random Forests,Segmentation,SVMs",,,,,,Most of the time,Most of the time,,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Most of the time,,Rarely,,,,,,20,40,10,15,15,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,Often,,,,76-99% of projects,Entirely internal,IT Department,geoip,database performance,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Subversion",,40000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,10,40,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Amazon Web services,C/C++,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,54,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Government website,"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Data Miner,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",15,80,5,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,Primary/elementary school,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Decision Trees,Random Forests","Jupyter notebooks,Python,RapidMiner (free version),Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Sometimes,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Often,,,,,,,,,,Sometimes,,,,,Sometimes,,,Often,,,Sometimes,Often,,,,70,15,8,5,2,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,Most of the time,,,,,,,,,,,,Sometimes,Sometimes,,Often,,,26-50% of projects,Entirely internal,Standalone Team,,Missing or incomplete datasets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,R,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,,Very useful,,,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Researcher",Self-taught,40,30,20,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Most of the time,10MB,"CNNs,Decision Trees,Neural Networks,SVMs","MATLAB/Octave,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Time Series Analysis",,,Sometimes,Often,,Most of the time,,Often,Most of the time,,,,,Often,,Rarely,,Most of the time,,Most of the time,Most of the time,,Sometimes,,Sometimes,,,Most of the time,,Most of the time,,,,0,40,40,20,0,0,Enough to explain the algorithm to someone non-technical,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Sometimes,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,,Sometimes,39600,SGD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United Kingdom,44,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Other,Python,Google Search,"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Other,,"Business Analyst,Other",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important +Male,Belgium,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Predictive Modeler,Researcher",Work,60,20,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Other,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,39,Employed full-time,,,Yes,,Other,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,40,15,20,15,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,500 to 999 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,CNNs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,C/C++,Google Cloud Compute,MATLAB/Octave,R",,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Segmentation,SVMs,Text Analytics",Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Often,,,,,,,,Often,Sometimes,Sometimes,,Often,,,,,,Often,,Often,Often,,,,,20,40,10,0,10,20,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Other",Self-taught,70,20,0,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Google Cloud Compute,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer",University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Often,Often,Often,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,10,30,0,20,40,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Privacy issues,Unavailability of/difficult access to data",,Often,,,,,,,,,,,,,,,Often,,,,Sometimes,,26-50% of projects,More internal than external,Standalone Team,UCI Rep.; MEKA Rep,Size (streams),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,Linear Digressions Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,30,30,20,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk,Other",,Sometimes,,,,,,,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,Often,,,,,,Often,Rarely,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,Sometimes,,,,Often,Most of the time,Often,Sometimes,,,Sometimes,,Often,,,,,,,Sometimes,Rarely,Sometimes,Rarely,,Sometimes,Sometimes,,,Sometimes,,,,20,10,20,20,10,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Sometimes,,,,,,,Sometimes,Often,,,,Often,,Most of the time,,,Often,,,100% of projects,More internal than external,Standalone Team,weather,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,scp,Git,Sometimes,,EUR,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,"Business Analyst,DBA/Database Engineer,Machine Learning Engineer,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Machine Learning Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,South Africa,26,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Anomaly Detection,R,Google Search,"Blogs,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,,,,,,,,,Very useful,,,Somewhat useful,,"Emergent/Future Newsletter (Algorithmia),Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Markov Logic Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Self-taught,30,10,60,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Image data,Video data,Relational data",Always,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,RapidMiner (free version),SAS Enterprise Miner,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,Rarely,Sometimes,,Most of the time,Often,Often,,,Most of the time,Most of the time,Most of the time,Rarely,,,,,,Most of the time,,Most of the time,,Rarely,,,,Rarely,,,Most of the time,,,Sometimes,Often,,Sometimes,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Most of the time,,,Often,Most of the time,Sometimes,Most of the time,Often,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,62,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),KDnuggets Blog",< 1 year,Necessary,,Necessary,,Necessary,Necessary,,,,,,,,,Other,2 - 10 hours,Github Portfolio,,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,Very Important,Very Important,,, +Male,Russia,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,,,,,,"Arxiv,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,"5,000 to 9,999 employees",,,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Other",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Recommender Systems,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Time Series Analysis,Python,"GitHub,Google Search","Arxiv,Blogs,Kaggle,Personal Projects",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,,Other,20,0,50,20,10,0,"Computer Vision,Speech Recognition","Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data",Sometimes,10GB,"Ensemble Methods,GANs,Neural Networks","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,GANs,Neural Networks,Random Forests,Segmentation",,,,Most of the time,,Sometimes,,Sometimes,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,30,20,40,10,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning",,,,Most of the time,Most of the time,,,,,,,Sometimes,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,Other,,,Git,Always,"73,000",RUB,,8,,,,,,,,,,,,,,,,,, +Female,Russia,50,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,IBM SPSS Modeler,Support Vector Machines (SVM),Python,"GitHub,Google Search","Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,,,Very useful,,,Very useful,,Very useful,,Very useful,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Other,Work,5,5,90,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM Cognos,IBM SPSS Modeler,Java,Python,QlikView,R,SQL,Tableau",,,,,,,,,,Often,Often,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,Rarely,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",,,,Sometimes,,Often,Often,Often,,,,,,,,Often,,,,,Often,Often,Often,Often,,Often,,Sometimes,,Sometimes,,,,45,40,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,Public,Time,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,2500000,RUB,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Denmark,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Jupyter notebooks,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Data Scientist,Self-taught,30,30,20,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Academic,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,100GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,Often,Rarely,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Simulation",,,Often,,,,Often,Rarely,,,,,,Rarely,,Often,,,,,,,,,,,Most of the time,,,,,,,50,20,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,,,,,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,500000,DKK,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,France,32,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Python,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,10,10,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,TIBCO Spotfire",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Often,,,,,Rarely,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Often,,,,Often,,,,,,,,,,Sometimes,Often,,,,,,Often,Sometimes,,,,30,5,15,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Often,,Sometimes,Most of the time,,,,,Often,,,,,,,,Often,,,,Often,,76-99% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,42000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +A different identity,Other,50,Employed full-time,,,No,Yes,Computer Scientist,Poorly,Employed by college or university,MATLAB/Octave,Cluster Analysis,R,Google Search,"Personal Projects,Trade book",,,,,,,,,,,,Somewhat useful,,,,Very useful,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Researcher",Self-taught,40,0,60,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Neural Networks - RNNs",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,26,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",Work,70,5,20,5,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Decreased significantly,6-10 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100GB,"Decision Trees,Random Forests","Amazon Machine Learning,Java,Jupyter notebooks,Python,R,Unix shell / awk",Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,,,,Often,Often,Most of the time,Rarely,Most of the time,,,,,,,,,,Often,,,,Most of the time,Often,,,,,Often,Often,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,,,Often,,,,,Sometimes,Often,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Internal server,"Bitbucket,Other",Sometimes,20400,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Egypt,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,Python,University/Non-profit research group websites,"College/University,Personal Projects",,,Very useful,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Data Scientist,Engineer,Machine Learning Engineer",University courses,30,20,10,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,29,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,80,20,0,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Female,United Kingdom,20,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Operations Research Practitioner,University courses,20,20,20,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,47,"Not employed, but looking for work",,,,,,,,SQL,,R,Government website,"Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Partially Derivative Podcast",1-2 years,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Belarus,43,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning",Bayesian Techniques,A bachelor's degree,Government,I prefer not to answer,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,0,20,0,20,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Romania,29,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Personal Projects",,,,,Somewhat useful,,,,,,Very useful,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Data Analyst,Other",University courses,20,70,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,1GB,"Bayesian Techniques,Regression/Logistic Regression","QlikView,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Often,Often,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Segmentation",Sometimes,Sometimes,,,,Often,Most of the time,,,,,,,,,Often,,Sometimes,,,,,,,,Often,,,,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,,,,,Often,,,,Often,,Most of the time,,,,,,Sometimes,,51-75% of projects,More internal than external,Business Department,Lista Firme (Info about Romanian companies),Data integration from multiple sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Netherlands,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Unix shell / awk,"Ensemble Methods (e.g. boosting, bagging)",R,University/Non-profit research group websites,"Online courses,Personal Projects,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,Other,University courses,30,50,10,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Traditional Workstation,Relational data,,10MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,,,,Often,,,,,,,,,Often,,,,,Often,Sometimes,,,,,,,,Sometimes,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Rarely,50000,EUR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Iran,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Poorly,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,University/Non-profit research group websites,"College/University,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer",University courses,50,10,30,10,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Text data",Sometimes,1GB,"CNNs,Gradient Boosted Machines,Neural Networks,RNNs,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,SVMs,Text Analytics",,,,Often,,Often,,,,,,Often,,,,,,Often,Often,Often,Often,,,Often,Often,,,Often,Often,,,,,25,15,40,10,10,0,Enough to explain the algorithm to someone non-technical,Limitations in the state of the art in machine learning,,,,,,,,,,,,Rarely,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,50000000,IRR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Africa,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SAS JMP,Neural Nets,SQL,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,DBA/Database Engineer,Software Developer/Software Engineer",University courses,6,8,5,9,62,10,Natural Language Processing,"Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - GANs",,Retail,100 to 499 employees,Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests","C/C++,Java,Microsoft SQL Server Data Mining,SQL",,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,kNN and Other Clustering,Neural Networks,Simulation",,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,10,10,10,10,10,50,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Most of the time,,,,,,,,,,,,,Often,Most of the time,,,,,,,,None,More internal than external,IT Department,,,Key-value store (e.g. Redis/Riak),"Company Developed Platform,Email,Share Drive/SharePoint",,Bitbucket,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,Germany,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,50,5,5,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,10,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Financial,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,QlikView,R,SQL,TensorFlow",Rarely,,,,,,,,Most of the time,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Sometimes,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Often,,,,Often,,,Often,,,Often,,,,Most of the time,,,,30,20,40,10,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,,,,,,,,,,,,,,Sometimes,,,,100% of projects,More internal than external,Central Insights Team,Appannie ,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Most of the time,50000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,"Data Elixir Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,20,40,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Other",Most of the time,10GB,"CNNs,Neural Networks","C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Python,Spark / MLlib,TensorFlow,Unix shell / awk,Other",,,,Rarely,,,,Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Often,,Most of the time,Sometimes,,,"CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Text Analytics",,,,Most of the time,,Most of the time,Most of the time,,,,Sometimes,,,Often,,Sometimes,,,Most of the time,Most of the time,Often,,,,Most of the time,Sometimes,,,Often,,,,,40,20,30,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Most of the time,,,,Sometimes,,Sometimes,Most of the time,Often,Sometimes,Most of the time,,Sometimes,Most of the time,,Often,Most of the time,,100% of projects,More internal than external,Standalone Team,Github; Bitbucket,Analysis,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,20400,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,,Kaggle Competitions,Yes,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,40,"Not employed, but looking for work",,,,,,,,C/C++,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",5-10 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher",Self-taught,50,30,20,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Republic of China,26,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Python,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Trade book",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,Very useful,,,"Data Machina Newsletter,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,Researcher,Self-taught,25,5,10,60,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,100 to 499 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Sometimes,100GB,CNNs,"C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,SVMs",,,,Often,,Often,Most of the time,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,30,30,5,30,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,Often,,,,Most of the time,,76-99% of projects,Approximately half internal and half external,Standalone Team,TCGA,preproccess,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,10000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Other,29,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by college or university,MATLAB/Octave,Decision Trees,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Kaggle Competitions,Yes,Master's degree,,1 to 2 years,"Computer Scientist,Engineer,Programmer,Researcher",University courses,0,0,0,70,30,0,"Speech Recognition,Time Series",Ensemble Methods,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Female,Germany,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Not Useful,,Very useful,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Udacity,Other",Other,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,70,0,20,0,0,Natural Language Processing,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,India,23,Employed full-time,,,No,Yes,Data Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,Self-taught,80,5,0,0,15,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Spain,45,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",< 1 year,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Female,India,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Predictive Modeler,Statistician",University courses,50,0,20,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Financial,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Random Forests,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Lift Analysis,Logistic Regression,Segmentation",,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,,,,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Often,,,,,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community",Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Engineer,Machine Learning Engineer",University courses,15,0,45,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,100MB,Other,"C/C++,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Often,Often,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,Most of the time,,Sometimes,,Sometimes,,,,,Sometimes,Often,,,,,40,35,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Privacy issues",,,,Sometimes,Most of the time,,,,,,,,,,,,Rarely,,,,,,10-25% of projects,More internal than external,Standalone Team,wikipedia; news corporas,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,1200000,RUB,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Italy,40,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,,3-5 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,Other,University courses,5,40,0,55,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,40,0,10,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Egypt,36,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,TensorFlow,Deep learning,R,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Tutoring/mentoring",,,Very useful,,,,Very useful,,,,,Very useful,,,,,Very useful,,"Becoming a Data Scientist Podcast,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,35,0,35,15,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,France,66,Retired,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,Very useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs",A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,SAP BusinessObjects Predictive Analytics,,,,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,Data Analyst,University courses,70,0,NA,30,0,0,Computer Vision,Ensemble Methods,I prefer not to answer,Military/Security,,,,,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Always,10TB,Ensemble Methods,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,10,10,80,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Rarely,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,Key-value store (e.g. Redis/Riak),I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Researcher,Statistician,Other",Self-taught,60,30,5,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,Technology,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,,Laptop or Workstation and private datacenters,Text data,Rarely,,,"Hadoop/Hive/Pig,Python,R",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,18,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,100 to 499 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,Microsoft SQL Server Data Mining,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Newsletters,Online courses,YouTube Videos",,,,,,,,Somewhat useful,,,Very useful,,,,,,,Somewhat useful,Data Elixir Newsletter,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,Other,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Work,30,50,20,0,0,0,Unsupervised Learning,Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important +Female,India,22,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,DataRobot,Neural Nets,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Engineer,Self-taught,50,5,5,5,10,25,"Recommendation Engines,Speech Recognition,Time Series","Decision Trees - Random Forests,Neural Networks - CNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,India,25,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,47,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Personal Projects,Stack Overflow Q&A,Trade book",,,,,Somewhat useful,,,,,,,Somewhat useful,,Very useful,,Somewhat useful,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Researcher",Work,70,0,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Academic,"5,000 to 9,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","Angoss,Cloudera,Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,R,SQL",,,Rarely,,Sometimes,,,,Often,,,,,,Often,,,,,,Often,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,Random Forests",,,Sometimes,,Often,,Most of the time,Sometimes,Often,,,,,,Sometimes,Often,,,,Sometimes,,,Often,,,,,,,,,,,50,15,0,20,15,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources",,Often,,,Sometimes,,,Often,,Most of the time,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,Twitter; Social network data (publicly available),retrieving data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,EUR,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,Belarus,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,Very useful,Somewhat useful,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),Linear Digressions Podcast",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,5,5,5,25,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Other,29,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,R,Proprietary Algorithms,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Newsletters,Online courses",,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",10,50,20,10,0,10,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression,Markov Logic Networks",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,Minitab,R",,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Logistic Regression,Markov Logic Networks,Prescriptive Modeling,Time Series Analysis",,Most of the time,Often,,,,,,,,,,,,,Most of the time,Sometimes,,,,,Often,,,,,,,,Most of the time,,,,10,20,20,20,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,,,,,,10-25% of projects,,IT Department,,Limit access to data ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,485,USD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,Very useful,,,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler",Self-taught,40,50,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Manufacturing,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,,Often,,,Rarely,Most of the time,,Often,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Often,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Simulation,SVMs,Time Series Analysis",,,Often,,Often,Often,Often,,,,,,,Often,,Often,,Often,,Often,Often,Often,,Often,,,Often,Often,,Often,,,,30,60,10,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Most of the time,,,Often,,,Often,,,Often,,Often,Often,Often,Often,Often,,Often,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,200000,USD,Has decreased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Germany,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,FastML Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,60,20,0,0,20,0,"Computer Vision,Machine Translation","Neural Networks - CNNs,Neural Networks - RNNs",,Internet-based,20 to 99 employees,Increased significantly,Less than one year,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Sometimes,1GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,Often,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Natural Language Processing,Neural Networks,RNNs,Segmentation,Text Analytics",,,,Sometimes,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,Often,,,Often,,,,,0,10,60,20,10,0,Enough to explain the algorithm to someone non-technical,"Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,50000,GBP,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,44,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Data Analyst,University courses,30,20,40,10,0,0,Time Series,Neural Networks - CNNs,,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,25,40,0,10,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,100 to 499 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,Sometimes,Sometimes,,,,,Sometimes,,,,Often,,Often,,,,,,,,,Often,,,,Rarely,,Sometimes,,,,"Cross-Validation,Logistic Regression,Neural Networks,Simulation,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Often,,,,,,,Often,,,Often,,,,10,20,50,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,,,Sometimes,,,Often,,,,,Sometimes,,,,,Often,Most of the time,Often,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,,Self-taught,70,0,15,15,0,0,Natural Language Processing,Bayesian Techniques,A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Online courses,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Other,1 to 2 years,"Business Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,40,0,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,40,10,50,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,1TB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,Most of the time,,Often,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Simulation",Most of the time,,Sometimes,,Often,Often,Often,,,,,,,Sometimes,,Most of the time,,,,,,,Sometimes,Most of the time,,,Rarely,,,,,,,10,25,35,20,10,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Email",,Git,Rarely,,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Poland,NA,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,30,10,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Financial,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,Regression/Logistic Regression,"Impala,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Text Analytics",,,,,,,,,,,,,,,,Often,,Often,,,Sometimes,,,,,,,,Sometimes,,,,,30,30,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,Sometimes,Most of the time,,,Sometimes,,,,,,,,,Most of the time,,Often,Often,,,26-50% of projects,Entirely internal,IT Department,,Getting data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,18000,,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Israel,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Cluster Analysis,Python,Government website,"Blogs,College/University,Company internal community,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,Other,University courses,60,20,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",Primary/elementary school,Other,I don't know,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Rarely,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,,Often,,,"A/B Testing,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems",Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Sometimes,Rarely,,,,,,,,,,60,20,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Most of the time,,26-50% of projects,More internal than external,Standalone Team,,generating enough of it to model,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Bitbucket,Sometimes,"135,000",EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,50,20,0,20,0,"Time Series,Unsupervised Learning",Logistic Regression,High school,Retail,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,No,Yes,Programmer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,30,50,0,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,23,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Other,,Other,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,21,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer",Self-taught,80,10,0,0,10,0,Survival Analysis,Logistic Regression,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,"Data Analyst,Data Miner,Programmer",Self-taught,65,20,10,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,A professional degree,Other,500 to 999 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10GB,Random Forests,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Data Visualization,Natural Language Processing,Random Forests",,,,,Rarely,,Often,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,60,5,20,10,5,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,NoSQL,Neural Nets,Python,Google Search,"Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,30,0,0,0,0,70,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,Romania,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,5,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Perl,Python",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,Most of the time,Sometimes,Most of the time,,Sometimes,Often,,,,,Often,,Often,,,Often,Often,Often,,Sometimes,Sometimes,Often,Often,,Sometimes,Often,Sometimes,,,,30,30,10,5,25,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Often,,,,,,,,,,,,,,Sometimes,,,Sometimes,,10-25% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,15,5,50,25,5,0,"Computer Vision,Reinforcement learning","Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Sometimes,10TB,"CNNs,Neural Networks","C/C++,Python,Other,Other",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,Often,,"CNNs,Cross-Validation,Data Visualization,Neural Networks",,,,Most of the time,,Most of the time,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,20,40,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues",,,,,Often,,,,,,Most of the time,,,,Often,,Sometimes,,,,,,26-50% of projects,More external than internal,Standalone Team,MICCAI datasets,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Bitbucket,Rarely,2250000,INR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,,Self-taught,30,30,0,40,0,0,,"Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,31,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Physics,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,0,50,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Kaggle,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,,,,"Data Stories Podcast,FastML Blog,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,No,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Spain,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,Data Analyst,Self-taught,80,10,0,0,10,0,,,A doctoral degree,Financial,"10,000 or more employees",,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,,"Python,R,SAS Base,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,NoSQL,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,,,,Somewhat useful,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Machine Translation,Natural Language Processing","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,20 to 99 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Cloudera,IBM Cognos,Microsoft Excel Data Mining,R,SQL",,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,Sometimes,,,,Often,Often,,,,,,,,Sometimes,,,,,,,Rarely,,,,,,,,,,,30,20,0,50,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,,Often,,,,,,,26-50% of projects,Entirely internal,IT Department,,not too much work on Data Science,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,50000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,50,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Python,R,Spark / MLlib",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",35,40,20,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,50,10,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Sometimes,,,100% of projects,Do not know,Standalone Team,,"Cleaning the dirty data, especially dates","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,"44,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Japan,42,"Not employed, but looking for work",,,,,,,,R,Other,R,Other,Other,,,,,,,,,,,,,,,,,,,Linear Digressions Podcast,< 1 year,,,,,,,,,,,,,,,Other,,Other,No,Bachelor's degree,Other,Less than a year,Other,Self-taught,10,0,0,0,0,90,Reinforcement learning,Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Other,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,75,0,0,15,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Conferences,Online courses,Textbook",,Somewhat useful,,,Very useful,,,,,,Very useful,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,Data Analyst,Work,30,15,50,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",High school,Manufacturing,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,Sometimes,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests",Sometimes,,,,,Often,Often,Often,,,,,,,,Often,,Often,,Often,,,Often,,,,,,,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Often,Sometimes,,,Often,,,,,,,Often,,,,Sometimes,Often,,26-50% of projects,More internal than external,Business Department,Government Open data; Twitter,Japanese Language Treatment,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,650,JPY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,30,10,0,0,"Machine Translation,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10MB,Decision Trees,"IBM Watson / Waton Analytics,NoSQL,TensorFlow",,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",35,30,15,10,5,5,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Always,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Sometimes,,Often,,,,Rarely,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Sometimes,Most of the time,,,Sometimes,Most of the time,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,SVMs",,,Sometimes,,,Most of the time,Most of the time,Sometimes,Often,,,Often,,,,Often,,,,Often,,,Often,,,,,Sometimes,,,,,,45,30,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Never,160000,CNY,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Turkey,40,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Textbook",Somewhat useful,,,,,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer",Self-taught,20,10,45,20,5,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Neural Networks","Amazon Web services,C/C++,NoSQL,R,SQL",,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Decision Trees,Neural Networks",Often,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,10,40,10,10,30,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Never,200000,TRY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,37,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Python,Google Search,"Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,Very useful,,,,,,,Very useful,,,,Very useful,,< 1 year,Necessary,,,,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,,Very Important,,,Very Important,,Very Important,Very Important,,,Very Important,,,Very Important, +Male,Netherlands,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,20,10,70,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Other,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression,RNNs","Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,Sometimes,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Neural Networks,Prescriptive Modeling,RNNs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,Often,,Often,,,Sometimes,,,,Sometimes,Often,,,,50,15,5,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Often,,,,,,,,,,,,,,,Most of the time,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,"25,000",EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Ireland,52,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Google Search,"Blogs,College/University,Friends network,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Very useful,,,Somewhat useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,University courses,40,0,0,60,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,28,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by company that makes advanced analytic software,Python,,SQL,,Online courses,,,,,,,,,,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Other,Self-taught,20,20,40,20,0,0,,,A bachelor's degree,Retail,"10,000 or more employees",,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,100MB,,"Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,35,15,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input",Rarely,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,I don't typically share data",,,Never,10000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Japan,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,"Data Analyst,DBA/Database Engineer,Engineer,Operations Research Practitioner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,10,70,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Data Analyst,Researcher",University courses,20,10,20,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,Telecommunications,I don't know,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,1GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Stan,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,25,15,5,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,QlikView,R,SQL,TensorFlow,Other",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,Most of the time,,Sometimes,,Sometimes,,,,,Often,,Most of the time,Sometimes,Sometimes,Sometimes,,,,Sometimes,,,,50,10,5,20,15,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,,Often,Sometimes,,Sometimes,,,,,Often,Often,Often,,,,Sometimes,Often,Often,,51-75% of projects,Entirely internal,IT Department,kaggle,data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,"Bitbucket,Git",Always,130000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,C/C++,Deep learning,C/C++/C#,University/Non-profit research group websites,"Arxiv,College/University,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,,Somewhat useful,,,,,,Very useful,,Somewhat useful,,,,Not Useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Researcher",University courses,10,20,30,40,0,0,Computer Vision,Neural Networks - CNNs,A doctoral degree,Academic,100 to 499 employees,Stayed the same,Don't know,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,1GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Neural Networks",,,,Most of the time,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,40,40,0,10,0,10,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,"Git,Other",,10000,UAH,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Italy,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"GitHub,University/Non-profit research group websites","Non-Kaggle online communities,Official documentation,Textbook,Trade book",,,,,,,,,Very useful,Very useful,,,,,Very useful,Very useful,,,Data Machina Newsletter,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Data Analyst,University courses,10,0,0,90,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Female,Turkey,24,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,30,Employed full-time,,,Yes,,Computer Scientist,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher",University courses,60,20,0,20,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,10 to 19 employees,,,,Somewhat important,,,,,,,"NoSQL,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,73,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Scala,Other,"Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,Logistic Regression,High school,Other,20 to 99 employees,Decreased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,,Regression/Logistic Regression,NoSQL,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,10,10,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,Patterns in datasets,"Document-oriented (e.g. MongoDB/Elasticsearch),Other",Company Developed Platform,,Git,Sometimes,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,People 's Republic of China,30,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,C/C++/C#,Google Search,"Conferences,Online courses",,,,,Somewhat useful,,,,,,Very useful,,,,,,,,"Data Machina Newsletter,Data Stories Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Statistician",Self-taught,60,20,10,10,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression",High school,Academic,20 to 99 employees,Stayed the same,Don't know,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Other,Sometimes,100MB,Bayesian Techniques,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Simulation",,,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,50000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Not Useful,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Electrical Engineering,Less than a year,I haven't started working yet,Self-taught,30,10,30,5,10,15,Recommendation Engines,"Bayesian Techniques,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,NA,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Physics,,"Computer Scientist,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Hong Kong,42,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Traditional Workstation,Workstation + Cloud service",2 - 10 hours,PhD,No,Master's degree,Mathematics or statistics,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,35,30,5,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,1GB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Often,,Sometimes,,Most of the time,Most of the time,,Most of the time,,,,,Often,,,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Often,,Often,,,,,Sometimes,,Often,Often,,,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,2000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Netherlands,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,60,10,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,,,,,,,,Text data,,,,"Java,Jupyter notebooks,Python,R,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,SVMs,Text Analytics",,,,,,Most of the time,Often,,Sometimes,,,,,,,Often,,,Most of the time,Most of the time,,,,,,,,Sometimes,Often,,,,,80,10,0,2,8,0,,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More external than internal,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Linear Digressions Podcast",< 1 year,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Natural Language Processing,Logistic Regression,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Ukraine,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,"Not employed, but looking for work",,,,,,,,Amazon Web services,Survival Analysis,R,Other,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Computer Scientist",Self-taught,40,10,15,10,15,10,"Adversarial Learning,Machine Translation,Survival Analysis","Decision Trees - Random Forests,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,0,15,15,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,3-5 years,Some other way,Very important,Other,Workstation + Cloud service,"Text data,Relational data",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",25,10,50,0,15,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data",Most of the time,10GB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Often,Most of the time,,Often,,,,,Often,,Sometimes,,Sometimes,Often,Often,Often,,Most of the time,,,,,Often,Often,Sometimes,,,,25,25,25,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,Often,,,,,,,Sometimes,,26-50% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,Italy,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Personal Projects",Very useful,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,10,0,5,80,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,10 to 19 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,,Text data,Most of the time,1GB,"Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Java,MATLAB/Octave,Python,TensorFlow,Other",,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Often,,,"Logistic Regression,Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,,30,25,30,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Bitbucket,Sometimes,12000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Greece,26,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,42,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,0,0,10,0,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,Technology,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100GB,"CNNs,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Logistic Regression,Naive Bayes,Natural Language Processing,RNNs,SVMs,Text Analytics",,,Sometimes,Often,,,,,,,,,,,,Often,,Sometimes,Often,,,,,,Often,,,Often,Sometimes,,,,,60,5,20,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,29,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,DataRobot,Decision Trees,C/C++/C#,GitHub,College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,"Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,18,19,19,17,19,8,Reinforcement learning,Neural Networks - CNNs,A bachelor's degree,Academic,Fewer than 10 employees,Stayed the same,6-10 years,Some other way,Somewhat important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Rarely,,CNNs,"C/C++,MATLAB/Octave,Python,SQL,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Often,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,19,19,19,19,19,5,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,Often,,,,,Often,Often,,,Often,Often,,,,Often,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,800000,KRW,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Canada,50,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Denmark,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Text Mining,R,Other,"Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,Other,Self-taught,90,10,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Academic,"10,000 or more employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and private datacenters,Other,Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Random Forests,Time Series Analysis",,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,50,10,5,15,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Most of the time,,,,Often,Often,,,Most of the time,Most of the time,Most of the time,,Often,Most of the time,,,,,,,Often,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"288,000",DKK,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,6 to 10 years,Researcher,Self-taught,20,20,60,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,CRM/Marketing,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,1TB,Gradient Boosted Machines,"Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Sometimes,,,,Sometimes,Sometimes,Often,Sometimes,,,,Most of the time,,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,Most of the time,,,,,,,,80,3,3,7,7,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"50,000",,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Denmark,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer",Work,55,20,20,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data",Rarely,10GB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Most of the time,,,,,,,Rarely,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs",,,,Most of the time,Sometimes,Most of the time,,Often,Often,,,,,Sometimes,,Most of the time,,,,Most of the time,Often,,Most of the time,Sometimes,,Most of the time,,Most of the time,,,,,,40,20,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Often,,Most of the time,,,,Most of the time,Most of the time,Often,,,,,Most of the time,,,,Often,Often,,Less than 10% of projects,More internal than external,IT Department,HarP; Greengenes; Silva," Lack of the format, source, and example usages.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"40,000",USD,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,United States,43,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,15,10,60,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,NA,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,Other,Regression,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,,Very useful,,Very useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,Researcher,Self-taught,50,0,50,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Video data",Don't know,1GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,"A/B Testing,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Rarely,,,Often,,,,Sometimes,Often,Often,,,Often,Often,,Sometimes,,,,Often,Often,,Sometimes,,,Sometimes,Often,Sometimes,,Sometimes,,,,20,20,30,30,0,0,Enough to refine and innovate on the algorithm,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Often,Often,,76-99% of projects,Do not know,Other,,,Other,Other,filesystem,Subversion,,,,,9,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Tableau,Cluster Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,More than 10 years,Other,University courses,10,0,50,40,0,0,,,A doctoral degree,Government,20 to 99 employees,Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM Cognos,Microsoft Excel Data Mining",,,,,,,,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Segmentation,Time Series Analysis",,,Often,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,10,40,0,25,25,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Privacy issues",Sometimes,Sometimes,,,,,,,Sometimes,,,,Sometimes,,,,Often,,,,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Never,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,South Africa,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",30,35,25,5,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Other (please specify; separate by semi-colon),High school,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Friends network,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Spain,32,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Tableau,Time Series Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle",,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,30,10,0,0,Recommendation Engines,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods","Java,Python,R,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,kNN and Other Clustering,Recommender Systems",,,Rarely,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,5,50,0,5,40,0,"Enough to code it again from scratch, albeit it may run slowly","Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,Often,,,Most of the time,,10-25% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Sometimes,15000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Data Miner,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,0,0,20,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Sometimes,Most of the time,Often,Often,,,,,Often,,Often,,,,Most of the time,Often,,Often,,,,,Often,,Most of the time,,,,10,40,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,,,,,,,,Often,,,76-99% of projects,Entirely internal,IT Department,,Need to understand the domain knowledge of the data source before going forward analyzing it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,400000,INR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,40,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,R Bloggers Blog Aggregator,3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,3 to 5 years,"Business Analyst,Operations Research Practitioner,Predictive Modeler,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",50,30,10,3,2,5,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Google Search,"Blogs,Online courses",,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",15,40,40,5,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,QlikView,R,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,Sometimes,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation",Sometimes,,,,Rarely,Most of the time,,Most of the time,,,,Often,,Often,,Often,,,,,Sometimes,,Sometimes,Rarely,,Sometimes,Sometimes,,,,,,,35,10,10,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,,Most of the time,,,Rarely,,,,,,,,,Most of the time,,,,,76-99% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,9600,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Spain,51,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,Work,20,10,60,0,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation",,,,Often,,Often,,Often,Often,,,,,Often,,Sometimes,,Sometimes,,,Often,,Often,,Often,Often,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,GitHub,"Arxiv,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,,,,Very useful,"Data Stories Podcast,FastML Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,Self-taught,60,20,20,0,0,0,Computer Vision,Neural Networks - CNNs,,Academic,I don't know,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,,,Neural Networks,"MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,40,20,10,0,0,Enough to explain the algorithm to someone non-technical,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,,,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",France,51,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,,,,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,University courses,10,30,10,40,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,"Google Search,I collect my own data (e.g. web-scraping)",Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Self-taught,50,35,0,10,5,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,Pakistan,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",60,30,5,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,Never,10MB,"CNNs,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,SAS Enterprise Miner",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,Sometimes,,,,Often,Often,Sometimes,,,,,,Sometimes,,Often,,Sometimes,,,Often,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,40,50,0,4,6,0,Enough to run the code / standard library,"Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,,,,,26-50% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Never,650000,INR,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Researcher,Statistician",Work,40,0,60,0,0,0,Survival Analysis,"Bayesian Techniques,Logistic Regression",,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Statistica (Quest/Dell-formerly Statsoft),Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences",,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Electrical Engineering,,"Computer Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Ukraine,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A",Very useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Machine Learning Engineer,Researcher",Self-taught,30,0,20,40,10,0,"Computer Vision,Other (please specify; separate by semi-colon)","Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Image data,Sometimes,100GB,"CNNs,GANs,Neural Networks,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Python,SQL,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,GANs,HMMs,kNN and Other Clustering,Neural Networks,Segmentation,SVMs",,,,Most of the time,,Most of the time,Often,,Rarely,,Sometimes,,Rarely,Sometimes,,,,,,Often,,,,,,Sometimes,,Sometimes,,,,,,30,60,5,5,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,Most of the time,,,,,,,,,Often,,Less than 10% of projects,More internal than external,Other,"imagenet, retrieval datasets, cifar, ms coco, otb,voc",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Personal Projects",,Very useful,,,,Very useful,,,,,,Very useful,,,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,0,20,80,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,10 to 19 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data,Text data,Relational data",Most of the time,10GB,"CNNs,GANs,Neural Networks,RNNs","C/C++,Google Cloud Compute,Java,NoSQL,Python,TensorFlow",,,,Often,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,CNNs,Recommender Systems,RNNs,Segmentation,Text Analytics",Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Most of the time,,,Most of the time,,,,,60,30,10,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Often,Often,,,,,,Most of the time,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,1600000,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Ukraine,19,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Programmer","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Reinforcement learning,Unsupervised Learning","Gradient Boosting,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Mix of fields,,,,,Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Video data,Other",Sometimes,10GB,"CNNs,Markov Logic Networks,RNNs,Other","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Association Rules,CNNs,Cross-Validation,Markov Logic Networks,Neural Networks,RNNs,Simulation",,Sometimes,,Often,,Sometimes,,,,,,,,,,,Often,,,Most of the time,,,,,Often,,Sometimes,,,,,,,10,40,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,,51-75% of projects,Do not know,Standalone Team,,,,I don't typically share data,,Git,,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Japan,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,33,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Computer Scientist,Programmer,Other",University courses,0,25,25,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Always,1GB,"Decision Trees,Random Forests","MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,50,10,20,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,Often,Most of the time,,,100% of projects,More internal than external,Other,Various expensive financial/score data providers,"Coverage, outliers",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,,150000,CHF,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Computer Vision,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Engineer,Self-taught,30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Master's degree,Yes,Master's degree,Mathematics or statistics,3 to 5 years,,University courses,30,35,0,35,0,0,Time Series,Hidden Markov Models HMMs,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Indonesia,28,Employed full-time,,,Yes,,Statistician,Poorly,Employed by college or university,Python,Deep learning,Python,GitHub,"Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,Somewhat useful,Not Useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",50,0,30,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,CRM/Marketing,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,,,,,,,"A/B Testing,Data Visualization",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,10,10,30,30,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,,Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,Very useful,Not Useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,6 to 10 years,Other,Self-taught,40,10,18,20,2,10,"Reinforcement learning,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,More than 10 years,Some other way,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Video data,Text data,Other",Sometimes,100GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,Sometimes,Most of the time,Most of the time,,,,,,,,,Sometimes,,Sometimes,,Often,Most of the time,,,,,,Often,Most of the time,Sometimes,Often,,,,25,25,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Sometimes,,100% of projects,More internal than external,Other,"Neurovault, neurosynth, human connectome project",Preprocessing it,Other,I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"26,000",,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Researcher,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by government",R,Neural Nets,Python,Google Search,"Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,,,,Very useful,Very useful,,,,Very useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,Self-taught,80,20,0,0,0,0,Computer Vision,"Bayesian Techniques,Markov Logic Networks",High school,Telecommunications,"1,000 to 4,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Image data,Don't know,1GB,"Bayesian Techniques,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"Bayesian Techniques,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Neural Networks,Segmentation",,,Often,,,,,,,,,,,Sometimes,,,Sometimes,Most of the time,,Sometimes,,,,,,Sometimes,,,,,,,,70,0,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Privacy issues",Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,51-75% of projects,Entirely internal,IT Department,Backhoe data dome,Nothing,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,720000,DZD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,France,46,Employed full-time,,,Yes,,Other,,Employed by college or university,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,Computer Scientist,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,Some other way,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Relational data",Never,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",,Sometimes,Sometimes,,,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,,,,,40,40,0,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data,Other",,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,Sometimes,Sometimes,10-25% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Git,Subversion",Sometimes,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,,"Data Elixir Newsletter,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Programmer,University courses,20,20,20,20,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,Not very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Stan,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,Sometimes,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Often,,Sometimes,,Most of the time,Often,Often,,,Most of the time,,Most of the time,,Most of the time,,Often,,Sometimes,Sometimes,,Often,,,,,,,Most of the time,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",,Sometimes,,,,,,Sometimes,,,Sometimes,,,Often,Often,,,,,,Often,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Taiwan,23,"Not employed, but looking for work",,,,,,,,Python,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Kaggle competitions,0,0,0,0,100,0,"Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,32,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Other,Google Search,"Blogs,Friends network,Kaggle,Online courses,Personal Projects",,Somewhat useful,,,,Very useful,Very useful,,,,Very useful,Very useful,,,,,,,,3-5 years,,,,,,,,,,,,,,Coursera,,,,,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,60,30,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,38,"Not employed, but looking for work",,,,,,,,KNIME (free version),Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,Very useful,,,,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,Linear Digressions Podcast,O'Reilly Data Newsletter",5-10 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",40+,PhD,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,90,10,0,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Mexico,28,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Conferences,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Textbook",,,,,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist",University courses,20,5,50,20,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A doctoral degree,Mix of fields,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Python,Spark / MLlib,SQL,Tableau,TensorFlow,TIBCO Spotfire",,Most of the time,,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Often,,,Often,Sometimes,Sometimes,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Sometimes,Often,,Sometimes,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Often,,Rarely,Rarely,Sometimes,Sometimes,,Often,Often,Rarely,Most of the time,Most of the time,Sometimes,Sometimes,Sometimes,,,,20,20,5,25,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Sometimes,,Rarely,,,,Often,,Sometimes,,,,Often,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,100000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Somewhat useful,,,,,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Master's degree,No,Master's degree,Computer Science,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important +Male,Egypt,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,50,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Textbook",Very useful,,Very useful,,Very useful,,,,,,,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,University courses,35,0,25,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,,,Rarely,,,,,,Often,Often,,,,Most of the time,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Time Series Analysis",Sometimes,Sometimes,Often,Most of the time,Sometimes,Most of the time,Most of the time,Sometimes,Often,,,Often,,Most of the time,,Often,,Sometimes,,Most of the time,Most of the time,,Often,Sometimes,Often,,,Often,,Most of the time,,,,40,30,5,20,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,Often,Sometimes,,,,,,Sometimes,,Sometimes,,Often,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Email,,"Bitbucket,Git,Mercurial",Sometimes,50000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Statistician",University courses,20,20,20,30,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,CRM/Marketing,10 to 19 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Perl,Python,R,TensorFlow,Other,Other",,Often,,,,,,,Often,,,,,,,,Often,,,,Sometimes,,,,,,Often,,,Sometimes,Often,,Often,,,,,,,,,,,,,Sometimes,,,,Most of the time,Most of the time,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,Other","Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,YouTube Videos",,Very useful,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,15,15,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Financial,"10,000 or more employees",Decreased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SAS Enterprise Miner,Unix shell / awk",,,,,,,,,Often,,,,,,,,Most of the time,,,,,Most of the time,Often,Often,Most of the time,,,Most of the time,,,Most of the time,Rarely,Often,,,,,,Often,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Often,,Often,,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,,,Often,,Most of the time,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Most of the time,,,,Most of the time,,,,,,,,,Often,Most of the time,Often,,,100% of projects,Entirely internal,Business Department,Social Media. ,Important missing values,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,"600,000",ZAR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,37,"Not employed, but looking for work",,,,,,,,TensorFlow,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Data Machina Newsletter,Data Stories Podcast,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,More than 10 years,Data Scientist,University courses,0,40,0,60,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Australia,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Neural Nets,C/C++/C#,Google Search,"Arxiv,Conferences,Online courses",Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Fine arts or performing arts,,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Unsupervised Learning",,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,31,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,,,High school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Image data,Text data,Relational data",Sometimes,1TB,,"Python,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,20,40,20,10,10,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Text Mining,Python,GitHub,"College/University,Personal Projects",,,Very useful,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Other,I haven't started working yet",University courses,10,0,10,60,0,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,Fewer than 10 employees,Stayed the same,Less than one year,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,,"C/C++,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,Rarely,,,,,,,,,,,Often,,Sometimes,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Privacy issues",Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,Less than 10% of projects,Do not know,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Bitbucket,Most of the time,6000,EUR,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,Netherlands,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,6 to 10 years,"Business Analyst,Data Miner,Other",Work,40,30,0,0,30,0,"Natural Language Processing,Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,CRM/Marketing,,,,,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Java,Python,R,SAS JMP",,,,,,,,,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,32,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,100,0,0,0,0,Recommendation Engines,,A master's degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Markov Logic Networks","NoSQL,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,40,40,0,20,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",,,,,3,,,,,,,,,,,,,,,,,, +Male,Sweden,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,50,10,0,10,30,0,,,A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,,,Not important,Not important,Very Important,,Not important +Male,Other,24,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Personal Projects",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,,,,,"Data Elixir Newsletter,FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,30,10,15,40,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,MATLAB/Octave,Microsoft Excel Data Mining,R,SQL,Other",,,,,,,,,,,,,,,Sometimes,,,,,,Often,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics",,Most of the time,Most of the time,,,Most of the time,Often,,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,Often,,,,,55,10,20,5,10,0,Enough to tune the parameters properly,"Dirty data,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,10-25% of projects,Entirely internal,IT Department,alarms dataset,dirty dataset,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Very useful,,Very useful,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Researcher,Other",University courses,50,15,30,0,5,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Image data,Rarely,100GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow,Other,Other",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,Often,Most of the time,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,Segmentation",,,,Often,,Often,Most of the time,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,100% of projects,More internal than external,Other,none,size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Subversion",Never,1380000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,,,,"Blogs,Conferences,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,,,,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Flume,Hadoop/Hive/Pig,Python,R,Spark / MLlib",,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,SVMs",,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,,40,20,10,20,10,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,Do not know,IT Department,,,,Email,,,Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Python,I collect my own data (e.g. web-scraping),"Arxiv,College/University,YouTube Videos,Other",Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Master's degree,Yes,Bachelor's degree,Computer Science,1 to 2 years,Programmer,University courses,30,40,0,30,0,0,Computer Vision,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,Self-taught,50,20,10,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Most of the time,Most of the time,,,Often,,,Sometimes,Rarely,,Often,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,Time Series Analysis,Other",,,,,,Sometimes,,Often,Sometimes,,,Sometimes,,,Often,Often,,,,Rarely,,,,,,,,,,Often,,,Often,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,Google Cloud Compute,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,70,10,10,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Financial,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,<1MB,Regression/Logistic Regression,"C/C++,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,50,10,10,20,10,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,10-25% of projects,Entirely internal,IT Department,not using any public data sets,data size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,45000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by college or university,Java,Monte Carlo Methods,Java,University/Non-profit research group websites,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Management information systems,I don't write code to analyze data,Other,Self-taught,90,5,0,5,0,0,Reinforcement learning,Bayesian Techniques,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important, +Male,India,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Work,50,0,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",University courses,5,5,50,40,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Personal Projects,Textbook",Somewhat useful,,,,,,,,,,,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,,,,,,,,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,PhD,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Factor Analysis,R,"Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,,,< 1 year,Unnecessary,Necessary,Nice to have,,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,DataCamp,GPU accelerated Workstation,0 - 1 hour,Master's degree,No,Master's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,25,0,75,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,India,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Perfectly,Employed by professional services/consulting firm,Microsoft Excel Data Mining,Link Analysis,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Very useful,,,,,,,,Very useful,,,,,5-10 years,,,,Necessary,Necessary,Nice to have,Necessary,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Statistician",University courses,40,0,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Statistician",University courses,5,55,20,10,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Programmer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,20,40,0,0,10,"Natural Language Processing,Recommendation Engines,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Most of the time,Often,,,,,,"Bayesian Techniques,Collaborative Filtering,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,Often,,Often,,,,,,,,,Often,,Often,,Often,Often,Rarely,,,Rarely,Sometimes,,Sometimes,,,Often,Sometimes,,,,40,10,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Often,,,,Often,,,,,,,Often,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,movie recommendation dataset scrapy,Data cleaning and feature selection,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,18000,RUB,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Russia,50,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Genetic & Evolutionary Algorithms,R,University/Non-profit research group websites,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,0,30,50,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,33000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Nigeria,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,IBM Watson / Waton Analytics,MATLAB/Octave,Microsoft Azure Machine Learning,Python,RapidMiner (commercial version),RapidMiner (free version),SAP BusinessObjects Predictive Analytics,SAS JMP,SQL,Tableau,TensorFlow",Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,Often,Sometimes,,,,,,,,,Most of the time,,,Sometimes,Often,,Sometimes,,,Sometimes,,Often,,,Sometimes,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Naive Bayes,Neural Networks,RNNs,SVMs,Text Analytics",,,Often,,,Often,Most of the time,,,,Sometimes,,,Often,,,,Often,,Often,,,,,Sometimes,,,Sometimes,Sometimes,,,,,30,20,20,10,20,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources",,,,Often,Often,,,,,Often,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Blogs,Company internal community,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,30,10,60,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,Fewer than 10 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data",Most of the time,10GB,"CNNs,Neural Networks","C/C++,MATLAB/Octave,Python",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,Sometimes,Often,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,10,30,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Unavailability of/difficult access to data",,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,Often,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,0,RUB,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,"Personal Projects,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,40,40,20,0,0,0,,,,Government,,,,,Not at all important,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",,,,,"C/C++,Java,Microsoft Azure Machine Learning,SQL",,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,Often,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Natural Language Processing,Neural Networks,Simulation",,,Sometimes,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,0,20,20,60,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Data Analyst,Kaggle competitions,0,20,0,0,80,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,100TB,Other,"Impala,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,Sometimes,Rarely,,Rarely,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,,Often,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,,,,,50,10,10,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Often,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,80000,CNY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,44,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,R,Government website,"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Adversarial Learning,Recommendation Engines,Survival Analysis","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,Japan,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Computer Vision,Recommendation Engines",,A bachelor's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,R,SQL",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Neural Networks,Random Forests,Recommender Systems",Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,Most of the time,,,,,,,,,,20,20,20,40,0,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,Sometimes,,,Often,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,40000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Nigeria,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle",,Very useful,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Predictive Modeler",University courses,20,5,20,40,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Other,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,,"Decision Trees,Random Forests","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,kNN and Other Clustering",,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Rarely,36000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,South Korea,39,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Google Cloud Compute,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,10,10,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data,Relational data",Sometimes,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,TensorFlow,Other",,Rarely,,,Rarely,,,Rarely,Rarely,,,,Rarely,,,,Most of the time,,,,Sometimes,,Sometimes,Sometimes,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,Often,,Often,Often,Often,,,,,,,,,,,,Often,Often,,Often,,,,,Often,Often,Often,,,,20,40,10,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Often,,,,Often,,,,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Central Insights Team,Kaggle,Noisy data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,15000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,0,0,40,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Microsoft SQL Server Data Mining,Python,R,SAS Base,SQL,Tableau",,Often,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,Often,,Often,,,,,Often,,,,Often,,,Often,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Often,Often,,,,Often,,,Often,Often,,,,,,,,,,Often,,,,Often,,,,30,20,10,30,10,0,Enough to run the code / standard library,"Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,Sometimes,Sometimes,,,,,,,Often,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,1500000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Self-employed",I don't plan on learning a new tool/technology,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,,,,Very useful,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",University courses,35,5,20,20,20,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,Often,Often,,Often,,,,Often,,,Often,,Sometimes,,,,,Most of the time,,,,20,30,10,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Organization is small and cannot afford a data science team",Sometimes,,,,Often,,,,,,,,,,Sometimes,Sometimes,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Sometimes,150000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,23,I prefer not to say,Yes,,,,,,,MATLAB/Octave,Proprietary Algorithms,R,GitHub,College/University,,,Not Useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,,Self-taught,20,0,0,80,0,0,Machine Translation,Gradient Boosting,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler",University courses,20,20,40,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,35,Employed part-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Data Analyst,Researcher",University courses,10,70,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data,Other",Never,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,Tableau",,,,,Most of the time,,,,Most of the time,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,35,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,FastML Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,I haven't started working yet,Self-taught,30,30,20,10,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Retail,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Rarely,100MB,Other,"Mathematica,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,95,0,0,0,5,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Most of the time,,Most of the time,,,,,,,,Often,,,Often,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,Software Developer/Software Engineer,Self-taught,60,10,20,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"5,000 to 9,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,Decision Trees,"C/C++,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests",Sometimes,,,,,,Often,Most of the time,,,,,,Sometimes,,,,,,Rarely,,,Most of the time,,,,,,,,,,,20,25,10,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Other",,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,Self-taught,50,20,20,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,31,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,Data Analyst,Other,10,10,0,0,0,80,,,A doctoral degree,Telecommunications,"5,000 to 9,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Not very important,Other,,Relational data,,,,"QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,23,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,Python,Decision Trees,Python,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Researcher,University courses,20,0,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Government,20 to 99 employees,Increased significantly,1-2 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,SVMs","C/C++,Hadoop/Hive/Pig,MATLAB/Octave,NoSQL,Python,R,RapidMiner (commercial version),SQL",,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,Most of the time,,Often,Sometimes,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Often,,,,,,,,Often,,,Most of the time,,Most of the time,,,,,Often,,,,,,10,60,20,10,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Need to coordinate with IT",,Sometimes,,,,,,,,,,,,,Often,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,60000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,"Natural Language Processing,Speech Recognition",Neural Networks - RNNs,A doctoral degree,Financial,100 to 499 employees,Increased significantly,,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Other",Don't know,1GB,"Neural Networks,RNNs","Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Natural Language Processing,Neural Networks",Sometimes,,,,,Often,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,5,85,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Often,,,,,,,Most of the time,,,,,Sometimes,,Often,Sometimes,,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,"Statistician,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Sometimes,100MB,"Random Forests,SVMs","IBM Watson / Waton Analytics,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,,Most of the time,Often,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,70,20,0,0,10,0,Enough to run the code / standard library,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,,Sometimes,,Sometimes,,,,26-50% of projects,Do not know,Standalone Team,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,MATLAB/Octave,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Very useful,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Not Useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,Other,Work,30,50,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data",Sometimes,10TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,SQL,Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Most of the time,Most of the time,Sometimes,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Simulation,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,Often,,,,Sometimes,,,,Rarely,,Often,,,,,30,10,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,30000,GBP,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed part-time,,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Statistician",Self-taught,50,10,40,0,0,0,Adversarial Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,41,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,Government website,"Stack Overflow Q&A,Textbook,Trade book",,,,,,,,,,,,,,Somewhat useful,Very useful,Very useful,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer",Self-taught,70,0,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,,Sometimes,,,,,,,,Sometimes,,Often,,,Most of the time,Often,,,,,Sometimes,,,Often,Most of the time,Sometimes,,,,60,10,0,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Sometimes,,,Often,,Often,,,,Often,Most of the time,,,Sometimes,,,Often,,10-25% of projects,Entirely internal,Business Department,Ofac,Text mining,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,,Sometimes,20880,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Kaggle,Newsletters,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Researcher,Other",University courses,10,20,0,0,0,70,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,Other",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Often,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Often,,,,,,,,Most of the time,,,,,,Often,Often,,,Often,,,,Most of the time,,,,0,15,30,30,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,,,,,,Most of the time,,,Often,,,Often,Sometimes,,,,Often,,51-75% of projects,More external than internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1000000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,45,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,Sometimes,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Often,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,kNN and Other Clustering",,,Often,Often,,,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,,,,,,,,,,,,,5,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Bayesian Methods,Python,Google Search,"Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,,,,,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important +Male,Poland,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects",,Very useful,Very useful,,,Very useful,Very useful,,,,Very useful,Very useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,,Nice to have,Nice to have,,Nice to have,,,,Nice to have,,,,,"Coursera,Udacity",GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,50,20,0,30,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,,Somewhat important,Somewhat important,,,,,,,Very Important,Somewhat important,,,,Somewhat important +Male,India,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Not Useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Not Useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",35,50,0,0,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",No education,Other,Fewer than 10 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,Regression/Logistic Regression,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,Often,,,,,Sometimes,Sometimes,,,,Rarely,,,,Often,,,,10,30,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,Standalone Team,National surveys; census data; data collected by state departments,Scanned documents that cannot be OCR,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,3000000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,Employed part-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Telecommunications,100 to 499 employees,Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner",Work,30,20,50,0,0,0,"Reinforcement learning,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks",A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","IBM Cognos,Java,KNIME (free version),Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Orange,Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,Often,,,,,Sometimes,,,,Most of the time,,,,Most of the time,,Most of the time,,Often,,Sometimes,,Sometimes,,Most of the time,,,,Often,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Random Forests,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,,Most of the time,Often,,,,,,Most of the time,,,,Often,,,Most of the time,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,Most of the time,,,,,Often,,,76-99% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Spain,52,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician",Work,10,0,20,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,24,"Not employed, but looking for work",,,,,,,,Weka,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,DataCamp,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Italy,42,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Computer Scientist,Researcher,Other",University courses,0,0,10,10,0,80,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Mix of fields,20 to 99 employees,Increased slightly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Relational data",Most of the time,,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,Most of the time,Often,Most of the time,Often,,,Most of the time,,,,,,,,,Sometimes,,,,,,,Often,,Sometimes,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Workstation + Cloud service,11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,3 to 5 years,,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Iran,42,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Predictive Modeler,"Online courses (coursera, udemy, edx, etc.)",20,20,0,60,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)",Neural Networks - RNNs,Primary/elementary school,Manufacturing,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Computer Vision,"Bayesian Techniques,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,Self-taught,30,30,30,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,29,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,,Python,"GitHub,Google Search","Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Other",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,"Programmer,Software Developer/Software Engineer,Other",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,India,42,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Kaggle Competitions,Yes,Master's degree,Biology,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,Other,28,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Other,Other,C/C++/C#,I collect my own data (e.g. web-scraping),"Friends network,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Other",University courses,50,11,35,4,0,0,,,High school,Other,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Always,,,"C/C++,NoSQL,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,25,10,0,65,0,0,Enough to run the code / standard library,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Sometimes,,51-75% of projects,,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"4,000,000",XAF,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Programmer,Software Developer/Software Engineer",Work,20,0,80,0,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Neural Networks - GANs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,20,0,0,60,,,A doctoral degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Amazon Web services,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Github Portfolio,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Hidden Markov Models HMMs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,Other,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by government,NoSQL,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Friends network,Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,Very useful,Very useful,,,,,Very useful,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM Cognos,IBM SPSS Statistics,Microsoft Excel Data Mining,Minitab,Python,R,SAS Base,Tableau",,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Often,,,,,,,Often,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,Most of the time,Often,,,Most of the time,,,,40,30,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Most of the time,,Most of the time,,,,Most of the time,Often,Often,,,,,Often,,,,Most of the time,,,51-75% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Germany,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",University courses,20,20,20,20,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Stayed the same,3-5 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,Often,Often,,Often,Often,Often,Often,,,Often,Often,Often,,Often,Often,Often,,Often,Often,Often,Often,Often,,Often,Often,Often,,Often,,,,30,20,10,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,40,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,DataRobot,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,,,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,40,0,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,Fewer than 10 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SQL,Tableau,TIBCO Spotfire",,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Most of the time,,Often,,,,,,,Often,,,Often,,Rarely,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Logistic Regression,Prescriptive Modeling",Often,Often,Most of the time,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,Often,Often,,,Often,,,,,,,Often,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",,,Bitbucket,Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,,Not Useful,Somewhat useful,,,Very useful,,Very useful,Not Useful,Very useful,Very useful,,,Somewhat useful,,5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,PhD,No,Master's degree,Mathematics or statistics,6 to 10 years,I haven't started working yet,Self-taught,75,5,0,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important +Male,United States,34,"Not employed, but looking for work",,,,,,,,Java,Anomaly Detection,Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher,Statistician",University courses,25,25,0,10,0,40,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,University courses,20,50,0,10,20,0,"Computer Vision,Reinforcement learning","Gradient Boosting,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Stayed the same,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Image data,,,CNNs,"C/C++,Java,Jupyter notebooks,Python",,,,Most of the time,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Neural Networks,PCA and Dimensionality Reduction",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,10,20,30,10,30,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,,3000000,CNY,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Ireland,30,"Not employed, but looking for work",,,,,,,,SAS Base,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Tutoring/mentoring",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,Data Stories Podcast",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,University courses,20,20,0,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Ukraine,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,10 to 19 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,20,10,NA,10,20,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Other,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Never,10MB,Regression/Logistic Regression,"IBM SPSS Statistics,Microsoft Excel Data Mining,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Segmentation",,,,,,,Often,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,0,60,0,0,40,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,,,,,Often,,Often,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,Rarely,,,,,,,,Most of the time,,,,Sometimes,Often,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Often,Most of the time,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Most of the time,,Sometimes,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,,Most of the time,,,,40,30,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,Often,,,Often,,Often,,,,,Sometimes,,,,,Sometimes,,51-75% of projects,Approximately half internal and half external,Business Department,Government,Structure and less availability of data.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,2000000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,France,55,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Newsletters,Online courses",,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,Other,Other,10,10,80,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,QlikView,R,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,Often,,,,,,,Often,Sometimes,Often,,Sometimes,,,,,,,,,,,Rarely,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,,Often,,,,,,Often,,Most of the time,,,,Sometimes,Most of the time,,Often,,,,Often,Often,,Most of the time,,,,50,10,10,5,5,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,Most of the time,,Most of the time,,,Often,Sometimes,Often,,,,Most of the time,,,,Often,,,Most of the time,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,Weather Data,Secure their quality & accessibility,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Iran,22,Employed part-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Researcher",University courses,20,0,0,80,0,0,Computer Vision,Neural Networks - GANs,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects",,,Not Useful,,,Very useful,Somewhat useful,,,Very useful,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,20,65,5,10,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data",Rarely,100GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Ensemble Methods,Random Forests,SVMs",,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,60,30,5,5,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Limitations of tools",Most of the time,Often,,,Often,,,,,,,,Most of the time,,,,,,,,,,Less than 10% of projects,Entirely external,Central Insights Team,Sometime from Kaggle,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Never,33000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,,,Very useful,,Very useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Engineer,University courses,30,0,30,30,10,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,NA,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,Very useful,,Very useful,,,,,,Very useful,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,"Business Analyst,Computer Scientist,Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Computer Vision,Natural Language Processing","Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,TensorFlow",Rarely,Rarely,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,SVMs,Text Analytics",,,Sometimes,Sometimes,,Often,Often,Often,Often,,,Often,,,,Often,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,,,,40,20,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,Often,Rarely,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,33,Employed full-time,,,Yes,,Researcher,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Talking Machines Podcast",< 1 year,,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,,,,Udacity,GPU accelerated Workstation,0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,France,25,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,24,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,1 to 2 years,"Operations Research Practitioner,Researcher",University courses,25,25,0,30,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"5,000 to 9,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes,Prescriptive Modeling,Simulation,Time Series Analysis",,,Often,,,,,,,,,,,,,Often,,Sometimes,,,,Often,,,,,Often,,,Often,,,,60,10,10,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Often,,,Sometimes,,,,,Often,,,Often,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Other,Vendor prices and models as contingency,Formatting,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",Rarely,40000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,,,,Very useful,Very useful,Very useful,,,,,,Very useful,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Other,33,33,0,0,10,24,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,,,,,,,,,,,,,, +Male,Netherlands,26,Employed part-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Online courses,Textbook",,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",25,25,20,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Text data",Sometimes,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Python,R,SQL,Tableau",,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Random Forests",Often,,Sometimes,,,,Most of the time,,,,,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,,,,,,40,25,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Ireland,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",15,30,25,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation,Other","Text data,Relational data",Never,1GB,,"Google Cloud Compute,Java,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Often,,Often,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,Not Useful,Somewhat useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,Other,Self-taught,80,10,0,8,2,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Financial,"5,000 to 9,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SQL,Tableau,TensorFlow,TIBCO Spotfire",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,Rarely,,,,,,Sometimes,,,Sometimes,Rarely,Sometimes,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis,Other,Other",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,Most of the time,,,30,30,15,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Sometimes,Sometimes,,,Most of the time,Most of the time,,,,Most of the time,,,,,Sometimes,Sometimes,,,76-99% of projects,Entirely internal,Business Department,Financial databases; economic daradases,Data stored in multiple locations and if different formats.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Subversion",Rarely,500000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Philippines,42,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Google Cloud Compute,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Online courses,Tutoring/mentoring",,,,,,,,,,,Very useful,,,,,,Very useful,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Fine,Employed by college or university,R,Regression,Other,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,,Nice to have,Nice to have,,,Nice to have,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Master's degree,A social science,Less than a year,Researcher,Kaggle competitions,50,30,0,20,0,0,Time Series,Logistic Regression,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","Blogs,College/University,Kaggle,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,10MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,R,SQL",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,Sometimes,,,,Most of the time,,Sometimes,,,,,,Often,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,Often,,,,40,30,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,"Datasets downloaded from public source such as NIST,MIT database,UCI library etc",No functional elements available. No metadata information . All we would have to do simulate the data by just plainly reading abt the same through publicly available resources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,1500000,INR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Other",University courses,30,20,10,30,10,0,"Machine Translation,Natural Language Processing,Recommendation Engines","Evolutionary Approaches,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,,,,,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Rarely,100MB,"Decision Trees,Evolutionary Approaches,SVMs","Hadoop/Hive/Pig,Java,NoSQL,Python,R",,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Naive Bayes,Natural Language Processing,Recommender Systems,Text Analytics",,,,,Sometimes,,,,,,,,,,,,,Sometimes,Most of the time,,,,,Most of the time,,,,,Often,,,,,30,25,20,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues",,,,,Often,,,,Sometimes,Often,,,,,,,Sometimes,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,400000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,1-2 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Sometimes,10TB,"CNNs,Neural Networks","Google Cloud Compute,Jupyter notebooks,TensorFlow",,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,CNNs",Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,30,20,30,10,0,Enough to run the code / standard library,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,,500000,JPY,Has stayed about the same (has not increased or decreased more than 5%),,,,,,,,,,,,,,,,,,, +Male,Indonesia,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,Matlab,Google Search,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10MB,"Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,Often,,,,30,20,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Rarely,,Rarely,,Sometimes,,,Rarely,Rarely,Rarely,,Rarely,Sometimes,,Sometimes,Rarely,,,Rarely,,Rarely,,26-50% of projects,Approximately half internal and half external,IT Department,Other companies,Gathering data,Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"107,000,000",IDR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,33,"Not employed, but looking for work",,,,,,,,Python,Genetic & Evolutionary Algorithms,Python,Google Search,"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,20,60,0,20,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Australia,30,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"FlowingData Blog,Jack's Import AI Newsletter,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,,DataCamp,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Fine arts or performing arts,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Ukraine,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,0,60,0,0,0,"Natural Language Processing,Speech Recognition",Neural Networks - RNNs,High school,Technology,10 to 19 employees,Increased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,26,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,29,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,MATLAB/Octave,,,,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist",Self-taught,70,10,0,20,0,0,Unsupervised Learning,Neural Networks - CNNs,No education,Academic,10 to 19 employees,Decreased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Sometimes,1GB,Neural Networks,"C/C++,Mathematica,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,Sometimes,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,More external than internal,IT Department,,,Graph (e.g. GraphBase/Neo4j),Company Developed Platform,,Subversion,Rarely,,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Germany,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,50,0,0,0,50,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs",High school,Insurance,"10,000 or more employees",Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Relational data,,,,"NoSQL,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Often,,,,,,Often,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,10,0,0,0,0,90,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues",Often,,,,,,,Sometimes,Most of the time,,,,,,,,Sometimes,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,Subversion,Never,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Analyst,Self-taught,10,20,30,5,35,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,100 to 499 employees,Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Sometimes,Sometimes,,,Sometimes,Most of the time,Most of the time,Often,,,,Sometimes,,Sometimes,Sometimes,Often,,,,,Often,,Sometimes,Sometimes,,Sometimes,,,,,,,,20,30,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,No public data,Data cleaning takes a lot of time. Lots of dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,575000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United Kingdom,32,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,33,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,YouTube Videos,Other",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Programmer,Software Developer/Software Engineer",Work,10,20,60,0,10,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A tech-specific job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Often,Often,,,,,,,,Sometimes,Often,Often,Most of the time,,Most of the time,,Often,,,,Often,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,Often,,,Sometimes,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,Simulation",Often,,Often,Often,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,Most of the time,Often,,,Often,,Often,Often,Sometimes,,,,,,,30,10,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues",,,,,Most of the time,,,,Most of the time,,Often,,,,,,Often,,,,,,51-75% of projects,Approximately half internal and half external,Other,"Data from FDA.org , hl7.org",,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,2600000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,34,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Fine,Self-employed,R,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Trade book,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,Very useful,Very useful,Very useful,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Necessary,,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Computer Scientist,DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,28,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",40,10,5,5,20,20,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,50,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Engineer,Programmer",Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,34,Employed full-time,,,No,Yes,Other,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,France,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,Work,20,0,70,10,0,0,,,High school,Academic,10 to 19 employees,,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Somewhat useful,,,,,Not Useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Engineer,Self-taught,90,10,0,0,0,0,"Recommendation Engines,Reinforcement learning,Time Series,Unsupervised Learning",,A bachelor's degree,Technology,Fewer than 10 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,<1MB,,"Python,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,"Data Visualization,Recommender Systems",,,,,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,10,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,Git,Rarely,23000000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Belgium,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Text Mining,Python,"Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Textbook,YouTube Videos",,Very useful,Very useful,,Very useful,,,,,,,,,,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher",University courses,10,0,20,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Rarely,Often,,,,,,,Often,,,,Rarely,Sometimes,,Sometimes,,,,,Sometimes,,,,,,50,5,30,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Most of the time,,,,,,Sometimes,,,Sometimes,,,,Sometimes,,Often,,,76-99% of projects,More internal than external,IT Department,UCI Repository;OpenML.org,Conclusions are usually not strong enough.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Never,"50,000",EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,5,5,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A master's degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Random Forests,Regression/Logistic Regression","R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Often,,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,,Sometimes,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,Often,,Sometimes,,,Often,Most of the time,,,Sometimes,,,,75,5,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,,"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Somewhat useful,,,Very useful,Not Useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,"Operations Research Practitioner,Programmer","Online courses (coursera, udemy, edx, etc.)",50,10,40,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,CRM/Marketing,"5,000 to 9,999 employees",Increased significantly,Don't know,An external recruiter or headhunter,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",,1GB,"Decision Trees,Random Forests","Amazon Web services,IBM SPSS Statistics,NoSQL,Python,Other",,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests",,,,,,Often,Rarely,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,8,60,30,1,1,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Most of the time,,,,,,Sometimes,,,,Rarely,,,,Less than 10% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Subversion,Never,45000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Finland,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,60,10,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,20,2,45,1,2,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A master's degree,Government,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,,Self-taught,70,30,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,CRM/Marketing,"10,000 or more employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,,,Julia,Deep learning,Other,Google Search,"Blogs,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer,Programmer",Self-taught,80,0,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Internet-based,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Neural Networks,RNNs","Hadoop/Hive/Pig,Java,NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Often,,,,,Sometimes,,,,,,"Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,Often,,,,,,,,,10,50,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,Most of the time,Often,,,,,Most of the time,,,,Sometimes,,,Less than 10% of projects,More internal than external,IT Department,corpora for different languages,"datasets are too general, while usecases would benefit from more domain-specific datasets","Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Most of the time,600000,DKK,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,30,20,20,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,"Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing",,,,,,,,Sometimes,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,20,20,20,30,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,Sometimes,,,,Often,,,Sometimes,Sometimes,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Sometimes,1500000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Researcher,Poorly,"Employed by college or university,Self-employed",TensorFlow,Deep learning,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Somewhat useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,"Jack's Import AI Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,6 to 10 years,"Researcher,Statistician",University courses,30,10,10,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Prescriptive Modeling",,,,,,Most of the time,Most of the time,,Often,,,,,,,Often,,,,,,Often,,,,,,,,,,,,10,40,10,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Often,,,,,,,,Sometimes,,Often,,,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",,80000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,< 1 year,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Professional degree,,Less than a year,"Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,France,51,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Google Search,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",15,75,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Netherlands,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),No Free Hunch Blog",< 1 year,Nice to have,Necessary,,,Necessary,Nice to have,Nice to have,,,,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,20,30,0,50,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Other,24,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Other,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",50,30,10,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,SAS Base,Support Vector Machines (SVM),R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,Very useful,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,Less than a year,"Data Analyst,Statistician",Self-taught,50,10,20,10,0,10,"Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Time Series Analysis",Often,,Often,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,Often,,,,,Most of the time,,,Often,,,,Most of the time,,,,45,20,15,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Other,Rarely,480000,INR,Other,5,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Researcher,Statistician",University courses,50,30,10,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Pharmaceutical,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,,"Decision Trees,Regression/Logistic Regression","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,42,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,75,0,0,5,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Sometimes,1MB,"Bayesian Techniques,Markov Logic Networks,Neural Networks","Amazon Machine Learning,Python",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Natural Language Processing,Neural Networks",,,Often,,,,,,,,,,,,,,,Sometimes,Sometimes,Often,,,,,,,,,,,,,,60,30,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,,,,,,,Often,,,,,,,,Most of the time,,,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,10000000,NGN,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Text Mining,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Programmer,Software Developer/Software Engineer",University courses,30,30,20,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Government,100 to 499 employees,Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Always,,,"Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,Rarely,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Text Analytics,Time Series Analysis",Often,,Often,,,,Most of the time,Often,,,,,,,,Often,,Often,,,,,,,,,,,Often,Often,,,,5,60,20,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Other,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,,Employed by college or university,SQL,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,6 to 10 years,"Operations Research Practitioner,Researcher,Other",Self-taught,15,45,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Mix of fields,,,,,Somewhat important,Other,Traditional Workstation,"Image data,Text data,Relational data",Never,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Sometimes,,,Rarely,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",,,Rarely,,,Often,Most of the time,Often,,,,,,,,Most of the time,,,,,Most of the time,,Often,,,,Often,,Often,,,,,50,10,0,15,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,Often,,,,Most of the time,,,Most of the time,,Most of the time,Often,,,,,,,100% of projects,More external than internal,Standalone Team,non-proprietary; citizen science programs,hardware capacity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,15000,USD,Has decreased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Italy,39,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,GitHub,"Blogs,Newsletters,Official documentation,Online courses,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,22,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by company that makes advanced analytic software,RapidMiner (free version),Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Company internal community,Kaggle,Newsletters,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,20,30,10,40,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,South Korea,38,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,A general-purpose job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data,Relational data",Always,10TB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,RNNs,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,Python,R,SQL,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,Most of the time,,Most of the time,,Most of the time,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,30,40,20,10,0,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,37,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Flume,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,,,,,,,,,Somewhat useful,Very useful,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",10,30,60,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,3-5 years,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,GANs,Markov Logic Networks,Neural Networks,RNNs,SVMs","Amazon Web services,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,QlikView,R,Spark / MLlib,SQL",,Often,,,Most of the time,,Most of the time,Often,Often,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,Sometimes,Often,Most of the time,Often,,,,Most of the time,,Most of the time,,,Most of the time,,,,,10,50,10,30,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,,130000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,NA,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,,Self-employed,,Deep learning,Python,,"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Other,Kaggle competitions,45,0,10,0,45,0,"Computer Vision,Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Image data,Other",,,"CNNs,HMMs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,Mathematica,Python,R,TensorFlow",,Rarely,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Data Visualization,HMMs,Neural Networks,RNNs,Time Series Analysis",,,,Most of the time,,,Most of the time,,,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO,Employed by government",TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Non-Kaggle online communities,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,Very useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,20,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Decreased slightly,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1GB,"Decision Trees,Markov Logic Networks,Random Forests","Amazon Machine Learning,Amazon Web services,C/C++,IBM Cognos,IBM SPSS Modeler,Impala,Jupyter notebooks,Mathematica,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Perl,Python,R,Spark / MLlib,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau,TensorFlow,Unix shell / awk",Sometimes,Rarely,,Sometimes,,,,,,Rarely,Sometimes,,,Rarely,,,Often,,,Rarely,,Often,Often,Rarely,Often,,,,,Often,Often,,Often,,,,,,,,Sometimes,Often,,Sometimes,Often,Sometimes,,Sometimes,,,,"Naive Bayes,Random Forests",,,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,,Sometimes,Sometimes,,Often,Sometimes,,,,,Sometimes,Often,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,2200000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Belgium,36,Employed part-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,,University courses,10,10,0,80,0,0,,"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,,,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,90,0,0,10,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,97,0,0,3,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,Other,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Adversarial Learning,"Logistic Regression,Neural Networks - CNNs",,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,An external recruiter or headhunter,"N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10TB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,NoSQL,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,RapidMiner (free version),Cluster Analysis,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",35,40,20,5,0,0,Time Series,Other (please specify; separate by semi-colon),High school,Telecommunications,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,,,Sometimes,,,,Often,,,,20,10,10,40,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Often,,,,,Often,,,,,,,,,,,Often,Most of the time,,51-75% of projects,More internal than external,Business Department,state open data; google analytics; CRM;,To give them right visual form.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"25,000",EUR,Other,7,,,,,,,,,,,,,,,,,, +Female,India,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Amazon Web services,Text Mining,C/C++/C#,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Data Machina Newsletter,FlowingData Blog",< 1 year,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Miner,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,0,0,50,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Russia,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,,Very useful,,,,Very useful,,,Very useful,"Data Machina Newsletter,Jack's Import AI Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,100MB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs","Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",,,,Sometimes,,Sometimes,Often,,,,,,,,,Often,,,,Often,Sometimes,,,,Sometimes,,,,Sometimes,Sometimes,,,,15,40,20,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,,Sometimes,,,,,Often,,,,,,Sometimes,,,,Sometimes,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Greece,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Other",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,30,10,30,20,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,Other,University courses,10,0,45,45,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks,Random Forests","C/C++,R",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Naive Bayes,Random Forests",,,Often,,,,,Often,Often,Often,,,,,,Often,,Often,,,,,Often,,,,,,,,,,,25,25,25,0,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues",,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,51-75% of projects,More internal than external,Other,,,,I don't typically share data,,Git,Rarely,"264,000",ZAR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Australia,NA,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,NA,Employed full-time,,,Yes,,Other,Fine,Self-employed,IBM Watson / Waton Analytics,,Python,"Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities",,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,,Self-taught,100,0,0,0,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,20 to 99 employees,Stayed the same,1-2 years,Some other way,Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",,1GB,"CNNs,GANs,HMMs,Markov Logic Networks,Neural Networks,RNNs","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs,Natural Language Processing,Neural Networks,RNNs,SVMs",,,,,,Sometimes,Often,,,,,,Often,,,,,,Most of the time,Most of the time,,,,,Often,,,Sometimes,,,,,,0,30,0,30,40,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of significant domain expert input",,,,,Often,Often,,,,,Most of the time,,,,,,,,,,,,26-50% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Git,,,,,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,53,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,R,Decision Trees,R,GitHub,College/University,,,Very useful,,,,,,,,,,,,,,,,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Researcher",Self-taught,70,10,10,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",High school,Manufacturing,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Often,,Often,Often,Often,,Most of the time,Often,,,Often,,Often,,Most of the time,,,,Most of the time,Often,,Most of the time,Most of the time,,,,,Most of the time,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,,,,,,,,,Often,,10-25% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,60000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Stack Overflow Q&A",,,Somewhat useful,Not Useful,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Software Developer/Software Engineer",University courses,60,0,0,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Often,,,,Often,,,,,,,Often,,Sometimes,Sometimes,,Often,,Sometimes,,,,20,20,10,10,40,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Do not know,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Spain,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,30,5,20,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,20 to 99 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Rarely,<1MB,"Bayesian Techniques,Neural Networks","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Neural Networks,Simulation",,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,Often,,,,,,,20,40,5,30,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,Often,,Sometimes,Sometimes,,,,,Often,,,,Sometimes,,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git,Mercurial,Subversion",,40000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,"Text data,Relational data",Most of the time,1GB,"Regression/Logistic Regression,Other","SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,,C/C++/C#,"GitHub,Google Search","Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Electrical Engineering,,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition","Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,Other,Python,Google Search,"Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,,,,,,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Researcher,Other",Self-taught,30,30,30,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Segmentation,SVMs",,,,Often,,Most of the time,Most of the time,,,,,,,,,Sometimes,,,,Often,,,,,,Often,,Sometimes,,,,,,20,15,5,15,15,30,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Often,,,Often,,,,,,,,,Often,,Sometimes,Often,,,76-99% of projects,More internal than external,Standalone Team,Any relevant db;,privacy; dirty; limited;,Other,Commercial Data Platform,,Other,Rarely,"440,000",ILS,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Indonesia,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,"Computer Scientist,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Recommendation Engines","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,,,Other,"C/C++,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,20,30,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,Sometimes,Most of the time,,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,Yes,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,Less than a year,"DBA/Database Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,,Work,30,20,40,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by government",Hadoop/Hive/Pig,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Natural Language Processing,Bayesian Techniques,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Greece,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Deep learning,Python,Other,"College/University,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler",Self-taught,30,0,20,10,40,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SAS Base,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,Often,,Often,,,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,Rarely,,,,Sometimes,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,Sometimes,Most of the time,,Often,Most of the time,,,Most of the time,,Sometimes,,Often,,Sometimes,Sometimes,Often,Sometimes,,Most of the time,Often,,Rarely,,Rarely,Sometimes,Sometimes,,,,10,50,40,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,We use many kaggle datasets to test the data. ,"To make certain the software we build can handle all types of data at once(like text, numbers etc)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",github,Git,Always,140000,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Ireland,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,29,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Deep learning,Python,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,,,,Necessary,,,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,Less than a year,"Engineer,Programmer",Self-taught,50,25,0,25,0,0,Adversarial Learning,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,,,,,,,,,Somewhat important,,,,,, +Male,Other,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Fine arts or performing arts,,"Data Analyst,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,YouTube Videos,Other",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Unnecessary,Nice to have,,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,10,50,20,10,10,0,"Natural Language Processing,Recommendation Engines,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,Java,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,,,,,,,,"Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,"Natural Language Processing,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,India,44,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,1 to 2 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Neural Nets,Python,"GitHub,Government website","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Statistician",University courses,30,10,5,50,5,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Video data,Text data",Don't know,10GB,"HMMs,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,30,25,20,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,26-50% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,"400,000",THB,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Uplift Modeling,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Statistician",Work,75,0,25,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Financial,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (free version),SAS Base,Statistica (Quest/Dell-formerly Statsoft),Tableau,TensorFlow,Unix shell / awk",Most of the time,Most of the time,,,Rarely,,,Sometimes,Most of the time,,Rarely,Rarely,Most of the time,,,,,,,Sometimes,Often,Often,Rarely,,,,Often,Sometimes,,,Most of the time,,Most of the time,,Rarely,,,Sometimes,,,,,,Rarely,Rarely,Most of the time,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Most of the time,Often,Often,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,Often,Most of the time,Most of the time,,Most of the time,Most of the time,Often,Most of the time,Most of the time,Most of the time,,Most of the time,Often,Often,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,,20,35,25,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,Sometimes,,,,Often,,,Often,Most of the time,,,,,,Sometimes,,,,,Most of the time,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"420,000",INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Portugal,27,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,Google Search,"Arxiv,College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,Not Useful,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,"Coursera,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Github Portfolio,No,Master's degree,Mathematics or statistics,,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Denmark,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Personal Projects,Stack Overflow Q&A",Very useful,,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,6 to 10 years,Programmer,Self-taught,30,0,30,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service",Other,Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,RNNs","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,RNNs",,,,Often,,Most of the time,Most of the time,,Often,,,Sometimes,Most of the time,,,,,,,Most of the time,,,Most of the time,,Often,,,,,,,,,50,20,5,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Sometimes,,,100% of projects,Approximately half internal and half external,Standalone Team,Public databases of biological sequences,Vectorizing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,336000,DKK,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,"GitHub,Google Search",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"DataTau News Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,40,10,10,0,10,30,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Internet-based,20 to 99 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,QlikView,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,Often,Often,,,Sometimes,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Recommender Systems,Text Analytics",,,Sometimes,,,,Often,Often,,,,,,,,,,Sometimes,,,,,,Often,,,,,Most of the time,,,,,30,30,10,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Sometimes,,,,Often,,,,Often,Sometimes,Often,,,,,Most of the time,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,,,,,,7000000,JPY,,4,,,,,,,,,,,,,,,,,, +Male,Japan,48,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer,Other",Self-taught,40,10,10,40,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,100 to 499 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,RNNs","IBM Watson / Waton Analytics,Julia,MATLAB/Octave,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,Often,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs",Sometimes,Sometimes,Sometimes,Sometimes,,Most of the time,Often,Often,Often,,,,Often,Often,,Often,Often,Often,Often,Often,Often,,Often,Often,Often,Often,,Often,,,,,,20,20,30,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of significant domain expert input,Scaling data science solution up to full database",Sometimes,Often,,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,"Not employed, but looking for work",,,,,,,,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Predictive Modeler,Programmer,Researcher",Kaggle competitions,40,0,5,5,50,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Textbook",Very useful,Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,10,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,10 to 19 employees,Increased significantly,3-5 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10MB,"Ensemble Methods,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,RNNs,SVMs",,,,,,Most of the time,,,Most of the time,,,,,,,Often,,,Most of the time,,Often,,,,Most of the time,,,Often,,,,,,60,20,0,0,20,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Often,,Less than 10% of projects,Approximately half internal and half external,IT Department,Linguistic Data Consortium datasets;English Gigaword,hard to formalize the task for it to be solved with ML,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,"47,000",CHF,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Malaysia,26,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,SQL,Time Series Analysis,Python,Google Search,Other,,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,,Less than a year,I haven't started working yet,Self-taught,25,50,25,0,0,0,Time Series,Other (please specify; separate by semi-colon),A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Jack's Import AI Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,20,20,0,40,0,"Computer Vision,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"5,000 to 9,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,Impala,Java,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,,Rarely,,,,Rarely,,Sometimes,,,Rarely,Rarely,,,,,,,,Rarely,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Sometimes,Sometimes,Sometimes,,,,Rarely,,,Sometimes,Rarely,,Often,,Rarely,Often,Often,,,,60,10,10,5,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Sometimes,,,,Sometimes,,,Sometimes,Most of the time,Often,,Often,Sometimes,,Sometimes,,,26-50% of projects,More external than internal,Standalone Team,none,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Sometimes,"130,000",RUB,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,27,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Operations Research Practitioner,Self-taught,90,5,5,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Government,"10,000 or more employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,48,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,,Self-taught,40,10,40,10,0,0,Time Series,"Hidden Markov Models HMMs,Logistic Regression",A professional degree,Academic,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1MB,"Markov Logic Networks,Regression/Logistic Regression","IBM SPSS Statistics,R,Tableau",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,NA,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,SQL,Neural Nets,R,,"Company internal community,Friends network,Online courses,Personal Projects,YouTube Videos",,,,Very useful,,Very useful,,,,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,,Self-taught,90,5,5,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",,Financial,"10,000 or more employees",,,Some other way,Important,Other,,Relational data,Most of the time,,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,,Most of the time,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,Often,,Most of the time,,Sometimes,Sometimes,,,Often,Often,,,,,,,Most of the time,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"DataTau News Aggregator,Siraj Raval YouTube Channel,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,I don't write code to analyze data,Other,University courses,0,20,0,80,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Machine Learning,Monte Carlo Methods,Python,University/Non-profit research group websites,"Blogs,College/University,Company internal community,Textbook",,Very useful,Somewhat useful,Very useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Researcher,University courses,30,0,10,40,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,10 to 19 employees,Stayed the same,More than 10 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Other,Most of the time,1GB,"Bayesian Techniques,Neural Networks,RNNs,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,RNNs,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,IT Department,,EEG is hard to distinguisg/classify,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git,Mercurial",Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Greece,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,IBM SPSS Statistics,Anomaly Detection,SAS,I collect my own data (e.g. web-scraping),"Company internal community,Kaggle,Online courses,Textbook,YouTube Videos",,,,Very useful,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,50,10,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Bayesian Techniques,High school,CRM/Marketing,Fewer than 10 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Bayesian Techniques,Regression/Logistic Regression",SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Segmentation,Time Series Analysis",,,Often,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Sometimes,,,,15,20,25,15,25,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"30,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Singapore,37,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Udacity,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,49,Employed part-time,,,Yes,,Researcher,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Data Analyst,Self-taught,50,25,25,0,0,0,,,A bachelor's degree,Government,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Never,,,Tableau,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,40,10,10,20,20,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,Data Scientist,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,52,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,I don't write code to analyze data,Researcher,University courses,80,0,0,20,0,0,Time Series,Other (please specify; separate by semi-colon),High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",35,30,20,0,5,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Philippines,23,Employed part-time,,,Yes,,Other,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Very useful,,,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,,University courses,70,10,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Decreased slightly,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Never,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","NoSQL,Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,Rarely,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Most of the time,,,,,,,,,,Sometimes,,Sometimes,Often,Sometimes,Rarely,,Sometimes,,,,,Sometimes,Often,,,,,60,20,0,0,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Sometimes,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,Sometimes,,26-50% of projects,More external than internal,Other,MyPersonality.org's Personality Dataset,Finding the data that's actually needed,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"60,000",PHP,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,Russia,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Link Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Not Useful,,,,,,Somewhat useful,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,20,10,50,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,20 to 99 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,Rarely,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,Most of the time,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs",,,,,,,Most of the time,Often,,,,Often,,Sometimes,,Sometimes,,Rarely,Rarely,,,,Often,,,,,Sometimes,,,,,,60,15,5,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,,,,,,,Sometimes,,,Most of the time,Sometimes,,100% of projects,Entirely internal,Standalone Team,-,"dirty data, rules for merging different data sources are often incomplete","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Sometimes,2000000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,Deep learning,Python,Other,"Arxiv,Blogs,College/University,Conferences",Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",15,10,20,40,15,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression",Most of the time,,,,,Most of the time,Often,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,50,20,20,10,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,Other,Company Developed Platform,,"Git,Subversion",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,20,30,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,15,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,MATLAB/Octave,Text Mining,Python,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Kaggle,Online courses",,Very useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,,,,"FlowingData Blog,Linear Digressions Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Programmer",Self-taught,50,10,30,0,10,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,Fewer than 10 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,100GB,"Bayesian Techniques,CNNs,SVMs","Java,Python,R,SQL",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,Often,,,,,,,,,,"CNNs,Naive Bayes,Natural Language Processing,Neural Networks",,,,Sometimes,,,,,,,,,,,,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,,Often,,,,Often,,,,,26-50% of projects,More internal than external,IT Department,,normalize,Key-value store (e.g. Redis/Riak),I don't typically share data,,Git,Sometimes,"96,000",CNY,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,People 's Republic of China,47,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",< 1 year,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,Other","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Yes,Doctoral degree,Mathematics or statistics,Less than a year,"Researcher,Other",Self-taught,50,45,0,0,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Very Important,Not important,Somewhat important +Male,Other,19,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,54,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Proprietary Algorithms,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Official documentation,Personal Projects,Textbook",,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Researcher",Self-taught,80,0,20,0,0,0,Time Series,"Neural Networks - RNNs,Other (please specify; separate by semi-colon)",I prefer not to answer,Academic,I don't know,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Rarely,10MB,"Evolutionary Approaches,Neural Networks,RNNs","C/C++,MATLAB/Octave,Perl,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Neural Networks,RNNs,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,Most of the time,,,,30,20,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,Often,,,,,,,Sometimes,,,,Often,,10-25% of projects,More internal than external,IT Department,,,Graph (e.g. GraphBase/Neo4j),"Email,Share Drive/SharePoint",,Git,Sometimes,13000000,JPY,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +A different identity,Indonesia,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,R,University/Non-profit research group websites,"Blogs,Personal Projects,Textbook",,Very useful,,,,,,,,,,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Telecommunications,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Other,Relational data,Never,1GB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,Orange,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Sometimes,,Rarely,,,,,,,,,,,,Rarely,Sometimes,Sometimes,,Often,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Often,Most of the time,Most of the time,,,,,,Often,,Often,,,,,Sometimes,,,,,,,,,,,,,30,30,0,30,10,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,None,Entirely internal,IT Department,None,"Lack of knowledge of basic things on how to prepare, communicate and visualize data to domain expert so he/she can get better insight out of the data, and we can realize hidden gem from the data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,10300000,IDR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Operations Research Practitioner,Researcher,Software Developer/Software Engineer",Self-taught,25,0,25,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Video data,Other",Sometimes,1GB,"Random Forests,Regression/Logistic Regression","C/C++,Minitab,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Random Forests,Simulation,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Female,Germany,30,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,15,5,30,50,0,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,500 to 999 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Video data,Rarely,1GB,"Bayesian Techniques,CNNs,SVMs","C/C++,Python,Other,Other,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,Most of the time,Most of the time,"Bayesian Techniques,CNNs,Neural Networks",,,Often,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Manufacturing,100 to 499 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,<1MB,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,Simulation",,,Sometimes,,,Sometimes,Most of the time,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,7,35,23,13,22,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,Most of the time,,,,,,Most of the time,,,,,,,,,Most of the time,Most of the time,,51-75% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,23000,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Other,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Conferences,Kaggle,YouTube Videos",Very useful,Very useful,,,Very useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,85,5,10,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Internet-based,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics",,,,,,Sometimes,,,Often,,,,,,,Often,,,Often,Sometimes,,,Sometimes,Sometimes,,,,,Often,,,,,35,45,3,2,5,10,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Rarely,Often,,,,Sometimes,Sometimes,,,,,,Most of the time,,,,Often,Often,,10-25% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Bitbucket,Git",Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,Less than a year,"Data Miner,Other",Self-taught,40,30,20,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Rarely,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis,Other",,,,,,Most of the time,Most of the time,Sometimes,,,,,,Often,,Often,,,,Sometimes,Sometimes,,Sometimes,Rarely,,,,,,Often,,,Often,40,30,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,,Often,Sometimes,,,Often,,Most of the time,,,Sometimes,,,Sometimes,,,Most of the time,Often,,10-25% of projects,More internal than external,IT Department,,feature engineering ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,our own servers,Other,Rarely,4000000,AMD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,"Employed by college or university,Employed by a company that performs advanced analytics,Employed by non-profit or NGO,Employed by government",Python,Text Mining,R,"GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Online courses,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,,,,,Very useful,,,,Somewhat useful,,,Very useful,"Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,6 to 10 years,"Data Scientist,Programmer,Statistician",Self-taught,40,0,30,30,0,0,,,,Academic,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",,,"CNNs,HMMs,Markov Logic Networks,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,Most of the time,Most of the time,,,Often,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,HMMs,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,,,,,,Often,,,Most of the time,Most of the time,,,Often,Most of the time,Most of the time,Often,,,,Often,Often,Sometimes,Sometimes,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Privacy issues",Often,,,Sometimes,,Often,,,,,,,,,,,Often,,,,,,26-50% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,,3,,,,,,,,,,,,,,,,,, +Male,Netherlands,48,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,I never declared a major,More than 10 years,"Data Miner,Other","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,500 to 999 employees,Decreased significantly,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,100GB,"Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks,Python,QlikView,R,SQL,Tableau,TensorFlow,Other,Other",,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Rarely,Often,,,,,,,,,Most of the time,,,Rarely,Sometimes,,,Most of the time,Sometimes,,"CNNs,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Text Analytics",,,,Rarely,,,,Most of the time,,,,,,,,Most of the time,,,Sometimes,Rarely,,,Most of the time,Sometimes,,,,,Often,,,,,80,15,0,0,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data,Other",Sometimes,Often,,Most of the time,,,,Most of the time,Sometimes,,,,Often,,Often,,,,,,Most of the time,Most of the time,Less than 10% of projects,More internal than external,IT Department,"CBS (dutch central bureau fo statistics); Acxiom; mosaic finergy; ",According to our security dept: source data may not exist outside our legacy mainframe systems and - for most part - may not be processed/ingested by open source tooling. This viewpoint cripples most AI/Analytics projects.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,90000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Statistica (Quest/Dell-formerly Statsoft),Bayesian Methods,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Software Developer/Software Engineer",Self-taught,70,0,0,0,24,6,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",,Internet-based,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Neural Networks,PCA and Dimensionality Reduction",,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,30,30,20,10,10,0,,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,None,,Standalone Team,,,,,,,,60000,,,3,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,70,20,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM Watson / Waton Analytics,Impala,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,QlikView,R,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,Sometimes,Often,,,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,Often,,,,,,Often,,,Often,,,Often,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Sometimes,,,,,Often,Sometimes,,,,,,Sometimes,,Often,,Sometimes,,,Often,,Sometimes,,,Often,,,,Sometimes,,,,50,20,0,30,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,France,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Social Network Analysis,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Personal Projects,Tutoring/mentoring",,Somewhat useful,,,,Very useful,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Machine Learning Engineer,Statistician",Work,10,10,50,10,20,0,Adversarial Learning,"Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",High school,CRM/Marketing,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,Sometimes,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,Random Forests",Rarely,,,,Rarely,,,Often,,,,,,,Often,Often,,,,Sometimes,,,Sometimes,,,,,,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Privacy issues",Often,Sometimes,,,Often,,,,,,,,,,,,Most of the time,,,,,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Git,Rarely,"65,000",,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Belarus,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,5,5,40,50,0,0,"Computer Vision,Machine Translation",,,Internet-based,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,15,25,0,10,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important +Male,Russia,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Social Network Analysis,Python,"Google Search,Government website","Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,30,40,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Relational data,Other",Sometimes,100MB,Gradient Boosted Machines,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs,Time Series Analysis",Often,Sometimes,Sometimes,,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,,,Often,,,,,Sometimes,,Most of the time,,,,60,10,15,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,Often,,Most of the time,Often,,76-99% of projects,Entirely internal,IT Department,,It's dirty,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,1500000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,Not Useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,Work,30,20,40,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,Less than one year,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,,Rarely,,,Rarely,Most of the time,,,,,,Rarely,,Most of the time,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Often,,,,Most of the time,Most of the time,Most of the time,Often,,,,,,,,,,,Often,Sometimes,,Often,,,Sometimes,,,,Sometimes,,,,5,10,0,5,30,50,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,,Often,,100% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Sharing data on common internal VMs,Bitbucket,Never,48000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Programmer",University courses,10,40,20,20,10,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Decision Trees,Logistic Regression,Naive Bayes",Often,Often,,,Rarely,,,Sometimes,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,30,30,5,5,30,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,30,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,20,0,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Pharmaceutical,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Most of the time,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,Sometimes,,Often,Most of the time,Often,,,,Often,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,20,20,25,15,20,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,76-99% of projects,More internal than external,IT Department,genomic variation data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,65000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Data Analyst,Engineer,Machine Learning Engineer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,"Not employed, but looking for work",,,,,,,,Python,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Conferences,Official documentation,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,FlowingData Blog,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Computer Vision,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,24,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Spark / MLlib,Time Series Analysis,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Researcher,Work,30,0,60,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,R,TensorFlow",,,,Often,,,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,Often,,,Often,Often,Often,Often,,,Most of the time,,Sometimes,,,,Often,,Often,Sometimes,,Often,,,Often,Most of the time,,,,,,,65,3,2,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Sometimes,,,,Sometimes,,,,,,,,,Sometimes,Sometimes,,,100% of projects,More internal than external,Other,MC Generated samples; ,data collection,Other,Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Most of the time,15000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,16,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,50,10,0,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Mix of fields,,,,,"N/A, I did not receive any formal education",Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Video data,Text data",,1GB,"CNNs,HMMs,Random Forests,Regression/Logistic Regression","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Decision Trees,Natural Language Processing,Neural Networks,Random Forests",,,,Sometimes,,,,Sometimes,,,,,,,,,,,Most of the time,Often,,,Rarely,,,,,,,,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Other",,,,,,Often,,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher",Kaggle competitions,30,0,0,0,70,0,Computer Vision,"Ensemble Methods,Gradient Boosting,Neural Networks - CNNs",I prefer not to answer,Technology,100 to 499 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Very important,Other,GPU accelerated Workstation,Image data,Rarely,1TB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,Most of the time,,Most of the time,,Sometimes,Most of the time,,,Most of the time,,Sometimes,,Often,,,,Most of the time,Sometimes,,Rarely,,,Often,,,,,,,,30,50,5,5,10,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,39,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,Amazon Web services,Time Series Analysis,R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,Somewhat useful,,,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,R,"Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Statistician",University courses,20,10,30,30,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,MATLAB/Octave,Python,R,SQL,Tableau,TIBCO Spotfire",,Sometimes,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Often,,Often,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,,,Sometimes,Most of the time,Most of the time,Often,,,,Often,,Sometimes,,Often,,,Rarely,,Sometimes,,Often,Sometimes,,Often,,,,Sometimes,,,,30,15,15,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,,,,,,,Sometimes,,Often,,Sometimes,,,,76-99% of projects,More internal than external,Business Department,,Finding the right person to ask to get the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,,Never,42000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Norway,43,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,15,15,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,I don't know,Increased slightly,,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",,10GB,"GANs,HMMs,Neural Networks,Random Forests","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Often,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,GANs,HMMs,Neural Networks,Random Forests",,,,,,Often,Often,,Often,,Most of the time,,Sometimes,,,,,,,Most of the time,,,Often,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,France,32,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Engineer,Self-taught,80,20,0,0,0,0,Computer Vision,Neural Networks - CNNs,,Government,500 to 999 employees,Decreased significantly,Don't know,Some other way,Somewhat important,Other,GPU accelerated Workstation,Video data,,,"CNNs,Neural Networks","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,KNIME (free version),Orange,Python,R,TensorFlow",,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Neural Networks",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,10,0,10,20,60,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,,Other,Serveur de donnÌÎå©es,"Git,Subversion",,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Support Vector Machines (SVM),Python,GitHub,"College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,"Data Machina Newsletter,Linear Digressions Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,Less than a year,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,NA,85,15,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Malaysia,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,"DataTau News Aggregator,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Data Analyst,Self-taught,40,10,30,20,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Financial,"1,000 to 4,999 employees",Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Sometimes,100GB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,Often,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",Sometimes,,Sometimes,,,Sometimes,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,,,,40,10,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Often,Often,,,Often,,,,,,Most of the time,,Sometimes,Sometimes,Sometimes,,,,10-25% of projects,More internal than external,Central Insights Team,,Correctly understanding the meaning behind the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,72000,NZD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",70,0,25,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10GB,CNNs,"Amazon Web services,Flume,Java,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Most of the time,,,,,Often,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,Recommender Systems",Rarely,,,Often,,Sometimes,Often,,,,,,,Most of the time,,,,,,Most of the time,,,,Often,,,,,,,,,,30,50,5,5,10,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by non-profit or NGO,TensorFlow,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,YouTube Videos,Other",,Very useful,,,,,,,,,,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",30,20,10,30,10,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression",A master's degree,Non-profit,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Never,100MB,Random Forests,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,60,10,5,20,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,Often,,Often,Often,Often,,,Often,,,Often,,,,Often,,,Less than 10% of projects,More internal than external,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Email,I don't typically share data",,Subversion,Rarely,2700000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Other,31,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,R,Google Search,"Blogs,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,"Coursera,DataCamp,edX",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,India,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),,Data Scientist,Self-taught,50,10,10,10,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,,,A master's degree,Mix of fields,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,10GB,,Spark / MLlib,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,"Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,Sometimes,,,,,,Often,Often,,,100% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Not Useful,No Free Hunch Blog,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,,,"Coursera,edX,Udacity",GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,45,0,5,10,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important +Female,Japan,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A health science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Uplift Modeling,Python,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",University courses,40,0,0,60,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Sometimes,1GB,Other,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,Often,,,,,,Often,,,,,,,,,Often,,,,,Often,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,kNN and Other Clustering,Logistic Regression,Recommender Systems,Time Series Analysis",,Often,Often,,Often,,,,,,,,,Often,,Often,,,,,,,,Often,,,,,,Often,,,,90,10,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Privacy issues",Often,Often,,,,,,,,,,,,,,,Often,,,,,,100% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,,70000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,70,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,33,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,6 to 10 years,"Researcher,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,DataRobot,Neural Nets,Java,Google Search,"College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,Nice to have,Nice to have,,,,edX,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,I don't write code to analyze data,I haven't started working yet,University courses,20,0,10,70,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Friends network,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Somewhat useful,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Statistician",Work,30,10,40,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,500 to 999 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,Rarely,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",Often,,,,,Often,Most of the time,Rarely,,,,Rarely,,,,Often,,,,,Rarely,,Rarely,,,,Rarely,,,Sometimes,,,,20,20,15,25,20,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Need to coordinate with IT",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,51-75% of projects,More internal than external,Standalone Team,"Alexa, and domain/IP whois data",It's not so well structured,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,45000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Australia,55,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Programmer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,90,0,0,10,0,0,"Computer Vision,Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Gradient Boosting",A master's degree,Pharmaceutical,Fewer than 10 employees,Decreased slightly,Less than one year,I visited the company's Web site and found a job listing there,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",,1GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,32,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,R,Neural Nets,Python,I collect my own data (e.g. web-scraping),"College/University,Company internal community,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,Very useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,Less than a year,"Business Analyst,Engineer,Researcher",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,A doctoral degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",,1GB,Neural Networks,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,10,40,5,15,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others",Sometimes,,,,,Often,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,80000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,,,,,Textbook,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Machine Learning Engineer,University courses,38,0,0,60,2,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,10 to 19 employees,Stayed the same,1-2 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,Always,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs",,,,Rarely,,Often,Often,,,,,,,,,Most of the time,,,Most of the time,Sometimes,,,Rarely,,Rarely,,,Sometimes,,,,,,40,20,20,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Often,Often,,,,,,Often,,,,,Most of the time,,10-25% of projects,More internal than external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Git,Never,70000,,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Poland,29,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,40,40,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Increased significantly,3-5 years,A general-purpose job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Neural Networks,Random Forests","Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,Python,SQL,Tableau,TensorFlow",Often,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,43,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,MATLAB/Octave,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Physics,More than 10 years,"Engineer,Researcher",Self-taught,100,0,0,0,0,0,Time Series,"Bayesian Techniques,Markov Logic Networks",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Australia,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,47,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,R,Monte Carlo Methods,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",University courses,70,0,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs",A bachelor's degree,Insurance,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,100MB,,"Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,20,40,20,10,10,0,,"Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT",Often,,,,,,,,,,,,,,Sometimes,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Other,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Recommendation Engines,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,,"Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Microsoft Azure Machine Learning,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,,"Data Visualization,Neural Networks,Recommender Systems,Text Analytics",,,,,,,Often,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,Sometimes,,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,Sometimes,,,,,,,,,Rarely,,,,26-50% of projects,Entirely internal,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Subversion,Sometimes,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,35,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,5,0,5,0,,,Primary/elementary school,Internet-based,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Most of the time,,"Bayesian Techniques,Decision Trees,Random Forests","Java,NoSQL,Python,SQL",,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Natural Language Processing",Sometimes,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,70,10,5,5,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Organization is small and cannot afford a data science team",Often,,,,,Often,,,,,,,,,,Often,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,30000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +A different identity,India,37,Employed full-time,,,No,Yes,Other,Poorly,Employed by college or university,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,,"Coursera,DataCamp",Other,2 - 10 hours,Other,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,30,30,NA,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,28,Employed part-time,,,No,Yes,Researcher,Perfectly,Employed by college or university,MATLAB/Octave,,Matlab,GitHub,"College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",1-2 years,,,,,,,,,,,,,,,Workstation + Cloud service,2 - 10 hours,PhD,No,Doctoral degree,Electrical Engineering,1 to 2 years,Operations Research Practitioner,Work,60,20,10,10,0,0,Computer Vision,Support Vector Machines (SVMs),A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,France,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,France,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,Neural Nets,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,60,20,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Rarely,100MB,"Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,,,Sometimes,Often,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,Sometimes,,Rarely,,,,,,60,20,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,Often,,,,Often,,,,,,,Often,Sometimes,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,70000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,1 to 2 years,"Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer",Work,30,10,30,10,20,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Internet-based,500 to 999 employees,Decreased significantly,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks","Amazon Web services,Impala,Jupyter notebooks,Python,SQL,Stan,Tableau",,Most of the time,,,,,,,,,,,,Rarely,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,Often,,Sometimes,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Lift Analysis,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Text Analytics",Sometimes,Sometimes,Often,,Often,Most of the time,Most of the time,,,,,Often,,,Sometimes,,,,Often,,Often,,,Often,,Sometimes,,,Often,,,,,35,15,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,Often,Most of the time,Most of the time,,Often,Sometimes,,,,,Often,,,,,Sometimes,Often,Often,,Less than 10% of projects,Entirely internal,Central Insights Team,nothing,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,7000000,JPY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,"Computer Vision,Recommendation Engines","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by government",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,,,Very useful,,"FastML Blog,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Engineer,Researcher",Work,50,20,20,5,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",High school,Other,"5,000 to 9,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Other",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,,,,,Most of the time,Often,,,,,,Often,,Often,,Often,,Often,Often,,,,,Often,,Often,,Often,,,,70,10,5,5,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Sometimes,,,,,,,,,,,,Often,,,Sometimes,Sometimes,,51-75% of projects,Approximately half internal and half external,Other,"Kaggle, open data ",Quick access,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Spain,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Machine Learning Engineer,University courses,10,5,25,60,0,0,"Natural Language Processing,Speech Recognition,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,20 to 99 employees,Increased significantly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Bayesian Methods,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,,,,Necessary,Necessary,Necessary,Necessary,,,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Other",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Data Analyst,University courses,0,0,100,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Bayesian Techniques,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,,,,,,,,,,,,,Very Important,,, +Female,Greece,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,20,10,10,60,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,Technology,I don't know,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,,"Bayesian Techniques,Decision Trees","Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,SQL,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Naive Bayes,Recommender Systems,Segmentation",,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,,Often,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,60,30,5,5,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,23,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Company internal community,Kaggle",Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,Self-taught,50,0,20,30,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,20,"Not employed, but looking for work",,,,,,,,Amazon Web services,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,FastML Blog,FlowingData Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,"Natural Language Processing,Speech Recognition",Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,10,10,0,0,Recommendation Engines,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important +Male,Russia,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Data Scientist,Predictive Modeler,Statistician",Work,40,25,5,5,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,Often,Rarely,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",Often,,Rarely,,,Sometimes,Often,Rarely,,,,Rarely,,,,Often,,,Sometimes,Rarely,Often,,Often,,,,,Rarely,Often,Sometimes,,,,60,20,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Data Analyst,Programmer,Researcher,Statistician",Work,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +,,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A doctoral degree,Manufacturing,100 to 499 employees,Increased slightly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,,,Other,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Other",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,80,0,0,20,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,Most of the time,1800000,INR,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Data Scientist,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",40,20,20,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Retail,I don't know,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,QlikView,R,Tableau",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,Most of the time,,Sometimes,Often,,,,,,Most of the time,Often,Often,,,,,,,,,,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities",,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Researcher",University courses,60,0,0,40,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Time Series","Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,100 to 499 employees,Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,100GB,"Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,R,SAS Base,SQL",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,,,,,"Cross-Validation,Text Analytics,Time Series Analysis",,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,50,10,0,10,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,,,Often,Most of the time,,,26-50% of projects,More external than internal,Other,,,Other,"Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Always,135000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,DataCamp,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,Engineer,University courses,30,10,0,60,0,0,,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",I don't know/not sure,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,Other,27,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Company internal community,Personal Projects",,,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Predictive Modeler,Statistician",Self-taught,40,0,0,40,0,20,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Female,India,37,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,Very useful,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Programmer,Researcher,Other",,50,30,0,20,0,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,Academic,100 to 499 employees,,3-5 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data,Relational data",Sometimes,100MB,"Neural Networks,Regression/Logistic Regression,Other","C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,Often,,,,,,Often,,,,,,Often,Often,,,,,Often,,,Often,Often,,Often,,,,,,,,Often,Most of the time,,,,Often,,Most of the time,,,,"Association Rules,Cross-Validation,kNN and Other Clustering,Neural Networks",,Often,,,,Often,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,30,40,0,5,15,10,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,Often,Often,,,,,,Most of the time,,,Most of the time,,,10-25% of projects,More external than internal,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,70000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Portugal,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,Julia,Other,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,25,0,0,75,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs",,Non-profit,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Relational data",Most of the time,10GB,"CNNs,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests","C/C++,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation",,,,Often,,Most of the time,Most of the time,,Often,Often,,Often,,Often,,,,,,Often,Often,Sometimes,Often,,,Rarely,Rarely,,,,,,,15,50,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,Often,,,Most of the time,Often,,100% of projects,Entirely external,Other,,Lack of domain expertise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,25000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Data Scientist,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,0,30,15,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,38,Employed full-time,,,Yes,,Researcher,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by non-profit or NGO",TensorFlow,Neural Nets,R,University/Non-profit research group websites,"Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,More than 10 years,Researcher,University courses,35,10,5,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Increased slightly,More than 10 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Relational data,Never,<1MB,Evolutionary Approaches,"Java,Jupyter notebooks,Python,R,Unix shell / awk",,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,Sometimes,Often,,,Often,,,,,,,,,,,,Often,,,,,Often,,,Sometimes,,,,20,50,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,Often,Sometimes,,,,,,Often,Sometimes,,76-99% of projects,More external than internal,Other,,Experimental data by external biologists sometimes spread in many excel files in slightly different formats among the files.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,JPY,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Russia,42,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Oracle Data Mining/ Oracle R Enterprise,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,,Online Courses and Certifications,No,Professional degree,,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,India,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,,,Very useful,,,,,Very useful,,,,,Siraj Raval YouTube Channel,< 1 year,,,,,,Necessary,,,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,Somewhat important,,,,,,,, +Male,Greece,45,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Bayesian Methods,C/C++/C#,University/Non-profit research group websites,"Arxiv,Conferences,Personal Projects,Textbook",Very useful,,,,Very useful,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Computer Scientist,University courses,40,0,0,60,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,,Very important,Research that advances the state of the art of machine learning,,"Image data,Text data,Relational data",Sometimes,,"Bayesian Techniques,HMMs,Markov Logic Networks,Neural Networks,SVMs","C/C++,Java,MATLAB/Octave,TensorFlow",,,,Often,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Often,Sometimes,,Often,Sometimes,Sometimes,,Sometimes,,,Sometimes,Most of the time,,Sometimes,,Sometimes,,Often,Often,,Sometimes,,,Often,Sometimes,Often,,Sometimes,,,,10,80,5,5,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Lack of significant domain expert input",,,,Often,,,,,,,Often,,,,,,,,,,,,10-25% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,22000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,55,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,57,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,66,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,Python,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",50,30,10,0,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","Amazon Web services,Julia,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Often,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,RNNs,SVMs",,Often,,Often,,Often,Often,Often,,,Often,,,Often,,Often,,Often,,,,,Often,,Often,,,Often,,,,,,50,20,10,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,Most of the time,,,Most of the time,,,,,Most of the time,,Often,,,,Often,,,,,Often,,10-25% of projects,More internal than external,Standalone Team,Do not know,Clean the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Git,Rarely,"100,000",EUR,Has decreased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Other,18,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,40,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Other",University courses,5,0,0,90,5,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,High school,Mix of fields,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,28,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,GitHub,"Arxiv,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,25,25,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,45,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by government,TensorFlow,Deep learning,R,Google Search,"Arxiv,Blogs,Conferences,Kaggle,YouTube Videos",Very useful,Somewhat useful,,,Very useful,,Very useful,,,,,,,,,,,Somewhat useful,Data Machina Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,Researcher,Work,70,0,15,0,0,15,Outlier detection (e.g. Fraud detection),Bayesian Techniques,A doctoral degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Business Analyst,Software Developer/Software Engineer,Other",University courses,30,0,20,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Markov Logic Networks",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Most of the time,100GB,Evolutionary Approaches,"Java,SQL",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Evolutionary Approaches,Prescriptive Modeling",,,,,,,Often,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,,,,,10,70,5,10,5,0,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Japan,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Python,Rule Induction,Haskell,Google Search,Blogs,,Very useful,,,,,,,,,,,,,,,,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,,,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,90,10,0,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Image data,Sometimes,1GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests","Java,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Other",Sometimes,Often,,Most of the time,,Often,Most of the time,Often,Often,,Sometimes,Most of the time,,Often,,Most of the time,,,Often,,Most of the time,,Often,Sometimes,,Often,,Often,,,Often,,,30,30,0,5,35,0,,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Sometimes,,,,,,,,,,Often,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Rarely,"70,000",,I was not employed 3 years ago,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Deep learning,R,University/Non-profit research group websites,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,,,,,,Often,Often,,,,,,,,,,,,,,Often,Often,,Sometimes,,,Often,,,,,30,30,20,0,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization",,Often,,Most of the time,Sometimes,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,Very limited features,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Most of the time,2400000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Italy,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,TensorFlow,,Python,Google Search,Textbook,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Telecommunications,,,,,Somewhat important,,,,,,,"C/C++,Python",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,None,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,30,30,0,40,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,Python",,,,,,,,Sometimes,,,,,,,,,Often,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,Sometimes,,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Rarely,,,Often,Often,Sometimes,,Sometimes,,Often,Often,,,Often,Often,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Often,,Sometimes,Often,,,,,,,Sometimes,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,Kaggle datasets;found through google,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Bitbucket,Sometimes,60000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Israel,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,,,,,,Very useful,,,Very useful,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,15,15,40,15,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Internet-based,"1,000 to 4,999 employees",Stayed the same,6-10 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,,,,,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Often,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,Often,Often,Often,Often,,,Often,,Sometimes,,Often,,,Most of the time,Sometimes,,,Often,,,,,,Most of the time,,,,,40,15,30,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,Often,Often,,,Often,,,,,,,,,,,,Often,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Sometimes,,,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Portugal,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,"Data Analyst,Data Miner,Predictive Modeler,Other",University courses,80,5,10,5,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Python,QlikView,R,Tableau",,Most of the time,,,,,,,,,Rarely,Rarely,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,3 to 5 years,"Data Analyst,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Portugal,40,Employed part-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Random Forests,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Operations Research Practitioner,Researcher",University courses,0,0,0,100,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A doctoral degree,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SQL",Often,Often,,,,,,,,,,,,,,,,,,,Often,,Often,,Often,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Segmentation,Simulation,Time Series Analysis",,,Often,,,,,Often,,Often,,,,Often,,Often,Often,Often,Often,Often,,,,,,Often,Often,,,Often,,,,35,35,10,0,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,10-25% of projects,Approximately half internal and half external,Other,arcgis,reliability ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,15000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Ukraine,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",Self-taught,30,30,10,20,10,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning",Support Vector Machines (SVMs),A master's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Java,TensorFlow",Sometimes,Often,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,"Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,,,,,,,,,,,,,,Often,Sometimes,Often,,,,,,,Sometimes,,,,,,20,50,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,Often,Often,,Most of the time,Often,,,,,,,,Most of the time,Often,,10-25% of projects,Entirely internal,Standalone Team,kaggle sample datasets,preprocessing and combining,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,320000,LKR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,New Zealand,48,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,Other,Self-taught,88,1,10,0,1,0,Time Series,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",High school,Financial,500 to 999 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Somewhat important,Other,Traditional Workstation,Relational data,Always,1GB,"Gradient Boosted Machines,Regression/Logistic Regression","C/C++,R,SQL",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,Often,,Sometimes,Sometimes,Often,,,,,Rarely,,,,,Often,,,,Often,,,,60,15,5,5,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Rarely,,,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,"census,redbook,rms models",Our traditional BI team have messed up at a very fundamental level and we need to start again. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,300000,NZD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Conferences,Kaggle,Personal Projects",,,,,Very useful,,Very useful,,,,,Very useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Statistician",University courses,10,10,0,70,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Very Important,Very Important +Male,India,25,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Cluster Analysis,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Data Analyst,Data Miner,Engineer",Self-taught,0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,"Bayesian Techniques,Data Visualization,Random Forests,Segmentation",,,Often,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,40,20,0,30,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Not Useful,,,,,,,,Not Useful,,,Not Useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Talking Machines Podcast",< 1 year,Unnecessary,,,,,,,,,,,,,"Coursera,DataCamp,edX,Udacity",Workstation + Cloud service,0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,Not important,,,,,,,,,,,, +Male,Spain,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Jupyter notebooks,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Newsletters",,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,40,0,40,20,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,,Academic,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,NoSQL,Python,SQL,Other",,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Often,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Text Analytics",Often,,,,,Rarely,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Often,,,,,,Sometimes,,,,,30,20,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Other",Most of the time,15000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Biology,6 to 10 years,"Data Analyst,Data Scientist,Programmer,Researcher",Work,50,0,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Julia,Deep learning,R,,"College/University,Conferences,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Data Analyst,Researcher",University courses,10,0,40,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,I don't know,Increased slightly,More than 10 years,Some other way,Very important,Other,Laptop or Workstation and local IT supported servers,,,,"Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Most of the time,Rarely,,,,,,Often,,Most of the time,,,,,Most of the time,,Sometimes,,,,,,,,,,,20,30,0,40,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Most of the time,,,,Sometimes,,,,,,,Most of the time,,,100% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Most of the time,34000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Pakistan,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Python,Link Analysis,SAS,GitHub,"Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,,,Somewhat useful,,Very useful,,,,,,,Not Useful,"FlowingData Blog,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,Survival Analysis,"Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",,Technology,,,,,Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,100GB,Random Forests,"NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Gradient Boosted Machines,Random Forests",Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,30,20,20,10,20,0,Enough to tune the parameters properly,"Dirty data,I prefer not to say,Privacy issues",,,,,Often,,Sometimes,,,,,,,,,,Most of the time,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,Key-value store (e.g. Redis/Riak),Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,1200000,PKR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,51,Employed full-time,,,Yes,,Researcher,,Employed by college or university,,,,,"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Self-taught,50,30,20,0,0,0,Other (please specify; separate by semi-colon),Neural Networks - CNNs,High school,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Sometimes,1GB,Neural Networks,"MATLAB/Octave,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,"Neural Networks,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,30,30,0,0,40,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Friends network,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Predictive Modeler,Programmer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,60,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Government,10 to 19 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,,,Regression/Logistic Regression,"MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,25,35,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,Most of the time,,,,,Most of the time,,76-99% of projects,More external than internal,Standalone Team,Remotely sensed data; global ocean assimilated model outputs;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,360000,PHP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Very useful,Very useful,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,edX,Laptop or Workstation and local IT supported servers,2 - 10 hours,Other,Sort of (Explain more),Bachelor's degree,Physics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,40,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Python,Deep learning,Python,"Google Search,Government website,University/Non-profit research group websites","Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,Some other way,Somewhat important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Text data,,,"Random Forests,Regression/Logistic Regression","R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,20,5,20,20,35,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Often,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,we use mostly public or inhouse databases,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Other,Rarely,80000,CHF,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Online courses,Tutoring/mentoring",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,500 to 999 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Text data,Relational data",Sometimes,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Impala,Java,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,Often,Sometimes,,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,Most of the time,Sometimes,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Logistic Regression,Neural Networks,Recommender Systems,Text Analytics",Often,Sometimes,,,Sometimes,,Often,,,,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,,,,Most of the time,,,,,20,30,30,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,,Often,,,,,,,,100% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,,8,,,,,,,,,,,,,,,,,, +Male,Turkey,51,Employed full-time,,,Yes,,Programmer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer",Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,"FastML Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important +Male,India,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,NA,"Not employed, but looking for work",,,,,,,,Other,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,10,20,20,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Germany,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Unix shell / awk,Bayesian Methods,SQL,Google Search,"Arxiv,Blogs,Conferences,Friends network,Stack Overflow Q&A",Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,,,,Very useful,,,,,"KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",15,25,50,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10GB,"Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,Random Forests,Simulation",Sometimes,,Sometimes,,,Often,,,,,,Often,,,,,,,,,,,Often,,,,Often,,,,,,,65,15,5,10,5,0,Enough to tune the parameters properly,"Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,Sometimes,,,,,,,,,Often,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,60000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Russia,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,Somewhat useful,,,Somewhat useful,,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Statistician",University courses,30,30,30,10,0,0,"Adversarial Learning,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Workstation + Cloud service,Other",Relational data,Sometimes,1PB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,Unix shell / awk",,,,,,,,,Most of the time,,,,,Often,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Time Series Analysis",,,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,Sometimes,,,Sometimes,,,,,,Sometimes,,,,50,10,10,5,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues",,,,,Often,,,,,,,,,,,,Most of the time,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,,180000,RUB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by college or university,SQL,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Not Useful,,,Somewhat useful,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"FlowingData Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Logistic Regression",A professional degree,Academic,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,Java,Mathematica,MATLAB/Octave,Perl,Python,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,,,,,Rarely,Often,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,,Sometimes,,,,Often,,,,,,,,,Often,,,,,,Sometimes,,,,,Most of the time,,,Sometimes,,,,70,10,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,Often,Often,,,Sometimes,,Often,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Russia,25,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +"Non-binary, genderqueer, or gender non-conforming",United Kingdom,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Other,Sort of (Explain more),Bachelor's degree,Physics,6 to 10 years,I haven't started working yet,Self-taught,90,0,0,0,10,0,Time Series,"Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,United States,41,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Uplift Modeling,Python,"Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Friends network,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,Very useful,,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Not Useful,,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Work,40,30,20,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Government,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,R,SAS Enterprise Miner,Tableau",,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Lift Analysis,Logistic Regression,Prescriptive Modeling,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,Often,,,,,,Often,,,,,Sometimes,,Rarely,Sometimes,,,,50,10,5,5,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,Rarely,Rarely,Most of the time,Often,,Most of the time,Often,,,,,Most of the time,,,,,,Often,Often,,Less than 10% of projects,Entirely internal,Other,"Client data, so sometimes includes public survey data. As the federal sector, our clients rarely feel comfortable or have clear authority to use commercial, crowd-sourced or other (mobile etc) data.","Understand how to wrangle data sets into the appropriate set up is a challenge I often face in my team - and there's less discussion of this quotidian aspect in the literature than the methods. Another is imputation, joining multiple data sources and hoping it maintains record integrity. Lastly, most organizations are evolving so there's a trade off between rich predictors and more substantial time frame.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,180000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,India,NA,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,10,10,10,60,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Decreased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,1GB,"Bayesian Techniques,Neural Networks,Random Forests,SVMs","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Sometimes,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,Rarely,,,Often,,Sometimes,,,,,Sometimes,,Sometimes,,,,40,5,2,23,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,Sometimes,,,Most of the time,Sometimes,,,,,Most of the time,,Often,,,Often,Sometimes,,51-75% of projects,More internal than external,Central Insights Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,55,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,More than 10 years,Researcher,University courses,15,5,70,10,0,0,Time Series,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Academic,,,,,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,Bayesian Techniques,"Mathematica,MATLAB/Octave,Python,R,SAS Base,SAS JMP,SQL",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,Often,,Often,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses",,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer,Statistician",University courses,0,0,20,80,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Internet-based,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,100MB,Regression/Logistic Regression,"Amazon Web services,C/C++,IBM SPSS Statistics,Java,Jupyter notebooks,NoSQL,Python,R,SAS Base,Spark / MLlib,SQL,Tableau",,Often,,Rarely,,,,,,,,Rarely,,,Rarely,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,Rarely,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,Often,,,,,Sometimes,,,,,Sometimes,Most of the time,Rarely,,Sometimes,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Often,Often,,,Often,,Most of the time,Often,,,Most of the time,,Often,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,,45000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Tableau,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,1-2 years,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,Self-taught,30,70,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,,,,Not important,Very Important,Very Important +Male,India,45,Employed full-time,,,No,Yes,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,Statistica (Quest/Dell-formerly Statsoft),Cluster Analysis,R,I collect my own data (e.g. web-scraping),"Newsletters,Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,3-5 years,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,,Sort of (Explain more),Professional degree,,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Support Vector Machines (SVM),Python,Google Search,"Non-Kaggle online communities,Official documentation,Online courses,YouTube Videos",,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,Very useful,"Linear Digressions Podcast,Talking Machines Podcast",1-2 years,Nice to have,,,,Nice to have,Unnecessary,Nice to have,,,,,,,DataCamp,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,30,50,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,,,Somewhat important,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,,Python,University/Non-profit research group websites,"College/University,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,More than 10 years,Other,Self-taught,50,0,30,20,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data",,100GB,"Decision Trees,Neural Networks,Random Forests","C/C++,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Python,R",,,,Often,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Decision Trees,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,,,Often,,,,,,Often,,,,,Sometimes,Often,Sometimes,,Often,,,,,,Sometimes,,,,,50,20,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization",,,,,,Often,,,Often,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,social media; public records,lack of API's,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Canada,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,Trade book,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Data Analyst,Self-taught,40,30,0,0,30,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Relational data",Most of the time,10GB,"Neural Networks,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,MATLAB/Octave,Minitab,Python,R,SQL",,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,Sometimes,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Most of the time,,,,Often,Most of the time,,Often,,,,Often,Often,,Often,,,,20,40,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,Often,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,90000,CAD,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,No,Yes,Other,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",Google Cloud Compute,Decision Trees,Python,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,66,25,0,0,3,6,"Adversarial Learning,Computer Vision,Recommendation Engines",Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by college or university,SQL,Cluster Analysis,Java,GitHub,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Nice to have,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Poorly,Self-employed,Microsoft Excel Data Mining,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Podcasts,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Work,20,10,50,15,5,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",CRM/Marketing,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,GANs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Perl,Python,QlikView,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,TIBCO Spotfire",Sometimes,Often,,Most of the time,Sometimes,,,,Often,,,Sometimes,Sometimes,,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,Sometimes,,Often,,,Often,Most of the time,Often,Most of the time,,Rarely,,,,,,Rarely,Most of the time,,,Often,,Rarely,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,GANs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",Most of the time,,Most of the time,,,Often,Most of the time,Most of the time,,Sometimes,Sometimes,,,Often,Often,Most of the time,Sometimes,Often,Most of the time,Most of the time,,Most of the time,,Most of the time,Often,,,,Most of the time,Sometimes,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,Often,Often,Sometimes,Sometimes,,Often,Most of the time,Often,Often,Often,Often,Often,Rarely,Sometimes,Often,Often,Sometimes,Often,Sometimes,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,26,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,College/University,Conferences,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,Very useful,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,0,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,Vietnam,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,Data Analyst,Other,20,10,10,40,10,10,"Machine Translation,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,Academic,20 to 99 employees,Increased significantly,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Other",,,,,,,Very useful,,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Other,Sort of (Explain more),Bachelor's degree,Other,,"Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,36,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,85,0,0,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),,"Operations Research Practitioner,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,Computer Vision,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,Spain,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Julia,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A",Very useful,,,,Somewhat useful,Very useful,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Programmer,Researcher",Self-taught,80,0,5,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,High school,Academic,20 to 99 employees,Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Basic laptop (Macbook),Image data,Most of the time,10MB,"Bayesian Techniques,CNNs,Neural Networks","MATLAB/Octave,Python,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,Most of the time,,,"Bayesian Techniques,CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Simulation",,,Sometimes,Often,,,,,,,,,,Rarely,,,,,,Often,Often,,,,,,Often,,,,,,,10,50,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Often,,,,,,,,,,Sometimes,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,Other,Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,26400,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Malaysia,23,Employed part-time,,,No,Yes,Other,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,6 to 10 years,Data Analyst,University courses,25,0,40,35,0,0,,Decision Trees - Random Forests,A professional degree,Other,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,,,Very useful,,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,34,33,0,33,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Other",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,,,Often,,Sometimes,Often,,Often,,Often,,,,,,Often,,,,,50,10,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Most of the time,Most of the time,,,,,Sometimes,,,,,,,,,,,Often,Most of the time,,10-25% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,14000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Miner,Engineer",Self-taught,30,40,20,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Telecommunications,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Oracle Data Mining/ Oracle R Enterprise,Python",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Time Series Analysis",Sometimes,,,,,Often,Often,Often,Often,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,,,,Sometimes,,,,,Sometimes,,Often,,,,50,15,20,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,Often,,,,Often,,Often,,Often,,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher",University courses,20,20,30,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Most of the time,100GB,"CNNs,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Java,MATLAB/Octave,Python,R,SQL,TensorFlow",Often,Often,,Often,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,Often,,Often,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,Often,Often,,Sometimes,,,Often,,Sometimes,,,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,Unavailability of/difficult access to data",Often,,,,Often,,,,,,,,,,,,Sometimes,,,,Sometimes,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Sometimes,"38,000",GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Sweden,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,80,10,0,10,0,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Logistic Regression",A master's degree,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression","NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",Often,,Often,,,,,Often,,,,,,,,Often,,,Often,,,,Often,,,,,,Often,,,,,30,20,15,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Sometimes,,,,,Often,,,Often,,,Often,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Git,Always,10000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Republic of China,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer",University courses,15,10,30,30,15,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Always,100GB,"Ensemble Methods,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Minitab,Python,R,SQL",,,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,,Sometimes,Often,,Often,Sometimes,Sometimes,Often,,Sometimes,Sometimes,,,,,Often,Most of the time,,,,30,30,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Sometimes,,,,,,,,Sometimes,,,,Often,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,,Rarely,400000,CNY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Recommendation Engines,Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Basic laptop (Macbook),"Text data,Relational data",Don't know,10MB,Other,"Amazon Web services,Google Cloud Compute,Java,NoSQL,Python,SQL,Unix shell / awk",,Often,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,Often,,,,,,Often,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,0,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Online courses,Personal Projects,Tutoring/mentoring",,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,,,,Very useful,,R Bloggers Blog Aggregator,1-2 years,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Engineer,Software Developer/Software Engineer",Other,30,30,0,0,0,40,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important +Male,Germany,42,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by non-profit or NGO,SAS Base,Text Mining,C/C++/C#,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,Very useful,,,,,Somewhat useful,Very useful,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher,Statistician",Self-taught,50,30,10,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Most of the time,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Julia,Python,R,Other",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Often,,,,,,,,Most of the time,,,,,Sometimes,,Often,,,,,,,Most of the time,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Retail,20 to 99 employees,Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Don't know,10MB,"CNNs,Neural Networks,Regression/Logistic Regression","Java,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,CNNs,Logistic Regression,Neural Networks",Often,,,Most of the time,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,19,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,,,,,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,10MB,,"Cloudera,Jupyter notebooks,NoSQL,Python,R,SAS Base,Tableau",,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,Often,,,,,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Decision Trees",,Often,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,,,,,,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Researcher,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Academic,"10,000 or more employees",,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",,1GB,"Decision Trees,HMMs,Random Forests,SVMs","Google Cloud Compute,Julia,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Python,Unix shell / awk",,,,,,,,Often,,,,,,,,Rarely,Sometimes,,Rarely,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",,,,,,Often,Most of the time,Sometimes,,Most of the time,,,Sometimes,Often,,Rarely,,,Often,,Often,,Sometimes,,,,Most of the time,Often,Often,,,,,30,50,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,Often,,,Sometimes,,,Sometimes,,Sometimes,,,,Most of the time,Most of the time,,Most of the time,,51-75% of projects,More internal than external,Other,Omics data; scientific papers; biochemical databases,Integration across sources with multiple identifiers,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,Email",,Git,Most of the time,43000,GBP,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Spain,49,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,1 to 2 years,Other,Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Other,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,<1MB,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,QlikView,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,Sometimes,,,,Often,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,"Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Often,,,,20,20,10,30,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Often,Most of the time,,,Sometimes,,,Often,,,,,Sometimes,,76-99% of projects,More internal than external,Business Department,INE-,Useful insights. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,"25,000",,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Hong Kong,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Podcasts",Very useful,Very useful,,,,,,,,,,,Very useful,,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service",,PhD,Yes,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ukraine,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,53,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Spark / MLlib,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites,Other","College/University,Conferences,Friends network,Personal Projects",,,Very useful,,Very useful,Very useful,,,,,,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Business Analyst,Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Other",Self-taught,90,0,0,0,0,10,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,100 to 499 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data,Other",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Java,Julia,Jupyter notebooks,KNIME (free version),Python,R,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,Often,Most of the time,Sometimes,,Often,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,Segmentation,Simulation,Time Series Analysis,Other",,,,Often,,Most of the time,Most of the time,Sometimes,Sometimes,Often,,Rarely,Rarely,Often,,Sometimes,,,,Most of the time,,,Sometimes,,Often,Sometimes,Often,,,Often,Often,,,70,10,5,5,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,Often,,,,Often,,,,Sometimes,,Often,,Most of the time,,76-99% of projects,Entirely internal,Other,"Physionet, NIH TCGA",Availability and metadata/annotation,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",Redmine,Git,Sometimes,75000,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Singapore,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Google Search,"Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Pharmaceutical,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,TensorFlow,TIBCO Spotfire",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,Sometimes,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Often,Most of the time,Often,Often,,,Often,,Sometimes,Sometimes,Often,,,,,,,Often,,,Often,Often,Sometimes,Sometimes,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",Often,,,Sometimes,Often,Sometimes,,,,Sometimes,,,,,Often,,Sometimes,Often,,,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,50000,SGD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Australia,53,"Not employed, but looking for work",,,,,,,,R,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,,,,,,,Necessary,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,More than 10 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,30,50,0,0,20,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,France,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer",University courses,0,60,10,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,SVMs,Text Analytics",,,,,,Rarely,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,Sometimes,,,,,30,10,20,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,Often,,,,,Sometimes,,Most of the time,,,Most of the time,Often,,100% of projects,More internal than external,Other,financial/news data providers,Answer to real needs of my clients,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,95000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Work,10,30,50,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Other,Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,IBM Watson / Waton Analytics,Java,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,TensorFlow",Rarely,Often,,,,,,,,,,,Rarely,,Often,,Most of the time,,Sometimes,,,Rarely,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Often,Sometimes,Often,Most of the time,Most of the time,Sometimes,Often,,,,Sometimes,Often,,Often,,Often,Often,Sometimes,Often,,Often,Sometimes,,Sometimes,,Often,Sometimes,Sometimes,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,,Often,,,,,,Often,Often,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Sometimes,0,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,Indonesia,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,,Very useful,,Very useful,Somewhat useful,"Data Stories Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",3-5 years,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Master's degree,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Statistician,I haven't started working yet",University courses,40,10,0,50,0,0,Time Series,"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,University courses,50,25,0,20,5,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Canada,51,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,40,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Government website,University/Non-profit research group websites","Blogs,Online courses,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,Self-taught,70,30,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,1 to 2 years,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,83,Retired,,,No,Yes,Computer Scientist,Perfectly,Employed by college or university,Mathematica,Cluster Analysis,Java,University/Non-profit research group websites,"Blogs,Conferences,Friends network,Newsletters,Personal Projects,Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Government website,"Company internal community,Conferences,Official documentation,Personal Projects",,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,Statistician,University courses,20,0,10,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,,,Often,,,,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,Often,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Often,,Most of the time,,,,,Most of the time,,Often,,,Most of the time,Often,,Most of the time,Often,,,,40,10,10,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,,,,,Sometimes,,,,,Most of the time,,,Sometimes,,,,Most of the time,,100% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,70000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,France,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Bayesian Methods,R,Google Search,"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,,Very useful,Very useful,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,35,30,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Republic of China,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,"Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Miner,Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Government,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Impala,Java,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,Often,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Often,,,,,,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Ensemble Methods,Evolutionary Approaches,HMMs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,Often,Often,Often,,Often,,,Often,Often,,,Often,,,,,Often,Often,Often,Often,,Often,,,,,,,,,,,40,30,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,Sometimes,,,,Sometimes,Often,,,,,,,,,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Subversion,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Textbook,Trade book,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,Not Useful,,,Very useful,Very useful,,,Not Useful,Not Useful,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer",Work,80,10,5,0,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Insurance,500 to 999 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Never,100MB,"CNNs,Regression/Logistic Regression","IBM Cognos,NoSQL,Python,R,SQL",,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,Rarely,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Logistic Regression",Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,3,1,0,1,5,90,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,Often,,Less than 10% of projects,Entirely internal,Other,"Google Trends, Twitter",Dirty datasets,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Subversion,Sometimes,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Spain,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Business Analyst,Self-taught,40,50,5,5,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,Arxiv,Very useful,,,,,,,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Scientist",Self-taught,30,30,30,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1GB,SVMs,"Amazon Web services,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,0,0,0,100,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,100% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United Kingdom,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Java,Social Network Analysis,Matlab,GitHub,"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Not Useful,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Not Useful,,Somewhat useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,70,15,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,I don't know,Increased slightly,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Other,Never,,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","Java,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,HMMs,Logistic Regression,Natural Language Processing,Recommender Systems,SVMs",,,,,,,Sometimes,,,,,,Rarely,,,Rarely,,,Sometimes,,,,,Rarely,,,,Rarely,,,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,,,,,Sometimes,,,Often,,,,,,,,,,,100% of projects,Entirely internal,Other,,,,Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,"58,000",EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,40,5,0,25,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,Most of the time,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Sometimes,,Sometimes,Often,Most of the time,Often,Sometimes,Often,Sometimes,Most of the time,Most of the time,,,Most of the time,Sometimes,,,,30,10,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,26-50% of projects,Entirely internal,Standalone Team,"tobacco-800, RVL-CDIP",Availability of enough data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Never,1000000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Python,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,75,0,0,25,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Financial,,,,,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Never,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,,Often,Sometimes,,,Often,,Sometimes,,Often,,,,Sometimes,,,Often,,,,,,,Most of the time,,,,45,40,5,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",,,,,Sometimes,,,,,Often,,,Often,,,,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,"Financial Data, Price Series, Sentiment, Fund Flows",Data not broad enough,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,5000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Italy,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,I haven't started working yet,University courses,40,20,20,20,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Text data,Relational data",Don't know,10GB,"CNNs,Neural Networks,RNNs","Amazon Machine Learning,MATLAB/Octave,Python,SQL",Rarely,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"CNNs,Cross-Validation,Neural Networks,RNNs,Time Series Analysis",,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Most of the time,,,,,,,,Sometimes,,,Often,,,Sometimes,,,,100% of projects,Entirely external,Standalone Team,,Unstructured data,Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,Git,Sometimes,"16,200",EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,KNIME (free version),Social Network Analysis,R,University/Non-profit research group websites,"Blogs,College/University,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,"Business Analyst,Researcher",Self-taught,50,20,10,10,10,0,Recommendation Engines,Support Vector Machines (SVMs),A bachelor's degree,Academic,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,Neural Networks,"Microsoft R Server (Formerly Revolution Analytics),Python",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Evolutionary Approaches,Natural Language Processing,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,Most of the time,,,Most of the time,,Most of the time,Often,,,,30,20,5,20,25,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,Sometimes,Often,,,,,,,,,Often,,,26-50% of projects,More internal than external,IT Department,,,Graph (e.g. GraphBase/Neo4j),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,10000,INR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,48,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Monte Carlo Methods,C/C++/C#,GitHub,"Blogs,Friends network,Personal Projects",,Very useful,,,,Very useful,,,,,,Somewhat useful,,,,,,,"Data Elixir Newsletter,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Online Courses and Certifications,Sort of (Explain more),Master's degree,Other,I don't write code to analyze data,"Business Analyst,Engineer,Researcher,Software Developer/Software Engineer",Self-taught,60,10,30,0,0,0,Unsupervised Learning,Evolutionary Approaches,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,R,University/Non-profit research group websites,"Arxiv,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,50,10,0,10,10,20,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Other (please specify; separate by semi-colon)",A master's degree,Academic,20 to 99 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data",,1GB,Other,"Java,Python,R,Spark / MLlib",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Often,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Recommender Systems",,,,,Often,Most of the time,Often,,,,,,,Most of the time,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,50,35,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,Often,,,,,Often,Often,,,,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Git,Subversion",Rarely,369600,INR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,26,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Bayesian Methods,R,Other,"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Very useful,Very useful,,,,"FastML Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician,Other",University courses,40,30,5,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Java,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Perl,Python,R,SQL,Tableau",,,,Rarely,,,,,,,Often,Often,,,Sometimes,,,,,,Sometimes,,Often,,,Rarely,,,,Rarely,Sometimes,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Often,,,,,Often,Most of the time,Often,,,,Sometimes,,Often,,Most of the time,,Often,,Sometimes,Often,,Often,,,Most of the time,Often,Sometimes,,Most of the time,,,,30,10,0,40,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Often,,Most of the time,Most of the time,,Often,Sometimes,,,,Often,Rarely,,,Sometimes,,,Often,Often,,100% of projects,More external than internal,Other,"INEGI, FACTSET, BLOOMBERG, RESEARCH",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Never,25000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,3-5 years,,,,,,,,,,,,,,,Other,0 - 1 hour,Github Portfolio,No,Doctoral degree,Other,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,,,,,,,Somewhat important,,, +Male,Israel,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,DataRobot,Deep learning,Python,,"Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,Work,0,50,50,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Never,10MB,"CNNs,Neural Networks,RNNs,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Often,,Most of the time,,,,,,,,,,,,,,Often,Often,,,,Often,,,Sometimes,,,,,,80,10,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,Often,,,Often,,,,,,,Often,,,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,,"40,000",USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Turkey,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,73,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,22,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Not Useful,,Very useful,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",15,20,50,10,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,,"Text data,Relational data",Don't know,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SAS Base,SAS Enterprise Miner,SQL",Sometimes,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,Most of the time,,,,,Rarely,Rarely,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Often,Most of the time,Often,Often,,,Often,,Sometimes,Often,Most of the time,,,Sometimes,,Rarely,,Often,,,,,Often,Sometimes,,,,,40,10,15,10,25,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,Sometimes,Most of the time,,,,Often,,,,,Often,,,,,Sometimes,Often,,,Less than 10% of projects,Entirely internal,Standalone Team,Facebook; Twitter; Vkontakte; Instagram;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,66000,PLN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,44,Employed full-time,,,No,Yes,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (free version),Regression,R,"GitHub,University/Non-profit research group websites",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important +Male,Netherlands,45,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Statistician,University courses,5,0,20,70,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Tableau,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Gaming Laptop (Laptop + CUDA capable GPU),,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,1-2 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",Other,Never,100MB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,RNNs,Other","Amazon Web services,C/C++,NoSQL,Python,TensorFlow,Unix shell / awk",,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,,Sometimes,,Often,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,,,,,,25,25,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Rarely,200000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Poland,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,30,20,20,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Financial,500 to 999 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,KNIME (free version),R,SQL",,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,Sometimes,,,Most of the time,Often,Sometimes,Often,,,Often,,Sometimes,Often,Often,,Sometimes,,,Often,,Often,,,Sometimes,,,,,,,,60,15,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,Often,Sometimes,,,,Often,Often,Sometimes,Sometimes,,,,Most of the time,,,Often,Most of the time,,,,100% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,,Never,72000,PLN,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Other,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher",University courses,47,0,20,30,3,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,Very useful,"Data Stories Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,10,30,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,20 to 99 employees,Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Most of the time,10MB,Other,"Java,Python,RapidMiner (free version),Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,RNNs",Sometimes,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,5,20,50,25,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Bitbucket,Never,25000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Workstation + Cloud service,2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,India,20,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Other,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,40,20,10,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Belgium,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Belarus,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,66,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Researcher",Self-taught,95,0,0,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A professional degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,32,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Time Series Analysis,Python,"GitHub,Google Search",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FlowingData Blog",3-5 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,3 to 5 years,Researcher,Work,50,0,30,20,0,0,Unsupervised Learning,Support Vector Machines (SVMs),,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,France,44,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,,,,Retail,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10GB,,"Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,KNIME (free version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,No Free Hunch Blog,1-2 years,Nice to have,Necessary,,,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Time Series,Decision Trees - Gradient Boosted Machines,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Not important,,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Australia,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,10,25,15,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",I don't know/not sure,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Never,,,"Flume,Java,Python,TensorFlow",,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Neural Networks",Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,50,10,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,Most of the time,,,Sometimes,,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Engineer,Poorly,Employed by college or university,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Stack Overflow Q&A",Very useful,,,,,,Somewhat useful,,,,,,,Very useful,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher",Self-taught,60,30,0,0,10,0,"Adversarial Learning,Computer Vision,Speech Recognition,Time Series","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation","Image data,Relational data",Sometimes,100GB,"CNNs,Evolutionary Approaches,GANs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,GANs,HMMs,PCA and Dimensionality Reduction,Time Series Analysis",,,,Most of the time,,,,,,,Often,,Rarely,,,,,,,,Sometimes,,,,,,,,,Often,,,,50,20,5,20,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,Often,,,,,Sometimes,,Often,,,Often,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,250000,MUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Denmark,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,Researcher,Self-taught,70,10,10,10,0,0,Reinforcement learning,"Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",Primary/elementary school,Academic,100 to 499 employees,Stayed the same,More than 10 years,Some other way,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Rarely,1TB,"Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Scala,"Google Search,University/Non-profit research group websites","Conferences,Kaggle,Personal Projects",,,,,Somewhat useful,,Very useful,,,,,Very useful,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,1 to 2 years,"Data Scientist,Operations Research Practitioner,Programmer",Work,20,10,30,20,10,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Manufacturing,Fewer than 10 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks","Cloudera,Flume,Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,Sometimes,,Sometimes,,Sometimes,,Rarely,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Evolutionary Approaches,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,Sometimes,Sometimes,,Sometimes,Often,,Often,,Often,,,,,,Most of the time,,Often,,Often,Often,Often,Often,,,Often,,,,Most of the time,,,,50,20,10,10,10,NA,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,Often,,,Often,,,,Often,Often,,,,Often,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Rarely,54,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,45,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,,,,,Coursera,"Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Other,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,40,5,53,0,2,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",High school,Other,I prefer not to answer,Increased slightly,More than 10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,Other,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Personal Projects",Very useful,,,Very useful,,,,,,,,Somewhat useful,,,,,,,,3-5 years,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,0,55,5,0,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Male,United States,59,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,37,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Spark / MLlib,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,,1GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,Often,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,Often,,Often,,,Often,,Rarely,,,,,,40,30,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,76-99% of projects,Entirely internal,Other,TCIA,image preprocessing and standardization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Other",xnat,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Germany,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Sort of (Explain more),Master's degree,,1 to 2 years,Other,University courses,15,35,10,38,2,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,France,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,10,0,10,0,0,"Natural Language Processing,Reinforcement learning","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Never,10GB,"CNNs,GANs","C/C++,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Data Visualization",Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,0,60,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Often,,,,Most of the time,,,,Sometimes,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",Very useful,Very useful,,Very useful,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,Very useful,,,,,Very useful,"Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Business Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Other","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",Most of the time,,,,Most of the time,,,Sometimes,Often,,,Often,,,,,,,Most of the time,Often,Most of the time,,Sometimes,Most of the time,,,,,Most of the time,,,,,50,10,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",,Sometimes,,Often,Most of the time,,,,,,,,,,Most of the time,,Most of the time,Often,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Bitbucket,,113000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,NoSQL,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,Very useful,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,10,60,0,10,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Other",,10MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,Often,,Often,,,Sometimes,,Sometimes,,,,,,,,,,,30,40,0,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,Sometimes,,,,,,Sometimes,,Often,,Often,,51-75% of projects,More internal than external,Standalone Team,,,,I don't typically share data,,Git,Sometimes,20000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,70,10,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,"Not employed, but looking for work",,,,,,,,Julia,Random Forests,Julia,Google Search,"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Partially Derivative Podcast",3-5 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Doctoral degree,Physics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,India,26,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Engineer,Programmer",Self-taught,40,50,10,0,0,0,Unsupervised Learning,Bayesian Techniques,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Google Cloud Compute,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,,University courses,20,0,0,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Other (please specify; separate by semi-colon)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle",,,,Very useful,,,Very useful,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,10,40,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Often,,,Often,,,,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,Sometimes,Often,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Segmentation",Sometimes,Sometimes,,,Most of the time,Most of the time,Often,Often,,,,Most of the time,,,Often,,,Often,Often,,,,Often,Most of the time,,Most of the time,,,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Rarely,1200000,INR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Sweden,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Online courses",Very useful,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,44,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Insurance,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10PB,"Decision Trees,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Random Forests,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,50,20,20,10,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,48,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,23,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,Very useful,,3-5 years,,,,,,,,,,,,,,,Basic laptop (Macbook),,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,,,,,, +A different identity,United Kingdom,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,"Engineer,Software Developer/Software Engineer",Self-taught,30,30,30,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Most of the time,1GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Simulation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,Sometimes,,Often,,,,,,,,,,,Often,,,Often,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,Bitbucket,Sometimes,62000,GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",India,25,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",R,Deep learning,Python,Google Search,"Blogs,Stack Overflow Q&A",,Very useful,,,,,,,,,,,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,University courses,20,0,20,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,"Random Forests,Regression/Logistic Regression","NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,"Collaborative Filtering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",,,,,Often,,,,,,,,,,,Most of the time,,,Sometimes,,Rarely,,Rarely,Often,,,,,,,,,,50,20,20,0,10,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,Most of the time,,Rarely,Rarely,,,,,,Rarely,,,,,,,None,,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by non-profit or NGO,Microsoft Excel Data Mining,Social Network Analysis,SQL,Google Search,Friends network,,,,,,Very useful,,,,,,,,,,,,,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,Researcher,University courses,0,0,0,100,0,0,Computer Vision,Neural Networks - CNNs,High school,Non-profit,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,,Workstation + Cloud service,Image data,Never,>1EB,Neural Networks,Google Cloud Compute,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,100,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,100% of projects,Do not know,IT Department,Documentary Global Human Rigths,Bit and Byte Pixels for Data Analistic Visualizer,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Subversion,,"50,000",EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Data Analyst,Self-taught,15,20,30,10,10,15,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting",I prefer not to answer,Retail,"10,000 or more employees",Increased slightly,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Romania,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Cluster Analysis,Python,GitHub,"Blogs,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Other,21,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,C/C++/C#,Google Search,College/University,,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,,PhD,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,33,Employed full-time,,,Yes,,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Software Developer/Software Engineer",University courses,0,0,60,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL,Python,R,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,Often,,Most of the time,,,,,,,Most of the time,,,Sometimes,,Most of the time,,Often,,,,20,70,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,Most of the time,,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,180000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Iran,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,0,0,10,90,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,24,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,Very useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,20,0,40,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Other,"Basic laptop (Macbook),Workstation + Cloud service",Other,Most of the time,,Other,"C/C++,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Simulation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,10,10,50,20,10,0,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,80000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Time Series Analysis,,,"Arxiv,Blogs,Friends network",Somewhat useful,Very useful,,,,Very useful,,,,,,,,,,,,,,3-5 years,,,,,,,,,,,,,,,,,Github Portfolio,No,Bachelor's degree,Computer Science,,,Self-taught,NA,NA,NA,NA,NA,NA,Time Series,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Female,France,20,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,C/C++/C#,University/Non-profit research group websites,"College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,DataCamp,Traditional Workstation,0 - 1 hour,Other,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,0,0,0,100,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,Ukraine,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,NoSQL,Survival Analysis,R,GitHub,"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,Very useful,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,80,0,20,0,0,0,Time Series,Logistic Regression,,Manufacturing,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,1MB,Decision Trees,"C/C++,Java,Python,SQL",,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,45,5,25,10,15,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,,,I am not currently employed,6,,,,,,,,,,,,,,,,,, +Male,Japan,43,Employed full-time,,,Yes,,Engineer,Poorly,Self-employed,Python,Neural Nets,R,,"Arxiv,Blogs",Very useful,Very useful,,,,,,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,"Engineer,Software Developer/Software Engineer",Self-taught,90,0,10,0,0,0,"Computer Vision,Reinforcement learning,Unsupervised Learning",Neural Networks - CNNs,High school,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data",Always,1TB,"CNNs,Neural Networks,Other","Amazon Web services,C/C++,Python,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Often,,,,,,,,,,Often,,,,,,Often,Often,,,,,,,,,,,,,40,40,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Always,60000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Portugal,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Self-employed,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,Rarely,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Rarely,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,Often,,,Often,,,,Often,,,,60,5,5,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Most of the time,Often,,Sometimes,Sometimes,,Often,,,,,,,,,,,,Most of the time,Often,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Other,Most of the time,50000,EUR,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Nigeria,27,Employed part-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",29,70,0,0,1,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,,,,Very Important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,45,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by government,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Other",University courses,0,0,40,60,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Bayesian Techniques,"Some college/university study, no bachelor's degree",Government,I don't know,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Other,Sometimes,10GB,"Bayesian Techniques,Other","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Unix shell / awk",,,,,,,,,Often,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Text Analytics",,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,70,0,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Sometimes,Often,Often,,Most of the time,Sometimes,,Most of the time,Most of the time,,Most of the time,,Most of the time,,,,Often,,,,Most of the time,,100% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,65,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Other,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Not Useful,Somewhat useful,Very useful,,,,Very useful,,,Very useful,,"Data Elixir Newsletter,FastML Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Statistician",University courses,40,5,45,5,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Sometimes,100GB,"Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,RNNs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,,,Most of the time,Most of the time,,Most of the time,,,,Sometimes,Sometimes,,Often,,,,Sometimes,Sometimes,,Most of the time,,Sometimes,,,,,Often,,,,30,10,5,35,20,NA,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Sometimes,Most of the time,,,Rarely,,,Often,,,Most of the time,,,Most of the time,,,Most of the time,,,100% of projects,More internal than external,Standalone Team,governmental data; weather data; images; ,finding interesting insights and directions to write a scientific article,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)","Share Drive/SharePoint,Other",external drive,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Sometimes,"24,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Spark / MLlib,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Jack's Import AI Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Other,University courses,20,15,35,20,10,0,"Natural Language Processing,Reinforcement learning,Time Series","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,A general-purpose job board,Important,Other,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100TB,"HMMs,Other","Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics",,,,,,,Most of the time,,,,,,,,,Sometimes,,,Often,,,,,,,,,,Often,,,,,20,30,30,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Often,,100% of projects,More internal than external,Other,all popular social media data,noise,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,60000,SGD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,56,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Poorly,Self-employed,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"edX,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,22,"Not employed, but looking for work",,,,,,,,SQL,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Other,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Other","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,,,,Very useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,40,40,10,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important +Male,Singapore,25,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,SVMs","Python,R,SAS JMP,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,,Often,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Neural Networks",,,,,,,Most of the time,Most of the time,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,60,15,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,Most of the time,,,,,,,,,,Sometimes,,,,Sometimes,,,,26-50% of projects,More internal than external,Standalone Team,None,Cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,,50000,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler",Self-taught,100,0,0,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Pharmaceutical,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,GitHub,"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,50,10,0,40,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,Rarely,Rarely,,,Rarely,Rarely,,,,,,,Rarely,Rarely,,,Rarely,,,Rarely,Rarely,,,Rarely,Rarely,,Rarely,Rarely,,,,30,20,10,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,Sometimes,,,,Sometimes,,Often,Sometimes,Sometimes,,Often,,,,10-25% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Rarely,550000,INR,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Pakistan,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,,Other,0,0,0,0,0,100,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,I prefer not to answer,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Text Mining,Python,University/Non-profit research group websites,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,KNIME (free version),Python,SAS Base,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,Most of the time,,,Often,Most of the time,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Text Analytics,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Often,,,,,,Often,Often,Most of the time,,,,,Most of the time,Often,,,,,,,Most of the time,Most of the time,,,,60,5,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,Often,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,Most of the time,,,100% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Subversion,Rarely,130000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,55,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important +Female,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Cognos,Time Series Analysis,SQL,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,Very useful,,,,,Very useful,,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Turkey,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Military/Security,"5,000 to 9,999 employees",Decreased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",Other,Rarely,100MB,"Bayesian Techniques,Regression/Logistic Regression","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,Often,,,,15,35,40,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Most of the time,,Most of the time,Often,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,52,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Regression,Python,GitHub,"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,,,,,Necessary,,Necessary,,,,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,,No,Master's degree,Mathematics or statistics,,"Data Analyst,Programmer,Researcher",Other,NA,NA,NA,NA,NA,NA,Reinforcement learning,Logistic Regression,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,Somewhat useful,Not Useful,Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,More than 10 years,"Data Scientist,Engineer,Operations Research Practitioner,Researcher,Software Developer/Software Engineer,Other",Self-taught,45,0,45,0,10,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Rarely,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,,,,Sometimes,Sometimes,,,,,,,,,,,Most of the time,Often,Often,,,,,,,Sometimes,Often,,,,,70,10,10,5,5,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data",,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Other,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,GitHub,"Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,35,45,15,0,5,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Technology,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Other",Text data,Always,1TB,,"Amazon Machine Learning,Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",Often,Most of the time,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,,,,,,,"Decision Trees,Ensemble Methods,Natural Language Processing,Random Forests,Text Analytics",,,,,,,,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,,10,20,20,10,40,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,Sometimes,,,,Sometimes,,Often,,Sometimes,,Often,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Rarely,120000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Julia,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,6 to 10 years,"Operations Research Practitioner,Programmer,Researcher",University courses,20,10,50,0,10,10,,,A master's degree,Academic,100 to 499 employees,,,Some other way,Very important,,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,100MB,,"Amazon Web services,C/C++,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Sometimes,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Rarely,,,,,,Sometimes,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,30,20,30,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,,,,,,Often,Sometimes,,,,,Often,,,26-50% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Git",,60000,EUR,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Male,Portugal,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Not Useful,,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,6 to 10 years,Researcher,Self-taught,60,0,0,0,10,30,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Academic,I don't know,Increased slightly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Other",Other,,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python",,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation",,,,,,Often,Most of the time,Sometimes,Most of the time,,,Sometimes,,,,Most of the time,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,15,50,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Scaling data science solution up to full database",Sometimes,,,Sometimes,,,,,,,,,,,,,,Often,,,,,100% of projects,Approximately half internal and half external,Other,,generation and storage,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,32400,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Google Search,"Blogs,College/University,Conferences,Kaggle,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,Very useful,"Jack's Import AI Newsletter,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,6 to 10 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,High school,CRM/Marketing,20 to 99 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,,"C/C++,Google Cloud Compute,NoSQL,Python,R,SQL",,,,Rarely,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,SVMs",,,,,,Sometimes,Often,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,30,0,30,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Sometimes,,,,Often,,,,,,,,,,,,Rarely,,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,2000000,TWD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,I don't write code to analyze data,I haven't started working yet,Self-taught,50,10,0,30,0,10,Time Series,Neural Networks - RNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,43,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Neural Nets,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Company internal community,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,"Data Stories Podcast,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Computer Scientist,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,40,10,0,10,0,"Adversarial Learning,Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",A master's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,Sometimes,100GB,"Bayesian Techniques,Neural Networks,Random Forests","DataRobot,Hadoop/Hive/Pig,Python",,,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction",,,,Often,Sometimes,,,,,,,,,,,Often,,,Often,,Often,,,,,,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,23000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Researcher,Self-taught,50,10,20,NA,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Never,1GB,"Regression/Logistic Regression,SVMs","Google Cloud Compute,IBM SPSS Statistics,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,Rarely,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,SVMs",,,Rarely,,,Often,Most of the time,,,,,,,,,Often,,Sometimes,,,Often,,,,,,,Often,,,,,,70,10,0,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Privacy issues",Often,Often,,,,Often,,,,Sometimes,,,,,,,Most of the time,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Rarely,5000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,Statistica (Quest/Dell-formerly Statsoft),Deep learning,Python,Google Search,"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,15,15,40,30,0,0,"Time Series,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Don't know,10MB,Other,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Time Series Analysis",,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,0,10,0,30,60,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,None,None,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,,EGP,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,Newsletters,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,Very useful,,Very useful,Somewhat useful,,,,,Somewhat useful,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Java,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics",,Often,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,Often,,,,,,,,,,Most of the time,,,,,60,15,10,15,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,Often,,,Often,,,,,Often,Often,,Often,Most of the time,,Often,Sometimes,,Less than 10% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Self-taught,80,10,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Business Analyst,Data Analyst,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed part-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Online courses,Podcasts,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,,Not Useful,,,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Kaggle competitions,45,0,10,0,45,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,3 to 5 years,"Business Analyst,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Survival Analysis,Time Series",Logistic Regression,A bachelor's degree,Pharmaceutical,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100GB,Random Forests,"Mathematica,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,Sometimes,,,,,Most of the time,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,Rarely,,,,,,,Sometimes,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,32,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,56,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,Less than a year,Engineer,Self-taught,100,0,0,0,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,India,27,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,R,Government website,Textbook,,,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Professional degree,,Less than a year,I haven't started working yet,Self-taught,5,20,40,25,5,5,Time Series,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Ireland,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Time Series Analysis,Python,Government website,"College/University,Textbook",,,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,Engineer,Self-taught,80,5,10,5,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Financial,10 to 19 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,NoSQL,Perl,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,Unix shell / awk",Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Often,Often,,Often,,,,,Often,Often,,,Often,,,Often,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Often,,,,,Often,Often,Often,,,,,,,Often,Often,,,,,,Often,Often,Often,,Often,,,,Often,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Sometimes,100000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Neural Nets,Python,Google Search,"Arxiv,Blogs,Friends network,Non-Kaggle online communities,Online courses,Podcasts",Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,"FastML Blog,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,15,10,45,30,0,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Minitab,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Rarely,Most of the time,,Sometimes,Most of the time,,Sometimes,,Often,,,,,Often,Rarely,Rarely,Sometimes,,,,Rarely,Rarely,,,,Rarely,Rarely,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,Rarely,Sometimes,,Most of the time,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,,Sometimes,Most of the time,Often,Often,Often,,,Sometimes,,Sometimes,,Often,,,Often,Sometimes,,,Often,Often,,,,Most of the time,Most of the time,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,Git,Most of the time,11700000,JPY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,Brazil,38,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler",Self-taught,50,15,15,15,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - RNNs",Primary/elementary school,Technology,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,Sometimes,,Often,,Often,Sometimes,Often,,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Often,,,,,,,,Often,Often,,,Sometimes,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",,Often,,,,,Often,Often,,,,,,Sometimes,,Sometimes,,,Often,,,,Sometimes,,,,,,Sometimes,,,,,10,50,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Often,,,Often,,,,,,,Often,,Often,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,45,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by college or university,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Online courses,Personal Projects,Trade book",Very useful,,Very useful,,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,,,,Somewhat useful,,,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,Engineer,University courses,0,0,0,50,50,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Decreased significantly,6-10 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation",Image data,Sometimes,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,Often,Most of the time,,Most of the time,Most of the time,Often,Often,,,Often,Sometimes,Sometimes,,Sometimes,,Rarely,,Sometimes,Sometimes,,Often,,Often,Often,,Often,,Often,,,,30,30,30,5,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Most of the time,Most of the time,,Often,,,,Often,,,Often,,,,,Often,Most of the time,,,,Often,,10-25% of projects,More internal than external,Standalone Team,"xenocanto (bird recordings), DORSA (insect recordings), Kaggle datasets, open competitions",dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,45000,EUR,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Data Scientist,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Traditional Workstation,0 - 1 hour,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Not important,,,,,,,,,,,,,,, +Female,India,32,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM SPSS Statistics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,"KDnuggets Blog,The Analytics Dispatch Newsletter,Other (Separate different answers with semicolon)",3-5 years,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,,"Business Analyst,Data Miner,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,Very useful,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Professional degree,,Less than a year,Other,University courses,30,20,20,20,10,0,Natural Language Processing,"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,100MB,"Bayesian Techniques,Neural Networks","Cloudera,Java,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau",,,,,Rarely,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,,Often,,Rarely,,,,,,,,,Often,,,Rarely,,,,,,,"Data Visualization,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,40,10,10,20,20,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,,,,10-25% of projects,Do not know,IT Department,no,I am a fresher,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,20000,INR,Other,2,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Python,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Conferences,Online courses",,,Very useful,,Very useful,,,,,,Somewhat useful,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,1 to 2 years,Computer Scientist,University courses,40,30,10,10,10,0,Computer Vision,Support Vector Machines (SVMs),Primary/elementary school,Academic,500 to 999 employees,Decreased slightly,Don't know,An external recruiter or headhunter,Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Image data,Never,1GB,SVMs,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,"Neural Networks,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,Often,,,,,20,30,20,20,10,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,,Often,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Other,Email,,,Never,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,29,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects",,,Very useful,,,,Somewhat useful,,,,,Very useful,,,,,,,KDnuggets Blog,< 1 year,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,A social science,,"Business Analyst,Other",Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,Markov Logic Networks,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United Kingdom,30,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Not Useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Pharmaceutical,10 to 19 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Cloudera,Java,Jupyter notebooks,KNIME (free version),NoSQL,Python,Spark / MLlib,SQL",,Most of the time,,Rarely,Sometimes,,,,,,,,,,Often,,Sometimes,,Often,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Sometimes,,Sometimes,,,,Sometimes,,,,30,20,20,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,,,Sometimes,,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,60000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Engineer,Software Developer/Software Engineer",Self-taught,50,20,20,0,0,10,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,6 to 10 years,Researcher,University courses,0,0,0,100,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Online courses,YouTube Videos",Very useful,,,,,,,,,,Very useful,,,,,,,Very useful,"FastML Blog,Jack's Import AI Newsletter,No Free Hunch Blog",1-2 years,,,,,Necessary,,,,Necessary,,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Very Important +Male,Germany,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,SQL,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Stories Podcast,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Researcher,Kaggle competitions,0,0,50,0,50,0,,,,Mix of fields,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Rarely,,Rarely,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,Often,Often,Often,Often,,,Often,,,,Sometimes,,Rarely,,Often,Sometimes,,Often,,,Sometimes,,Rarely,,,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,26-50% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Bitbucket,,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,70,25,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Decreased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,100MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,Most of the time,,,,,,"kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,,,,,,,,,Sometimes,,Often,,,,Often,,,Often,,,,,,,Most of the time,,,,60,20,0,15,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,,"Computer Scientist,Data Scientist,Researcher",University courses,20,20,20,20,0,20,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,48,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Cluster Analysis,R,,"Friends network,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,6 to 10 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,High school,Technology,500 to 999 employees,Increased significantly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Never,100MB,"Random Forests,Regression/Logistic Regression","Amazon Machine Learning,NoSQL,Python,R,SQL",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Random Forests,Text Analytics",Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,10,10,10,20,50,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Less than 10% of projects,More internal than external,Standalone Team,,,Key-value store (e.g. Redis/Riak),Company Developed Platform,,"Mercurial,Subversion,Other",Rarely,150000,NZD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Random Forests,Python,"GitHub,University/Non-profit research group websites",Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,95,0,0,4,1,Survival Analysis,Logistic Regression,"Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Other",Text data,Sometimes,100TB,"Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Oracle Data Mining/ Oracle R Enterprise,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Random Forests",Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,30,50,0,0,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Privacy issues,Other",,,,,Often,,,,Sometimes,,,,,,,,Often,,,,,Most of the time,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,"Git,Subversion",Sometimes,8000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Other","Arxiv,Blogs,Company internal community,Kaggle,Newsletters,Official documentation,Online courses,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,I don't write code to analyze data,I haven't started working yet,University courses,10,10,0,80,0,0,Computer Vision,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important +Male,South Africa,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,25,30,20,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Financial,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","Microsoft SQL Server Data Mining,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Rarely,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,43,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Google Search,"Company internal community,Conferences,Podcasts,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,47,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,MATLAB/Octave,Monte Carlo Methods,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,Very useful,,,,,Very useful,,,,Very useful,"Emergent/Future Newsletter (Algorithmia),FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler,Statistician",Self-taught,30,40,20,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",Primary/elementary school,Insurance,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,MATLAB/Octave,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Often,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Often,Often,Often,Often,,,Often,,Often,,Often,,Often,,,Often,,,,,,Often,,Sometimes,Sometimes,,,,20,40,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations of tools",Sometimes,Sometimes,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,,Sometimes,65000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United Kingdom,25,Employed full-time,,,Yes,,Other,Fine,"Employed by non-profit or NGO,Employed by government",Julia,Deep learning,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Researcher",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,100MB,"Bayesian Techniques,Random Forests,SVMs","Java,Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"Naive Bayes,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,Often,,,,,Most of the time,Often,,76-99% of projects,Entirely external,Other,,,Graph (e.g. GraphBase/Neo4j),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,45000,GBP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Hungary,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Julia,Time Series Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Not Useful,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important +Male,United States,NA,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,70,26,2,0,2,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,A master's degree,Government,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Image data,Text data,Relational data",Rarely,<1MB,Bayesian Techniques,"Cloudera,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,Often,Most of the time,,,"Bayesian Techniques,Natural Language Processing,Text Analytics",,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,60,2,15,3,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Other",Most of the time,,,,Most of the time,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,Most of the time,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Other",Ceph S3 object store,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,165000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Ireland,54,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,0,30,70,0,0,0,"Natural Language Processing,Unsupervised Learning",Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important,Not important,Not important,Very Important +Female,Israel,62,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Necessary,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,0,10,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important +Male,Other,24,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,23,"Not employed, but looking for work",,,,,,,,DataRobot,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",Very useful,,Very useful,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Other","Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",40+,Master's degree,Sort of (Explain more),Master's degree,Other,Less than a year,I haven't started working yet,University courses,10,40,0,50,0,0,"Computer Vision,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Nigeria,58,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100MB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Python",,,,Rarely,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Sometimes,,,Often,Often,Sometimes,,,,,,,,Rarely,,,,,Rarely,,,,,,Sometimes,,,Often,,,,20,30,20,10,10,10,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools",,,,,,Sometimes,,,,,,Often,Often,,,,,,,,,,26-50% of projects,More internal than external,Other,None,Sparsity,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,18000000,NGN,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Spain,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner",Self-taught,20,20,29,30,1,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,,"Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,Often,,Often,Most of the time,,,"CNNs,Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,69,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Manufacturing,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,25,25,25,25,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,35000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,R,Random Forests,R,Google Search,"Blogs,Friends network,Personal Projects,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,,,,,,Very useful,,,,,Very useful,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,Other,42,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,18,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",< 1 year,,Necessary,Nice to have,,Necessary,,Necessary,Nice to have,Nice to have,,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,I prefer not to answer,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,Self-taught,50,0,0,0,50,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)",Neural Networks - RNNs,A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,<1MB,"Neural Networks,RNNs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Neural Networks,RNNs",,,,,,,Sometimes,,,,,,,,,,,,Rarely,Most of the time,,,,,Most of the time,,,,,,,,,40,20,30,10,0,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,4500000,JPY,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,73,Retired,,,Yes,,Statistician,Fine,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,"University/Non-profit research group websites,Other","College/University,Textbook",,,Somewhat useful,,,,,,,,,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,More than 10 years,Statistician,Self-taught,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",I don't know/not sure,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Julia,Time Series Analysis,Python,GitHub,"Conferences,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,,,,,"Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A master's degree,Financial,,,,,Very important,Other,Traditional Workstation,Relational data,Never,100MB,"Bayesian Techniques,Random Forests,SVMs","NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,,,Sometimes,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,90,0,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data",,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git,Subversion,Other",Rarely,,GBP,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Rule Induction,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,More than 10 years,Other,Self-taught,10,10,10,65,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,20,Employed part-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Bayesian Methods,Python,,"Arxiv,Blogs,College/University,Non-Kaggle online communities",Very useful,Very useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,,University courses,10,0,20,70,0,0,,,A doctoral degree,Internet-based,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Rarely,100GB,"GANs,Neural Networks","Amazon Web services,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,GANs,Neural Networks",,,,Most of the time,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,20,80,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations in the state of the art in machine learning,,,,,,,,,,,,Often,,,,,,,,,,,None,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Git,Sometimes,780000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,19,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,,,,,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data",Most of the time,1TB,"Bayesian Techniques,CNNs,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs,Other","Amazon Web services,C/C++,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,1 to 2 years,Researcher,University courses,40,30,30,0,0,0,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Basic laptop (Macbook),Image data,,10MB,Random Forests,"Java,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Often,50,0,0,30,0,20,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,51-75% of projects,Do not know,Other,"We work with published and therefore, public database. GenAtlas, INteractome, etc.",understand how to use it correctly. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,27057,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,University courses,20,20,20,40,0,NA,"Adversarial Learning,Computer Vision","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,10 to 19 employees,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,10GB,"CNNs,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,22,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Machine Learning Engineer,Researcher,Statistician",Work,70,10,15,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,22,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,Yes,,Statistician,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Amazon Machine Learning,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Statistician",Work,40,40,20,0,0,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation",Most of the time,Often,,,Rarely,,Often,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,Sometimes,,,,,,,,30,50,10,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,Often,,,,,,,,,,,Most of the time,,,,,Often,Often,,10-25% of projects,,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"1,550,000",INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Text Mining,Python,Google Search,"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",GPU accelerated Workstation,2 - 10 hours,PhD,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",35,40,0,25,0,0,Unsupervised Learning,"Hidden Markov Models HMMs,Neural Networks - CNNs",A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Vietnam,26,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,R,Monte Carlo Methods,Python,University/Non-profit research group websites,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Work,0,0,100,0,0,0,Time Series,Bayesian Techniques,High school,Insurance,100 to 499 employees,Decreased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Video data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Random Forests","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Prescriptive Modeling,Simulation,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,80,5,5,5,5,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,Entirely external,Standalone Team,Not,handling data,Other,I don't typically share data,,Other,Never,"10,000",VND,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Other,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Machine Learning Engineer,Statistician",University courses,30,40,0,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,20,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Don't know,10GB,"Evolutionary Approaches,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,70,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Self-employed,KNIME (free version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook,Tutoring/mentoring",,,,,,,,,,,Very useful,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Machine Learning Engineer,Other","Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Other,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,,<1MB,"Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Neural Networks,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,Sometimes,,,,0,80,0,20,0,0,Enough to refine and innovate on the algorithm,Explaining data science to others,,,,,,Most of the time,,,,,,,,,,,,,,,,,26-50% of projects,Entirely external,Business Department,Kaggle,getting datasets from clients,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,40000,GBP,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,65,0,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Text data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Perl,Python,SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Rarely,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Recommender Systems,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,,Often,,,,Sometimes,,,,Often,,,Sometimes,,,Sometimes,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Most of the time,,Most of the time,Sometimes,,,Most of the time,,,,,Sometimes,,,,Most of the time,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,48,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",2,3,49,45,1,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Increased significantly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,54,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Government website,University/Non-profit research group websites","College/University,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Other,University courses,100,0,0,0,0,0,"Natural Language Processing,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Other,Never,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,,,Most of the time,,Most of the time,Most of the time,Sometimes,,Often,Sometimes,,,,,Sometimes,Often,Often,Most of the time,,Most of the time,Sometimes,,,,Most of the time,,Often,,,,40,20,0,20,20,0,Enough to refine and innovate on the algorithm,"Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,Sometimes,,Often,,,,,Often,,,,,,Sometimes,,100% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,NoSQL,Neural Nets,Python,Other,"Blogs,Online courses,Personal Projects,Textbook",,Very useful,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,,,A bachelor's degree,Internet-based,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",,10MB,,"Jupyter notebooks,NoSQL,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,40,0,30,0,30,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,137000,,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Genetic & Evolutionary Algorithms,C/C++/C#,Other,"Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,,,Very useful,Data Stories Podcast,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Fine arts or performing arts,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,South Korea,27,Employed full-time,,,Yes,,Machine Learning Engineer,,,Jupyter notebooks,Monte Carlo Methods,Python,Google Search,"Arxiv,Blogs,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,10,10,20,0,0,"Machine Translation,Natural Language Processing,Speech Recognition","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"CNNs,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,HMMs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,Sometimes,,Often,,,,,,,Sometimes,Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,Sometimes,Most of the time,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,,,,,Often,,,,Sometimes,,,,Sometimes,,,,Often,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,50000,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,Software Developer/Software Engineer,Other,30,20,10,0,0,40,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,100MB,"CNNs,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",Sometimes,Often,,,Often,,,,Often,,,,,,,,Most of the time,,,,,Sometimes,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Often,Often,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,Often,Most of the time,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,Most of the time,Most of the time,,,,Most of the time,Often,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,,Often,,,,10-25% of projects,Do not know,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Female,United States,36,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Trade book,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,50,0,0,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,,Very Important,,Very Important,,,Somewhat important, +Female,Portugal,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Text Mining,Python,Google Search,"Online courses,Podcasts",,,,,,,,,,,Very useful,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,Less than a year,Other,Self-taught,20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Support Vector Machines (SVM),C/C++/C#,GitHub,"Arxiv,Blogs,College/University,YouTube Videos",Very useful,Very useful,Very useful,,,,,,,,,,,,,,,Very useful,FastML Blog,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,,Github Portfolio,Sort of (Explain more),Doctoral degree,Computer Science,,"Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Australia,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,Minitab,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Programmer,Self-taught,85,10,0,0,5,0,Time Series,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,,,,,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,<1MB,"Bayesian Techniques,Regression/Logistic Regression",SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Time Series Analysis",,,Sometimes,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,30,20,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Predictive Modeler",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Very useful,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,35,0,0,40,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Male,Netherlands,36,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,30,0,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1GB,Other,"Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,Often,,,,,,Often,,,,,,Often,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,,,,Often,Most of the time,,,,Most of the time,,Most of the time,Often,Sometimes,,,,Sometimes,,10-25% of projects,Entirely internal,IT Department,,Time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,70000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Israel,44,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Other",Work,20,20,30,0,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",No education,Internet-based,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,,,,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Stack Overflow Q&A",,,,,,Very useful,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",10,10,70,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,TensorFlow",,,,,Sometimes,,,,Sometimes,Often,Often,Sometimes,Often,,,,Most of the time,,Sometimes,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,Sometimes,,Often,Most of the time,,,,Most of the time,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Often,,Rarely,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Often,Sometimes,Often,Most of the time,Often,Sometimes,,Most of the time,Sometimes,,Most of the time,,Sometimes,Most of the time,Sometimes,,,,20,15,10,20,35,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,,Sometimes,,,,,Often,,Sometimes,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Most of the time,60000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Egypt,23,Employed part-time,,,No,Yes,Computer Scientist,Poorly,Employed by college or university,C/C++,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,Talking Machines Podcast",1-2 years,Necessary,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Machine Learning Engineer",Self-taught,65,5,25,0,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important +Male,Spain,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,Spark / MLlib,Anomaly Detection,Python,Google Search,"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Engineer,Programmer",Self-taught,40,60,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,Rarely,,,Often,,,,,,,,,,,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation",,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,None,Cleansing,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Never,"50,000",,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,20,20,10,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,,,"Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Tableau,TensorFlow",,,,Rarely,,,,,Often,,,,,,,,Most of the time,,,,Rarely,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Recommender Systems",Sometimes,,Rarely,Sometimes,Often,Most of the time,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,Often,,,,,,,,,,25,25,15,10,15,10,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,,Most of the time,,Most of the time,,,,51-75% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,South Korea,27,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Non-Kaggle online communities,YouTube Videos",,,,,,,,,Somewhat useful,,,,,,,,,Somewhat useful,"Data Stories Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,75,5,0,0,0,"Machine Translation,Recommendation Engines,Survival Analysis",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,India,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,Less than a year,Business Analyst,Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A doctoral degree,Insurance,10 to 19 employees,,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,,Random Forests,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,20,0,30,50,0,"Computer Vision,Machine Translation","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,36,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Julia,Monte Carlo Methods,Python,GitHub,"Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Other,Self-taught,50,30,20,0,0,0,Recommendation Engines,Bayesian Techniques,"Some college/university study, no bachelor's degree",Other,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Other,Always,1GB,Bayesian Techniques,"Julia,Python",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering",,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,50,30,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of significant domain expert input,Limitations of tools",,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,10-25% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Bitbucket,Git",,30000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,Julia,Social Network Analysis,R,I collect my own data (e.g. web-scraping),"Arxiv,Conferences,Friends network,Textbook",Somewhat useful,,,,Very useful,Very useful,,,,,,,,,Very useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Researcher,Statistician",University courses,90,10,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Government,"5,000 to 9,999 employees",Decreased significantly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Rarely,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","IBM SPSS Statistics,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,I prefer not to say,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues",Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,,,,Most of the time,,,,,,10-25% of projects,Do not know,Other,,,,,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed part-time,,,Yes,,Predictive Modeler,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,5,15,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by professional services/consulting firm,Employed by non-profit or NGO",Julia,Social Network Analysis,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Analyst,Programmer",Work,50,0,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Non-profit,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Rarely,1TB,Decision Trees,"Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Association Rules,Cross-Validation,Decision Trees,Text Analytics",,Sometimes,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,50,10,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Often,,,Most of the time,,,,,Often,,26-50% of projects,Entirely external,Standalone Team,"OpenStreetMap, Wikidata, Wikipedia, GitHub/GitLab data repositories; datasets about GitHub; Government ""Open Data""",data quality,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,public git; public dumps,Git,Sometimes,750000000,IDR,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Canada,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,More than 10 years,,Self-taught,60,20,0,20,0,0,,,High school,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100GB,"Evolutionary Approaches,Neural Networks","Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Neural Networks",,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,40,0,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Sometimes,,,,,,,,,,Often,Most of the time,,100% of projects,Do not know,Business Department,Geogratis,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Cloud,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,France,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,0,50,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Textbook",Very useful,,,,,Very useful,Somewhat useful,,,,,,,,Somewhat useful,,,,"O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Engineer,Researcher",University courses,30,10,20,30,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Impala,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,Rarely,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Often,,Often,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Most of the time,Often,,Rarely,Often,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,,Often,,Most of the time,Most of the time,Sometimes,,,Often,Often,,,,Sometimes,Often,Often,,,,55,20,10,5,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,76-99% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,40000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Nigeria,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,Financial,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,<1MB,Decision Trees,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Decision Trees,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Often,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,,9000000,NGN,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,22,Employed part-time,,,Yes,,Data Scientist,,Employed by company that makes advanced analytic software,Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer",University courses,20,0,20,20,20,20,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Python,QlikView,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,Often,,,,,,,,,Sometimes,,,Sometimes,Often,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Often,Sometimes,,,Most of the time,,Most of the time,,Often,,Often,Most of the time,Most of the time,Sometimes,,Sometimes,Sometimes,Often,Often,,Sometimes,Most of the time,Sometimes,,,,25,25,15,10,25,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,51-75% of projects,Do not know,Central Insights Team,,,,,,,,350000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Czech Republic,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,,Other,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",70,10,0,20,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Don't know,A general-purpose job board,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Rarely,1GB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Neural Networks,Simulation",,,Rarely,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,20,60,0,10,10,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Female,Canada,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Official documentation,Online courses,Personal Projects,Textbook",,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,80,20,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Cloudera,Hadoop/Hive/Pig,Java,KNIME (free version),Microsoft Excel Data Mining,Perl,Python,R,RapidMiner (commercial version),SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,Unix shell / awk",Sometimes,,,,Often,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,Often,,,,,,,Often,Most of the time,,Most of the time,Rarely,,,,,Rarely,,Sometimes,Sometimes,,,Rarely,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Sometimes,,Often,,Often,Rarely,Often,Often,,,Often,Most of the time,Most of the time,,Most of the time,,Sometimes,Often,Often,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,Sometimes,Sometimes,,,,,Often,,,Often,,Most of the time,Most of the time,Often,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,300000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,,Self-taught,70,10,10,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,44,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"FastML Blog,Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,Less than a year,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,"Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Other (please specify; separate by semi-colon)",,Mix of fields,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Basic laptop (Macbook),Relational data,Don't know,100MB,"Decision Trees,Other","Cloudera,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,SQL",,,,,Rarely,,,,,,,,,,,,Most of the time,,,,Sometimes,Often,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis,Other",,,,,,Most of the time,Often,Often,,,,,,Sometimes,,Often,,,,,Most of the time,Often,Sometimes,,,,,,,Sometimes,,,Often,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,Sometimes,,Most of the time,,,,Sometimes,,,,,,,,,,Often,Often,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Very useful,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,100 to 499 employees,Stayed the same,Less than one year,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,SVMs,"Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,"SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,50,10,10,30,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Denmark,42,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),I don't plan on learning a new ML/DS method,R,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,More than 10 years,Other,Self-taught,40,40,20,0,0,0,Time Series,"Ensemble Methods,Logistic Regression",High school,Other,20 to 99 employees,Decreased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,,"Bayesian Techniques,Ensemble Methods,Regression/Logistic Regression","Microsoft Excel Data Mining,R,SQL,Stan",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,Rarely,,,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Logistic Regression,Naive Bayes,Simulation,Time Series Analysis,Other",,,Rarely,,,Rarely,,,Rarely,,,,,,,Sometimes,,Rarely,,,,,,,,,Often,,,Often,,,Often,30,20,30,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,Sometimes,,,,,,Often,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,NASDAQ data; Metological data; consumpion data ,"Inconsistency in data format across time, ie- prodction flow fails cause of unexpected/uannounced changes in structure","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Other,Sometimes,90000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Researcher,Other",Self-taught,60,5,10,20,5,0,"Computer Vision,Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Image data,Text data",Never,100GB,"CNNs,Neural Networks,RNNs,SVMs","Java,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"CNNs,Neural Networks,RNNs,SVMs",,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,Often,,,,,,20,30,5,5,40,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,Often,,Often,,Sometimes,,,,Often,Sometimes,,Often,,,,26-50% of projects,More external than internal,IT Department,"TRECVID,IMAGECLEG,IMAGENET","TRECVID,IMAGECLEG,IMAGENET","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,2000,TND,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Very Important,Somewhat important,Not important,Very Important +Female,Poland,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,20,30,15,35,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Pharmaceutical,500 to 999 employees,Stayed the same,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,49,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,More than 10 years,"Data Miner,Predictive Modeler,Researcher",Work,30,10,30,20,0,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Increased significantly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Other","Text data,Relational data",Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,QlikView,R,SQL,Other",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Rarely,,,,,Rarely,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Often,,,,,,Most of the time,Sometimes,Often,,,Most of the time,Sometimes,,,Often,,,,Sometimes,,Often,Often,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Often,,Often,Often,,,,,,,,,,,Often,Most of the time,,51-75% of projects,More internal than external,Other,Credit Bureau; Clinical Data,Data cleaning; data integration; business rules; scientific methods vs intuition ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,4300000,DOP,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Australia,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,41,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,NoSQL,Anomaly Detection,C/C++/C#,Google Search,"Personal Projects,Podcasts",,,,,,,,,,,,Very useful,Very useful,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Engineer,University courses,0,0,0,100,0,0,"Computer Vision,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Technology,500 to 999 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10MB,"CNNs,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","NoSQL,Perl,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Neural Networks,Segmentation,SVMs,Text Analytics",,,Often,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,Sometimes,Sometimes,,,,,0,10,10,10,70,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,I prefer not to say,Limitations in the state of the art in machine learning",Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Often,,,,,Sometimes,,,,,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,I don't typically share data",,"Git,Subversion",,100000,,Has decreased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,Less than a year,"Engineer,Software Developer/Software Engineer",University courses,75,5,0,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",High school,Technology,"10,000 or more employees",Increased significantly,Less than one year,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,NoSQL",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,25,30,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,Tableau",,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,Sometimes,,,,Often,,,,,,,"Bayesian Techniques,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"GitHub,Google Search","Arxiv,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Very useful,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Java,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"Association Rules,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Text Analytics",,Sometimes,,,,Often,Often,,,,,Most of the time,,,,Often,,,,,,Often,,,,,,,Often,,,,,50,30,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,Often,,Often,,,,,,,,,,,,,,Often,,Most of the time,,10-25% of projects,More internal than external,Other,None,"Insufficient dats Getting data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"130,000",,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,31,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Business Analyst,Self-taught,70,0,30,0,0,0,Supervised Machine Learning (Tabular Data),,A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Jack's Import AI Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,35,15,50,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,Sometimes,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,,,,50,10,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,Rarely,,Sometimes,,,,,,,,,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,"twitter,facebook,financial data",,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,42000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Hong Kong,33,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Other,Kaggle competitions,10,10,20,30,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Financial,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,Less than a year,"Computer Scientist,Programmer",Self-taught,20,20,30,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Mix of fields,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Very useful,,Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,Very useful,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist",University courses,10,20,10,60,0,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,I don't know,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Most of the time,100MB,"Random Forests,Regression/Logistic Regression","C/C++,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Random Forests,Time Series Analysis",,Most of the time,,,,Sometimes,Most of the time,Often,,,,,,Most of the time,Most of the time,,,,,,,,Often,,,,,,,Often,,,,55,15,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",Often,,,,Most of the time,,,,,,,,,Sometimes,,,Often,,,,,,100% of projects,Entirely external,Other,,Undocumented data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,30000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,,O'Reilly Data Newsletter,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,PhD,No,Doctoral degree,Computer Science,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Computer Vision,Neural Networks - CNNs,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important +Male,Other,38,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Kaggle,Online courses,YouTube Videos",Very useful,Very useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,Less than one year,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,Often,,,Most of the time,Most of the time,Often,,,,,,,,,,Often,,,Often,,Often,,,,,,,,,,,50,20,5,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Often,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,76-99% of projects,More internal than external,Other,,"Understanding the meaning of the features, which are not always 'clean'",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Bitbucket,Rarely,65000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",5,80,0,10,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,Deep learning,Python,Google Search,"Blogs,Stack Overflow Q&A,Trade book",,Very useful,,,,,,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,6 to 10 years,"Data Scientist,Engineer,Programmer",Self-taught,10,10,55,25,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,DataRobot,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Tableau,Unix shell / awk",Rarely,Most of the time,,Rarely,Rarely,Rarely,,,Sometimes,,,,,Rarely,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,Sometimes,,,Sometimes,Most of the time,,,Often,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Sometimes,,,,Sometimes,Often,Most of the time,Most of the time,Often,,,Most of the time,,,,,,,Often,,Sometimes,,Often,,,,,,Often,,,,,30,30,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,Most of the time,,Often,Most of the time,,,Sometimes,Sometimes,Often,Most of the time,,Rarely,,,Sometimes,,,76-99% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Adversarial Learning,Computer Vision","Neural Networks - CNNs,Neural Networks - GANs",I don't know/not sure,Technology,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Never,100MB,"CNNs,GANs,Neural Networks","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,GANs,Neural Networks,Segmentation",,,,Most of the time,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,80,10,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,,,Most of the time,,10-25% of projects,More external than internal,Standalone Team,,,Key-value store (e.g. Redis/Riak),I don't typically share data,,"Bitbucket,Git",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,22,Employed full-time,,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Business Analyst,Self-taught,40,10,20,30,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,"CNNs,Gradient Boosted Machines,RNNs","Amazon Machine Learning,Python",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,29,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Impala,Anomaly Detection,R,Google Search,"Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,,,,,Somewhat useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Engineer,Programmer",University courses,10,50,30,10,0,0,Recommendation Engines,Decision Trees - Gradient Boosted Machines,A professional degree,Internet-based,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Don't know,1GB,"Decision Trees,Regression/Logistic Regression,SVMs","Amazon Web services,Java,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,SVMs,Text Analytics",,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,60,10,0,0,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,SQL,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Engineer,Other",University courses,30,10,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Personal Projects,Trade book,Tutoring/mentoring",Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,,,,Very useful,,,,Very useful,Very useful,,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer",Self-taught,80,10,10,0,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Russia,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by college or university,DataRobot,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,,Very useful,Very useful,"Data Elixir Newsletter,DataTau News Aggregator,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Computer Scientist,Data Miner,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Telecommunications,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,1TB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Often,Often,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Often,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Sometimes,,Most of the time,Most of the time,,Sometimes,,,,Often,,,,30,30,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Often,Often,Often,,,Sometimes,,,Sometimes,Most of the time,,,,Sometimes,,Sometimes,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,20000,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,South Africa,29,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,Python,Government website,"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Mix of fields,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,Often,,Sometimes,Sometimes,Most of the time,,,,,Often,Often,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Naive Bayes,Natural Language Processing,Random Forests",,,,,,,,,,,,,,,,,,Sometimes,Often,,,,Most of the time,,,,,,,,,,,70,20,0,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Sometimes,Often,,,Often,Often,Most of the time,Most of the time,,,,,,,,Often,,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,"Bitbucket,Subversion",Most of the time,200000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Spain,41,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Miner",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,37,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"GitHub,Google Search","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,10,30,10,10,10,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,Sometimes,,,,,,,Often,,Most of the time,,,,Sometimes,Often,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,Sometimes,Often,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",Sometimes,,,,,Most of the time,Most of the time,Often,Sometimes,,,Often,,Sometimes,,Often,,,Most of the time,,Most of the time,,Often,,,,,Often,Often,,,,,50,10,10,10,10,10,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,Often,,Often,,,,,,,,,,Most of the time,,26-50% of projects,More external than internal,Standalone Team,google,find valuable data and label them,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,Git,Sometimes,"100,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Ukraine,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,Self-taught,90,0,10,0,0,0,,,,Hospitality/Entertainment/Sports,"10,000 or more employees",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,Jupyter notebooks,Neural Nets,Python,"GitHub,Google Search,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Very useful,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,1 to 2 years,Researcher,Self-taught,80,5,10,0,5,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A professional degree,Non-profit,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Image data,Rarely,1GB,"Neural Networks,Random Forests","C/C++,Jupyter notebooks,Perl,Python,R,SQL",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Often,,Often,,,,,,,,,Rarely,,,,,,,,,,"CNNs,Decision Trees,Neural Networks,Segmentation,Time Series Analysis",,,,Sometimes,,,,Often,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Sometimes,,,,65,10,5,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,,,,Often,,,Sometimes,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,480000,RUB,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,0,13,7,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Female,Turkey,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,38,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Mathematics or statistics,Less than a year,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Regression,R,I collect my own data (e.g. web-scraping),"Company internal community,Kaggle,Newsletters,YouTube Videos",,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,40,10,0,0,0,Recommendation Engines,Logistic Regression,"Some college/university study, no bachelor's degree",Telecommunications,I prefer not to answer,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,Regression/Logistic Regression,"Amazon Web services,QlikView,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Recommender Systems",,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,20,30,10,15,25,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",,Often,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,34,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",30,25,20,10,15,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Financial,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,Sometimes,Sometimes,,Rarely,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests",,,,,,,Most of the time,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Often,Often,,,,,Sometimes,,,,,,,,,Most of the time,Sometimes,,10-25% of projects,More internal than external,IT Department,,"Dirty data, missing data",Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,Other,Rarely,160000,HRK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,,Employed by a company that performs advanced analytics,Amazon Machine Learning,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search",Blogs,,Very useful,,,,,,,,,,,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,Self-taught,60,0,40,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Always,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,Sometimes,Most of the time,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,Often,,,Sometimes,Sometimes,,,,40,25,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,Rarely,,,Sometimes,Often,,,Sometimes,,Often,Sometimes,,,,,,,,,,,26-50% of projects,Do not know,IT Department,,,,,,,,500000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,France,42,Employed part-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Python,Neural Nets,R,Google Search,"College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,,,,Not Useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A health science,More than 10 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",25,65,5,5,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Mix of fields,Fewer than 10 employees,Increased significantly,Less than one year,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Sometimes,,,,Rarely,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,Sometimes,Often,,,Sometimes,Most of the time,Sometimes,,,,,,Often,,,,Often,Sometimes,,Sometimes,Often,Sometimes,Sometimes,,Rarely,,,Sometimes,Often,,,,40,15,5,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",,,,,Sometimes,Often,,,,Sometimes,,,,,Sometimes,Often,Most of the time,,,,,,76-99% of projects,More internal than external,Central Insights Team,Na,Finding and having access to data bases,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Most of the time,45000,EUR,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,27,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,"Software Developer/Software Engineer,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Russia,36,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Ensemble Methods,,Internet-based,20 to 99 employees,,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data,Other",,,"Neural Networks,Random Forests,Other","Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Ensemble Methods,Natural Language Processing,Text Analytics",,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,60,15,15,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,Often,,,,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint,Other",,Git,,720000,RUB,Other,8,,,,,,,,,,,,,,,,,, +Male,Poland,17,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses",Very useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,University courses,30,20,20,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data",Most of the time,10TB,"CNNs,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,TensorFlow",,,,Often,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,Often,,Often,Often,,,,,,,Often,,Often,,,,,Often,,,,,Often,Often,,,Often,,,,40,30,15,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Often,,,,,,,,Sometimes,,,Often,,,,Sometimes,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Sometimes,800000,NOK,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Vietnam,23,Employed part-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,10 to 19 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Never,100GB,"CNNs,Decision Trees,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Segmentation,SVMs",,,,Often,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,Often,,,Sometimes,,,Often,,Often,,,,,,40,60,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources",Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,10-25% of projects,Entirely external,Business Department,TIMIT; LIDC/IDRI,lack of knowledge about properties of data ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,,"144,000,000",VND,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Podcasts,Textbook",Somewhat useful,,,,,,Very useful,,,,Very useful,,Somewhat useful,,Not Useful,,,,"FastML Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,60,10,15,0,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,,,,,,,,,Most of the time,Sometimes,,Often,,,,,,,,,,,15,60,15,5,5,0,Enough to tune the parameters properly,Data Science results not used by business decision makers,,Rarely,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,census;weather;imagenet,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,140000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,,Very useful,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,10,60,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",CRM/Marketing,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",Sometimes,,,,,Most of the time,Often,,Often,,,Often,,Often,,Sometimes,,,,,Sometimes,,Sometimes,,,Most of the time,,,,,,,,20,25,15,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,Sometimes,,,,,,,,,Often,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,62,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,R,,C/C++/C#,"GitHub,Google Search","Friends network,Textbook,YouTube Videos",,,,,,Somewhat useful,,,,,,,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,40,30,10,0,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Never,<1MB,,"Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,,Rarely,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests",,,,,,,Most of the time,Rarely,,,,,,,,Sometimes,,,,,,Sometimes,Rarely,,,,,,,,,,,70,0,10,20,0,0,Enough to run the code / standard library,"Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,Sometimes,,Rarely,,,,,,,,,,Often,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,300000,INR,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Government website,"Arxiv,Blogs,College/University,Conferences,Personal Projects,Podcasts,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",3-5 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Electrical Engineering,I don't write code to analyze data,"Engineer,Other",University courses,75,20,0,5,0,0,,,"Some college/university study, no bachelor's degree",Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Germany,42,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Social Network Analysis,Java,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Not Useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Recommendation Engines,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Taiwan,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,30,50,10,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,31,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,,,,,"Arxiv,Kaggle,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Researcher",Work,10,20,70,0,0,0,"Time Series,Unsupervised Learning",Hidden Markov Models HMMs,,Academic,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,,,,"MATLAB/Octave,Perl,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,60,0,20,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Base,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Most of the time,Most of the time,,Often,Often,,,,Sometimes,,,,,,,,Most of the time,,,Often,Often,Often,Often,,,,Often,,,,Most of the time,,Often,,,,,Sometimes,,,Often,Often,,,Often,Most of the time,,Often,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs",,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,Most of the time,,Most of the time,,Most of the time,,,,,,50,50,0,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,Often,,,,,,,,Often,,,Often,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,15000,USD,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,90,0,10,0,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Text data",Most of the time,10MB,"CNNs,HMMs,Markov Logic Networks","C/C++,Hadoop/Hive/Pig,Python,R,TensorFlow",,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"Association Rules,CNNs,HMMs,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,SVMs,Text Analytics",,Often,,Often,,,,,,,,,Often,,,,,,,,,,Often,,,Often,,,Most of the time,,,,,50,40,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,Often,,,,Most of the time,,,,,Most of the time,,Most of the time,,,Often,,Often,,None,Entirely internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Subversion,Rarely,750000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,70,10,10,NA,NA,"Adversarial Learning,Computer Vision,Survival Analysis","Evolutionary Approaches,Gradient Boosting",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Bayesian Techniques,"Amazon Web services,IBM SPSS Statistics,R",,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Evolutionary Approaches,Simulation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,42,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Government website,"Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,,Very useful,,Very useful,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,5,0,70,25,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,,,,,,Often,,,,,,Often,,Often,,,,Sometimes,Often,,,,,,,,Often,,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues",Sometimes,,Often,,,,,,,,,,,Most of the time,,Often,Often,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,80000,,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,Work,70,0,20,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,"1,000 to 4,999 employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Rarely,100TB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,64,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,R,,,,"Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,,Self-taught,80,10,10,0,0,0,"Computer Vision,Time Series","Bayesian Techniques,Logistic Regression",High school,Financial,,,,,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Rarely,10MB,Bayesian Techniques,"Amazon Web services,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Data Visualization,Neural Networks,Text Analytics",,,Rarely,,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,55,20,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,Often,,,Sometimes,,,,,,,,,,,,Often,,Less than 10% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Subversion,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Other,52,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by government,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Somewhat useful,,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important +Male,Germany,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Kaggle competitions,40,30,15,0,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Most of the time,10GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,Often,,,,Often,Often,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Often,Sometimes,Sometimes,,Often,Sometimes,,,,Sometimes,Often,,,,,50,10,30,5,5,0,Enough to refine and innovate on the algorithm,"Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,26-50% of projects,Approximately half internal and half external,Standalone Team,Social media,data processing (size),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,"Bitbucket,Git",Always,90000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,52,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,,"Data Stories Podcast,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Researcher,University courses,30,20,0,50,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs",A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,10TB,"Bayesian Techniques,Evolutionary Approaches,GANs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,GANs,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,,Sometimes,Often,Often,,,Often,,,,Sometimes,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,30,40,0,20,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,,Often,,,,,,,Most of the time,,,,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,,do exploratory analysis and be confident with the results.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Git,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Switzerland,41,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Deep learning,R,,Online courses,,,,,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,Coursera,,11 - 39 hours,,No,Bachelor's degree,Computer Science,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,34,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,Necessary,Necessary,Necessary,,Necessary,Necessary,,,Nice to have,Nice to have,,,,"DataCamp,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,A health science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Portugal,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Business Analyst,Self-taught,10,5,0,0,65,20,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,,Insurance,20 to 99 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,10MB,Decision Trees,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,0,30,0,0,,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,Business Department,None,Lack dimension,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,"40,000",,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,Deep learning,Matlab,I collect my own data (e.g. web-scraping),"Friends network,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,Very useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,,< 1 year,,Nice to have,,,Necessary,Nice to have,Nice to have,,,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,,,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Italy,24,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,I haven't started working yet,Self-taught,60,5,5,30,0,0,Other (please specify; separate by semi-colon),,I prefer not to answer,Academic,,,,,,,,,,,,"Python,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,90,10,0,0,0,0,,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Business Department,,,Other,Other,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,58,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,"Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - GANs",A bachelor's degree,Financial,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,,,,"Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks",,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,Often,,,,,,,,,,,,,,50,30,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,Most of the time,,,,,Often,,,,,Often,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Subversion",Sometimes,150000,,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,India,50,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,Other,University courses,0,20,0,50,0,30,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Never,10GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Non-Kaggle online communities,Online courses,Textbook",,Somewhat useful,,,,,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,Less than a year,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,"1,000 to 4,999 employees",,1-2 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,R,Other",,,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,Sometimes,,Most of the time,,,,Rarely,,,Sometimes,,,,,,,Sometimes,,,,60,15,0,5,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",Often,,,,,,,,Rarely,,Often,,Often,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,720000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Decision Trees,Random Forests",,,,,,,,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,50,10,0,30,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,Often,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Not Useful,,,,,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),,,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Personal Projects",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,Very useful,,,,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,SVMs","Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Tableau,Other",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,Rarely,,,,Often,,,"Cross-Validation,Decision Trees,Naive Bayes,Text Analytics",,,,,,Often,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,80,10,5,5,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Sometimes,170000,BRL,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Kenya,34,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",60,15,0,15,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,,Financial,500 to 999 employees,Increased slightly,Less than one year,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Never,100MB,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Recommender Systems,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Very useful,,,,,,,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,"Ensemble Methods,Neural Networks,Random Forests","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Neural Networks,Recommender Systems,SVMs",,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,,,Often,,,,,,50,30,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,470000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Other,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Software Developer/Software Engineer",Other,80,0,20,0,0,0,Time Series,Ensemble Methods,A professional degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Ensemble Methods,Neural Networks","Cloudera,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Most of the time,,Most of the time,,Most of the time,,,,,,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,,,10,40,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,,,Often,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,25,"Not employed, but looking for work",,,,,,,,Java,Social Network Analysis,C/C++/C#,"Google Search,University/Non-profit research group websites","College/University,Personal Projects",,,Very useful,,,,,,,,,Somewhat useful,,,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,I haven't started working yet,University courses,30,0,0,70,0,0,"Natural Language Processing,Reinforcement learning,Time Series,Unsupervised Learning",,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,32,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,Social Network Analysis,Python,,"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Other",Self-taught,35,35,0,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Segmentation",Often,,,,,,Often,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,Often,,Often,,,,,,,,25,10,5,20,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Often,,,,,,,Often,,,,,,,100% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,,Rarely,50000,GBP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,60,5,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Python",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs,Time Series Analysis",Rarely,,,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,Rarely,Rarely,,,,,Often,Rarely,Sometimes,,Most of the time,,,,70,20,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Often,Most of the time,Sometimes,,,,Most of the time,,,Often,,Rarely,Most of the time,,Often,,,Often,,100% of projects,Entirely external,IT Department,satellite images; public government data,retrieving useful information from the free satellite imagery,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,100000,PLN,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Russia,20,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,,Very useful,,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,20,10,10,60,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,24,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,"GitHub,Google Search",Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Machina Newsletter,FlowingData Blog,Talking Machines Podcast",1-2 years,Necessary,,,,Necessary,,Necessary,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Programmer",University courses,20,10,20,30,0,20,Computer Vision,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",20,20,10,0,20,30,"Adversarial Learning,Machine Translation,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Most of the time,10GB,Regression/Logistic Regression,"Cloudera,Hadoop/Hive/Pig,Python,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Engineer,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,40,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Computer Science,Less than a year,"Business Analyst,Data Analyst",Self-taught,60,40,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,3-5 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Machine Translation,Recommendation Engines","Bayesian Techniques,Logistic Regression,Neural Networks - GANs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important +Male,France,38,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Regression,Python,Google Search,"Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,Not Useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Other,Other,11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Recommendation Engines,Ensemble Methods,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Germany,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Miner,Researcher",Work,20,40,30,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,KNIME (free version),Python,R,Spark / MLlib,SQL,Tableau",Rarely,Often,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,20,30,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,Most of the time,,,,Often,,,,,,Sometimes,,Sometimes,,Sometimes,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,,Most of the time,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Singapore,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Oracle Data Mining/ Oracle R Enterprise,Genetic & Evolutionary Algorithms,R,GitHub,"College/University,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer,Researcher,Statistician",University courses,10,20,0,30,0,40,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,Most of the time,,,,,,,,Sometimes,,,,Rarely,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Often,Often,Often,Often,Most of the time,Often,Sometimes,,,Often,,Often,,Often,,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,Often,,,,10,20,30,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,Often,Often,,,,Often,,,,,Often,,,,Often,Often,,Often,,76-99% of projects,Approximately half internal and half external,Central Insights Team,Social media data,Limited,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,165000,SGD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,39,"Not employed, but looking for work",,,,,,,,Java,Time Series Analysis,Python,Government website,"Personal Projects,Textbook",,,,,,,,,,,,Very useful,,,Somewhat useful,,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,Unnecessary,Nice to have,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Statistician,Other","Online courses (coursera, udemy, edx, etc.)",25,25,20,25,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Male,Canada,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,SQL,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"Partially Derivative Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,Self-taught,10,0,0,0,10,80,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Sometimes,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,C/C++,Python,R,SQL,Unix shell / awk",,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,Often,,Often,,,,,Sometimes,,Most of the time,,,,,,Often,Often,,,,50,25,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",Sometimes,,,,Often,Sometimes,,,Often,,,,,,,,Sometimes,,,,,,76-99% of projects,More internal than external,Other,NCBI,Genomics data is error prone and messy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",USBs and external hard drives,"Git,Other",Rarely,25000,CAD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Neural Nets,R,I collect my own data (e.g. web-scraping),"College/University,Online courses,Podcasts",,,Somewhat useful,,,,,,,,Very useful,,Somewhat useful,,,,,,"FlowingData Blog,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,"Business Analyst,Other",Self-taught,50,25,25,0,0,0,"Natural Language Processing,Survival Analysis",Other (please specify; separate by semi-colon),A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Sweden,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Researcher,Other,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Other,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,Very useful,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",40,20,30,0,10,0,Natural Language Processing,"Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Always,1GB,"Bayesian Techniques,CNNs,Neural Networks","Julia,Jupyter notebooks,MATLAB/Octave,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,Sometimes,Rarely,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Sometimes,Most of the time,,,,,,"CNNs,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,Text Analytics,Time Series Analysis",,,,Often,,,Sometimes,,,,,,,Sometimes,,,,Often,Most of the time,Most of the time,,,,Sometimes,,,,,Most of the time,Sometimes,,,,20,30,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Often,,,,,,Rarely,,,,,,,Rarely,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,300000,INR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Denmark,37,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,GitHub,"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",1-2 years,Nice to have,,Necessary,,Necessary,,Necessary,,Necessary,Necessary,,,,,Workstation + Cloud service,,Online Courses and Certifications,No,Master's degree,Physics,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,,,,,,,,,,,,,,, +Male,Russia,38,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,,,,Not Useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,,,,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),1 to 2 years,DBA/Database Engineer,Self-taught,60,0,0,0,40,0,"Computer Vision,Time Series","Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Decision Trees,Python,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Necessary,,,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Github Portfolio,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Other,50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,,Not important,,Not important,,,,,,,,,, +Male,Kenya,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,Less than a year,Researcher,Self-taught,20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Online courses,Personal Projects,YouTube Videos,Other",,,Very useful,,,,,,,Somewhat useful,Very useful,Very useful,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,10,0,10,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Java,KNIME (free version),Python,SQL,Other,Other,Other",,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Often,Most of the time,Often,"Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,,,,,Sometimes,Often,,Often,,Often,Often,Most of the time,Often,,Most of the time,,,,,Most of the time,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Often,,,,,,,,Rarely,,,,,,Often,,,,10-25% of projects,More internal than external,IT Department,Students Performance of Recife; Musical Data; Mining Estimation Data Iron; ,Find Relevant Data for an good proposal and integrate it with different sources.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git,Other",Most of the time,22000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Matlab,GitHub,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,Self-taught,60,0,10,30,NA,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,CNNs,"Java,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,Most of the time,Often,,Most of the time,,Sometimes,Often,,,,,Sometimes,,,,Sometimes,Most of the time,Most of the time,Sometimes,,,,Sometimes,,,Most of the time,Most of the time,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Lack of significant domain expert input,,,,,,,,,,,Most of the time,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Most of the time,"30,000",EUR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",20,35,40,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Insurance,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,Regression/Logistic Regression,"R,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,Sometimes,,,Often,,,,,,Sometimes,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Segmentation",,,,,,Often,,Often,,,,,,Often,,Often,,,,,,Often,,,,Often,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,,University courses,40,10,20,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,100 to 499 employees,Increased significantly,Don't know,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Image data,Sometimes,10GB,"CNNs,Neural Networks,Random Forests","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,Random Forests,Segmentation",,,,Most of the time,,Sometimes,Often,,,,,,,,,,,,,Most of the time,,,Sometimes,,,Often,,,,,,,,5,40,40,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Often,Often,,,,,,,,Sometimes,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Neural Nets,Python,"Google Search,Government website","Blogs,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,35,30,0,0,5,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1TB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,Most of the time,,Rarely,,,,,,,,Rarely,Often,,,Rarely,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests",,,,,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,30,25,30,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",,,,Often,Most of the time,Rarely,,,,,,,Most of the time,Most of the time,,,,Most of the time,Most of the time,Often,Most of the time,Often,51-75% of projects,Entirely internal,Standalone Team,,"Client data shows up in random formats, because the sales team told them we can deal with anything Need to process large flat files in restricted client server environments","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,70000,GBP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Egypt,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Business Analyst,Software Developer/Software Engineer",University courses,40,0,0,30,30,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,3 to 5 years,Data Analyst,Self-taught,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,37,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,Image data,Most of the time,1GB,CNNs,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,"CNNs,Data Visualization,Neural Networks,Segmentation,SVMs",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,Often,,Sometimes,,,,,,20,40,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,,,,Often,,Most of the time,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,Friends network,Kaggle,Personal Projects",Somewhat useful,Very useful,,,,Very useful,Very useful,,,,,Very useful,,,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,0,0,0,20,30,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Greece,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Work,50,20,30,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,Yes,,Researcher,Poorly,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",Hadoop/Hive/Pig,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,YouTube Videos",,,Not Useful,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Scientist,Programmer,Researcher",Self-taught,40,30,20,10,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,Often,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,,Rarely,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,,,,,,Most of the time,Most of the time,,Most of the time,,Often,,,,,,,,,20,40,10,30,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Often,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Association Rules,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses",Somewhat useful,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",10,10,20,0,0,60,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Often,,,Rarely,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,Often,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,SVMs,Time Series Analysis",Sometimes,,,,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,50,10,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,76-99% of projects,More internal than external,IT Department,geodata,dealing with geodata,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,50000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Turkey,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician",Self-taught,90,0,10,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,25,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,Python,Anomaly Detection,R,,"Arxiv,Blogs,Company internal community,Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Other,Work,20,0,80,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"1,000 to 4,999 employees",,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,100MB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Cloudera,Microsoft Excel Data Mining,Minitab,Python,R,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Rarely,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,Time Series Analysis",,Sometimes,Sometimes,Sometimes,,Often,Often,Most of the time,,,,Most of the time,,,,Often,,,,Most of the time,,Sometimes,Most of the time,,Sometimes,,,,,Sometimes,,,,75,20,0,0,5,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,Often,,Most of the time,,,Sometimes,Often,,,,,,,Often,,,,,,,None,More internal than external,Standalone Team,,"Data cleaning, sampling",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Other,NA,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,21,Employed part-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,Python,Survival Analysis,R,Google Search,"College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,Very useful,,Somewhat useful,Very useful,,Very useful,,,,,,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Other,Less than a year,,Self-taught,60,0,0,20,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Turkey,26,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Very useful,Very useful,Not Useful,,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer",University courses,10,0,30,60,0,0,"Adversarial Learning,Natural Language Processing,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Academic,I don't know,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"CNNs,Ensemble Methods,GANs,Neural Networks,Random Forests,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Ensemble Methods,GANs,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,Text Analytics",,,,Most of the time,,Sometimes,,,Sometimes,,Rarely,,,,,Often,,,Most of the time,Most of the time,,,Sometimes,,Most of the time,,,,Most of the time,,,,,60,20,10,5,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,,,Sometimes,,Often,,,,,,,Often,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,"80,000",CHF,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Brazil,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, but looking for work",,,,,,,,Julia,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,Talking Machines Podcast",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Programmer",University courses,10,10,10,70,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Brazil,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,1 to 2 years,Programmer,University courses,60,20,0,10,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,6 to 10 years,"Researcher,Other",University courses,10,30,30,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Self-employed,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Podcasts,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Business Analyst,DBA/Database Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,10,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,,,,,Important,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,Often,,,Rarely,Rarely,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,,Sometimes,,,Often,Most of the time,Often,Sometimes,,,Often,,Often,,Often,,,,Often,Often,,Often,,,,,Often,,,,,,50,20,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,Often,,,,,,,,Often,Sometimes,Often,Sometimes,,,,51-75% of projects,Do not know,Standalone Team,"Google Analytics, Social Media",Sometime too proprietary access.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Bitbucket,Git",Sometimes,100000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,NoSQL,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Machina Newsletter,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Miner,Machine Learning Engineer",University courses,0,0,40,60,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Most of the time,1TB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","C/C++,Java,Jupyter notebooks,Python,R,Tableau,TensorFlow",,,,Often,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Often,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Sometimes,,Sometimes,Most of the time,,Most of the time,Most of the time,Sometimes,Often,,,Often,,Sometimes,,Sometimes,Sometimes,Sometimes,,Most of the time,,Most of the time,Sometimes,,,Sometimes,Often,Often,,Often,,,,30,20,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Sometimes,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,",",",",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,100000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Data Miner,Self-taught,50,20,5,5,20,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data,Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Neural Networks,SVMs","Cloudera,MATLAB/Octave,Python,R,TensorFlow",,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,SVMs,Text Analytics",,Sometimes,,Most of the time,,Most of the time,Often,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,Rarely,,,,,35,40,15,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,,Most of the time,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,New Zealand,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,DBA/Database Engineer,Self-taught,80,10,10,0,0,0,Recommendation Engines,Neural Networks - CNNs,A bachelor's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,Decision Trees,"Microsoft Azure Machine Learning,R",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,Time Series Analysis",,Sometimes,,,,Often,Often,Often,,,,,,Often,,,,,,Often,,,Often,,,,,,,Often,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Scaling data science solution up to full database",,Often,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Never,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Most of the time,,,,Often,Most of the time,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,Most of the time,,Often,,,Often,Most of the time,Often,,,,50,30,0,10,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,Often,Most of the time,,,,Often,,,Often,,,Often,,,,,Most of the time,Often,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Never,550000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Philippines,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,10,10,10,0,"Recommendation Engines,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important +Male,India,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Stack Overflow Q&A",,Very useful,Somewhat useful,,,Very useful,,,,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Other,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,20,0,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Very Important,Very Important +Male,Other,39,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,SVMs","R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,,"Association Rules,Data Visualization,Segmentation,Text Analytics,Time Series Analysis",,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Often,Often,,,,10,20,10,40,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,Often,,,Most of the time,,Often,,,,,,,,,Most of the time,,,76-99% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,60000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,Other,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Kaggle,Personal Projects,Podcasts,Textbook,Trade book,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,A humanities discipline,Less than a year,Business Analyst,Self-taught,90,5,0,0,5,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Female,Turkey,23,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Official documentation,Personal Projects",,Very useful,Very useful,,,,,,,Very useful,,Very useful,,,,,,,"Data Stories Podcast,Jack's Import AI Newsletter,Talking Machines Podcast",1-2 years,Necessary,Necessary,Necessary,,Nice to have,Unnecessary,Nice to have,Nice to have,,Nice to have,,,,,Traditional Workstation,2 - 10 hours,PhD,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,40,0,0,50,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,Somewhat important +Male,Other,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,College/University,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,,,1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Pakistan,24,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Other,Less than a year,,University courses,0,35,5,35,25,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Other,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Very useful,,,,,"Linear Digressions Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,5,5,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,100MB,Regression/Logistic Regression,"DataRobot,Jupyter notebooks,Orange,Python,R,RapidMiner (free version)",,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,Often,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",Sometimes,,,,,Often,Often,Sometimes,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,Sometimes,,,Often,,,,,70,5,5,15,5,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,"Bitbucket,Git",Sometimes,18000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,29,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,,Very useful,Somewhat useful,,3-5 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp","Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Researcher,University courses,40,15,15,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning",Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by government,Python,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,40,15,10,15,0,Time Series,Ensemble Methods,,Financial,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,100MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression",,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,30000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Australia,45,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,NoSQL,Survival Analysis,Python,University/Non-profit research group websites,Conferences,,,,,Very useful,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer",University courses,10,10,20,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,SVMs","Java,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Sometimes,,,,,,,,,Often,,Often,Most of the time,,Sometimes,,Often,,,,,Most of the time,Most of the time,,,,,20,30,0,0,50,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,,,Often,Sometimes,,,,,,,,Often,,,10-25% of projects,More internal than external,Other,Academic research data sets,Complexity of natural language data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,,,Very useful,,Very useful,Very useful,,,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Researcher,Other",Other,20,0,30,0,0,50,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Other,Most of the time,100GB,"Decision Trees,Ensemble Methods,Random Forests,RNNs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,TensorFlow,Other,Other,Other",,Often,,Most of the time,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,Often,Often,Often,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,Time Series Analysis",Often,,,,,Often,Most of the time,Often,Often,Often,,,,,,,,,,Often,Sometimes,Often,Often,,Often,Sometimes,Often,,,Most of the time,,,,25,15,15,30,15,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,100% of projects,Approximately half internal and half external,Standalone Team,,It is streaming data that has complicated message structures at each moment in time,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Company Developed Platform,,Git,Sometimes,"270,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Switzerland,45,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,,,Very useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher",Self-taught,50,35,10,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",,Technology,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Rarely,100MB,"Ensemble Methods,Random Forests,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Random Forests",,,Sometimes,,,Often,Most of the time,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,70,10,NA,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,Often,,Most of the time,,,,Often,,,,,,,,,,Often,,Often,,26-50% of projects,Entirely internal,Standalone Team,,Lack of structure and low data volume,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,120000,CHF,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Colombia,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Relational data,Rarely,100GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks","Amazon Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,R,SQL,Tableau",Rarely,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,10,10,10,70,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,None,None,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Mercurial",Rarely,120000,PKR,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Italy,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Company internal community,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Data Scientist,University courses,30,10,20,20,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,Stan",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,,Rarely,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Sometimes,Often,,,Often,,,,,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,,,,Often,,,,15,25,30,25,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,,,,,,,,,,Most of the time,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Git,Other",Sometimes,70000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,20,10,20,20,20,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,25,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,"Not employed, but looking for work",,,,,,,,R,Text Mining,SQL,I collect my own data (e.g. web-scraping),"Personal Projects,Textbook,Other",,,,,,,,,,,,Very useful,,,Somewhat useful,,,,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",15+ years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service,Other",2 - 10 hours,PhD,Yes,Bachelor's degree,,More than 10 years,"Data Analyst,Operations Research Practitioner,Predictive Modeler,Other",Self-taught,50,5,40,5,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important +Male,Other,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by college or university,MATLAB/Octave,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences",,,Very useful,,Very useful,,,,,,,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer,Statistician",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Sometimes,10MB,"Decision Trees,GANs,Neural Networks,RNNs,SVMs","Java,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,50,0,50,0,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,2000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,43,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,University/Non-profit research group websites,"Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,Somewhat useful,,Very useful,,,Very useful,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Hungary,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,85,0,10,5,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Sometimes,1GB,"Neural Networks,RNNs","Java,NoSQL,Python,SQL,TensorFlow",,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",Rarely,,,,,Often,Sometimes,,,,,,,Sometimes,,Often,,Sometimes,Often,Often,Sometimes,,,,Often,,,,Often,,,,,30,20,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,,Most of the time,,Often,Often,Sometimes,,Sometimes,,,,,,,,Often,Sometimes,,10-25% of projects,More external than internal,IT Department,"yelp, twitter sentiment, imdb","To convert raw date into normalized and machine understandable format, try to find the best features witch describes the whole data.","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",skype,"Git,Subversion",Never,3600000,HUF,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Java,Text Mining,Java,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Talking Machines Podcast",< 1 year,Necessary,Necessary,Nice to have,,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Other,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,"Computer Vision,Reinforcement learning,Speech Recognition","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Japan,50,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Personal Projects,Tutoring/mentoring",,,,Not Useful,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,6 to 10 years,"Business Analyst,Data Analyst,Researcher",Self-taught,30,15,20,0,10,25,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",Primary/elementary school,Financial,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Python,R,SQL",,,,,,,,,,Sometimes,Most of the time,Most of the time,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Rarely,Most of the time,Often,Often,,,,,,,,,,,Often,Often,,Often,,,Most of the time,,Often,Sometimes,Sometimes,,,,85,5,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,,,Often,,,,Often,,10-25% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,85000,,Has decreased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,"Researcher,Software Developer/Software Engineer",Self-taught,45,45,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A professional degree,Technology,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Sometimes,100MB,"Random Forests,Other","Java,Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,Often,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Random Forests,Other",,,,,,,Often,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,0,80,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,,,,,Sometimes,,Sometimes,Often,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,Car industry,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Always,"48,000",EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,63,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,I never declared a major,More than 10 years,"Business Analyst,Data Analyst",Work,60,0,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Retail,,,,,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,1TB,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,50,Employed full-time,,,Yes,,Data Scientist,Fine,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"Company internal community,Kaggle",,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,"Data Elixir Newsletter,Linear Digressions Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,Computer Scientist,Self-taught,90,5,5,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,3-5 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,Often,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,Most of the time,,,,Often,,Often,,,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Naive Bayes,Natural Language Processing,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,Often,,,,,Sometimes,,,,Often,,,,,Sometimes,Sometimes,Most of the time,,,,,Often,,Sometimes,,,Often,Often,,,,50,30,10,2,8,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of significant domain expert input",,,,,Most of the time,Often,,,,,Often,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,Git,Most of the time,150000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Official documentation,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,,,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,Researcher,University courses,50,10,10,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,Regression/Logistic Regression,"Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,Often,Sometimes,,,Often,,,,,,,,,,"Decision Trees,Ensemble Methods,Logistic Regression,Time Series Analysis",,,,,,,,Sometimes,Sometimes,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,,Sometimes,105000,,,9,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,Bayesian Methods,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Kaggle,Online courses,Textbook",Somewhat useful,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,0,30,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","C/C++,Flume,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,Sometimes,,,Often,Often,Often,Often,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Often,,Often,,,,"CNNs,Collaborative Filtering,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,Text Analytics",,,,Sometimes,Sometimes,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,Sometimes,,,Most of the time,,,,,,Often,,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Often,,Often,,,,,,,,,Most of the time,,,Less than 10% of projects,Entirely external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,80000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,R,University/Non-profit research group websites,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,10,0,10,20,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,I prefer not to answer,Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,1GB,Decision Trees,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,20,20,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,NA,NA,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Australia,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,40,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Mix of fields,500 to 999 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,100TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation",Sometimes,Often,Often,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,,,,Most of the time,Most of the time,,Most of the time,,Sometimes,,,,,,,,40,20,20,5,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,Sometimes,,Often,,,,10-25% of projects,Entirely internal,Central Insights Team,,Accuracy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,"Bitbucket,Git",Rarely,130000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Engineer,Researcher,Statistician",Self-taught,30,5,40,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,"1,000 to 4,999 employees",Increased significantly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100MB,"Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,Sometimes,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Sometimes,Often,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,20,20,40,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,Sometimes,,,,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,,Sometimes,,10-25% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,Git,Rarely,53000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Japan,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",Very useful,,,,,,,,Very useful,Very useful,Very useful,Very useful,,,,,,,,1-2 years,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Biology,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Natural Language Processing,Hidden Markov Models HMMs,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Turkey,25,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Neural Nets,C/C++/C#,"GitHub,Google Search","Blogs,College/University,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,,,,Very useful,,,,Very useful,"FlowingData Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",11 - 39 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,I haven't started working yet,Self-taught,65,0,0,35,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Germany,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,NoSQL,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Stack Overflow Q&A,Textbook",,Somewhat useful,,Somewhat useful,,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,,"Data Machina Newsletter,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Other,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,Work,40,10,20,20,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,,,,Very useful,,,,Very useful,"Data Stories Podcast,KDnuggets Blog",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,Github Portfolio,Yes,Master's degree,Other,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,10,0,30,50,10,0,"Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important +Male,Poland,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,Not Useful,Not Useful,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,Somewhat useful,Not Useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"Data Analyst,Software Developer/Software Engineer,Statistician",University courses,25,25,25,25,0,0,"Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Jupyter notebooks,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,Rarely,,Rarely,Rarely,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,Rarely,,,,Often,Most of the time,Sometimes,Rarely,,,Often,,Most of the time,,,,,,Sometimes,Rarely,,Sometimes,,,,,Most of the time,,Most of the time,,,,40,10,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Sometimes,,,,Most of the time,,Most of the time,,,Often,,,,,,Sometimes,Often,,51-75% of projects,Entirely internal,Other,none,cleaning up the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Other",Seafile,"Bitbucket,Git",Rarely,112195,PLN,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Python,I don't plan on learning a new ML/DS method,Python,Google Search,"Personal Projects,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,,,,,,,,,,,,,,Other,0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,43,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Bayesian Methods,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Other,Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Other (please specify; separate by semi-colon)",A master's degree,Manufacturing,500 to 999 employees,Stayed the same,Don't know,An external recruiter or headhunter,Important,Other,Traditional Workstation,"Image data,Text data",Sometimes,10MB,Regression/Logistic Regression,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,20,25,10,30,15,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,100% of projects,Do not know,Other,None,"Industrial IoT not fully developed, lack of a standard service to store and query data (data come from machines)","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,46000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,,Nice to have,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,20,0,0,0,60,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,Spain,42,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Other,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Official documentation,,,,,,,,,,Very useful,,,,,,,,,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Spark / MLlib,SQL,Other",,Often,,,Often,,,,Often,,,,,,Often,,Often,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,Often,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Sometimes,Often,,,Often,,Often,,Often,,,Often,,Sometimes,,Often,,,,,Rarely,Often,,,,,20,20,30,30,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,Sometimes,Often,Most of the time,,,76-99% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Rarely,80000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Textbook,YouTube Videos,Other",,Somewhat useful,,,,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,,1-2 years,,,Nice to have,,Necessary,Necessary,Nice to have,,,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,,Yes,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by a company that performs advanced analytics,Employed by government",TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle,Personal Projects,YouTube Videos",Very useful,,,,,,Very useful,,,,,Somewhat useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,40,10,30,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,500 to 999 employees,Increased slightly,1-2 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters",Other,Sometimes,10GB,"Neural Networks,Random Forests,SVMs","Java,MATLAB/Octave,Microsoft Azure Machine Learning,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Often,,,,Sometimes,Often,,Often,Often,Often,Sometimes,,,Sometimes,,Often,,Often,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Mercurial,Subversion",Sometimes,11000000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Statistica (Quest/Dell-formerly Statsoft),Genetic & Evolutionary Algorithms,R,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Government,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Video data,Text data,Relational data",Sometimes,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,NoSQL,Python,QlikView,R,SQL,Tableau",,,,Sometimes,Often,,Sometimes,,Often,,,,,Often,Often,,,,,,,,,,,,Often,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Segmentation,Text Analytics,Time Series Analysis",,Often,,,,,Most of the time,Often,,,,,,,,Sometimes,,,Sometimes,Often,,,,,,Most of the time,,,Most of the time,Most of the time,,,,20,15,15,25,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,,,,,Sometimes,Most of the time,Often,,Most of the time,Often,Most of the time,Most of the time,,,Often,,,,10-25% of projects,More internal than external,IT Department,,variety of data types,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Git,Rarely,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,"FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,Necessary,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,30,25,0,30,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,France,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,52,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Other,Support Vector Machines (SVM),R,Government website,"Arxiv,Blogs,Personal Projects,Podcasts,Textbook,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Other",University courses,40,10,40,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Financial,20 to 99 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,MATLAB/Octave,R,SAS Base,SQL",,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Simulation,Time Series Analysis",,,Rarely,,,Often,Often,Rarely,,,,,,,,Often,,,,,,,,,,,Sometimes,,,Sometimes,,,,20,30,10,10,10,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,Sometimes,Often,Sometimes,,,,Often,Sometimes,,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,10-25% of projects,More internal than external,Other,Experian;Equifax;Moody's,Getting it; getting someone to buy it;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,80000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Not Useful,Very useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",10,50,20,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,Often,,,,Rarely,,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Random Forests,Simulation,SVMs,Time Series Analysis",Rarely,,,,,,Often,,Most of the time,,,Most of the time,,,,,,,,,,,Often,,,,Often,Sometimes,,Sometimes,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Sometimes,Most of the time,Often,,,,,,,,,Often,,Sometimes,,Sometimes,Often,Often,,10-25% of projects,Entirely internal,IT Department,none,To get it in the first place and to get information about scales and meaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"46,000",EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,I don't write code to analyze data,Business Analyst,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,Flume,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Online courses",,,,,Somewhat useful,Very useful,,,,,Somewhat useful,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,50,30,NA,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Nigeria,26,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Physics,I don't write code to analyze data,"Computer Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Unsupervised Learning,Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,11-15,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United Kingdom,20,"Not employed, but looking for work",,,,,,,,Tableau,Time Series Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Kaggle,Online courses,YouTube Videos",,,,,,Very useful,Very useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Other,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,55,0,5,5,0,,Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important +Male,Australia,63,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Self-employed,Python,,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Personal Projects",,Somewhat useful,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Statistician,Other",University courses,40,0,40,20,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression,Other","C/C++,IBM SPSS Statistics,MATLAB/Octave,Python",,,,Rarely,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Time Series Analysis",,,Sometimes,,,,Often,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,Often,,,Sometimes,,,,5,35,0,30,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,Often,,,,Sometimes,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,100000,AUD,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Other,19,Employed part-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,Other (please specify; separate by semi-colon),Bayesian Techniques,I prefer not to answer,Technology,500 to 999 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,10GB,Bayesian Techniques,"KNIME (commercial version),Microsoft Excel Data Mining,Python,Other",,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,Sometimes,Sometimes,Most of the time,,Sometimes,,,,,,,,,Sometimes,,Sometimes,Most of the time,Most of the time,Sometimes,,Sometimes,Most of the time,Most of the time,,Most of the time,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,40,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Factor Analysis,R,University/Non-profit research group websites,"Blogs,Conferences,Online courses,Podcasts",,Somewhat useful,,,Somewhat useful,,,,,,Very useful,,Somewhat useful,,,,,,"Data Machina Newsletter,FlowingData Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,Udacity",Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,A social science,I don't write code to analyze data,Business Analyst,University courses,50,30,0,20,0,0,Survival Analysis,,Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,26,"Not employed, but looking for work",,,,,,,,C/C++,Text Mining,C/C++/C#,I collect my own data (e.g. web-scraping),"Personal Projects,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,"Data Machina Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",0 - 1 hour,Experience from work in a company related to ML,Yes,Professional degree,,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,70,0,30,0,0,0,Computer Vision,Neural Networks - CNNs,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important +Male,Hong Kong,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Friends network,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Very useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Data Scientist,Operations Research Practitioner,Predictive Modeler,Software Developer/Software Engineer,Statistician",University courses,10,20,30,30,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,"10,000 or more employees",Stayed the same,Less than one year,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Microsoft SQL Server Data Mining,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,Often,,Often,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,Sometimes,Sometimes,Often,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,Often,Sometimes,Sometimes,Often,Sometimes,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Most of the time,Often,,,,,Often,,,,Rarely,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,Git,Sometimes,,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,21,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Julia,Monte Carlo Methods,Python,Google Search,"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,20,5,35,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Sometimes,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Other",,Most of the time,,Sometimes,,,,,,,,,Sometimes,,Often,,Often,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,Rarely,Often,,,Often,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics",Most of the time,,Often,,,Most of the time,Most of the time,,,,,,Often,Sometimes,,Often,Often,Often,Most of the time,Most of the time,,,Often,,Most of the time,,,Most of the time,Most of the time,,,,,45,15,15,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,Sometimes,Often,Often,,,,Sometimes,Often,,,Most of the time,,76-99% of projects,Approximately half internal and half external,Central Insights Team,mostly text datasets,Finding a set of data that is representative for the domain I'm modeling,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,"Git,Other",Rarely,24000,BRL,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Self-employed,Python,Social Network Analysis,R,Government website,"Blogs,Conferences,Friends network,Newsletters",,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,More than 10 years,"Data Analyst,Statistician",Work,15,15,70,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Manufacturing,,,,,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,<1MB,Regression/Logistic Regression,"R,SAS Base,SAS JMP,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,Often,,Often,,,Rarely,,Rarely,,,,,"Logistic Regression,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,,15,15,15,20,15,20,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,,"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Colombia,40,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Engineer,Self-taught,20,60,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,Not Useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Software Developer/Software Engineer,Self-taught,60,40,0,0,0,0,Natural Language Processing,Neural Networks - CNNs,High school,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Ensemble Methods,Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,,Often,,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,,,Often,,,,Often,,,,,30,30,30,5,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,Often,Often,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,Often,Most of the time,Most of the time,,Less than 10% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,100000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Not Useful,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,,1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,40,30,5,20,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important +Male,South Korea,24,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Textbook",Somewhat useful,Somewhat useful,Very useful,,,,,,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Researcher,University courses,30,20,0,50,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,Less than a year,,Other,80,0,15,0,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,10 to 19 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,10MB,"Decision Trees,Random Forests,Other","Amazon Web services,IBM SPSS Statistics,Jupyter notebooks,Python,R,SQL,Other",,Sometimes,,,,,,,,,,Often,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Sometimes,,,,,,,Sometimes,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Rarely,Rarely,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,,Most of the time,Often,,Sometimes,,,,Sometimes,,,,70,10,5,10,5,NA,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",,,,,,Most of the time,,,Most of the time,Often,,,Often,,,,,,,,Often,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"20,000",EUR,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Norway,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,Very useful,Very useful,,,,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Researcher",Self-taught,50,20,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A master's degree,Technology,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics",,,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,Often,Often,Sometimes,Often,,,Sometimes,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,I prefer not to say,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,Rarely,Most of the time,Often,Rarely,,,,,,,Sometimes,,,Often,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,Public data via API,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Canada,41,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Other,Python,,"Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",34,33,33,0,0,0,Reinforcement learning,"Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,"5,000 to 9,999 employees",,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Video data,Other",,10GB,"CNNs,Neural Networks,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Often,,Sometimes,Most of the time,,,,,,,,,,,,,Often,Sometimes,,Rarely,,Often,Sometimes,,Rarely,,Often,,,,60,20,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Often,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,76-99% of projects,More internal than external,Other,Sound recordings from other neuroscience labs,Small datasets; often lacking non-trivial effects,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"65,000",CAD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,France,42,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by college or university,Jupyter notebooks,Bayesian Methods,Python,Google Search,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Computer Scientist,Data Scientist,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Reinforcement learning,Time Series",Ensemble Methods,A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,1MB,Neural Networks,"C/C++,Java,MATLAB/Octave,Python,R",,,,Often,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Logistic Regression,Neural Networks",,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,,,,,10,5,0,10,75,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely external,IT Department,Opendata sets only,Don't know,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,40000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Romania,38,Employed part-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,60,15,5,10,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Image data,Most of the time,1TB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,,Sometimes,,,Often,,Rarely,,Often,,,,70,10,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,Often,,,Sometimes,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,Copernicus ESA;,Putting the data in a common format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,18000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Brazil,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Friends network,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,Very useful,,,,Very useful,,Very useful,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Researcher,University courses,50,30,0,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,People 's Republic of China,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Perfectly,Self-employed,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",33,33,34,0,0,0,"Computer Vision,Speech Recognition","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Rarely,100GB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,Sometimes,Most of the time,,Often,Sometimes,,,,,,,Often,,Often,,,,Most of the time,Often,,,,Rarely,Sometimes,,Most of the time,,,,,,0,30,70,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning",Sometimes,,,,Often,,,,,Sometimes,Often,Most of the time,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Weka,Survival Analysis,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Statistician",University courses,0,0,15,70,15,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Decreased significantly,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,SVMs",,Often,,,,Most of the time,Often,Sometimes,,,,,,Sometimes,Sometimes,Often,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,40,15,20,15,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Often,,Often,,,,,,,Rarely,,,,,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,TCGA cancer genome atlas,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,31000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Programmer,Statistician",University courses,10,0,30,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"5,000 to 9,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,SQL,Tableau,TIBCO Spotfire",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,Often,,Often,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,100 to 499 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Other",,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,"A/B Testing,Data Visualization,Logistic Regression,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,75,5,5,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team",Often,,,,Often,Often,,,,,,,Sometimes,,,Often,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,85000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,50,40,0,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,5,5,3,37,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Rarely,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Programmer",University courses,0,15,40,40,5,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Data Scientist,Kaggle competitions,50,0,15,0,35,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Software Developer/Software Engineer,Other",University courses,10,0,40,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,I don't know,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Stan,Unix shell / awk",,Most of the time,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,Random Forests,Time Series Analysis",,,Sometimes,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,Often,,,,15,20,25,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,,,Sometimes,Often,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,828000,SEK,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,France,28,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,Python,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Personal Projects",,,,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Engineer,University courses,10,8,30,50,2,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,I don't know,Stayed the same,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Never,10MB,"Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests",,Rarely,,,,Most of the time,Most of the time,Rarely,,,,,,Often,,Often,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,,,,40,25,3,14,18,0,"Enough to code it again from scratch, albeit it may run slowly","The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Sometimes,"24,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by non-profit or NGO,DataRobot,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,No education,Other,10 to 19 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Always,10TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,Often,,,,,,,Sometimes,,,,,,,,Often,,Often,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Simulation,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,70,10,8,4,6,2,Enough to tune the parameters properly,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Rarely,,,,,,,,,51-75% of projects,More internal than external,IT Department,Ministry of Agriculture,Availability of good quality data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,98000,USD,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Personal Projects,YouTube Videos",,,,Somewhat useful,,,Very useful,,,,,Very useful,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,6 to 10 years,DBA/Database Engineer,University courses,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Brazil,25,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,3 to 5 years,I haven't started working yet,University courses,10,30,0,60,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,35,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by non-profit or NGO,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,0,10,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,Non-profit,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Don't know,100MB,,"IBM SPSS Statistics,Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Segmentation",,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,60,10,0,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,Most of the time,,Often,,,,,Most of the time,Sometimes,,,Most of the time,,,10-25% of projects,Entirely internal,Standalone Team,Gov data,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,80000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Pakistan,39,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Other,Fine,Employed by government,I don't plan on learning a new tool/technology,Neural Nets,R,Government website,"Online courses,Personal Projects",,,,,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,"Researcher,Other",Work,25,0,75,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A doctoral degree,Government,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Mathematica,MATLAB/Octave,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,SQL",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,Rarely,Most of the time,Rarely,Rarely,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Simulation,Time Series Analysis",,,Sometimes,,,Often,Often,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,Sometimes,,,,,Most of the time,,,Most of the time,,,,25,50,25,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues",,,,,Often,Sometimes,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,52,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,Amazon Machine Learning,Random Forests,Python,I collect my own data (e.g. web-scraping),"Arxiv,Company internal community,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Other,Other",Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,60,0,35,0,0,5,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Technology,Fewer than 10 employees,Stayed the same,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Sometimes,1GB,"Evolutionary Approaches,Neural Networks,SVMs,Other","C/C++,Mathematica,Python,Other",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Other",,,,Often,,Often,Often,,,,,,,,,,,,,Often,Sometimes,,,,,Often,,Often,,,Sometimes,,,20,45,15,15,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,Often,,,,,,,,Often,,10-25% of projects,Entirely internal,Other,MNIST; ImageNet;,Converting it to right format and cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Shared Server,Other,,78000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Other,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,Python,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Data Miner,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Most of the time,,,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,20,30,20,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Often,Often,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Non-Kaggle online communities,Online courses,Personal Projects",,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Software Developer/Software Engineer",University courses,60,10,5,10,5,10,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,"HMMs,Neural Networks","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,Sometimes,,,,,Often,,,,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,Often,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,Often,Most of the time,Most of the time,,Often,Often,,,,Most of the time,,,,,,60,10,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,common crawl,unstructured nature,Document-oriented (e.g. MongoDB/Elasticsearch),Share Drive/SharePoint,,Git,Rarely,1300000,INR,Other,3,,,,,,,,,,,,,,,,,, +Male,Italy,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Google Cloud Compute,Deep learning,Python,GitHub,"Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Not Useful,,Not Useful,Very useful,Somewhat useful,,,,,Somewhat useful,,Very useful,,,Not Useful,Somewhat useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,20,0,0,80,0,0,Time Series,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Relational data,Other",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Java,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,Unix shell / awk",Rarely,,,,,,Rarely,Rarely,Sometimes,,,,,,Most of the time,,,,,,Most of the time,,,Rarely,Rarely,,Most of the time,,,,Most of the time,,Rarely,,Rarely,,,,,,Often,Most of the time,,,Rarely,,,Sometimes,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,SVMs,Time Series Analysis",,,,,,,Often,Rarely,,,,,,Rarely,,,,Sometimes,,Often,,,,,,,,Often,,Most of the time,,,,10,30,30,30,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Often,,,Often,,,,,,,,,,Rarely,,,,100% of projects,Approximately half internal and half external,Central Insights Team,weather,big data processing,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,60000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Philippines,22,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,"Business Analyst,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Decreased significantly,More than 10 years,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R",Rarely,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,Often,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Recommender Systems,Simulation,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,Often,,,,40,10,10,2,38,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,Sometimes,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Not Useful,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,Somewhat useful,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,"Coursera,Udacity,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,0,0,30,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important +Male,Other,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Google Search,"Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,,Sometimes,Often,Often,,,,,Often,,Most of the time,,Sometimes,,Sometimes,Often,,Often,,,Often,,Sometimes,,,,,,10,50,10,5,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,,Sometimes,Often,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Bitbucket,Never,20000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Spain,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,73,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,R,Google Search,"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,40,30,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,I prefer not to answer,Increased slightly,Don't know,A career fair or on-campus recruiting event,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,R",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,,50,30,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,,,,Often,,,,,,,,,Often,Most of the time,,100% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Online courses,Personal Projects",Somewhat useful,,Very useful,,Very useful,,,,,,Very useful,Very useful,,,,,,,,5-10 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Other,No,Bachelor's degree,Computer Science,6 to 10 years,I haven't started working yet,University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Computer Scientist,Data Analyst,Engineer,Software Developer/Software Engineer",Work,20,10,20,50,0,0,,,"Some college/university study, no bachelor's degree",Telecommunications,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,100GB,,QlikView,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,0,0,0,10,10,80,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues",Often,,,,Often,,,,Most of the time,,,,,,,,Often,,,,,,26-50% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Rarely,60000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,33,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,31,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",65,30,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,,,,,,,,,,,,,,Often,,,Sometimes,,Often,,Most of the time,,,,,,,,,,,20,10,10,40,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,Often,,,,,,,Often,,,,,,,51-75% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,Very useful,,"FlowingData Blog,Linear Digressions Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,43,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook,Tutoring/mentoring",,,,,,,,,,,,,,,Somewhat useful,,Very useful,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler",Work,20,20,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Other","Image data,Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM Watson / Waton Analytics,Python,R,TensorFlow",,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Association Rules,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Natural Language Processing,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,,,Most of the time,Often,,,,,Most of the time,Often,,,,Most of the time,,,,Often,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,10,35,20,20,15,0,Enough to tune the parameters properly,"I prefer not to say,Lack of significant domain expert input,Limitations of tools",,,,,,,,,,,Often,,Often,,,,,,,,,,100% of projects,More external than internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Most of the time,100000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,South Korea,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,0,0,0,80,20,0,"Computer Vision,Machine Translation","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs",High school,Mix of fields,20 to 99 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Image data,Most of the time,100GB,"CNNs,Ensemble Methods,GANs,Neural Networks","MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Ensemble Methods,Neural Networks,Segmentation",,,Often,Most of the time,,Most of the time,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Scaling data science solution up to full database",,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,48,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Textbook",,,,,Somewhat useful,,Very useful,,,,,,,,Somewhat useful,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Operations Research Practitioner",Kaggle competitions,30,10,10,30,20,0,Time Series,"Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"5,000 to 9,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Sometimes,,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","C/C++,MATLAB/Octave,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,,Sometimes,,Often,,Often,Often,Most of the time,,,,,,,,,,Often,Sometimes,,Sometimes,,,,Often,Sometimes,,Often,,,,20,60,10,10,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Sometimes,,,,Often,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,130000,BRL,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Russia,22,"Not employed, but looking for work",,,,,,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Other,2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,0,20,50,0,0,"Adversarial Learning,Computer Vision","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Google Search,Government website,University/Non-profit research group websites","College/University,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,,,Very useful,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,20,50,20,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,20 to 99 employees,Stayed the same,1-2 years,Some other way,Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Don't know,10MB,"Neural Networks,RNNs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,Often,,,,,,,,,,,,,,,,,Often,Sometimes,,,,Sometimes,,Often,,,Most of the time,,,,30,10,0,10,50,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Rarely,Sometimes,Most of the time,,,,Often,,,,Sometimes,,,51-75% of projects,Entirely internal,Standalone Team,FRED website,Actually searching for the relevant data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Slack,"Generic cloud file sharing software (Dropbox/Box/etc.),Other",Never,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,Other,University courses,25,0,25,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Non-profit,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Simulation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,"Data Elixir Newsletter,DataTau News Aggregator,FlowingData Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Often,,,,,,Often,,Often,,,,,Often,,Often,,,,,,Sometimes,Sometimes,,,,20,20,10,25,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,Sometimes,,Often,,,,,,,,,,Often,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,175000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,Google Search,"Arxiv,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Textbook",Very useful,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,,,Very useful,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Researcher,Statistician",University courses,50,20,0,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,,10TB,"Regression/Logistic Regression,SVMs","C/C++,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,Often,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis",,,,,,Often,Often,,,,,,,,,Often,,,,,Often,,,,,,Most of the time,Often,,Often,,,,10,70,0,20,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,"60,000",USD,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Other",University courses,30,40,0,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Never,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,85,0,15,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,,,,,,,,Most of the time,,,,Often,,,Sometimes,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,,,Has decreased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,"Google Search,University/Non-profit research group websites","Friends network,Kaggle,Personal Projects,Tutoring/mentoring",,,,,,Very useful,Very useful,,,,,Somewhat useful,,,,,Very useful,,,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,75,0,10,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important +Male,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Conferences,Kaggle,Textbook",Very useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,"Researcher,Statistician",Self-taught,40,0,25,25,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A master's degree,Academic,20 to 99 employees,Increased slightly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Mathematica,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Often,,,Often,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Simulation,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,30,20,0,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Often,Often,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,37000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Poland,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,University courses,20,0,0,80,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,"GitHub,Government website","College/University,Non-Kaggle online communities,Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,Very useful,,,Very useful,Somewhat useful,,,,Somewhat useful,Not Useful,The Data Skeptic Podcast,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Kaggle Competitions,No,I prefer not to answer,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,SQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,College/University,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,,Self-taught,70,0,0,10,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,Don't know,Some other way,Somewhat important,Other,Basic laptop (Macbook),Text data,,,,"Amazon Web services,Python,R,Stan,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,Sometimes,,,,,Often,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods",,,Most of the time,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,0,50,0,40,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Sometimes,,,,,Often,,,Rarely,,,,,,Often,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"20,000",USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Random Forests,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle",Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Engineer,Machine Learning Engineer",University courses,10,30,0,50,0,10,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A bachelor's degree,Telecommunications,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Most of the time,100MB,"Neural Networks,Random Forests","MATLAB/Octave,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,Random Forests,SVMs",,,,Often,,Most of the time,Most of the time,,,,,,,Often,,,,,,Most of the time,,,Often,,,,,Sometimes,,,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,Most of the time,,,,Sometimes,,,100% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,39,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,Very useful,,Somewhat useful,,,,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Traditional Workstation,11 - 39 hours,Online Courses and Certifications,Yes,Some college/university study without earning a bachelor's degree,I never declared a major,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",16-20,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",80,10,5,5,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Relational data,Other",Sometimes,1GB,"Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression,Random Forests",,,,,,,,Often,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,80,10,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues",Most of the time,,,,Often,,,Most of the time,Most of the time,,Often,,Most of the time,,,,Most of the time,,,,,,10-25% of projects,More internal than external,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Company Developed Platform,Email",,"Git,Other",Sometimes,1000000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Excel Data Mining,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",55,40,0,0,5,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,Australia,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,Julia,Monte Carlo Methods,Python,Government website,"Arxiv,College/University,Friends network,Online courses,Textbook,YouTube Videos",Very useful,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Engineer,Researcher,Statistician",Work,0,20,50,30,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Hidden Markov Models HMMs",A professional degree,Military/Security,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,Rarely,10MB,"Bayesian Techniques,HMMs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,HMMs,Naive Bayes,Time Series Analysis",,,Most of the time,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,10,50,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Unavailability of/difficult access to data",Most of the time,Sometimes,,,,,,,Often,,,,,,,,,,,,Often,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,100000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Argentina,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,GitHub,"College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Internet-based,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,KNIME (free version),Python,QlikView,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,Often,,,,Often,,,,,Often,,,Most of the time,,Rarely,,,,,,,,,,,,Often,Often,Most of the time,,,,,,,,Most of the time,Most of the time,,,Rarely,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation",Often,,,,Often,Often,Often,Often,Sometimes,,,Sometimes,Sometimes,Sometimes,,Rarely,,,,,Rarely,,Often,Often,,Often,Sometimes,,,,,,,50,25,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Often,,,,Sometimes,,,,Sometimes,,Sometimes,,Rarely,,Sometimes,,Most of the time,,None,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,O'Reilly Data Newsletter,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,0,0,30,0,Recommendation Engines,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,,,,,Emergent/Future Newsletter (Algorithmia),,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",35,65,0,0,0,0,,,I prefer not to answer,Internet-based,Fewer than 10 employees,Decreased slightly,Less than one year,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",Never,100MB,CNNs,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Did not instrument data useful for scientific analysis and decision-making,,,Most of the time,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,Stan,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Online courses,Podcasts,Textbook,YouTube Videos",Very useful,,,,,,,,,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Engineer",University courses,30,30,10,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and private datacenters,Text data,Most of the time,10GB,Bayesian Techniques,"Spark / MLlib,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,80,6,10,4,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,76-99% of projects,More external than internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,85000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,South Korea,40,Employed part-time,,,Yes,,Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Miner,Researcher,Statistician",University courses,20,10,0,70,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Other,Python,Google Search,"Arxiv,Blogs,Conferences,YouTube Videos",Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Data Miner,Researcher",University courses,45,0,30,25,0,0,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Stayed the same,6-10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data,Other",Most of the time,10GB,"CNNs,GANs,Neural Networks,RNNs,Other","Java,Jupyter notebooks,Python,SQL,TensorFlow,Other",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,Most of the time,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs",Most of the time,Often,,Rarely,Most of the time,,,,,,,,,Sometimes,,Often,,,,Most of the time,Sometimes,,,Most of the time,Most of the time,,,,,,,,,20,70,10,0,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database",,,,Often,Often,,,,,,,,,,,,,Often,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Git,Rarely,,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Canada,46,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Decision Trees,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Emergent/Future Newsletter (Algorithmia),FastML Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Professional degree,,,Engineer,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Brazil,50,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,3 to 5 years,"Researcher,Other",Self-taught,40,20,30,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,"Recommendation Engines,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,500 to 999 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Newsletters,YouTube Videos",Somewhat useful,Very useful,,,,Somewhat useful,Very useful,Very useful,,,,,,,,,,Very useful,"FastML Blog,No Free Hunch Blog,Partially Derivative Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,50,30,0,0,20,0,"Computer Vision,Natural Language Processing","Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by company that makes advanced analytic software,Python,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Kaggle,Official documentation,Personal Projects,Tutoring/mentoring",,,,,Somewhat useful,,Somewhat useful,,,Very useful,,Very useful,,,,,Somewhat useful,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Data Analyst,Data Miner,Machine Learning Engineer,Programmer,Researcher",University courses,10,10,15,50,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods",A master's degree,Technology,20 to 99 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",Relational data,Always,100MB,"Decision Trees,Ensemble Methods","C/C++,Microsoft Excel Data Mining,Python,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests",,,Rarely,,,,,Often,Most of the time,,,,,,,,,,,,,,Rarely,,,,,,,,,,,10,50,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Scaling data science solution up to full database",Often,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,yes,dont' known,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Brazil,37,Employed full-time,,,No,Yes,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Psychology,3 to 5 years,"Engineer,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst",University courses,20,10,20,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",High school,Other,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","NoSQL,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,Often,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Rarely,,Often,,,Often,Most of the time,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,,,Sometimes,,Rarely,Rarely,Most of the time,,,,60,10,10,10,10,NA,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Sometimes,Often,Sometimes,,,Most of the time,,,,,,,,Most of the time,Often,,Most of the time,Sometimes,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,100000,CHF,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by non-profit or NGO,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites,Other","Arxiv,Blogs,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",5,85,5,0,5,0,Unsupervised Learning,Decision Trees - Random Forests,A doctoral degree,Mix of fields,100 to 499 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,"Image data,Video data,Text data,Relational data",Never,,,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Other",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,70,5,5,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,Often,Often,,,,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,Other,"GBIF natural history collections data, museum APIs and IPTs","data cleanup, and setting/following data standards going forward","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"50,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,Employed part-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"GitHub,Google Search","Blogs,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Not Useful,Not Useful,,,,,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",1,88,1,0,10,0,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,DataTau News Aggregator,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher",University courses,5,25,45,20,5,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,1GB,Other,"Java,Jupyter notebooks,Python,R,Spark / MLlib",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Text Analytics",,Often,,,,Often,Often,,,,,,,Sometimes,,Sometimes,,,Often,,,,,,,,,,Sometimes,,,,,10,25,0,25,40,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,,,,,,,,Often,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Bitbucket,Git",Rarely,"24,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Turkey,31,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,20,0,70,10,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs",Primary/elementary school,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,,1GB,"CNNs,GANs,Neural Networks","Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,Rarely,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,GANs,Neural Networks",,,,Most of the time,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,30,40,20,5,5,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,SpaceNet,TIF images are massive.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,USB Flash Memory,"Bitbucket,Git",Sometimes,24000,TRY,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Other,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,0,10,60,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,NoSQL,Python,R,RapidMiner (free version),TensorFlow,Unix shell / awk,Other",,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,Rarely,,,,,,,,,,,Often,,Most of the time,Often,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Sometimes,,,,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Most of the time,Sometimes,,Often,,,Sometimes,Sometimes,,Often,Sometimes,,,,40,40,10,10,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,Sometimes,,,,,Sometimes,Often,Sometimes,Often,Often,,,,,Often,,,26-50% of projects,Approximately half internal and half external,Standalone Team,uci repository; mulan repository; libsvm repository,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Other",Google Drive,Git,Most of the time,125000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,SQL,Google Search,"Blogs,Online courses,Personal Projects",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,20,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Insurance,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Often,,,Sometimes,,,,Often,,,,,,,,Often,,,,,,Often,,Often,,Sometimes,,,,Often,,Sometimes,,,,,,,,Often,Most of the time,,,Rarely,,,Often,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",Sometimes,Sometimes,,,,Often,Often,Sometimes,,,,,,,,Sometimes,,,Rarely,,,,Sometimes,,,,,,Rarely,,,,,60,10,20,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,Sometimes,,,Often,Sometimes,Sometimes,,,Sometimes,,,,,Often,Most of the time,Most of the time,,Less than 10% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Always,125000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Portugal,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Poorly,Self-employed,Python,Anomaly Detection,Python,Government website,"Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,,No,Some college/university study without earning a bachelor's degree,I never declared a major,I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Julia,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,3 to 5 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Natural Language Processing,Speech Recognition,Time Series","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Technology,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Video data,Text data",Most of the time,,Neural Networks,"Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Neural Networks,Text Analytics",,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Often,,,,,60,5,15,5,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,Often,Often,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Most of the time,80000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,Tableau,Survival Analysis,Matlab,Google Search,"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,5,10,10,5,"Adversarial Learning,Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Stayed the same,Less than one year,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Video data,Text data,Relational data",Never,<1MB,"Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,SVMs","C/C++,MATLAB/Octave,R",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Markov Logic Networks,Neural Networks,SVMs",,,,,,,,,,,,,,Sometimes,,,Rarely,,,Sometimes,,,,,,,,Sometimes,,,,,,20,20,20,20,20,0,Enough to tune the parameters properly,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Key-value store (e.g. Redis/Riak),I don't typically share data,,Git,Sometimes,2000000,LKR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Work,25,0,75,0,0,0,Time Series,,A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,Python,I collect my own data (e.g. web-scraping),"Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Angoss,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SAS Enterprise Miner",,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,Often,,Most of the time,Often,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,,Most of the time,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,Sometimes,,,,0,50,25,0,25,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,Often,,,,,,,,,,,Most of the time,,,Often,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +A different identity,Canada,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased slightly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,R,,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Scientist,Statistician,Other",Self-taught,40,0,50,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Other,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","KNIME (free version),R",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Sometimes,,Often,Often,,,,,Most of the time,,Sometimes,,,,Often,Often,,Most of the time,,,,,,,Most of the time,,,,35,35,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Most of the time,,,Often,,,,,Sometimes,,,Often,,,Most of the time,,,10-25% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Weka,Uplift Modeling,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,Trade book",,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Machine Translation,Natural Language Processing,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Government,500 to 999 employees,Increased slightly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Ensemble Methods,Regression/Logistic Regression","IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,NoSQL,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,Often,,,,,Sometimes,,,,Most of the time,,Most of the time,,Sometimes,,,,,,,Often,,,Sometimes,Sometimes,,Often,,,,"Association Rules,CNNs,Data Visualization,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Often,,Sometimes,,,Most of the time,,Often,,,,,,,,,,Most of the time,Sometimes,Often,,Often,,,,,,Most of the time,Most of the time,,,,10,25,15,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,I prefer not to say,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Often,,,,Most of the time,,Often,,Sometimes,,,,Often,,,,Often,,,,Most of the time,,76-99% of projects,More internal than external,Other,"nltk, wordnet",quantity and quiality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"67,000",EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Google Search,"Blogs,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Miner",Self-taught,60,10,30,0,0,0,,,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,,,"KNIME (free version),SQL,Tableau",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,0,0,40,20,0,Enough to run the code / standard library,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,,,,,Often,,,51-75% of projects,More external than internal,Standalone Team,,strict governance rules,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,100000,AUD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Julia,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,Very useful,,,,,,Very useful,,Very useful,,Somewhat useful,,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Other",Other,0,10,0,0,0,90,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Other,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,1TB,CNNs,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation",,,,Often,,Often,Often,,Sometimes,,,,,,,,,,Sometimes,Often,Rarely,,,,Rarely,Often,,,,,,,,60,20,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,Sometimes,,Often,,,,Sometimes,,,Most of the time,,100% of projects,Entirely internal,Standalone Team,Can't say,Getting it labeled and getting enough of it,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,"115,000",USD,Other,6,,,,,,,,,,,,,,,,,, +Male,Greece,26,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Spark / MLlib,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,100MB,"Neural Networks,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Naive Bayes,Neural Networks,SVMs",,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Often,,,,,,80,10,5,2,3,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Sometimes,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,Sometimes,,51-75% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,7200,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,R,Factor Analysis,Python,GitHub,"Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Unnecessary,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,Udacity",GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,50,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,GitHub,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,30,0,30,20,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,"Emergent/Future Newsletter (Algorithmia),Partially Derivative Podcast",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,,,,,,, +Female,Pakistan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important, +Male,Chile,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,NoSQL,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,,Very useful,,Somewhat useful,,,,Very useful,,,,Very useful,,Very useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,40,30,0,20,10,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Video data,Relational data",Most of the time,100GB,"CNNs,Neural Networks,Random Forests,RNNs,SVMs","Jupyter notebooks,Python,SQL,TensorFlow,Other,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,Often,Sometimes,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,Often,Most of the time,,Most of the time,Sometimes,Sometimes,Often,,,,,Sometimes,,Often,,,,Most of the time,Sometimes,,Often,,Most of the time,,Often,Sometimes,,Often,,,,55,20,5,10,10,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database",,,,Sometimes,Often,,,,,,,,,,,,,Rarely,,,,,100% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,"51,428",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,Somewhat useful,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,,Self-taught,50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased slightly,Don't know,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Python,R,SQL,Tableau",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,SVMs",,,,,,,Most of the time,Most of the time,,,,,,Often,,Most of the time,,Often,,,,,Most of the time,,,,,Often,,,,,,10,60,0,30,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Other,50,50,0,0,0,0,"Computer Vision,Time Series",Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,25,2,0,33,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Not Useful,Somewhat useful,Very useful,,Very useful,,,Very useful,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,3 to 5 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",30,20,40,10,0,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Text data,,100MB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,Often,,Often,,Rarely,,,,,,Sometimes,,Sometimes,,,Most of the time,Most of the time,Sometimes,,,,Most of the time,,,Often,Often,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Often,Often,Rarely,,Often,Sometimes,,Most of the time,Often,,,,,,Often,Most of the time,Most of the time,Sometimes,,Less than 10% of projects,More external than internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Bitbucket,Never,1500000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Neural Nets,R,Google Search,"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,,Self-taught,75,20,5,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Always,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",Sometimes,Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Often,Often,,,,,,,Most of the time,,,,,,,Often,,,,35,15,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,Sometimes,Rarely,,,,,,Rarely,,,51-75% of projects,Entirely internal,Other,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Bitbucket,Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,32,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Text Mining,R,Other,"Kaggle,Textbook,YouTube Videos,Other",,,,,,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Other,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Ukraine,33,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,"Data Analyst,Predictive Modeler",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,36,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Python,Text Mining,Python,I collect my own data (e.g. web-scraping),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Reinforcement learning,Other (please specify; separate by semi-colon),Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Rarely,10GB,Bayesian Techniques,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Natural Language Processing,Text Analytics",,,Sometimes,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,30,0,40,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Most of the time,Often,Often,Most of the time,,,Most of the time,,,,Most of the time,,,,100% of projects,Entirely internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,48000,TRY,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Very useful,,Very useful,Somewhat useful,"R Bloggers Blog Aggregator,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,25,65,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Telecommunications,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Orange,R,SQL,Other",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,,Often,,,,,,Sometimes,,Often,,Sometimes,,,,,Often,,,,,Rarely,Often,Sometimes,,,,75,3,2,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,Most of the time,,Sometimes,,,,,,Sometimes,,,Often,,,Most of the time,Most of the time,,76-99% of projects,More internal than external,Other,geo location data,"Diry, inaccessible, silos of data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,95000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Spain,53,Employed full-time,,,Yes,,Data Scientist,,,TensorFlow,Social Network Analysis,R,"GitHub,Google Search","Blogs,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Programmer,Software Developer/Software Engineer,Statistician",Self-taught,80,2,10,8,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","Microsoft SQL Server Data Mining,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,Sometimes,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,50,10,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,Often,,,Most of the time,Often,,,Most of the time,,,,,,Most of the time,,,,,,Often,,100% of projects,More internal than external,Business Department,,Business making decision to depurate and clean data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,33000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,,,R,I collect my own data (e.g. web-scraping),Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,0,0,40,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,10 to 19 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,Bayesian Techniques,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes",,,Sometimes,,,Sometimes,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,40,20,10,10,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Often,,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Vietnam,29,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Analyst,,Employed by a company that performs advanced analytics,Cloudera,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects",,Very useful,,,,,Somewhat useful,,,,,Very useful,,,,,,,"Data Stories Podcast,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Unsupervised Learning,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Always,100GB,Decision Trees,"Hadoop/Hive/Pig,IBM Cognos,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Text Analytics",,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,10,50,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input",Often,,,,,,,,,,Sometimes,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,2300000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Israel,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Scientist,Statistician,Other",Work,20,15,40,25,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Internet-based,500 to 999 employees,Increased significantly,6-10 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Time Series Analysis",Often,,Sometimes,,,Sometimes,,Often,Often,Often,,,,,Often,Often,,Often,,,,,Often,,,,,,,Often,,,,10,10,60,0,20,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,54,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,0,0,50,50,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Most of the time,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Sometimes,Most of the time,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",Most of the time,,,,Sometimes,Most of the time,Often,Often,,,,Often,,,Often,Often,,,,Often,Sometimes,,Sometimes,Sometimes,,,,,Most of the time,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,,,,,,Sometimes,,Sometimes,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hungary,44,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,Tableau,Neural Nets,R,Google Search,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Predictive Modeler",Kaggle competitions,40,0,0,0,60,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Government,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Other,Traditional Workstation,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Sometimes,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Often,,Most of the time,,,,Often,Often,,Often,,,,,Sometimes,Sometimes,Sometimes,,,,20,30,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,,Often,,,,Sometimes,,,,Often,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,the size of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",google drive,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,235000,HUF,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Other,Perfectly,"Employed by a company that doesn't perform advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,"Data Analyst,Programmer,Other",Other,30,30,30,10,0,0,Adversarial Learning,Decision Trees - Random Forests,A bachelor's degree,Government,20 to 99 employees,Decreased slightly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Researcher",Self-taught,50,20,10,10,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,Sometimes,,,Often,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,Most of the time,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,Most of the time,,Often,,Often,,,Often,Often,Often,Often,Most of the time,Sometimes,,,,Often,Often,Often,,,,20,30,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Limitations of tools",Often,,,,,,,,,Often,,,Often,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,31,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Bayesian Methods,Python,Google Search,"Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX",GPU accelerated Workstation,40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,India,21,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,43,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",20,10,20,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"5,000 to 9,999 employees",Increased slightly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Often,Often,,,,,,,Often,,Sometimes,,,Sometimes,,Often,,,Often,,Sometimes,,Most of the time,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,Sometimes,,Often,Often,,Often,,,,,,,,,Often,Often,,100% of projects,More external than internal,IT Department,"reuters, icis, cru, eurostat, comtrade, (quandl data sources)",lack of historical data at fine granularity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"Azure, SAP BW / HANA",Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,37,"Not employed, but looking for work",,,,,,,,R,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Researcher",Self-taught,50,30,0,0,0,20,Time Series,Neural Networks - RNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",University courses,30,30,20,10,0,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,C/C++/C#,Google Search,"Friends network,Online courses",,,,,,Very useful,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Mathematics or statistics,,"DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Deep learning,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Talking Machines Podcast",< 1 year,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Electrical Engineering,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,India,24,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,,,,Other,"Basic laptop (Macbook),Other",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Physics,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,5,5,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,South Korea,32,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Julia,Genetic & Evolutionary Algorithms,Julia,"GitHub,Other","Arxiv,Newsletters,Personal Projects,Textbook",Somewhat useful,,,,,,,Very useful,,,,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Australia,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Tableau,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,Very useful,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100MB,Regression/Logistic Regression,"Amazon Web services,IBM Cognos,KNIME (free version),Microsoft Excel Data Mining,R,SAS Base,SQL,Tableau",,Rarely,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,87,3,3,0,7,0,Enough to tune the parameters properly,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Bitbucket,Sometimes,96000,AUD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,,Less than a year,"Business Analyst,Data Analyst,Operations Research Practitioner,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,10,0,0,Recommendation Engines,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Malaysia,39,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Java,GitHub,Official documentation,,,,,,,,,,Very useful,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",1-2 years,Necessary,,Necessary,,Necessary,Necessary,,,,,,,,,Workstation + Cloud service,,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,0,10,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,South Africa,35,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,YouTube Videos",Somewhat useful,,,,,Very useful,Very useful,Somewhat useful,,Not Useful,Very useful,Somewhat useful,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,"Programmer,Researcher,Software Developer/Software Engineer",Kaggle competitions,50,25,5,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Internet-based,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Ensemble Methods,Neural Networks","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,Often,,,,"Collaborative Filtering,Neural Networks,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,Most of the time,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Sometimes,,Often,Most of the time,,Often,Most of the time,,Often,,,Sometimes,,,,,,Most of the time,,,100% of projects,More internal than external,Other,Wordnet,Size,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,720000,ZAR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Statistician,Perfectly,Self-employed,Amazon Machine Learning,Survival Analysis,R,,"Kaggle,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,Self-taught,100,0,0,0,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",No education,Other,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Minitab,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SQL,Tableau",,,,,,,,,Rarely,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,Most of the time,,,Often,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,60,20,12,3,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Often,,,Sometimes,,,,,Most of the time,,,Often,,,,Most of the time,,,,,,,100% of projects,More internal than external,Business Department,,,,Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,550000,,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Australia,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,DBA/Database Engineer,Engineer,Programmer",Self-taught,40,5,0,40,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,20+,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,23,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,NA,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer,Software Developer/Software Engineer",University courses,20,10,0,70,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Java,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,Random Forests,Recommender Systems,Time Series Analysis",,,,,Sometimes,Often,Often,Often,Sometimes,,,,,Sometimes,,,,,,Sometimes,,,Often,Often,,,,,,Sometimes,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,Most of the time,,,,Often,,Most of the time,Often,,Sometimes,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Never,40000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Argentina,49,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner",Work,40,10,30,10,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - RNNs",A bachelor's degree,Retail,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks","IBM Watson / Waton Analytics,NoSQL,Python,R",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests",,Often,Often,,,Often,,Most of the time,,,,,,,,,,Often,Often,Often,,,Often,,,,,,,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization",Often,Often,,,Most of the time,,,,Often,,,,,,,,,,,,,,10-25% of projects,,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Company Developed Platform",,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,TensorFlow,Support Vector Machines (SVM),Python,,"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,60,40,0,0,0,0,,,A doctoral degree,Government,500 to 999 employees,,,"A friend, family member, or former colleague told me",Important,,Laptop or Workstation and local IT supported servers,Relational data,,,,"Java,Python,SQL",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,Entirely internal,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,Government website,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,The Analytics Dispatch Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Researcher,Other",Work,15,5,80,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A master's degree,Other,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,,,,,,,,Often,Sometimes,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Prescriptive Modeling,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,Often,,,,,,Sometimes,Most of the time,Often,Sometimes,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,20,10,0,30,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",Most of the time,,,,,Often,,,Most of the time,Often,Most of the time,,,,Most of the time,,Most of the time,Sometimes,,,,,26-50% of projects,,Business Department,Pricing data; google analytics; web analytics;,unstructured data sets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1600000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,25,30,0,35,10,0,Outlier detection (e.g. Fraud detection),Neural Networks - CNNs,A bachelor's degree,Technology,100 to 499 employees,Increased significantly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Traditional Workstation,Relational data,Sometimes,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Often,,,Often,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Subversion,Sometimes,5600000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by college or university,Julia,Link Analysis,Scala,GitHub,"Company internal community,Conferences,Kaggle",,,,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Researcher,Self-taught,10,15,10,10,10,45,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Support Vector Machines (SVMs)",I prefer not to answer,Government,"5,000 to 9,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Sometimes,1GB,"Evolutionary Approaches,Gradient Boosted Machines,SVMs","Java,Python,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Unix shell / awk",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Evolutionary Approaches",Most of the time,,,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,10,10,10,10,10,50,Enough to refine and innovate on the algorithm,"Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,Often,,Most of the time,,,,Often,,,,26-50% of projects,More internal than external,Central Insights Team,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,Spain,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,10,0,0,70,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Programmer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,R,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,20,0,50,30,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Other,,10GB,"GANs,Neural Networks","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Data Visualization,GANs,Neural Networks,Simulation",,,,,,,Often,,,,Rarely,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,20,50,0,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Often,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Most of the time,60000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Czech Republic,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,51,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Weka,Deep learning,C/C++/C#,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Researcher",University courses,0,0,40,60,0,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - GANs",Primary/elementary school,Academic,100 to 499 employees,Increased slightly,More than 10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,SVMs","C/C++,Java,Perl,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,SVMs,Text Analytics",,Sometimes,Sometimes,,,Often,,Often,,,,,,Sometimes,,Sometimes,,,Often,Most of the time,Sometimes,,,Often,,Often,,Sometimes,Most of the time,,,,,40,40,0,10,10,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,Often,,,Sometimes,,,,,Often,,Less than 10% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,1500000,TWD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website",Online courses,,,,,,,,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Workstation + Cloud service,2 - 10 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",Kaggle competitions,20,20,20,10,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Sometimes,10GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Ensemble Methods,Neural Networks",,,,Most of the time,,Often,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,Sometimes,Often,,,,Often,,Often,Sometimes,,,,Sometimes,,,,,,,51-75% of projects,More internal than external,Standalone Team,imagenet,annotating the data takes lot of time.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,1450000,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Personal Projects",,,,,Somewhat useful,Somewhat useful,Very useful,,,,,Very useful,,,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,20,5,5,40,0,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,1GB,"Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Most of the time,,,,,,,Often,,,,"Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Random Forests,Recommender Systems,Time Series Analysis",,,,,,Often,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,Sometimes,,,,,,Rarely,,,,25,15,20,15,25,0,Enough to tune the parameters properly,"Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,,,,,,,Sometimes,,,Often,,Sometimes,,,,Most of the time,,,,Most of the time,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git,Subversion",Sometimes,80000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,Very useful,,,,,,Somewhat useful,Very useful,,,,KDnuggets Blog,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,"Survival Analysis,Time Series",Logistic Regression,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,United States,61,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,6 to 10 years,"Business Analyst,Other",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,"Emergent/Future Newsletter (Algorithmia),Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,70,20,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,10MB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Python,R,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,RNNs,SVMs",,,Rarely,Most of the time,,,,,,,,Often,,,,Often,,Sometimes,,Most of the time,,,Sometimes,,Most of the time,,,Most of the time,,,,,,20,30,30,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,Often,,,,,,Often,Rarely,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,MINIST data sets,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,75000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,31,"Not employed, but looking for work",,,,,,,,Java,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",40+,PhD,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,5,0,80,15,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Denmark,24,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,"Data Stories Podcast,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Predictive Modeler,University courses,30,10,30,30,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,"5,000 to 9,999 employees",Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,NoSQL,Python,R,SAS Base,SQL",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,Sometimes,,,,Often,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,Often,Sometimes,,,,60,15,5,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",Sometimes,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Git,Other",,312000,DKK,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Cloudera,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Statistician",University courses,10,20,40,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10GB,"Decision Trees,Regression/Logistic Regression,SVMs","Java,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Often,,,,,Often,Often,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",Sometimes,,,,,Most of the time,,Often,,,,,,,Most of the time,Most of the time,,,,,Often,,Rarely,,,,,Often,,,,,,50,30,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT",,Sometimes,,,,,,Often,Often,,,,,,Sometimes,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Rarely,220000,PEN,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,DataCamp,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Researcher",Self-taught,60,20,10,0,0,10,Time Series,"Bayesian Techniques,Logistic Regression",A master's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,Google Search,"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,More than 10 years,"Business Analyst,Data Analyst,Programmer,Other",Work,40,0,60,0,0,0,Time Series,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,Other,"SAS Base,SAS Enterprise Miner,SAS JMP",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",,,Most of the time,,,,Sometimes,Often,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,,,75,5,5,10,5,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,80000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Very useful,,Somewhat useful,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),,Master's degree,Yes,Master's degree,Electrical Engineering,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,Not important,Somewhat important,Somewhat important,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Random Forests,C/C++/C#,GitHub,Podcasts,,,,,,,,,,,,,Somewhat useful,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Miner,DBA/Database Engineer,Statistician",Self-taught,100,0,0,0,0,0,Natural Language Processing,"Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,10 to 19 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1GB,"Markov Logic Networks,Neural Networks","Amazon Machine Learning,Microsoft SQL Server Data Mining",Often,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,SVMs",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,80,0,0,20,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,Graph (e.g. GraphBase/Neo4j),Email,,Mercurial,,51000,EUR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,GitHub,"College/University,Company internal community,Kaggle,Non-Kaggle online communities,Online courses",,,Very useful,Very useful,,,Very useful,,Very useful,,Very useful,,,,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Machine Translation,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Other,52,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,25,50,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Decreased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,PhD,No,Professional degree,,Less than a year,I haven't started working yet,Other,1,9,0,0,0,90,Time Series,"Bayesian Techniques,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Very Important +Male,Poland,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,Very useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,30,40,20,10,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important +Female,India,20,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Kaggle competitions,5,15,0,0,80,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important +Male,Romania,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Anomaly Detection,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Other",Workstation + Cloud service,0 - 1 hour,Online Courses and Certifications,No,Doctoral degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,10,30,0,0,Recommendation Engines,Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important +Male,South Korea,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Deep learning,R,GitHub,"Blogs,College/University,Stack Overflow Q&A",,Very useful,Very useful,,,,,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Unsupervised Learning,Decision Trees - Random Forests,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Business Analyst,Other",Self-taught,40,30,5,20,5,0,,"Decision Trees - Random Forests,Logistic Regression",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,Work,20,50,25,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Julia,Jupyter notebooks,Python,R",,,,Most of the time,,,,,Often,,,,,,,Often,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks",,,Often,,,,,Often,,,,Often,,,,Often,,,,Often,,,,,,,,,,,,,,30,5,5,20,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,Most of the time,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,Hadoop disks ,Git,Sometimes,78000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed part-time,,,No,Yes,,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,Engineering (non-computer focused),3 to 5 years,,University courses,10,30,0,60,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",R,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Financial,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,Often,,,,,,,,Sometimes,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,,,Often,Often,,Most of the time,,,,,,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Often,,,,,,,,,,,,,,,Often,,,,,,,10-25% of projects,More internal than external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Official documentation,Online courses",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,30,20,10,30,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Sometimes,100GB,"Decision Trees,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,90,10,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Most of the time,14400,BRL,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Academic,I don't know,,,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,10MB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,20,0,20,20,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Privacy issues",,,,,Often,,,,Sometimes,,,,,,,,Sometimes,,,,,,Less than 10% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Self-employed",Oracle Data Mining/ Oracle R Enterprise,Proprietary Algorithms,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Very useful,,,,Very useful,Very useful,,,Very useful,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Software Developer/Software Engineer",Work,0,0,60,0,40,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Insurance,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,SVMs","C/C++,Cloudera,Flume,Hadoop/Hive/Pig,IBM Cognos,Java,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft SQL Server Data Mining,Minitab,NoSQL,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Tableau,TIBCO Spotfire",,,,Sometimes,Often,,Sometimes,,Often,Often,,,,,Often,Often,Often,,,,Sometimes,,,,Often,Sometimes,Often,,,,Often,Often,Sometimes,,,,,Sometimes,Sometimes,Sometimes,Often,Often,,,Sometimes,,Sometimes,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,Often,Often,Often,Often,Often,Often,,,Often,,Often,,Often,,Often,Often,Often,Often,Often,Often,Sometimes,,,,Often,Often,Often,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,,Often,Often,,,,Sometimes,,Often,Sometimes,Sometimes,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Mercurial",Sometimes,,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Brazil,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,44,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Very useful,,,,,"Data Elixir Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,Other,Self-taught,55,0,40,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A doctoral degree,Government,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,SVMs","Amazon Web services,Julia,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,Often,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,Often,,,,,,Often,,Often,,Often,Most of the time,Sometimes,Often,,Often,Often,,,,Often,Most of the time,Most of the time,,,,20,40,0,15,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Often,Often,,Rarely,Sometimes,,,Often,Most of the time,,Sometimes,,,,Sometimes,,Often,,,,Often,,76-99% of projects,More internal than external,Business Department,"NLTK corpus, others by need",,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,135000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Iran,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,R,Text Mining,Matlab,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Programmer",University courses,45,40,10,5,0,0,Natural Language Processing,Support Vector Machines (SVMs),A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Random Forests,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,Very useful,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,University courses,5,5,30,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10GB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,SAS Base,SQL",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,Sometimes,,,,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",Often,,,,Sometimes,Often,Most of the time,Sometimes,,,,,,,Often,Often,,,,,Sometimes,,,Often,,,,,Sometimes,,,,,50,10,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,Sometimes,,Rarely,,,Often,,Often,Often,,Sometimes,Sometimes,,,Sometimes,,Often,,,26-50% of projects,More internal than external,Business Department,,"A lot of data I work is confidential - such as pay data, HR data, and there is a central team that manages/owns it.Getting data is by far the biggest problem - can take upto 2-3 weeks to start a project because you're waiting for data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,92000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Python,,Python,Google Search,"Blogs,College/University,Company internal community,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,"Data Analyst,Predictive Modeler",Self-taught,50,25,0,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Military/Security,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SAS JMP,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,Often,,,Often,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,,Sometimes,,Often,Often,,Often,,,,,Sometimes,Often,Often,,,,Sometimes,,,Sometimes,,,,Sometimes,,,Sometimes,,,,50,30,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,,,,Sometimes,,,,,,,,Sometimes,Sometimes,,Sometimes,,,76-99% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Rarely,69000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,3 to 5 years,Researcher,Self-taught,70,30,0,0,0,0,"Survival Analysis,Time Series",Logistic Regression,A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Never,1GB,Regression/Logistic Regression,"Java,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,SVMs,Time Series Analysis",,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,Most of the time,,,,50,30,0,20,0,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Lack of data science talent in the organization",,,,Often,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Subversion,,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Brazil,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,Microsoft R Server (Formerly Revolution Analytics),Regression,R,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,20,60,10,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,,,,,,,,,,,,,,,, +Male,United States,64,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Miner,Engineer,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Other,20 to 99 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,1GB,RNNs,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,TensorFlow,Other",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,,,,Most of the time,"Data Visualization,kNN and Other Clustering,Neural Networks,RNNs,Time Series Analysis",,,,,,,Most of the time,,,,,,,Often,,,,,,Most of the time,,,,,Often,,,,,Most of the time,,,,10,30,20,10,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,Often,,,Sometimes,Sometimes,,,Most of the time,,,,,Often,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,Standardization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,France,25,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Somewhat useful,,,,Very useful,,,,,,,,,,,,Data Stories Podcast,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Programmer",University courses,10,0,10,60,20,0,"Computer Vision,Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,Not Useful,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",10,70,5,15,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,,,"Java,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,Most of the time,,,,,,,,,,Lift Analysis,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,10,80,10,0,NA,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,,Sometimes,Sometimes,,Most of the time,,,Most of the time,,,Sometimes,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,None,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Rarely,65000,PKR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Malaysia,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,I don't plan on learning a new tool/technology,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer",Kaggle competitions,0,0,10,0,90,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Financial,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",,10GB,"CNNs,Ensemble Methods,Gradient Boosted Machines","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Gradient Boosted Machines",,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,10,90,0,0,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Always,,,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Researcher",Work,30,10,30,30,0,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Flume,Hadoop/Hive/Pig,Java,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Perl,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Sometimes,Often,,,,,Sometimes,,Often,,,,,,Often,,,,,,Sometimes,,Sometimes,,,,Often,,,Most of the time,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,Often,Often,,Most of the time,,,,"A/B Testing,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",Often,,,,,,Often,,,,,,Often,Often,,Often,,Often,,,Often,,,Often,,,,,Often,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,26-50% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Researcher",University courses,75,5,15,5,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,DataTau News Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,50,0,0,15,0,"Natural Language Processing,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Technology,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,Bayesian Techniques,"Amazon Web services,NoSQL,Python",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Natural Language Processing,Neural Networks",,,Sometimes,,,,Sometimes,Often,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,15,20,40,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization",,,,Sometimes,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Subversion",Rarely,65000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Italy,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Google Search,University/Non-profit research group websites","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,25,10,10,50,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Evolutionary Approaches",A master's degree,Academic,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data",Never,1GB,Neural Networks,"C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Cross-Validation,Ensemble Methods,Evolutionary Approaches,Neural Networks",,,,,,Most of the time,,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,20,20,40,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,,,,,,Often,,,Sometimes,,,,,,Most of the time,,Often,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,"Bitbucket,Git",,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Electrical Engineering,I don't write code to analyze data,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,,,A doctoral degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,Russia,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,,Self-employed,Amazon Machine Learning,Deep learning,C/C++/C#,Google Search,"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,,,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Kaggle Competitions,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,50,30,0,20,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important +Male,Italy,43,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",35,15,10,30,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Financial,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,QlikView,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,Rarely,,,,,Sometimes,,,,Often,Often,Often,,,,,,,,Rarely,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,20,25,10,10,35,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,98000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Turkey,33,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,GitHub,"Kaggle,Online courses,Podcasts,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,"Data Analyst,Predictive Modeler,Statistician,Other",Self-taught,30,10,60,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,Regression/Logistic Regression,"Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Rarely,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction",Sometimes,,,,,Often,Most of the time,Sometimes,,,,,,,Often,Most of the time,,,,,Rarely,,,,,,,,,,,,,20,30,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Privacy issues",,Sometimes,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Sometimes,10000,GBP,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,Pakistan,23,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,QlikView,Deep learning,Java,Google Search,"Blogs,Friends network,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,,University courses,0,10,0,70,20,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,38,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Reinforcement learning,Decision Trees - Random Forests,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Not Useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Operations Research Practitioner",University courses,30,10,30,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Financial,100 to 499 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Orange,Perl,Python,R,Spark / MLlib,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,Most of the time,,Often,,,,,,,,Sometimes,Often,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,Often,,Often,,Sometimes,,Sometimes,,Sometimes,Most of the time,,Often,,,Often,Often,Sometimes,,Most of the time,,,,50,10,10,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,76-99% of projects,Entirely internal,IT Department,Too many to list ,Integration ans consolidation ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Bayesian Methods,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,Not Useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,,University courses,30,3,40,25,2,0,,,A master's degree,Other,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,0,0,50,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Sometimes,,Often,Sometimes,,,,,,,,,,,Often,,,51-75% of projects,More internal than external,Other,Transcore;Chainalytics;FRED Data;Cass Information Systems;Key Bank,"Carving out time to clean, prep, and use external data.",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,58000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,50,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Amazon Web services,Regression,Python,GitHub,"Blogs,Conferences,Kaggle,Newsletters,Personal Projects,Podcasts,YouTube Videos",,Not Useful,,,Somewhat useful,,Not Useful,Very useful,,,,Somewhat useful,Very useful,,,,,Very useful,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,Time Series,Decision Trees - Random Forests,No education,Manufacturing,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Rarely,10MB,Gradient Boosted Machines,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,Often,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees",,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,10,30,20,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Sometimes,70000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Biology,6 to 10 years,"Data Scientist,Other",University courses,25,0,50,25,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,Employed full-time,,,Yes,,Computer Scientist,Fine,,Python,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,"Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Researcher",Self-taught,50,20,10,0,20,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Technology,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Sometimes,1GB,"CNNs,GANs,Neural Networks","Google Cloud Compute,Jupyter notebooks,Mathematica,TensorFlow",,,,,,,,Often,,,,,,,,,Most of the time,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Data Visualization,GANs,Logistic Regression",,,,Most of the time,,,Often,,,,Sometimes,,,,,Rarely,,,,,,,,,,,,,,,,,,20,10,10,10,50,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Spark / MLlib,Neural Nets,R,Government website,"College/University,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,Very useful,,,,Very useful,"FlowingData Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,20,5,40,30,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,Often,,Sometimes,,,,Rarely,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation,Simulation,SVMs",,Often,,,,,Most of the time,Often,,,,,,Most of the time,,,,,,,,,,,,Most of the time,Sometimes,Often,,,,,,50,20,5,25,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,Often,Most of the time,,,,Most of the time,,,Most of the time,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,National Student Clearing House; IPEDS; WealthEngine,Source data is entered to get the ERP system to work and the end product our the door. Little time is put in to making sure we re entering the data in the ERP system in a way to make it easy to get out again. Example of this is that master data is changed to get a process to run and then changed back to what is was originally. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,86000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Other,Neural Nets,R,"Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,Very useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,10,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Other,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Impala,R,Unix shell / awk",,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,"Association Rules,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,Sometimes,,,,,,Often,,,,,,,,Sometimes,,,,,Most of the time,,Rarely,,,,,Sometimes,,,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,Often,Most of the time,,,,,,Most of the time,,,Often,,,,Sometimes,Often,,,,51-75% of projects,Entirely internal,Other,UK gov data,data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Most of the time,790000,INR,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Belgium,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,Data Scientist,Work,50,5,30,15,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods",Primary/elementary school,Technology,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Often,Often,Often,Often,Most of the time,,,,,,,,,,Often,,Sometimes,,Most of the time,Often,,,,,Often,Sometimes,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,Sometimes,Most of the time,Often,,,,,,,,,,Sometimes,,,,Often,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,48000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Iran,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,"Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Programmer",University courses,20,20,30,25,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Don't know,10GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,Sometimes,,Most of the time,,Sometimes,Most of the time,Sometimes,,,,Sometimes,,,,,Often,Often,Sometimes,,,,Often,,,Most of the time,Sometimes,,,,,45,20,15,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,Often,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,24,"Not employed, but looking for work",,,,,,,,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Newsletters,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,Very useful,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,"Data Machina Newsletter,Jack's Import AI Newsletter,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,PhD,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,25,15,50,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,India,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Fine,Employed by non-profit or NGO,C/C++,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,Siraj Raval YouTube Channel,< 1 year,,,,,,Unnecessary,,Necessary,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Researcher,Other",Work,20,20,60,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Evolutionary Approaches",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,,,Very Important,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,< 1 year,Necessary,,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,Japan,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,20,80,0,0,0,0,"Machine Translation,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data",Rarely,10MB,"Bayesian Techniques,CNNs,Neural Networks,RNNs","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R",Rarely,Rarely,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,GANs,RNNs",Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,,50,50,0,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Scaling data science solution up to full database",Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,35,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,R,Google Search,"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Engineer,Software Developer/Software Engineer",Self-taught,20,30,40,0,10,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,Retail,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Java,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,Often,,,,Rarely,,,,,,,Rarely,,,,,,,,,Sometimes,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Often,Often,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Rarely,Often,,,Often,,Sometimes,,Sometimes,,Rarely,Often,,Sometimes,,Often,,,Sometimes,,,Sometimes,Often,,,,40,20,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,Often,Most of the time,,,,,,Sometimes,,,,,,,,Often,Rarely,Often,,Less than 10% of projects,More internal than external,IT Department,,Dirty data and inconsistent data over the time as gathering process has been changing,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",,140000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Stack Overflow Q&A",,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,,,A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,Less than one year,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Never,,,"SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Segmentation",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,,,,50,10,30,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,Sometimes,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,55000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,Business Analyst,Fine,Self-employed,Other,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,"FastML Blog,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Often,Often,,,Often,,Sometimes,,Most of the time,,,Sometimes,Most of the time,Sometimes,,Often,,Often,Sometimes,,Often,,Often,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,10-25% of projects,Entirely internal,Standalone Team,Depends on task,Small volume of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,50000,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Pakistan,43,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Very useful,,,,,Very useful,,,,,Very useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,"1,000 to 4,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,QlikView,R,Tableau",,Sometimes,,,,,,,,,,,,,,,Often,,,,Sometimes,Sometimes,Most of the time,Often,Often,,,,,,Often,Sometimes,Often,,,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Recommender Systems,Text Analytics",,,Sometimes,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,,50,25,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Often,Often,,Often,Often,,,,,,,Often,,,,,,,10-25% of projects,More internal than external,IT Department,,Gathering of data due to confidentiality. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Most of the time,24000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,60,"Not employed, but looking for work",,,,,,,,Tableau,Regression,R,GitHub,"Kaggle,Newsletters,Online courses,Textbook,YouTube Videos,Other",,,,,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,Coursera,Traditional Workstation,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),"Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Germany,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,KNIME (commercial version),Factor Analysis,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Arxiv,Very useful,,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,90,5,0,5,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,6-10 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service","Image data,Text data,Relational data,Other",Sometimes,1TB,Neural Networks,"Cloudera,Flume,Impala,Java,MATLAB/Octave,Spark / MLlib,SQL,TensorFlow",,,,,Most of the time,,Most of the time,,,,,,,Most of the time,Most of the time,,,,,,Often,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,Often,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Segmentation,Time Series Analysis",,,,,Sometimes,Sometimes,Sometimes,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,35,20,5,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,Often,Often,,Often,Often,,,,,,Often,,,26-50% of projects,Entirely internal,Standalone Team,Wikipedia,To get the data into our data lake.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Python,Neural Nets,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Stack Overflow Q&A,Other",,Very useful,,,,,Somewhat useful,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,"Data Analyst,Researcher",Self-taught,20,10,40,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,,"HMMs,Regression/Logistic Regression","R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,HMMs,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics",,Most of the time,,,,Often,Most of the time,,,,,,Often,,,Most of the time,,,,,Most of the time,,,,,,,,Often,,,,,40,10,10,20,20,0,Enough to run the code / standard library,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Often,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,,,,,,60000,CNY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,Flume,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,Self-taught,70,25,0,0,5,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,19,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle",,Very useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Engineer,Self-taught,70,20,0,0,0,10,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Never,100MB,"Decision Trees,Ensemble Methods,Random Forests","NoSQL,Python,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"Decision Trees,Random Forests",,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,90,5,0,2,3,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Git,Rarely,60000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Text Mining,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",80,10,0,5,5,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,44,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,R,Time Series Analysis,R,Google Search,Blogs,,Very useful,,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher",Self-taught,90,10,0,0,0,0,"Time Series,Unsupervised Learning","Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Academic,I don't know,Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Traditional Workstation,Workstation + Cloud service",Other,Always,1GB,"Evolutionary Approaches,GANs,Regression/Logistic Regression,SVMs","C/C++,Java,Mathematica,R",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,GANs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Sometimes,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,Sometimes,,,,,Most of the time,,,,,,,,,Most of the time,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,in-house dataset,no advanced knowledge to find suitable tools.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,50000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Ukraine,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by non-profit or NGO",I don't plan on learning a new tool/technology,Neural Nets,Python,Google Search,"Arxiv,Blogs,Conferences,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Machine Learning Engineer,Researcher,Statistician",University courses,25,0,50,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,"5,000 to 9,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,HMMs,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Flume,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Perl,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,Often,,Most of the time,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"Cross-Validation,Data Visualization,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Simulation",,,,,,Most of the time,Most of the time,,Most of the time,,,,Sometimes,Most of the time,,Most of the time,,Most of the time,Often,,,,Most of the time,,,,Most of the time,,,,,,,70,10,10,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,,,Often,Often,Often,,,,,,,,,,,Most of the time,,51-75% of projects,Approximately half internal and half external,Standalone Team,Common Crawl; Censys.io,"Understanding how the data was collected, what it is and isn't, logistics in moving it around and manipulating it.","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,135000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,University/Non-profit research group websites,"College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Very useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",University courses,0,10,10,60,0,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important +Female,United States,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,SVMs","IBM SPSS Statistics,Python,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,Often,Often,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,30,30,10,20,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,24,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,30,0,0,60,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,10MB,Regression/Logistic Regression,"MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,,,Most of the time,Most of the time,,,,,Sometimes,,,,Often,,,,,Often,,,,,,,,Often,Sometimes,,,,40,20,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Often,,,Most of the time,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Canada,32,Employed full-time,,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,30,0,70,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Weka,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Argentina,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Computer Scientist,Data Scientist,Programmer,Researcher,Statistician",Self-taught,50,20,20,10,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks",A doctoral degree,Academic,,,,,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data,Relational data",,1GB,"Markov Logic Networks,Neural Networks","C/C++,Java,Mathematica,MATLAB/Octave,NoSQL,Python,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Evolutionary Approaches,Markov Logic Networks,Segmentation,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,70,10,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,R,Google Search,"College/University,Personal Projects",,,Not Useful,,,,,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,< 1 year,Necessary,,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer",Self-taught,50,50,0,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,35,35,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction",,Often,,,,Sometimes,Most of the time,Most of the time,,,,,,Often,,Often,,,,,Rarely,,,,,,,,,,,,,70,10,5,15,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,Most of the time,,,,,,,,,Sometimes,,Sometimes,,Often,,Sometimes,,,100% of projects,More internal than external,Standalone Team,Sales data,Inconsistent data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,85000,BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Republic of China,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Computer Scientist,,Employed by college or university,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Conferences,YouTube Videos",Very useful,,,,Very useful,,,,,,,,,,,,,Very useful,"Data Stories Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,Researcher,Work,30,10,40,10,5,5,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",,Academic,20 to 99 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,SVMs","MATLAB/Octave,Minitab,Python,R,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,Sometimes,,Often,,,,,,,,,,,Often,,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,PCA and Dimensionality Reduction,SVMs",,,Sometimes,,,Often,,,Sometimes,,,Often,,,,,,Sometimes,,,Often,,,,,,,Most of the time,,,,,,20,40,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,Often,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,50000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,0,10,25,60,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data,Other",Most of the time,10TB,"CNNs,Evolutionary Approaches,Random Forests,SVMs","Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,CNNs,Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,SVMs",,Sometimes,,Rarely,,Most of the time,,,,Often,,,,Sometimes,,,,,,,Often,,Most of the time,,,,,Most of the time,,,,,,20,40,5,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues",Most of the time,,,,Sometimes,Sometimes,,,Often,,Often,,,,Most of the time,,Rarely,,,,,,26-50% of projects,Do not know,IT Department,NCBI; Uniprot; Kegg,Request time over internet.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Sometimes,"48,000",BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Spain,30,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Traditional Workstation,2 - 10 hours,Master's degree,Yes,Master's degree,Other,3 to 5 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",33,33,0,0,34,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,32,"Independent contractor, freelancer, or self-employed",,,No,Yes,Computer Scientist,Fine,Self-employed,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Other,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,0,40,0,0,,,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Canada,28,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by government,R,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Other,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Natural Language Processing,,"Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,90,0,0,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Flume,Hadoop/Hive/Pig,IBM SPSS Modeler,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,RapidMiner (free version),SQL,Tableau",,,,,,,Rarely,,Rarely,,Rarely,,,,,,,,,,,,Often,Often,Rarely,,,,,,,,Most of the time,,Rarely,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,37,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,Julia,Text Mining,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,FastML Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,10,30,0,0,Natural Language Processing,Neural Networks - RNNs,,Internet-based,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Sometimes,1GB,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,50,10,30,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,Often,,,,,,Often,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Git,,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,Very useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,,1 to 2 years,Other,University courses,0,5,0,90,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important +Male,India,25,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Random Forests,R,I collect my own data (e.g. web-scraping),"Blogs,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Very useful,,,,,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"edX,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series",Logistic Regression,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Orange,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,Often,,Sometimes,,Sometimes,,Rarely,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Often,,Sometimes,Often,,,,Sometimes,Often,,,,Sometimes,Often,Sometimes,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Most of the time,Often,,,Often,,,,Most of the time,,,,,Often,,Often,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,105000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,South Africa,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,DBA/Database Engineer,Operations Research Practitioner",Self-taught,40,30,20,10,0,0,,,A master's degree,Technology,100 to 499 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,QlikView,R,SQL,Tableau",,Sometimes,,,,,,Often,,Most of the time,Often,Often,Often,,,,,,,,,,Often,,Most of the time,,,,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Text Analytics",Often,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,30,30,10,20,10,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Often,Often,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Czech Republic,36,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer,Other",Self-taught,80,0,15,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,"HMMs,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Natural Language Processing,Text Analytics",Sometimes,,,,,Often,Sometimes,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT",,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,geoip database;uiradr,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Bitbucket,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Military/Security,"1,000 to 4,999 employees",Stayed the same,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed full-time,,,Yes,,Researcher,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,50,0,30,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Perl,Python,R,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,Sometimes,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,FastML Blog,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,0,30,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",40,15,5,10,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Academic,I don't know,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Image data,Rarely,10GB,"CNNs,Ensemble Methods,Gradient Boosted Machines","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Segmentation",,,,Most of the time,,,Sometimes,,Sometimes,,,Sometimes,,,,,,,,Often,,,,,,Sometimes,,,,,,,,0,90,5,0,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Often,,Most of the time,,,,,,,Often,Sometimes,,100% of projects,Do not know,Standalone Team,,The images are very large and take a lot of memory.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,"17,500",EUR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Ireland,35,Employed full-time,,,Yes,,Data Analyst,,,Spark / MLlib,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,Very useful,,Very useful,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",60,30,0,10,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Internet-based,"10,000 or more employees",Increased significantly,6-10 years,An external recruiter or headhunter,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,100MB,"Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Rarely,,,,Sometimes,Often,Most of the time,Often,,,,,,,,Often,,,Often,,,,Often,Often,,Often,,,Often,Often,,,,10,10,5,70,5,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Mercurial,Sometimes,,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Germany,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,25,25,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - CNNs",A doctoral degree,Retail,100 to 499 employees,Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,1TB,"Bayesian Techniques,Neural Networks","Jupyter notebooks,Python,QlikView,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Neural Networks,Time Series Analysis",Often,,Most of the time,,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,50,20,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools",Sometimes,,,,Sometimes,Often,,,,Often,,,Often,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Subversion,Sometimes,90000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Talking Machines Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Other,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,Adversarial Learning,"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,37,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Social Network Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Online courses,YouTube Videos",,,Not Useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,"FastML Blog,KDnuggets Blog,The Data Skeptic Podcast",3-5 years,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Yes,I prefer not to answer,Computer Science,,"Business Analyst,Data Analyst,Researcher,Software Developer/Software Engineer,Other",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,Very useful,,,Very useful,,,Very useful,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Data Scientist,Work,10,30,30,0,30,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Internet-based,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,,,,,Sometimes,Often,Sometimes,,,Often,,Often,,,,Often,Often,,Often,,Sometimes,,,,,,,,,,,40,20,30,0,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Often,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Email,Share Drive/SharePoint",,Git,Rarely,,INR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,65,0,35,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Sometimes,1GB,Regression/Logistic Regression,"Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,SVMs,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,Often,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Often,,Often,,,,55,20,10,5,10,0,Enough to run the code / standard library,"Limitations of tools,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Other",,,,,,,,,,,,,Most of the time,,,Often,,,Often,,,Sometimes,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Other,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by government",IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Financial,"10,000 or more employees",Decreased slightly,Don't know,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,1GB,Other,"Microsoft Excel Data Mining,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,70,0,0,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,Sometimes,,,Most of the time,,Often,,,,,,,,,Most of the time,Most of the time,,51-75% of projects,Entirely internal,Other,None,Reach data on silos,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"35,000",,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,Deep learning,R,GitHub,"Conferences,Podcasts,Stack Overflow Q&A,Textbook,Trade book",,,,,Very useful,,,,,,,,Very useful,Very useful,Very useful,Very useful,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,"Data Analyst,Researcher",Self-taught,25,0,15,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,Bayesian Techniques,"Microsoft Excel Data Mining,Perl,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,SVMs",,,Most of the time,,,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,,,,Most of the time,Most of the time,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,100% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,,Sometimes,105000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,35,60,0,5,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",High school,Telecommunications,"10,000 or more employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,39,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,,No Free Hunch Blog,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,Supervised Machine Learning (Tabular Data),,I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,Canada,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,44,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Data Scientist",University courses,20,20,30,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Mix of fields,20 to 99 employees,Increased significantly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,10TB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Impala,Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,Often,,,,Most of the time,,,,Rarely,Often,,,Often,,,,,,Sometimes,,,,,,,,Often,Sometimes,Most of the time,,,,,,,,Often,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,,,,Sometimes,Most of the time,Most of the time,Most of the time,Sometimes,Sometimes,,Sometimes,,Often,Often,Often,,,Sometimes,Sometimes,Most of the time,,Most of the time,Sometimes,,Most of the time,,,Sometimes,Sometimes,,,,25,25,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,Often,,,,,,Often,Often,,Often,,,Often,Often,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United Kingdom,42,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,Very useful,,,,,,Very useful,,,Very useful,Somewhat useful,,,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher",Self-taught,70,10,0,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,SVMs","Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Impala,Java,Jupyter notebooks,KNIME (free version),NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,Sometimes,Rarely,Often,,Most of the time,,Rarely,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics",Often,,Often,,,,,,,,,,,,,Sometimes,,,Most of the time,Sometimes,,,Often,Often,,,,Often,Often,,,,,50,30,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",Often,,Often,,,,,Often,Often,Often,,,Often,,Often,,Often,Often,,,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Most of the time,85000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,University courses,20,30,30,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,,,"IBM Watson / Waton Analytics,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,10,0,40,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Financial,500 to 999 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Self-employed,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,,,Very useful,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,45,10,5,0,0,"Computer Vision,Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Other,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Workstation + Cloud service",Video data,Most of the time,100TB,"CNNs,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,Tableau,TensorFlow",,Most of the time,,Often,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Often,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation",,,,Most of the time,,Often,Often,,,Often,,,,Sometimes,,Most of the time,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,,,,5,50,35,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,Often,,,,,,,Most of the time,,,Often,,,,Most of the time,,,,,Often,,10-25% of projects,Approximately half internal and half external,Other,Human3.6m; Cmu mocap; mpii poseprior; ms coco; other pose datsets,consistency of the labeling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,stored on cloud drives,Git,Sometimes,110,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Natural Language Processing,Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,GitHub,"Online courses,Personal Projects",,,,,,,,,,,Somewhat useful,Very useful,,,,,,,Linear Digressions Podcast,< 1 year,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Nice to have,,,,,Basic laptop (Macbook),,Online Courses and Certifications,Yes,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,1-2 years,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important +Male,United Kingdom,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Business Analyst,Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,20,10,0,0,0,"Machine Translation,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Not at all important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Random Forests","C/C++,Julia,Jupyter notebooks,Mathematica,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,Stan,Tableau",,,,Often,,,,,,,,,,,,Rarely,Often,,,Rarely,,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Rarely,,Rarely,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Simulation",,,Often,,,,Often,Sometimes,,,,Sometimes,,,,Often,Sometimes,Sometimes,,,Sometimes,,,,,,Most of the time,,,,,,,20,20,30,20,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database",,,,,,,,,Sometimes,,Sometimes,Often,Often,Sometimes,,,Sometimes,Sometimes,,,,,76-99% of projects,More internal than external,Standalone Team,,,Other,"Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,9,,,,,,,,,,,,,,,,,, +Male,Australia,44,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Bayesian Methods,Python,Google Search,"Blogs,Kaggle,Online courses,Textbook",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Data Analyst,University courses,40,20,0,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Other,"1,000 to 4,999 employees",Stayed the same,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","KNIME (free version),R",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,Often,,,,Sometimes,,Often,,Most of the time,,,Sometimes,Rarely,Sometimes,,Sometimes,,,Often,,,Sometimes,Most of the time,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Sometimes,,,,,Often,,,Sometimes,,100% of projects,More internal than external,Business Department,Census; gnaf; ,Getting data in correct format to analyse,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Never,170000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Finland,29,Employed part-time,,,Yes,,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,Logistic Regression,High school,Pharmaceutical,"1,000 to 4,999 employees",Decreased significantly,Less than one year,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Most of the time,,Regression/Logistic Regression,"Minitab,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,Sometimes,,,,80,5,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,Often,Often,Most of the time,Sometimes,,Sometimes,Often,,Often,Often,Often,,Often,,Often,Often,Most of the time,Most of the time,,,10-25% of projects,Approximately half internal and half external,Other,,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,34800,EUR,Has decreased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by non-profit or NGO",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Management information systems,Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,,,A bachelor's degree,Military/Security,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,,"MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,Yes,,Researcher,Poorly,Employed by professional services/consulting firm,Other,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A master's degree,Mix of fields,500 to 999 employees,Decreased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Relational data,Rarely,<1MB,"Bayesian Techniques,Regression/Logistic Regression","R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,Often,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Sometimes,,,,45,10,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization",Sometimes,Sometimes,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,N/A,N/A,Other,"Commercial Data Platform,Other",,Other,Most of the time,65000,GBP,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,25,40,0,8,25,2,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Random Forests,Python,,"Company internal community,Conferences,Kaggle,Newsletters,Online courses,Textbook",,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,Very useful,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Traditional Workstation,0 - 1 hour,Master's degree,No,Bachelor's degree,Management information systems,I don't write code to analyze data,Business Analyst,Work,5,30,65,0,0,0,,,A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Java,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",30,40,10,0,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,Often,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Most of the time,Often,Often,,Often,Most of the time,Most of the time,Often,,,,,,Often,Sometimes,Often,,,Often,Often,Often,Often,Often,Most of the time,Often,Most of the time,Often,Often,,Most of the time,,,,50,15,5,15,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,175000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Internet-based,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,,Regression/Logistic Regression,"MATLAB/Octave,Microsoft Azure Machine Learning,Python,SQL",,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,10,10,20,30,0,Enough to run the code / standard library,"Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,Sometimes,Sometimes,,,,,,,Sometimes,,,,Sometimes,,,26-50% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,75000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Czech Republic,37,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,30,60,0,10,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Other",Most of the time,10GB,"CNNs,HMMs,Neural Networks","C/C++,MATLAB/Octave,Python,TensorFlow,Other,Other",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,Often,Often,,"CNNs,Cross-Validation,HMMs,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Often,,Sometimes,,,,,,,Most of the time,,,,,,,Most of the time,Often,,,,Sometimes,,,Sometimes,,,,,,10,80,10,0,0,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Often,Often,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,Switchboard;TIMIT;Fisher;Wallstreet Journal,to organize and preprocess data to be ready to run an NN training,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,500000,CZK,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Portugal,43,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,Trade book,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Survival Analysis,,A bachelor's degree,Telecommunications,"5,000 to 9,999 employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Regression/Logistic Regression,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Prescriptive Modeling,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,30,10,10,40,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Sometimes,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,,Sometimes,50000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,62,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Online courses,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast",1-2 years,,,,,Necessary,Nice to have,Nice to have,,,,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Doctoral degree,Computer Science,,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Pakistan,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,Data Machina Newsletter,1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,55,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Online courses",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,Other,Self-taught,25,75,0,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Other,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,Other,"Microsoft Excel Data Mining,QlikView,R",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,50,0,0,0,50,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Rarely,,,,,,,,,,,,,,,,,Often,,,26-50% of projects,Entirely internal,Business Department,None,"Getting the data from the databases, analyzing then finding it's use.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,90000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Kenya,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Other,"1,000 to 4,999 employees",Decreased significantly,Don't know,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",,,Other,"Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,10,30,10,0,Enough to tune the parameters properly,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,Sometimes,,,51-75% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,Rarely,600000,KES,Other,8,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,54,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler",Self-taught,30,0,60,0,10,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased significantly,3-5 years,Some other way,Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,1GB,"CNNs,Neural Networks,RNNs,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,"CNNs,Ensemble Methods,Logistic Regression,Neural Networks,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Often,,,,,Often,,,,,,,Sometimes,,,,Most of the time,,,,,Often,,,Sometimes,Most of the time,Often,,,,30,30,10,20,10,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,51-75% of projects,Entirely external,Central Insights Team,,data authenticity ,Document-oriented (e.g. MongoDB/Elasticsearch),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,2500000,INR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,Very useful,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Financial,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Other,Traditional Workstation,Relational data,,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Often,Often,Often,,,,,Rarely,,Sometimes,,,,,Sometimes,,Often,,,,,,,Often,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,,,,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,,,,10-25% of projects,More external than internal,Business Department,Bloomberg,Costs,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,17000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,65,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by company that makes advanced analytic software,Other,Genetic & Evolutionary Algorithms,Python,"Government website,I collect my own data (e.g. web-scraping),Other","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,Computer Scientist,Self-taught,40,50,10,0,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Very important,Other,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Never,100GB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests","C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Unix shell / awk",,,,Rarely,Sometimes,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,Often,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Recommender Systems,Simulation",,,,,,Sometimes,Often,Often,,,,Sometimes,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,Often,,,,,,,20,50,0,20,10,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,51-75% of projects,Entirely external,Other,kaggle and public data sets as examples to demonstrate tools,"unlike the 'real' world, much of the data worked with is already in good shape","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Other",Sometimes,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Data Miner",University courses,0,40,25,35,0,0,"Computer Vision,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks","IBM SPSS Modeler,Minitab,Python,R,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,"A/B Testing,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",Often,,,,,,,,,,,,,,,Often,,,,Often,Most of the time,,,,,,,,,,,,,50,15,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,,,,Often,Most of the time,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,75,0,0,0,25,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,"Ensemble Methods,Random Forests","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Random Forests,Segmentation",,,,,,Often,Most of the time,Sometimes,Often,,,,,,,,,,,,,,Often,,,Often,,,,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Scaling data science solution up to full database",Sometimes,Often,,,Most of the time,,,,,,,,,,,,,Often,,,,,76-99% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,104000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Russia,18,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,20+,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important +Male,United States,29,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Newsletters,Online courses,Stack Overflow Q&A",,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Software Developer/Software Engineer",University courses,40,0,30,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,100 to 499 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,,,,,,,"Collaborative Filtering,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,Rarely,,,Most of the time,,,,,,,,Often,,,Often,,,,Often,,,,,Sometimes,Most of the time,,,,,60,20,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Rarely,"140,000",,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Canada,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by non-profit or NGO,Java,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Online courses,Personal Projects",,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Statistician",Self-taught,40,30,30,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",I prefer not to answer,Non-profit,"5,000 to 9,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics",,,,,,Often,Most of the time,Often,Often,,,,,Often,,Often,,,,,Often,,Often,,,Often,Often,,Sometimes,,,,,30,30,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,,,,Often,,,,,,,100% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Git,Never,,CAD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Data Miner,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by government,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Engineer,University courses,30,0,70,NA,0,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Most of the time,10GB,"CNNs,Decision Trees,Neural Networks","C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Neural Networks,Random Forests",,,,Most of the time,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,,,,,,,,20,50,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Sometimes,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Git,Subversion",Rarely,103000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,R,Neural Nets,R,Google Search,"Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Not Useful,Somewhat useful,,Somewhat useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Predictive Modeler,Statistician",Work,60,10,20,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,10 to 19 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,,,,,,,,,,,"Decision Trees,GANs,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,,,Often,,,,,,,,Most of the time,,,,,,Often,,,,Often,,,,Often,,,,65,20,10,2.5,2.5,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,Sometimes,,,Most of the time,,,,Often,,,,Sometimes,,,,Less than 10% of projects,Entirely internal,Central Insights Team,Credit bureau data,Values don't change frequently. Regulatory constraints.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,2000000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Italy,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Textbook",Very useful,,Very useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician",University courses,40,0,30,20,0,10,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,500 to 999 employees,Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data,Other",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Julia,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SAS Enterprise Miner,Unix shell / awk",,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,,Rarely,,,,,Most of the time,,,,,Rarely,Rarely,,,,,,,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Often,Often,,,Often,,Often,Often,,,Sometimes,,Often,Often,Often,,Sometimes,,Sometimes,,,Often,,,Often,Often,Sometimes,Sometimes,Sometimes,,,,30,50,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",Sometimes,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,,"Company Developed Platform,Email",,,Sometimes,,EUR,,9,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,20,10,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,100GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Natural Language Processing,Neural Networks,RNNs,Segmentation",,,,Most of the time,,Sometimes,Often,,,,,,,,,,,,Often,Most of the time,,,,,Often,Often,,,,,,,,70,10,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,Sometimes,Sometimes,,,Sometimes,Sometimes,,,,Most of the time,,26-50% of projects,Entirely internal,Standalone Team,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Always,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,C/C++/C#,Google Search,"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,"Siraj Raval YouTube Channel,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Data Analyst,Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,,,,,,,,,,,,,,,, +Male,Spain,21,Employed part-time,,,No,Yes,Other,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,NA,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,NoSQL,Uplift Modeling,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,,Somewhat useful,,,,Very useful,,,Very useful,,Very useful,,Somewhat useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler",University courses,70,0,30,0,0,0,"Adversarial Learning,Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data,Text data,Relational data,Other",Always,10TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Orange,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,Often,Often,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Most of the time,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,,0,10,0,10,30,50,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data",Rarely,,,,Sometimes,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Data Scientist,Engineer",Self-taught,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,51,Employed part-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites",Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",3-5 years,,Necessary,Necessary,,,Necessary,,Necessary,,,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,Master's degree,Psychology,Less than a year,I haven't started working yet,University courses,0,0,0,90,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,Very Important,Very Important,Very Important,, +Male,Portugal,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,Researcher,University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,22,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Management information systems,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,40,40,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,42,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by government,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,0,20,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,100 to 499 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Other,Workstation + Cloud service,Text data,,,Other,"Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Tableau,Other",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,Rarely,,,,Rarely,,,"CNNs,Data Visualization,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis,Other",,,,Rarely,,,Most of the time,,Rarely,,,,,Sometimes,,,,,Sometimes,Rarely,,,,,,,,,Often,Often,Often,,,50,0,0,30,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,Sometimes,Sometimes,,76-99% of projects,Entirely internal,Other,GDELT,Defining the question.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Brazil,35,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,No Free Hunch Blog,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,,,,Coursera,Other,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,33,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by non-profit or NGO,Python,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Master's degree,Biology,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,30,70,0,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Text Mining,SQL,Google Search,"Blogs,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,,,,,Somewhat useful,Very useful,,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,Other,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,A social science,I don't write code to analyze data,"Business Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,30,30,20,0,0,,,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Other,28,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,62,Employed full-time,,,No,Yes,Computer Scientist,Fine,Self-employed,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,,,,,Emergent/Future Newsletter (Algorithmia),5-10 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,Computer Scientist,Self-taught,60,0,0,20,20,0,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A master's degree,Manufacturing,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Random Forests,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,60,10,10,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,Sometimes,,,Sometimes,,,,,,,Often,,,51-75% of projects,More external than internal,Business Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,48,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by college or university,Hadoop/Hive/Pig,Cluster Analysis,Python,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,6 to 10 years,"Engineer,Researcher",Self-taught,40,20,20,10,10,0,Other (please specify; separate by semi-colon),Bayesian Techniques,A bachelor's degree,Academic,"1,000 to 4,999 employees",,,A tech-specific job board,Somewhat important,Research that advances the state of the art of machine learning,Other,"Image data,Text data",Sometimes,10GB,"Bayesian Techniques,CNNs,Neural Networks,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python",,,,Often,,,,,Sometimes,,,,,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Neural Networks",,,Often,,,Most of the time,,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,40,30,10,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,Often,Most of the time,,,Most of the time,,,Most of the time,,,,,,,51-75% of projects,Approximately half internal and half external,Other,,Learning new tech,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,"4,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Switzerland,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,"Software Developer/Software Engineer,I haven't started working yet",University courses,20,0,0,75,5,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,People 's Republic of China,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Personal Projects",,Somewhat useful,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer",Self-taught,50,0,50,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",Primary/elementary school,Technology,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100GB,Bayesian Techniques,"Hadoop/Hive/Pig,Impala,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,,,,,,Often,,,,,Most of the time,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Sometimes,,,,"Data Visualization,kNN and Other Clustering,Naive Bayes,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,Often,,,,10,10,20,40,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,signal data;documents crawled from network;,benchmark often changes,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Subversion,Rarely,300000,CNY,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,Yes,Computer Scientist,Poorly,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects,YouTube Videos",,,Very useful,,,,,,,,,Somewhat useful,,,,,,Not Useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,,,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,,Less than a year,Other,University courses,10,0,0,89,1,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important +Male,Ukraine,25,Employed full-time,,,Yes,,Programmer,Poorly,Self-employed,Julia,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,Less than a year,"Predictive Modeler,Software Developer/Software Engineer",Self-taught,30,50,0,0,20,0,Natural Language Processing,Hidden Markov Models HMMs,High school,Financial,10 to 19 employees,Decreased slightly,Less than one year,A general-purpose job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Rarely,<1MB,CNNs,"Amazon Machine Learning,Google Cloud Compute,Jupyter notebooks,NoSQL,R,TensorFlow,Unix shell / awk",Sometimes,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,CNNs,GANs",,,Sometimes,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,60,5,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,Sometimes,,Often,,,,,Often,,,,Sometimes,,,Most of the time,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Rarely,48000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Canada,60,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Conferences,Friends network,Official documentation,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,Very useful,Somewhat useful,,,,Very useful,,Very useful,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,20,50,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Academic,500 to 999 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data,Other",Rarely,1MB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,IBM Cognos,Java,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,RapidMiner (commercial version),SQL,Tableau,TensorFlow,Other,Other,Other",,Sometimes,,Often,,,,,,Rarely,,,,,Often,,,,,,Often,Rarely,,,,,Sometimes,,,,Most of the time,,Sometimes,Rarely,,,,,,,,Often,,,Rarely,Most of the time,,,Most of the time,Sometimes,Often,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis,Other,Other",,,,Often,,Most of the time,Often,Often,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,Most of the time,,,Often,,Often,Sometimes,,Sometimes,,Often,Most of the time,Often,,30,20,0,20,10,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,Sometimes,,Sometimes,Most of the time,Often,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Other",Dropbox,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,United States,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Business Analyst,Operations Research Practitioner,Researcher",Work,25,20,30,20,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Stack Overflow Q&A,YouTube Videos",Very useful,,,,Very useful,,,,,,,,,Very useful,,,,Very useful,"Data Stories Podcast,KDnuggets Blog",3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"GPU accelerated Workstation,Traditional Workstation",40+,PhD,No,Bachelor's degree,Computer Science,,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,South Africa,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,R,Neural Nets,SQL,"GitHub,Google Search,Government website","Blogs,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist",Self-taught,50,25,25,0,0,0,"Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,RapidMiner (free version),SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,Often,,,,,,,,,Sometimes,,Rarely,,,,Rarely,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,Often,,,,Most of the time,,,,Sometimes,,,,30,35,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,880000,ZAR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United Kingdom,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,,Kaggle competitions,33,0,33,0,33,1,,,A professional degree,Internet-based,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,22,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,1-2 years,Nice to have,,,Nice to have,Necessary,Necessary,Necessary,,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),,Online Courses and Certifications,No,Master's degree,Electrical Engineering,1 to 2 years,Data Scientist,University courses,12,8,0,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,23,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,10,10,5,70,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",High school,Technology,Fewer than 10 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Data Scientist,Predictive Modeler",Work,50,0,50,0,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,CRM/Marketing,20 to 99 employees,Decreased significantly,,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,,,Often,Sometimes,Often,,,Often,,Sometimes,,Often,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,,25,15,0,0,60,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues",Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Sometimes,,,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Other,Python,Google Search,"College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",,,Somewhat useful,,,Not Useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Not Useful,Very useful,,,,Somewhat useful,,3-5 years,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,PhD,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,United States,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Other,"Blogs,Podcasts,Tutoring/mentoring",,Somewhat useful,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Work,0,0,100,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Java,Julia,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,,,Often,,,Often,,Often,,Often,,,,Sometimes,,Often,,,,,10,30,60,0,0,0,Enough to run the code / standard library,Difficulties in deployment/scoring,,,,Sometimes,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,Census;American community survey,Feature engineering,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,USB stick,Git,Rarely,"112,000",,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,Jupyter notebooks,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Stack Overflow Q&A",,,,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important +Female,India,33,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,R,Deep learning,Matlab,Google Search,"Blogs,College/University,Friends network,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Researcher,University courses,10,50,5,30,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Other,Sometimes,1MB,Evolutionary Approaches,"Java,MATLAB/Octave",,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Evolutionary Approaches,Simulation",,,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,30,20,10,20,20,0,Enough to tune the parameters properly,Explaining data science to others,,,,,,Sometimes,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Italy,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler",Work,0,20,80,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Insurance,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Julia,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Arxiv,Online courses",Very useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Non-profit,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,10MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks","C/C++,Python,R,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Text Analytics",,,Sometimes,,,,Most of the time,,,,,,,Most of the time,,,,Often,Most of the time,,,,,,,,,,Most of the time,,,,,20,20,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly",Inability to integrate findings into organization's decision-making process,,,,,,,,Sometimes,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,Making sure the classes are optimal,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,0,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Other,21,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Jack's Import AI Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",16,22,44,8,2,8,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data",Rarely,100MB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Perl,Python,Spark / MLlib,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,,Most of the time,Often,,,,,,,,,,Rarely,,,,,,,Most of the time,,,,"Bayesian Techniques,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs",,,Sometimes,,,,,,,,,,,,,,,,Most of the time,Often,,,Often,,Often,,,Rarely,,,,,,42,25,15,7,11,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,,Rarely,,,,,,Most of the time,,None,Approximately half internal and half external,Standalone Team,OPUS bilingual corpora,Find bilingual data; extract text from pdfs ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,50000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,80,0,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Biology,3 to 5 years,"Data Scientist,Engineer",Self-taught,50,10,20,20,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,100GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Java,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,Tableau",,Often,,,,,,,Sometimes,,,,,,Rarely,,,,,,,,,Often,,,,,,,Often,,Often,,,,,,,,Often,,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,,,,Often,Often,Often,Often,,,Often,,Often,Often,,,,,,Often,,Often,,,Often,Often,Often,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Very useful,,Very useful,,,,,,,,,Very useful,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,,Self-taught,70,10,10,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM SPSS Statistics,R,SQL,Stan",,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,Often,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Simulation,SVMs,Text Analytics",Most of the time,,Most of the time,,,,Most of the time,Sometimes,Sometimes,,,,,,,Often,,Sometimes,Often,,,,Sometimes,,,,Often,Sometimes,Often,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,Scaling data science solution up to full database",,,,,,,,,,,,,,,,,Sometimes,Often,,,,,100% of projects,Entirely external,Other,data.gov; twitter; NOAA weather data;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,35000,USD,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Poland,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,Talking Machines Podcast,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Support Vector Machines (SVMs),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,France,41,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,Spark / MLlib,Social Network Analysis,R,Government website,"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,"Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,,,A doctoral degree,Government,"1,000 to 4,999 employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Important,,Traditional Workstation,Relational data,,,,"Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"PCA and Dimensionality Reduction,Text Analytics",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,50,0,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Often,,,,,,,,,,,Most of the time,,,76-99% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,,Never,25000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,68,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Technology,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Other,,,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Data Analyst,Other",University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Technology,"5,000 to 9,999 employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1PB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,6 to 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10TB,,"QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,10,40,40,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Python,Google Search,"Blogs,Company internal community,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Talking Machines Podcast,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",2 - 10 hours,Other,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,16-20,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important +Male,Poland,26,Employed full-time,,,Yes,,Programmer,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,,Internet-based,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,IBM Watson / Waton Analytics,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Miner",University courses,40,30,10,20,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,20 to 99 employees,Decreased slightly,Don't know,Some other way,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Sometimes,10GB,"HMMs,Neural Networks,RNNs,SVMs","MATLAB/Octave,Perl,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,,Most of the time,Sometimes,,,,,,Often,Sometimes,,,,,Sometimes,Most of the time,,,Often,,Often,,,Often,,Often,,,,65,10,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Often,,,,,,,,,,,,,,,Often,,,Often,,26-50% of projects,More internal than external,Standalone Team,LDC Speech-Related Corpus; Physionet Datasets,To guarantee the correctness in data annotations,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,360000,ARS,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Kaggle,Online courses,Podcasts,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,"Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Never,10MB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Machine Learning,Google Cloud Compute,TensorFlow",Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Decision Trees,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,RNNs,SVMs",,,,Often,,,,Sometimes,,,,,Sometimes,,,Most of the time,,Sometimes,Sometimes,,,,,,Sometimes,,,Sometimes,,,,,,50,50,0,0,0,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,,Often,,,,,,Often,,,,,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,75000,AUD,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,DataRobot,Neural Nets,Java,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Not Useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches",,Technology,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Rarely,1EB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,C/C++,IBM Cognos,IBM Watson / Waton Analytics,Impala,NoSQL,Perl,Python,QlikView,SAS Base,SAS JMP,Unix shell / awk",Rarely,,,Sometimes,,,,,,,,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,Often,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Gradient Boosted Machines,HMMs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,Sometimes,,,,,,,,Often,Most of the time,,,,,,Most of the time,Often,,,Most of the time,,,,Most of the time,,Often,Most of the time,,,,60,20,10,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,Often,,,,,,Sometimes,Sometimes,Often,,,,Sometimes,,Most of the time,,,,,Often,,51-75% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Mercurial,Subversion",Sometimes,75000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Poland,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,Somewhat useful,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,60,20,5,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,Often,,,Often,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics",,,,,,Rarely,Most of the time,Often,,,,,,Sometimes,Often,Often,,,,,,Sometimes,Often,,,Sometimes,Often,,Sometimes,,,,,40,10,0,10,40,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Most of the time,,,,Often,,,,Sometimes,Most of the time,,,Sometimes,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,200000,PLN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Statistician,Poorly,Employed by college or university,SQL,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Textbook",Somewhat useful,Very useful,,,,,Somewhat useful,,Very useful,Very useful,,Very useful,,,Very useful,,,,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,1 to 2 years,Researcher,Other,80,15,5,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Academic,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Very important,Other,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Random Forests",,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,,Rarely,,,,,,,,,,,5,20,10,20,10,35,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Sometimes,,,,,,,,,Often,,Often,,,,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,none,Using the data to distill meaningful business insights.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Other,Most of the time,70000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Spain,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer",Work,25,25,50,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Other,70,30,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,University courses,30,10,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",High school,Academic,I don't know,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,,100TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Python,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests",,,Often,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,30,0,10,30,30,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of funds to buy useful datasets from external sources",,,,,,Often,,,,Most of the time,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,Git,Sometimes,70000,CHF,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Retail,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Often,,,,Often,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,Sometimes,,,Often,Often,,,,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,"Arxiv,Conferences,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,,,,Very useful,,,,,,,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,"Data Analyst,Data Scientist,Researcher,Statistician,Other",University courses,30,10,0,60,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,Rarely,,,,,Often,,,,Often,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics",Most of the time,,Rarely,Rarely,,Sometimes,Sometimes,Sometimes,Often,,,Often,,Sometimes,,Sometimes,,Rarely,,Rarely,,,Often,Most of the time,,,Rarely,Rarely,Often,,,,,60,20,10,5,5,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Need to coordinate with IT,Scaling data science solution up to full database",,Often,,,,,,,,,,,,,Sometimes,,,Often,,,,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,150000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Spain,47,Employed full-time,,,Yes,,Engineer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,DBA/Database Engineer,Self-taught,20,10,40,20,0,10,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches",Primary/elementary school,Telecommunications,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data,Other",Sometimes,10PB,"Evolutionary Approaches,Markov Logic Networks,Other","C/C++,Java,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python,SQL,Tableau,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"A/B Testing,Association Rules,Natural Language Processing,Neural Networks,Text Analytics",Most of the time,Most of the time,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,10,10,50,20,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Anomaly Detection,Python,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Company internal community,Conferences,Non-Kaggle online communities,Official documentation,Other",Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,,The Data Skeptic Podcast,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,,Traditional Workstation,0 - 1 hour,Master's degree,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,10,0,80,0,0,Recommendation Engines,Bayesian Techniques,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,India,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Self-employed,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Operations Research Practitioner,Predictive Modeler,Other","Online courses (coursera, udemy, edx, etc.)",0,25,35,0,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Manufacturing,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Sometimes,10GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests",,,,,,Most of the time,Often,Sometimes,Most of the time,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Often,,,,,,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,10-25% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Git,Other",Most of the time,,INR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,50,30,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,IBM SPSS Statistics,Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Python,R,RapidMiner (free version),SAS Base,SQL,Tableau",Often,Most of the time,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,Sometimes,,Often,,,,Often,,Most of the time,,Sometimes,,,Often,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Segmentation,SVMs,Text Analytics,Time Series Analysis",Most of the time,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,,,,,,,,,Most of the time,,Most of the time,Rarely,Most of the time,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Most of the time,,,Most of the time,,,,Often,,,,Often,,,,,,Sometimes,,,,100% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Never,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Other,Bayesian Methods,Python,,"Arxiv,Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,30,0,40,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Academic,I don't know,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data",Never,1GB,"CNNs,Ensemble Methods,Evolutionary Approaches,Neural Networks,RNNs,Other","C/C++,Julia,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,RNNs,Segmentation,Time Series Analysis",,,,Most of the time,,Often,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,Often,,,,20,40,0,30,10,0,Enough to refine and innovate on the algorithm,"Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,,,Often,Rarely,,100% of projects,Do not know,Other,,,,,,Git,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Female,Taiwan,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Not Useful,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,Very useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst",Other,30,5,20,0,0,45,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,QlikView,R,Tableau",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,Most of the time,,,,,,,,Sometimes,Rarely,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,15,10,20,25,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Most of the time,,Often,,,Sometimes,Often,Often,,Sometimes,,Often,Most of the time,Often,Most of the time,,,Sometimes,Sometimes,,100% of projects,More internal than external,Central Insights Team,Government Open Data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,"1,000,000",TWD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Other",Self-taught,70,5,15,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Regression/Logistic Regression","IBM SPSS Modeler,Java,Python,R",,,,,,,,,,,Rarely,,,,Often,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Naive Bayes,Natural Language Processing,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,Sometimes,,,Most of the time,,,,,10,10,30,10,40,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Rarely,Sometimes,,,,,,,,,,,,Most of the time,,Sometimes,,,Less than 10% of projects,More external than internal,Business Department,NA,Truncated news articles; News articles in other languages,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,115000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Philippines,28,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,Very useful,Somewhat useful,,Very useful,,Very useful,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Programmer,Researcher,Statistician",University courses,30,5,30,30,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,RapidMiner (free version),SQL,Tableau,Unix shell / awk,Other",,,,,,,,,,,Rarely,Rarely,Rarely,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,,Sometimes,,,Most of the time,,,Rarely,,Most of the time,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,Often,,Often,,,,Sometimes,,Sometimes,Often,,,Most of the time,,,,Most of the time,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,government official statistics,availability of updated data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1500000,PHP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,NoSQL,I don't plan on learning a new ML/DS method,C/C++/C#,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Friends network,Kaggle,Personal Projects,Textbook,Tutoring/mentoring",,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,"Data Machina Newsletter,DataTau News Aggregator,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,1,1,1,1,1,95,Speech Recognition,Neural Networks - RNNs,No education,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,47,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,R,Neural Nets,Python,"Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,"FastML Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Other,Self-taught,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Often,,,,,Sometimes,,Often,,,,,,,,,,,20,70,0,0,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,Often,,Often,,,,,,,,Often,,,,100% of projects,Approximately half internal and half external,Central Insights Team,images,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,25000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Online courses",,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,60,10,0,30,0,0,"Recommendation Engines,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Markov Logic Networks",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,HMMs,Regression/Logistic Regression","C/C++,Google Cloud Compute,Java,Jupyter notebooks,Python,R,SQL",,,,Sometimes,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Markov Logic Networks,Naive Bayes",,Often,Most of the time,,Most of the time,Most of the time,Most of the time,Often,,,,,Often,Most of the time,,,Most of the time,Often,,,,,,,,,,,,,,,,30,30,5,25,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Sometimes,,,,,Often,Most of the time,,,,,,,,,,Most of the time,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,360000,INR,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Kenya,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,30,0,40,10,0,,Bayesian Techniques,A master's degree,Financial,10 to 19 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,,,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Association Rules,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",,Often,,,,Sometimes,,,Most of the time,,,,Often,,,Most of the time,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Never,20000,KES,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,Google Search,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Other,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,Italy,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,25,40,0,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Neural Networks - CNNs,A bachelor's degree,Other,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",,10GB,Neural Networks,"C/C++,Java,Jupyter notebooks,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,Often,,,,,,"CNNs,Logistic Regression,Neural Networks",,,,Often,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,30,40,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Often,,,,Often,,,10-25% of projects,Entirely internal,Standalone Team,NOAA climate forecast data,Increase accuracy of noaa public data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,EUR,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,Iran,29,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Bayesian Methods,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)",Personal Projects,,,,,,,,,,,,,,,,,,,,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,University courses,10,0,0,90,0,0,"Computer Vision,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Colombia,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,I collect my own data (e.g. web-scraping),Blogs,,Very useful,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Business Analyst,University courses,30,0,40,30,0,0,"Recommendation Engines,Unsupervised Learning",Support Vector Machines (SVMs),A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,10MB,Regression/Logistic Regression,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Collaborative Filtering,Data Visualization,Segmentation",,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,60,10,0,20,10,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,Often,,,,,,Often,Often,,,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Never,75000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Company internal community,Kaggle,Online courses",,Very useful,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,50,20,0,30,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important +Male,United States,40,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Friends network,Kaggle,Online courses,Podcasts,Trade book,YouTube Videos",,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,,Very useful,,Not Useful,,,Somewhat useful,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,25,50,10,0,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,,"Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Text Analytics",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,80,5,5,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,Often,,Most of the time,,,,,,Sometimes,,,Sometimes,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,Reliably instrumenting the application to provide proper data.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,"135,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Newsletters,Online courses,YouTube Videos",,,,,,,,Somewhat useful,,,Very useful,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Necessary,,,,Necessary,,,,Necessary,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,60,0,40,0,0,"Computer Vision,Natural Language Processing,Speech Recognition","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,,,Very Important,Very Important,Somewhat important,,,, +Male,United Kingdom,NA,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,Very useful,,,,Very useful,,Very useful,,Somewhat useful,Very useful,,,Very useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer,Other",Self-taught,60,10,20,0,10,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,100TB,"Bayesian Techniques,CNNs,Ensemble Methods,GANs,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,NoSQL,Python,Spark / MLlib,TensorFlow,Other,Other,Other",,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,,Most of the time,Most of the time,Most of the time,"Bayesian Techniques,CNNs,Collaborative Filtering,Ensemble Methods,GANs,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,,,,40,25,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data,Other",Most of the time,,,,Often,,,,,Most of the time,,,,,,,,,,,Often,Most of the time,26-50% of projects,More internal than external,IT Department,-,-,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Most of the time,-,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,6 to 10 years,Programmer,Self-taught,80,0,5,10,1,4,Recommendation Engines,Bayesian Techniques,A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Very important,Other,Basic laptop (Macbook),"Image data,Text data,Other",Always,10MB,,"C/C++,Java,Mathematica,MATLAB/Octave,Python,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,Sometimes,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Data Visualization,Other",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,90,2,2,1,5,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Textbook",,,Very useful,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,R Bloggers Blog Aggregator,1-2 years,Unnecessary,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer",Kaggle competitions,10,20,0,50,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,France,28,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,NA,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Business Analyst,Statistician",Self-taught,20,10,20,20,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Government,"1,000 to 4,999 employees",Decreased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,67,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Conferences,Official documentation,Textbook",Somewhat useful,,,,Very useful,,,,,Very useful,,,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,Self-taught,50,0,50,0,0,0,"Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Other,Rarely,100GB,"CNNs,Decision Trees,HMMs,RNNs,SVMs","MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,HMMs,PCA and Dimensionality Reduction,RNNs,SVMs",,,,Often,,Often,Often,,,,,,Often,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,40,30,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools",,,,Sometimes,Often,,,,,,,Often,Often,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,NA,data annotation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,"12,000,000",JPY,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Other,33,"Not employed, but looking for work",,,,,,,,DataRobot,Support Vector Machines (SVM),SQL,GitHub,Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,"Linear Digressions Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,Work,30,20,30,20,0,0,"Computer Vision,Survival Analysis",Support Vector Machines (SVMs),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts",,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Programmer",University courses,15,25,25,30,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,500 to 999 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,Sometimes,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,Most of the time,,Most of the time,Most of the time,Most of the time,Often,Sometimes,,,,Sometimes,,Often,,Often,Sometimes,Most of the time,Often,,Most of the time,,,,,,Rarely,Most of the time,,,,85,1,10,3,1,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects",Often,,,Sometimes,Most of the time,,,,,Rarely,,,,Sometimes,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,"Combining disparate teams' data sets, most of which were never meant for analysis.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,130000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,University courses,40,20,10,30,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,MATLAB/Octave,NoSQL,Python,R,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,Rarely,,,,,,Sometimes,,,,,,Sometimes,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,"CNNs,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,Sometimes,,,Often,Often,,,,,,,,Often,,Often,Most of the time,Often,,,Most of the time,Sometimes,,,,Sometimes,Most of the time,,,,,30,40,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,Often,,,Often,,,Sometimes,,,,,,,,,,,,,Most of the time,,26-50% of projects,More internal than external,Business Department,,Too big and full of noise,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,90000,CAD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,Data Scientist,University courses,20,30,0,20,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Other,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,TIBCO Spotfire",,Sometimes,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,Sometimes,Sometimes,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,Often,,,Often,,Often,Most of the time,Most of the time,,Sometimes,Often,,,Often,Often,,,,50,20,NA,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Often,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,Most of the time,,100% of projects,More internal than external,IT Department,prefer not to answer,unformatted data...handling data issues.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,4000000,INR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,,GitHub,Arxiv,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,University courses,15,30,5,20,30,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,GANs,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,,,Often,,,Often,Often,Often,,Sometimes,,,,,,,,,Often,Sometimes,,Often,,Sometimes,,,,,,,,,30,15,5,15,35,0,Enough to refine and innovate on the algorithm,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,,300000,CNY,Other,,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Textbook",Very useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Basic laptop (Macbook),Relational data,,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Rarely,,,,,,,Sometimes,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Often,,Often,Often,,,Often,,,,Often,,,,,,,Often,,,,,,,,,,,70,30,0,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Organization is small and cannot afford a data science team",Most of the time,Often,,,,Sometimes,,,,,,,,,,Often,,,,,,,Less than 10% of projects,More internal than external,IT Department,None,Cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Sometimes,"80,000",BRL,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Pakistan,30,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook",Very useful,,Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher",University courses,10,20,0,60,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Rarely,1GB,"CNNs,Neural Networks,SVMs","IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Python,RapidMiner (commercial version)",,,,,,,,,,,,Often,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Often,,Most of the time,Often,,,,,,,Often,,,,Often,,Most of the time,Often,,,,,Most of the time,,Most of the time,,,,,,40,20,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",Most of the time,Often,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,Most of the time,,10-25% of projects,More external than internal,Standalone Team,Agricultural data; remote sensing data; public dataset,"Data collection is the biggest challenge I face in most of the problems, online dataset are usually limited and need a better or more focused dataset according to targeted problem.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,100000,PKR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Kenya,26,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,"Data Analyst,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,30,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,Somewhat useful,,Not Useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,High school,Financial,100 to 499 employees,,Don't know,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,1GB,Other,"Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,60,0,0,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization",Most of the time,Often,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Subversion",Never,65000,EUR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","GPU accelerated Workstation,Traditional Workstation",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,A social science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,70,0,10,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,50,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,Government,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Never,100GB,"Decision Trees,Gradient Boosted Machines","SAS Base,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",25,50,15,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,Technology,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,Often,,Most of the time,,,Often,,Often,,Most of the time,,,Often,,,Often,Often,,,,20,35,0,20,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",Work,25,25,25,25,0,0,"Computer Vision,Natural Language Processing","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,R,Spark / MLlib,SQL,TensorFlow",,Sometimes,,Rarely,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,Often,,,,,Often,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",,,,,,Often,Often,,,,,,,Often,,Often,,,Often,Often,Often,,,,Often,,,,Often,Often,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,I prefer not to answer,Internet-based,20 to 99 employees,Decreased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,Regression/Logistic Regression,"Microsoft Azure Machine Learning,Python,SQL,Other",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,Often,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,Sometimes,,,,50,5,5,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,Sometimes,,,,,,,76-99% of projects,Entirely internal,Other,google analytics,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,35000,GBP,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Argentina,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Neural Nets,SQL,Google Search,"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Data Miner,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,40,30,30,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,Primary/elementary school,Mix of fields,Fewer than 10 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,Decision Trees,"C/C++,KNIME (free version),MATLAB/Octave,Microsoft SQL Server Data Mining,R,RapidMiner (free version),SQL,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,Most of the time,,Often,,,,Most of the time,,,,,,,,Sometimes,,Often,,,,,,,Most of the time,,,,,,Often,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Random Forests,Segmentation,Time Series Analysis",,Often,,,Often,Often,Most of the time,Often,,Often,,,,,,,,,,,,,Often,,,Most of the time,,,,Often,,,,30,20,10,25,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,Often,Often,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,600000,ARS,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Other,Python,GitHub,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,DataTau News Aggregator,< 1 year,Nice to have,,Nice to have,,Nice to have,,,Nice to have,,Nice to have,,,,,Laptop or Workstation and local IT supported servers,,Master's degree,No,Master's degree,Computer Science,,"Data Analyst,Data Miner,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Gradient Boosting","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Denmark,34,"Not employed, but looking for work",,,,,,,,NoSQL,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,University courses,30,30,0,10,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important +Male,United Kingdom,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,35,Employed full-time,,,Yes,,Statistician,Poorly,Employed by college or university,TensorFlow,Deep learning,R,,"Arxiv,College/University,Official documentation,Online courses",Very useful,,Very useful,,,,,,,Very useful,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Programmer,Statistician",Self-taught,30,20,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,MATLAB/Octave,R,RapidMiner (free version)",,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests",,,,Sometimes,,Often,Often,Often,Sometimes,,,,,Sometimes,,Most of the time,,,,Often,Often,Often,Most of the time,,,,,,,,,,,40,25,10,20,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,,,,,,,Often,Most of the time,Sometimes,Sometimes,,51-75% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,18000,EUR,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Argentina,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Work,30,40,30,0,0,0,"Computer Vision,Unsupervised Learning",Decision Trees - Random Forests,A bachelor's degree,Internet-based,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Tableau",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Decision Trees,Neural Networks,Recommender Systems,Text Analytics",Most of the time,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,Rarely,,,,,Rarely,,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,,,,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,15,25,5,40,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Canada,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Often,Most of the time,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,39,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Self-employed,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Other",University courses,10,40,10,0,40,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,Fewer than 10 employees,Stayed the same,6-10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Always,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,SAS Enterprise Miner",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,Rarely,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",,,,Sometimes,Sometimes,Most of the time,Most of the time,Sometimes,,,,Often,,,Often,Most of the time,,,,Often,,,Often,Often,,Often,,Often,,Sometimes,,,,55,20,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,,Often,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Standalone Team,regulatory economic data,data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,100000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Netherlands,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,50,10,20,20,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Evolutionary Approaches,Logistic Regression",A doctoral degree,Academic,100 to 499 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Other","Text data,Relational data",Sometimes,1TB,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests","C/C++,Julia,Jupyter notebooks,Mathematica,Python,Spark / MLlib,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Most of the time,,,,,,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Prescriptive Modeling,Random Forests,Simulation,Text Analytics",,,,,,,,,Sometimes,Often,,,,,,,,,,Often,,,Sometimes,,,,Most of the time,,Often,,,,,30,30,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,26-50% of projects,Approximately half internal and half external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,27500,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Ukraine,27,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Programmer",Self-taught,30,20,30,0,20,0,"Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Telecommunications,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,RNNs","C/C++,Python,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,Often,,,,,,Sometimes,,Most of the time,,,,Sometimes,Often,,Often,,Sometimes,,,,Sometimes,Rarely,,,,25,35,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Sometimes,,,,,,,100% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,"Git,Mercurial",Rarely,36000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Singapore,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician,Other",University courses,20,20,30,20,10,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,10 to 19 employees,Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects",,,Very useful,,,,Very useful,,,,,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,50,20,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,Often,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Often,Most of the time,Often,Often,,,,,Often,,Often,,,,Often,Often,,Often,,,,,,Often,,,,,80,5,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,Sometimes,,Often,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Rarely,35000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Data Analyst,Kaggle competitions,0,30,0,0,70,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),Primary/elementary school,Government,100 to 499 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,10GB,Other,"SQL,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,Often,,"Cross-Validation,Data Visualization",,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,25,13,5,7,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Limitations of tools",Most of the time,Often,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Never,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,R,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Stack Overflow Q&A",,Very useful,,,Somewhat useful,,,,,,,,,Very useful,,,,,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,Researcher,University courses,75,5,0,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Government,Fewer than 10 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,RapidMiner (free version),SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Often,Most of the time,,,,,Sometimes,,Most of the time,,Often,Most of the time,Sometimes,Most of the time,Often,Most of the time,,,,Often,,Most of the time,Most of the time,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,Sometimes,,,,,,,Sometimes,,100% of projects,Entirely internal,Central Insights Team,open source text,maintaining feeds,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,Always,100000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,46,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by professional services/consulting firm,R,,R,GitHub,"Blogs,Official documentation,Online courses,Podcasts",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,,,"Data Stories Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,A humanities discipline,More than 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,60,20,0,0,20,0,Supervised Machine Learning (Tabular Data),Hidden Markov Models HMMs,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important,Not important +Male,Singapore,33,Employed full-time,,,No,Yes,Other,Fine,Employed by government,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,Other,10,10,0,0,0,80,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,26,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,Turkey,28,Employed full-time,,,Yes,,Researcher,Fine,,Jupyter notebooks,Anomaly Detection,Python,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Time Series,"Decision Trees - Random Forests,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Rarely,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,0,0,0,0,0,0,,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,TRY,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,Not Useful,,Not Useful,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Not Useful,Somewhat useful,Very useful,Very useful,,Not Useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,Software Developer/Software Engineer,University courses,70,20,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased slightly,Don't know,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,SVMs","Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,Often,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Naive Bayes,Natural Language Processing,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,,,Sometimes,Often,,,,,,,,Sometimes,Sometimes,Often,Sometimes,,,,40,30,14,14,2,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Privacy issues",Sometimes,,,,,,,,,Often,,,,,,,Sometimes,,,,,,100% of projects,Do not know,Other,,Not enough available data. ,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Other",Github,Git,Sometimes,,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Brazil,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Poorly,Self-employed,R,Association Rules,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,15,20,40,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Support Vector Machines (SVMs),A bachelor's degree,Other,,,,,Very important,Other,Traditional Workstation,Text data,Rarely,1MB,SVMs,"Jupyter notebooks,Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,,,,,,,,"Logistic Regression,Recommender Systems,SVMs",,,,,,,,,,,,,,,,Rarely,,,,,,,,Most of the time,,,,Most of the time,,,,,,50,10,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Cloud Platform,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,21,Employed part-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",85,10,0,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,10 to 19 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Video data,,,,"C/C++,Java,MATLAB/Octave,Python",,,,Rarely,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks",,,,Rarely,,Rarely,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,33,0,0,37,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,Sometimes,,,,,,,,,,,Often,,Rarely,,,,Less than 10% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Rarely,,,,7,,,,,,,,,,,,,,,,,, +A different identity,Other,99,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,I never declared a major,3 to 5 years,"Business Analyst,Data Analyst,Other",Work,60,0,40,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Non-profit,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Time Series Analysis",Sometimes,,,,,Most of the time,Often,Rarely,,,,Rarely,,Sometimes,Often,Sometimes,,,Sometimes,,Often,,,,,,,,,Often,,,,15,5,5,5,70,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Not Useful,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Operations Research Practitioner,Programmer,Other",University courses,30,10,0,50,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Japan,NA,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,,"Arxiv,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,,,,Very useful,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,10,10,40,40,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Internet-based,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,Java,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,Most of the time,,Often,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",,,Often,Most of the time,Rarely,Often,Often,Often,Often,,,Sometimes,Most of the time,Often,,Most of the time,Often,Most of the time,Most of the time,Most of the time,Most of the time,,Often,Rarely,Most of the time,Rarely,,Sometimes,Most of the time,,,,,0,60,20,20,0,0,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,Often,,10-25% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Tableau,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,80,5,10,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",No education,Government,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Sometimes,<1MB,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,Often,,Rarely,,,,,,,Sometimes,,,,80,5,0,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Most of the time,Most of the time,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,education,"lack of documentation use to produce data such as code book variables, methodolgy and other useful documents to understand data ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"5,000,000",XOF,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that doesn't perform advanced analytics,Mathematica,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,Somewhat useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,"Emergent/Future Newsletter (Algorithmia),Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,40,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",Often,Most of the time,,,,,,,Sometimes,,,,Sometimes,,,,Often,,,,Often,Sometimes,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,Often,Often,,,,Most of the time,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics",,,Most of the time,,,Often,Most of the time,Often,Often,,,,,,,Often,Often,,Most of the time,Most of the time,,,Sometimes,Sometimes,,,,Most of the time,Often,,,,,40,30,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,Often,Often,,,,,,,,Often,,,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,20,10,40,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Other,"10,000 or more employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Python",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Most of the time,,,Often,,,Often,,,,Often,,,,,Often,,,,,,,,,Most of the time,,,,40,20,10,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,,None,More internal than external,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,36000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Military/Security,"10,000 or more employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Other,,,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Argentina,46,Employed full-time,,,Yes,,Researcher,Poorly,Employed by professional services/consulting firm,R,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Official documentation,Personal Projects,Textbook",,,Very useful,,,,Very useful,Very useful,,Very useful,,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,1 to 2 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs",A master's degree,Technology,Fewer than 10 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Other,Other,Rarely,,,"C/C++,MATLAB/Octave",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,30,50,5,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,76-99% of projects,More internal than external,IT Department,,,,,,,,"20,000",USD,Has decreased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Japan,42,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Survival Analysis,Python,Google Search,"Online courses,Other",,,,,,,,,,,Very useful,,,,,,,,Talking Machines Podcast,< 1 year,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Programmer,Self-taught,10,90,0,0,0,0,"Recommendation Engines,Survival Analysis","Bayesian Techniques,Logistic Regression",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,Spain,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Researcher",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,67,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,Julia,,"Blogs,Non-Kaggle online communities,Stack Overflow Q&A",,Somewhat useful,,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,DBA/Database Engineer,Predictive Modeler,Software Developer/Software Engineer,Statistician",Work,30,0,70,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",Primary/elementary school,Pharmaceutical,10 to 19 employees,Increased slightly,More than 10 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Workstation + Cloud service",Other,Most of the time,<1MB,"Bayesian Techniques,Neural Networks,Random Forests","C/C++,Julia,Perl,R",,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,Most of the time,,Often,Often,,Sometimes,,,,,,,Sometimes,,,,10,30,40,0,20,0,Enough to refine and innovate on the algorithm,"Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Often,,Often,,Sometimes,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",,70000,GBP,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Chile,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Anomaly Detection,R,University/Non-profit research group websites,"Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,"Data Analyst,Researcher",University courses,30,10,10,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Most of the time,Often,,,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,20,20,0,40,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Sometimes,,,,Often,,,,,,,Often,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,Census; CASEN survey,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,18000000,CLP,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"GitHub,Google Search","Conferences,Kaggle,Newsletters,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Very useful,,Very useful,Very useful,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Unnecessary,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Natural Language Processing,Speech Recognition,Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Somewhat important,,Somewhat important,Somewhat important,,,,,,,,,,,, +Male,United Kingdom,57,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,Coursera,Traditional Workstation,2 - 10 hours,PhD,No,Bachelor's degree,Mathematics or statistics,,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Self-employed,IBM Watson / Waton Analytics,Deep learning,C/C++/C#,University/Non-profit research group websites,"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Tutoring/mentoring",,,,,,,Somewhat useful,,Somewhat useful,,Not Useful,Very useful,,,,,Very useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,10,30,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,1GB,"Decision Trees,Neural Networks","C/C++,Java,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau",,,,Often,,,,,,,,,,,Often,,,,,,Often,,Sometimes,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Decision Trees,Logistic Regression,Neural Networks,Text Analytics",,,,,,,,Often,,,,,,,,Sometimes,,,,Often,,,,,,,,,Most of the time,,,,,30,20,10,30,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team",,Often,,,,,,,Sometimes,,Rarely,,,,Rarely,Sometimes,,,,,,,26-50% of projects,More internal than external,Other,"EDI standard data sets, Regulatory guidelines",cleaning ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"70,000",USD,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,Python,Deep learning,Python,"Google Search,Government website,Other","Arxiv,Friends network,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",18,37,20,20,5,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Video data,Text data",Most of the time,10GB,"Bayesian Techniques,CNNs,GANs,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Orange,Python,RapidMiner (free version),Spark / MLlib,Tableau,TensorFlow",,,,,Sometimes,,,,Sometimes,,,,Sometimes,,Often,,Often,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,Often,,Sometimes,,Most of the time,,,,Often,,,,,,Most of the time,,,,Often,Most of the time,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Often,Often,Sometimes,Often,Often,Sometimes,Often,,Most of the time,,Most of the time,Sometimes,,Sometimes,Sometimes,Often,Most of the time,Most of the time,Often,Sometimes,Sometimes,Most of the time,Most of the time,Sometimes,,Often,Most of the time,Most of the time,,,,20,60,10,5,5,0,Enough to run the code / standard library,"Explaining data science to others,I prefer not to say,Lack of data science talent in the organization",,,,,,Often,Often,,Often,,,,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,I don't typically share data",,"Git,Other",Most of the time,700000000,IRR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,37,Retired,,,Yes,,DBA/Database Engineer,Fine,Employed by government,Oracle Data Mining/ Oracle R Enterprise,Social Network Analysis,SQL,GitHub,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,"Data Machina Newsletter,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,20,30,40,10,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,15,20,0,50,0,Natural Language Processing,"Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Always,1GB,"CNNs,Gradient Boosted Machines,HMMs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,I never declared a major,I don't write code to analyze data,Other,Self-taught,0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,68,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Random Forests,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Newsletters,Non-Kaggle online communities",,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Software Developer/Software Engineer",University courses,75,0,0,20,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased slightly,More than 10 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10MB,"Ensemble Methods,Neural Networks,Other","Java,Python,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",Sometimes,,,,,Often,Sometimes,,Most of the time,Most of the time,,,,Often,,,,,,,Often,Most of the time,,,,,,,,Often,,,,30,20,20,10,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,,,,,,,,Sometimes,,,Most of the time,Often,,10-25% of projects,Entirely internal,Business Department,kaggle,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,France,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,Government website,"Official documentation,Online courses,Textbook",,,,,,,,,,Very useful,Somewhat useful,,,,Very useful,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Scientist",Work,30,30,30,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",Primary/elementary school,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,<1MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,,,60,10,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,Geographic & country data ; currency & change data ; ,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,South Korea,26,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,,,,,Somewhat useful,"Jack's Import AI Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,,Self-taught,50,30,20,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Technology,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Cloudera,DataRobot,Python,R,TensorFlow,Unix shell / awk",,Sometimes,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics,Time Series Analysis",,,,Rarely,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Most of the time,Rarely,Most of the time,,Rarely,Rarely,Sometimes,Sometimes,,Most of the time,,Sometimes,,,,Rarely,Rarely,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Often,Often,,Often,Often,Often,Often,Often,,Often,Often,Often,,Often,Often,Often,Often,,51-75% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,140000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Machine Learning Engineer",University courses,30,0,0,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",High school,Telecommunications,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Python,R,Spark / MLlib,SQL",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Rarely,,,,Often,Most of the time,Often,Sometimes,Sometimes,,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Most of the time,,,Most of the time,,,Sometimes,Sometimes,,,,80,10,0,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,Often,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,Most of the time,,100% of projects,More external than internal,Standalone Team,US Census,Reliability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,134000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",Work,70,0,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Insurance,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,1TB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Python,Spark / MLlib,TensorFlow",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Text Analytics",Often,Often,,Often,Often,Most of the time,Most of the time,Often,,,,,,,,Often,,Often,,Often,,,Sometimes,Sometimes,,,,,Often,,,,,35,25,25,8,7,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,,,,Sometimes,,,,,,,,Most of the time,,,Often,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Canada,32,"Not employed, but looking for work",,,,,,,,Amazon Web services,Neural Nets,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Uplift Modeling,Python,"Google Search,Government website","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Data Analyst,University courses,0,0,0,100,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",Primary/elementary school,Financial,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,<1MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks",,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics",Often,Sometimes,,,,Often,Often,Often,,,,,,,Often,Sometimes,,,,,,Often,,,,Often,,,Sometimes,,,,,70,30,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,experian,none,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,USD,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Ireland,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,I never declared a major,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,20,20,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Rarely,10GB,"CNNs,Evolutionary Approaches,Neural Networks,RNNs","KNIME (free version),MATLAB/Octave,Python,R,SAS Enterprise Miner,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,Sometimes,,Most of the time,,,,Sometimes,Often,,Sometimes,,,,"CNNs,Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Time Series Analysis",,,,Often,,Often,,,,Sometimes,,,,Most of the time,,Often,,,,Most of the time,,,,,Often,,,,,Most of the time,,,,40,20,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,76-99% of projects,Do not know,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Israel,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,"Kaggle,Non-Kaggle online communities",,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,50,0,40,0,10,0,,,,Technology,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,,,,"Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,SQL",,,,,Often,,,,Often,,,,,Rarely,Often,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Random Forests,Recommender Systems",Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,50,30,0,0,20,0,Enough to run the code / standard library,"Dirty data,I prefer not to say",,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,Software Developer/Software Engineer,University courses,15,15,10,50,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,29,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,30,5,30,5,0,"Adversarial Learning,Computer Vision,Time Series","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,A bachelor's degree,Other,20 to 99 employees,Stayed the same,,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Never,,,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Siraj Raval YouTube Channel",< 1 year,,Necessary,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Canada,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Bayesian Methods,R,University/Non-profit research group websites,"Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,6 to 10 years,Researcher,University courses,50,25,0,25,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Other,Traditional Workstation,Relational data,Sometimes,,Regression/Logistic Regression,"MATLAB/Octave,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,"Association Rules,Decision Trees,Logistic Regression,Random Forests,Segmentation",,Rarely,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,Rarely,,,Rarely,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,51-75% of projects,Entirely internal,,,,,,,,,65000,CAD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,University courses,20,0,30,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,,"Amazon Machine Learning,Amazon Web services,Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Python,Deep learning,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,Somewhat useful,"Data Stories Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Statistician",University courses,35,10,30,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,CRM/Marketing,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100GB,"Decision Trees,Regression/Logistic Regression","SAS Base,SAS Enterprise Miner,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,Often,,,,,,Sometimes,Sometimes,Most of the time,,,,Rarely,Often,,,,,Most of the time,Sometimes,,,Sometimes,,,,40,20,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Sometimes,,,,Often,,Often,,,,Most of the time,,,,,Often,Often,,51-75% of projects,Entirely internal,Central Insights Team,Kaggle datasets,"volume of data, definition of columns","Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,2100000,INR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,"Employed by non-profit or NGO,Self-employed",Amazon Web services,Other,,University/Non-profit research group websites,"Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,,Self-taught,50,0,0,50,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Non-profit,,,,,Important,Other,Workstation + Cloud service,"Text data,Relational data,Other",Rarely,,,"Mathematica,MATLAB/Octave,NoSQL,Python",,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Data Visualization,kNN and Other Clustering,Naive Bayes,Neural Networks,RNNs,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,23,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Engineer",University courses,30,0,20,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,49,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Company internal community,Online courses",,,,Somewhat useful,,,,,,,Very useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,Computer Scientist,University courses,30,50,0,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Retail,"10,000 or more employees",Increased slightly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Rarely,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",Often,,,,,Most of the time,Sometimes,,Often,,,Most of the time,,Sometimes,,Rarely,,,,Sometimes,Sometimes,,Sometimes,,,Often,,,,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues",Sometimes,,,,Most of the time,,,,Most of the time,Often,,,,Sometimes,Often,,Sometimes,,,,,,10-25% of projects,More internal than external,Business Department,web crawled data,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Singapore,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",15,30,30,5,20,0,"Computer Vision,Natural Language Processing,Speech Recognition","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",,Pharmaceutical,20 to 99 employees,Increased slightly,3-5 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Amazon Web services,Python,R,TensorFlow,TIBCO Spotfire",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,Sometimes,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,Text Analytics",,Often,,Sometimes,,Often,Most of the time,Often,,,,,,Often,,,,,Often,Sometimes,,,Most of the time,,Often,Often,,,Often,,,,,30,30,30,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Spark / MLlib,Deep learning,R,Google Search,"Blogs,Online courses",,Very useful,,,,,,,,,Very useful,,,,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,University courses,30,0,0,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,Most of the time,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,,,,Sometimes,Most of the time,Often,Sometimes,,,Often,,,,Most of the time,,,,,Often,,Often,,,Often,Most of the time,Sometimes,,,,,,20,60,0,5,15,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Limitations of tools,Privacy issues,Scaling data science solution up to full database",,,,Sometimes,,,,,,,,,Sometimes,,,,Most of the time,Most of the time,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,115000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Other,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,Jupyter notebooks,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,55,0,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Not at all important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","Amazon Web services,Jupyter notebooks,Python,SQL",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Text Analytics",,,,,,Most of the time,Most of the time,,Most of the time,,,,,Sometimes,,Sometimes,,Sometimes,,Often,,,Often,Sometimes,,,,,Sometimes,,,,,25,25,25,25,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Often,,,,,,,Often,,,,,Sometimes,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,30000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,47,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A health science,6 to 10 years,Other,University courses,30,10,0,60,0,0,Survival Analysis,Logistic Regression,Primary/elementary school,Government,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,47,Employed full-time,,,Yes,,Predictive Modeler,Fine,Self-employed,Python,Social Network Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Personal Projects,Textbook",,,,,,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Miner,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler",Self-taught,50,20,20,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Support Vector Machines (SVMs),A master's degree,Financial,Fewer than 10 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,SVMs,"Java,SAP BusinessObjects Predictive Analytics,SQL,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,Most of the time,,,"kNN and Other Clustering,Lift Analysis,Segmentation,SVMs,Time Series Analysis",,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,Most of the time,,Most of the time,,Often,,,,5,70,20,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring",Often,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,150000,TRY,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,40,10,20,30,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,TensorFlow,TIBCO Spotfire",,,,,Rarely,,,,,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Rarely,Sometimes,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Simulation,SVMs,Time Series Analysis",Sometimes,,,,Sometimes,Often,,Sometimes,Often,,,Often,,,,Most of the time,,,,,,,Often,,,,Sometimes,Sometimes,,Most of the time,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Most of the time,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",,260000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,Very useful,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Computer Scientist,Data Scientist,Researcher,Other",Self-taught,80,20,0,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Perl,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Rarely,,Sometimes,,,,,Sometimes,,,,,,Often,Often,Most of the time,,,,Sometimes,,,Sometimes,,,,,,Rarely,Most of the time,,Often,,,,,,,,Most of the time,Most of the time,,,Sometimes,Often,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Text Analytics",Often,,,Sometimes,Often,Most of the time,Most of the time,Often,Often,,,Often,,,Sometimes,Often,,,Often,Often,Often,Often,Often,Often,Sometimes,,,,Often,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Sometimes,Often,,Often,,,,,,,,,,,Sometimes,,,,10-25% of projects,More internal than external,Other,Census; google analytics ,Warehouseing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,Other,Self-taught,20,5,25,50,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Logistic Regression,Markov Logic Networks",A professional degree,Non-profit,"1,000 to 4,999 employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,,10GB,Regression/Logistic Regression,"Amazon Web services,Other",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Logistic Regression,Time Series Analysis",,Often,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,10,20,30,10,30,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other",,Very useful,,,,,Very useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,,,,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,75,0,0,5,0,Supervised Machine Learning (Tabular Data),,High school,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,10GB,"Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,Recommender Systems,Time Series Analysis",,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,50,10,0,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Often,,,,,,,Often,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Online courses",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Miner,Predictive Modeler,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,Impala,QlikView,R,SAS Base,SQL",,Rarely,,,Often,,,,Sometimes,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics",,Often,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,,40,15,20,15,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Often,,Most of the time,,,,,Often,Sometimes,,Often,Most of the time,,,,,,Most of the time,,,,10-25% of projects,More external than internal,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,72000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,South Africa,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Other,Decision Trees,SAS,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Official documentation,Stack Overflow Q&A",,Very useful,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Miner,University courses,15,0,35,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Always,10GB,"Decision Trees,Regression/Logistic Regression","Java,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,,,,,"Collaborative Filtering,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,Rarely,,,Sometimes,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Rarely,,,,25,25,25,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Other",,,,,,,,,,,,,Often,,,,,,,,,Most of the time,10-25% of projects,More external than internal,Other,Census data,Getting it in a format that everybody is comfortable using,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,400000,ZAR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Norway,43,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Government website,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A doctoral degree,Other,20 to 99 employees,Stayed the same,Don't know,A general-purpose job board,Important,Other,Workstation + Cloud service,Relational data,,,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,50,20,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,10-25% of projects,Do not know,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Japan,31,Employed full-time,,,Yes,,Predictive Modeler,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Predictive Modeler,Software Developer/Software Engineer,Other",Self-taught,50,10,30,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,DataRobot,Hadoop/Hive/Pig,Python,R,SAS Base,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,Rarely,,Rarely,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Rarely,,,,Often,,,Rarely,Sometimes,,Sometimes,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation",,,,,,Often,Most of the time,Often,Often,,,Often,,,,Often,,,,,,,Often,,,Often,,,,,,,,30,10,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Other",Sometimes,Sometimes,,Sometimes,Sometimes,,,Sometimes,Often,,,,,,Often,,,,,,,Most of the time,26-50% of projects,More internal than external,Business Department,,"We typically work on our customer's fairly sensitive data, and we cannot store the data in modern cloud environment such as AWS, GCP, Azure etc. This substantially limits the capacity and flexibility of our data analysis work.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,7500000,JPY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Mexico,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Other",Self-taught,10,35,50,0,5,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Spark / MLlib,Tableau,Other",,,,,Sometimes,,,,Often,,,,,Often,,,Most of the time,,,,Rarely,,Rarely,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,,,,Sometimes,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Time Series Analysis",,Rarely,Often,,,Most of the time,Most of the time,Often,,Sometimes,,Often,,Often,,Sometimes,,Rarely,,,Most of the time,Often,Often,Often,,,,,,Most of the time,,,,40,20,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Most of the time,Often,Most of the time,,,,,Often,,,,,,,26-50% of projects,More external than internal,Other,INEGI;Kaggle; datos Mexico,Finding the right business question to ask and find an answer with the available data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Most of the time,"10,000",,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,41,"Not employed, but looking for work",,,,,,,,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,,,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"GPU accelerated Workstation,Traditional Workstation,Other",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,I never declared a major,3 to 5 years,Software Developer/Software Engineer,Kaggle competitions,20,0,0,0,80,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,I haven't started working yet,Work,5,35,60,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Internet-based,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,1TB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,QlikView,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Rarely,Most of the time,,,,,,,Often,,,,,,Often,,Often,,,,,,,,,,Most of the time,,,,Most of the time,Sometimes,,,,,,,,,Often,Most of the time,,,Rarely,Rarely,,Often,,,,"A/B Testing,Association Rules,Data Visualization,Neural Networks,Random Forests,Segmentation",Most of the time,Sometimes,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,Rarely,,,Rarely,,,,,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Most of the time,,Often,Most of the time,Sometimes,,Sometimes,,Rarely,Sometimes,,,,Sometimes,,,,Often,Most of the time,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Never,,EUR,Has increased 20% or more,,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Textbook",Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Machine Learning Engineer,Programmer",Self-taught,25,0,50,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,100GB,"Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs",Sometimes,,,,,Most of the time,Most of the time,,,,,Often,,,,Most of the time,,,Most of the time,Most of the time,Sometimes,,,,Most of the time,,,,,,,,,49,30,1,15,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Rarely,1500000,RUB,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United States,57,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,More than 10 years,"Business Analyst,Data Analyst",Self-taught,34,33,33,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Statistician",Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,Less than a year,Other,Work,0,0,50,0,0,50,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Image data,Sometimes,,"CNNs,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,,TensorFlow,Text Mining,Python,"Government website,University/Non-profit research group websites","Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"Linear Digressions Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,,"Data Analyst,Programmer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,"5,000 to 9,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R",,,,,,,,,,,,,,,,,Often,,,Rarely,Rarely,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,Sometimes,,,,Often,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,Often,,Most of the time,,,,,,,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Most of the time,,,,Often,,,,,,Often,,Often,,Sometimes,,,,100% of projects,More internal than external,Standalone Team,SPARCS; Healthfacts,Extracting the right number of features,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,145000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Anomaly Detection,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,I don't write code to analyze data,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician,I haven't started working yet",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Friends network,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,50,30,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Mix of fields,,,,,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Python,R,SQL,Tableau,Unix shell / awk",,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Rarely,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",,,,,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Sometimes,,,,,Most of the time,,Most of the time,Sometimes,,,,,,Sometimes,,,,40,20,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Most of the time,,,,,,Often,Most of the time,Often,Often,,,,Often,,,Most of the time,Most of the time,,,,51-75% of projects,Entirely internal,Standalone Team,,loading it on our laptops and running models on them. since there is no server cluster in place.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,172500,INR,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,38,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,GitHub,"Arxiv,Kaggle,Official documentation,Online courses,Personal Projects",Somewhat useful,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other",Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,45,0,0,5,0,Reinforcement learning,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,15,5,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Microsoft Azure Machine Learning,Python,SQL,TensorFlow",,Most of the time,,Often,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,,,,,Often,Most of the time,,Often,,,Most of the time,Most of the time,Often,,Sometimes,,,,,Often,Most of the time,,,,,10,50,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,Often,,,Most of the time,Most of the time,,Often,Most of the time,,Most of the time,Most of the time,,Often,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,"refining, transformation","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,I don't typically share data",,Bitbucket,Rarely,1600000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Data Elixir Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,6 to 10 years,"Data Scientist,Researcher",Self-taught,50,10,40,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Other,Always,10GB,"Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Decision Trees,Ensemble Methods,HMMs,Logistic Regression,Random Forests,SVMs,Time Series Analysis",,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,Most of the time,,,,,,,Often,,,,,Sometimes,,Often,,,,30,40,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources",Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,Physionet Databases,Data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,65000,CAD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Greece,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,DataCamp,Traditional Workstation,11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Germany,36,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,3 to 5 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,500 to 999 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Text Analytics",Sometimes,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,,,,50,25,5,10,10,0,Enough to run the code / standard library,"Dirty data,Privacy issues",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,100000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +,Other,25,"Not employed, but looking for work",,,,,,,,R,Regression,Python,University/Non-profit research group websites,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),0 - 1 hour,PhD,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,20,10,15,10,45,0,"Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,20+,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Czech Republic,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,56,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,"FlowingData Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,,Self-taught,50,50,0,0,0,0,Other (please specify; separate by semi-colon),,"Some college/university study, no bachelor's degree",Government,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not at all important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,,,"Jupyter notebooks,NoSQL,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,,,,,Rarely,,,,,,"Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,,20,0,0,0,80,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Most of the time,Most of the time,,,Most of the time,,,,,,Most of the time,Often,,,Often,,,,76-99% of projects,More external than internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,,9,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Cluster Analysis,Python,University/Non-profit research group websites,"College/University,Conferences,Official documentation,Online courses,Stack Overflow Q&A",,,Very useful,,Very useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Other",University courses,20,20,15,45,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,KNIME (commercial version),KNIME (free version),MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,Rarely,Rarely,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,Most of the time,,,,30,20,10,15,25,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Most of the time,,,,,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,100% of projects,More internal than external,Standalone Team,none,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,25000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Professional degree,,Less than a year,"Business Analyst,I haven't started working yet",University courses,30,20,0,50,0,0,Adversarial Learning,Decision Trees - Gradient Boosted Machines,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Canada,34,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Survival Analysis,R,"Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Very useful,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",University courses,30,20,20,30,0,0,,Logistic Regression,High school,Other,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Google Cloud Compute,Python,R,SQL",,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Decision Trees,Ensemble Methods",Rarely,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,10,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,,,,,,,,,,Often,,,,,Often,,Often,,10-25% of projects,More internal than external,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Company Developed Platform",,Git,Rarely,80000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,30,40,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Ensemble Methods,Random Forests","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,,Most of the time,Often,Most of the time,,,,,,,Often,,,,,Often,,Often,,,,,,,,,,,35,5,10,20,10,20,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,Sometimes,,,,,,Often,,,,,,Often,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Rarely,40000,EUR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Iran,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Tutoring/mentoring",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Master's degree,Engineering (non-computer focused),,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important +Male,United Kingdom,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by professional services/consulting firm,Statistica (Quest/Dell-formerly Statsoft),Deep learning,Python,Google Search,"Blogs,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,,,,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,50,0,20,0,0,30,"Reinforcement learning,Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Other,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10GB,Neural Networks,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Neural Networks",,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,0,70,0,10,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,,,,Most of the time,,,,,,,Often,,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,12000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,Python,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Sweden,34,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",,10TB,"Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,R,Spark / MLlib,SQL",,Most of the time,,,,,,,Sometimes,,,,,,Most of the time,,,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Collaborative Filtering,Decision Trees,Logistic Regression,Recommender Systems,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,10,0,10,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,132000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Chile,37,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses",,,,,,,Not Useful,,,,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Engineer,University courses,30,30,10,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100MB,Decision Trees,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,30,10,0,60,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Git,Rarely,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Brazil,50,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Amazon Machine Learning,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher,Other",Self-taught,40,60,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Important,Other,"GPU accelerated Workstation,Traditional Workstation",Image data,,100MB,CNNs,"Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,SVMs",,,,Most of the time,,Most of the time,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,30,50,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,,Sometimes,Often,Often,,Often,,,,,,,Often,Most of the time,,Less than 10% of projects,Do not know,Other,Kaggle,Usually is to find the dataset available,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,"Researcher,Other",Work,10,20,70,0,0,0,"Natural Language Processing,Reinforcement learning,Time Series",Logistic Regression,A bachelor's degree,Academic,100 to 499 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1GB,"Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,R",,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,Text Analytics",,,,,,,Most of the time,,,,,,,Rarely,,,,,Often,,,,,,,,,,Often,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,,Most of the time,Most of the time,,Often,Most of the time,,,,,,,Most of the time,,,Sometimes,Often,Most of the time,,100% of projects,More internal than external,Business Department,,Organization wanting to use insights,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),,"60,000",,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Impala,Neural Nets,Python,Google Search,"Arxiv,Friends network,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,Very useful,,,,,,,Very useful,Very useful,,,,Very useful,"FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,Business Analyst,Work,10,10,10,70,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,Often,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics",,,,,,,Often,Sometimes,,,,,,Often,,Often,,,Most of the time,Often,,,,,Sometimes,,,Often,Often,,,,,10,10,0,10,30,40,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization",Often,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,More external than internal,Business Department,none.,labels,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,120000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Argentina,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,Very useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,15,10,50,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A professional degree,Retail,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,Sometimes,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Sometimes,,,Sometimes,Rarely,,Rarely,,,Rarely,,Sometimes,Often,,Often,,,Sometimes,Sometimes,,,,20,30,10,15,15,10,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Need to coordinate with IT,Organization is small and cannot afford a data science team",Sometimes,Often,Sometimes,,Often,,,,,,,,,,Most of the time,Often,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,390000,ARS,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +A different identity,United States,70,Retired,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Programmer,Other",University courses,15,0,15,70,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,Natural Language Processing,Evolutionary Approaches,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,39,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Researcher,Other",University courses,10,50,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",A master's degree,Internet-based,10 to 19 employees,Increased slightly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,Sometimes,Often,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Often,,,,,,Often,,,,,,,,,Often,,,,,,,Often,,,,30,30,0,15,25,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,Often,,,Sometimes,,Sometimes,Often,,,Sometimes,,,51-75% of projects,Entirely internal,Standalone Team,Twitter; Facebook; Wikipedia; OpenStreetMap,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Rarely,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,"FastML Blog,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Natural Language Processing,,"Some college/university study, no bachelor's degree",Retail,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Lift Analysis,Natural Language Processing,Prescriptive Modeling,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,,,,,,,,,,,,Often,,,,Often,,,Sometimes,,,,,,,Often,Often,,,,40,20,0,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,Sometimes,,,Often,,Often,,,,Sometimes,,,,,Often,,,100% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,62000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by non-profit or NGO,SAS Base,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,50,0,25,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Rarely,1GB,"Decision Trees,Regression/Logistic Regression,SVMs","Google Cloud Compute,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Tableau,TensorFlow",,,,,,,,Rarely,,,,,,,,,,,,,,Often,Most of the time,Sometimes,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Often,Sometimes,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Simulation",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,45,10,5,10,30,0,,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,Often,,,,,,,,,,,Sometimes,,,,76-99% of projects,More internal than external,Standalone Team,Real estate; weather,Access,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,160000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Personal Projects",,,,,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",CRM/Marketing,"5,000 to 9,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,,,,"Google Cloud Compute,Java,NoSQL,Python,SAS Base,SQL",,,,,,,,Often,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Text Analytics",,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,90,0,0,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Often,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,,,Less than 10% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,,"Data Elixir Newsletter,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,Other,University courses,30,20,20,20,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,DataRobot,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Most of the time,,,,Rarely,,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,Rarely,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,Often,,,,,Most of the time,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Often,Most of the time,,Most of the time,Most of the time,Often,Most of the time,,Sometimes,Often,Sometimes,Often,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Often,Most of the time,Most of the time,Sometimes,Often,Most of the time,Sometimes,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,Often,Often,Most of the time,,,,Often,,,,Often,Often,Often,,Most of the time,,,,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,365000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",11 - 39 hours,PhD,Yes,Master's degree,"Information technology, networking, or system administration",,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Japan,34,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Python,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Blogs,Textbook",Very useful,Very useful,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,Self-taught,70,0,20,0,10,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",,Academic,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Sometimes,100MB,"CNNs,Neural Networks,RNNs","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Sometimes,,,,"CNNs,Natural Language Processing,Neural Networks,RNNs",,,,Often,,,,,,,,,,,,,,,Often,Most of the time,,,,,Often,,,,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Sometimes,,,,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Australia,34,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Matlab,University/Non-profit research group websites,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,Linear Digressions Podcast,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX","Traditional Workstation,Workstation + Cloud service",11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Business Analyst,Software Developer/Software Engineer,Other",Self-taught,50,10,0,0,40,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Romania,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,Google Search,"Arxiv,Blogs,Newsletters,Non-Kaggle online communities,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"Jack's Import AI Newsletter,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,90,10,0,0,0,0,"Adversarial Learning,Computer Vision","Neural Networks - CNNs,Neural Networks - GANs",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important +Male,Finland,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Bayesian Methods,Python,Google Search,"Arxiv,College/University,Personal Projects",Very useful,,Very useful,,,,,,,,,Very useful,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,,University courses,5,0,85,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I prefer not to answer,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Other",,1GB,Bayesian Techniques,"Jupyter notebooks,MATLAB/Octave,Python,R,Stan,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,,,Often,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Most of the time,,,Often,,,,,,,,,,Often,,,,,Sometimes,,,,,,Sometimes,,,Sometimes,,,,10,80,0,10,0,0,Enough to refine and innovate on the algorithm,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,22,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"O'Reilly Data Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,,Nice to have,,,Necessary,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,Very Important,,,Very Important,,,,,Somewhat important,Somewhat important,,, +Male,Canada,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Online courses",Somewhat useful,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,GPU accelerated Workstation,0 - 1 hour,Master's degree,No,Bachelor's degree,A humanities discipline,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,25,0,75,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",Very useful,,Very useful,,,,Very useful,,Very useful,,Very useful,,,Somewhat useful,,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,25,15,0,50,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,"5,000 to 9,999 employees",Increased significantly,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,HMMs,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,TensorFlow,Other",,,,Rarely,,,,,Most of the time,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,Often,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Simulation,SVMs",Rarely,,Sometimes,,Often,Often,Often,Sometimes,Sometimes,,,,Often,Often,,Often,,Often,Often,,,,Sometimes,Often,,,Often,Often,,,,,,25,30,20,15,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,Less than 10% of projects,More external than internal,Standalone Team,We get data from clients,The sheer size of the data,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Rarely,0,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher",University courses,35,0,5,25,35,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Manufacturing,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Never,,"Ensemble Methods,Gradient Boosted Machines,Random Forests","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests",,,,,,Sometimes,Most of the time,,Sometimes,,,,,Sometimes,,Rarely,,,,,,,Sometimes,,,,,,,,,,,50,10,0,35,5,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Never,47000,USD,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,57,Employed full-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,,,,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,More than 10 years,"Operations Research Practitioner,Statistician,Other,I haven't started working yet",University courses,10,0,10,80,0,0,Time Series,"Logistic Regression,Neural Networks - GANs",A doctoral degree,Non-profit,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),Text data,Rarely,,Regression/Logistic Regression,"C/C++,IBM SPSS Statistics,R,Tableau",,,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,20,0,0,0,50,"Enough to code it again from scratch, albeit it may run slowly","Organization is small and cannot afford a data science team,Other",,,,,,,,,,,,,,,,Sometimes,,,,,,Often,Less than 10% of projects,Entirely internal,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler",Work,20,50,20,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,,Often,,,,Most of the time,Rarely,Most of the time,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics",Often,Rarely,Sometimes,Sometimes,Often,Sometimes,Often,Often,Sometimes,,,Sometimes,,Often,,Most of the time,,Most of the time,Sometimes,Sometimes,Often,Sometimes,Often,Most of the time,,,,Sometimes,Often,,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Often,Sometimes,,Most of the time,Most of the time,,Sometimes,Most of the time,Most of the time,Most of the time,,Often,Often,Sometimes,Often,,,Often,Often,Sometimes,,51-75% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,2500000,INR,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Russia,26,"Not employed, but looking for work",,,,,,,,Julia,Survival Analysis,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Official documentation,Online courses,Textbook,Trade book",Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,,,,Very useful,Somewhat useful,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,DataCamp,Traditional Workstation,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Other,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,10,10,10,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Somewhat important,Somewhat important,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,44,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,Python,Other,"Blogs,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,More than 10 years,Data Analyst,Self-taught,60,30,10,0,0,0,Natural Language Processing,Bayesian Techniques,A master's degree,CRM/Marketing,"10,000 or more employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Most of the time,100MB,Bayesian Techniques,"Java,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,NoSQL,Python,QlikView,R,Spark / MLlib,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,,,Most of the time,,,,Most of the time,Most of the time,Often,,,,,,,,Most of the time,,,,,Often,Sometimes,,,,,"Data Visualization,HMMs,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,,Most of the time,,,,,,Often,,,,,Often,Most of the time,,Most of the time,,,,,,,Sometimes,Most of the time,,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues",,,,,,,,Sometimes,Often,,Often,Often,,,,,Often,,,,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Subversion,Sometimes,80000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Other,43,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Personal Projects,Textbook,YouTube Videos",Very useful,,Very useful,,Very useful,Somewhat useful,,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Other",Work,50,5,20,20,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Stayed the same,6-10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,"Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,Most of the time,,,,,Often,,Often,,Most of the time,Often,Often,Often,,Sometimes,,,Sometimes,Sometimes,Most of the time,Often,Most of the time,,,,50,20,5,20,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",,,,,Most of the time,Often,,,Often,,,Often,Often,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Siraj Raval YouTube Channel,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Predictive Modeler",University courses,80,0,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Female,United States,65,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,6 to 10 years,"Data Scientist,Researcher",Self-taught,70,10,10,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Other,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,23,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Neural Nets,R,Google Search,"Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,Very useful,Very useful,,,Very useful,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Reinforcement learning,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,Python,University/Non-profit research group websites,"College/University,Newsletters,Personal Projects",,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",70,0,0,30,0,0,Recommendation Engines,Neural Networks - CNNs,High school,Academic,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100MB,Neural Networks,"C/C++,Julia,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Python,R",,,,Rarely,,,,,,,,,,,,Rarely,Sometimes,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Data Visualization,kNN and Other Clustering,Neural Networks",,,,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,10,20,30,10,30,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,,Email,,,Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,20,10,40,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Financial,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,Tableau",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,Rarely,,,,,,Sometimes,,,,Sometimes,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,Sometimes,Often,Most of the time,Sometimes,Most of the time,,,Most of the time,,,,Often,,,Often,,,,Often,Sometimes,,Sometimes,Often,,Often,Often,,,,50,20,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,,,,,,,,,,,,Rarely,,,,Most of the time,,,76-99% of projects,Entirely internal,Other,Twitter; Intrinio,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,170000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Software Developer/Software Engineer,Other",University courses,25,30,20,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,Tableau,Other,Other",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,,Often,Most of the time,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Other",,Sometimes,Often,,,,Sometimes,Often,,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,,,Often,,,30,15,15,20,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Other,Kaggle; Data.gov,Cleaning and deciding the question to be answered from that data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,,,Other,7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,"Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,I prefer not to answer,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,10GB,,"Amazon Machine Learning,C/C++,Java,Microsoft Excel Data Mining,NoSQL,Python,R,SQL",Sometimes,,,Often,,,,,,,,,,,Often,,,,,,,,Sometimes,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,30,20,15,30,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Most of the time,,,,,,,Often,,,,,,,Less than 10% of projects,,,,,,Commercial Data Platform,,"Bitbucket,Git",,,INR,,6,,,,,,,,,,,,,,,,,, +Male,France,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Friends network,Kaggle,Tutoring/mentoring",,,,,,Very useful,Very useful,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Other,0,0,0,0,0,100,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Retail,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,Often,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,Sometimes,,,,,,Rarely,,Often,,,,,,,Often,,,,Often,,Sometimes,Sometimes,,,,55,20,5,10,10,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,,,Often,,,,,,Most of the time,,Often,,,,,100% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,"110,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Poland,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,30,30,10,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,10 to 19 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,Often,,,,Often,,Often,,,,Often,Often,,Often,,,,,Often,,Most of the time,,,,60,10,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,Sometimes,,,Often,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,36000,PLN,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Friends network,Official documentation,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Very useful,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,3 to 5 years,"Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,25,0,0,0,50,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,10 to 19 employees,Increased slightly,Don't know,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Relational data,Sometimes,,,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,76-99% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Never,50000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,France,40,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by company that makes advanced analytic software,Python,Deep learning,Python,Google Search,"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Miner,DBA/Database Engineer,Programmer",Self-taught,30,0,30,10,30,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Always,10GB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,Often,Often,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests",,,Often,,,Often,,,,,,Often,,,,Often,,Often,,,,,Often,,,,,,,,,,,15,45,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Privacy issues",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,10-25% of projects,Entirely internal,IT Department,opendata,Data cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Rarely,"60,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,United States,52,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Google Search,Government website",Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,Indonesia,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Israel,33,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,YouTube Videos,Other",Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Programmer,Self-taught,40,10,20,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Pharmaceutical,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression,RNNs","Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics,Time Series Analysis",,,Often,,,,Sometimes,Often,,,,,Sometimes,Rarely,,Sometimes,Rarely,Often,Sometimes,Rarely,Sometimes,,,Sometimes,,,,,Often,Sometimes,,,,40,40,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,Sometimes,,,,,Often,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,360000,ILS,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Statistician,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,College/University,,,Not Useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Programmer,Statistician",Self-taught,50,30,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,DataRobot,IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,R,SAS Base,SQL",,,,Rarely,,Rarely,,,,,Rarely,Rarely,,,,,,,,,Rarely,,Most of the time,,Rarely,Rarely,,,,,,,Sometimes,,,,,Rarely,,,,Rarely,,,,,,,,,,"Cross-Validation,Logistic Regression",,,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,10,10,10,40,30,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,,Business Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Bayesian Methods,R,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer",Work,15,70,15,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and private datacenters",Other,Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Random Forests",Often,,,,,Often,,Often,Often,,,Most of the time,,Sometimes,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,7,3,0,0,90,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,Sometimes,,,Sometimes,Sometimes,,Often,,,,,,,,Sometimes,,Often,,None,Entirely internal,Standalone Team,Credit history; Identity; ,Censorship; Low variance; unclear any inefficient definitions;,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,2400000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Spark / MLlib,Text Mining,Python,Google Search,"Arxiv,Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",Very useful,Somewhat useful,,Somewhat useful,,,Very useful,,,Not Useful,Somewhat useful,Very useful,,Very useful,Not Useful,Somewhat useful,,,"KDnuggets Blog,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,40,10,10,0,25,15,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Other,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data,Other",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Often,,,,,Rarely,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Often,,,Sometimes,Sometimes,,Often,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,Sometimes,Most of the time,Most of the time,Often,Often,,Sometimes,Often,,Sometimes,,Rarely,,,Sometimes,Sometimes,Sometimes,Sometimes,Often,Sometimes,Often,,Sometimes,Often,Often,Often,,,,40,20,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Sometimes,Most of the time,Often,,,Sometimes,,Sometimes,,,,Often,,Most of the time,,,Often,Most of the time,,76-99% of projects,More internal than external,IT Department,US Census Bureau; NOAA; SSA; data.gov; OpenStreetMap; Maryland Open Data Portal; Center for Transit-Oriented Development; Twitter; NIH; NLM; CMS,Inconsistency,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,"130,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Podcasts,Stack Overflow Q&A",Very useful,Very useful,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Talking Machines Podcast",3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",60,15,5,10,10,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,35,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by college or university,Tableau,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other","Laptop or Workstation and local IT supported servers,Traditional Workstation",0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Business Analyst,Software Developer/Software Engineer",University courses,5,20,5,60,10,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,16-20,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Finland,56,"Not employed, but looking for work",,,,,,,,TensorFlow,I don't plan on learning a new ML/DS method,Matlab,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,1-2 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Other,Self-taught,80,0,0,0,20,0,Natural Language Processing,Other (please specify; separate by semi-colon),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,,,,,, +Female,Netherlands,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,30,10,10,0,20,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Google Search,"Friends network,Official documentation,Stack Overflow Q&A",,,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,Statistician,Self-taught,50,0,40,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SAS Enterprise Miner,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,Rarely,Rarely,,,,,,,,,Sometimes,,,,"Decision Trees,Ensemble Methods,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,,Often,Sometimes,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,,,20,30,0,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Most of the time,,,Most of the time,,,Sometimes,,Often,,,,,Most of the time,Sometimes,,,100% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,"137,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Textbook,Other,Other",Somewhat useful,Very useful,,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,18,0,0,80,2,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",A master's degree,Other,500 to 999 employees,Stayed the same,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Never,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Other",,,,,,,,,Most of the time,,,,,,Sometimes,,Often,,,,,,,,,,Rarely,,,,Often,,Rarely,,,,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,"Cross-Validation,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,,,,,,,,Rarely,,Rarely,,,,,Sometimes,,Most of the time,,,,,Often,,,,,,60,15,0,5,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,,Most of the time,Sometimes,,,Often,,,,,,,,,,Most of the time,Sometimes,Rarely,,Less than 10% of projects,More internal than external,Other,"Insee, IGN",,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Spain,61,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Other,1 to 2 years,Programmer,Self-taught,65,0,20,15,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Academic,I don't know,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",50,10,20,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A master's degree,Telecommunications,"10,000 or more employees",Increased significantly,Don't know,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,Rarely,,Sometimes,Sometimes,,,Most of the time,,,,,,,,,,Often,Most of the time,,,Rarely,Rarely,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Random Forests",,,,,,Often,Often,Often,Often,,,Often,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,20,10,50,10,10,0,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,26-50% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,32000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,47,Employed part-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,Not Useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer,Other",Self-taught,90,10,0,0,0,0,Natural Language Processing,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Neural Networks","Amazon Web services,IBM SPSS Statistics,Java,Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,TensorFlow",,Rarely,,,,,,,,,,Often,,,Rarely,,Most of the time,,,,,,,,Sometimes,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Text Analytics",,,,,,,Often,,,,,,,,,Sometimes,,,Most of the time,,Often,,,,,Sometimes,,,Most of the time,,,,,20,10,10,5,20,35,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,Sometimes,,,Sometimes,Most of the time,,,Most of the time,,Often,,Often,Most of the time,,Most of the time,,,,,,,100% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,75000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,No,Yes,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Scala,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,,Professional degree,,More than 10 years,Software Developer/Software Engineer,Self-taught,50,25,25,0,0,0,Recommendation Engines,Bayesian Techniques,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Internet-based,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Self-employed,DataRobot,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,KDnuggets Blog",< 1 year,Necessary,,,,,Necessary,,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,"Business Analyst,Engineer",Self-taught,50,30,0,10,10,0,,,High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Not important,,,,,,,,,,,Very Important,Very Important,,,Somewhat important +Male,United States,45,Employed full-time,,,No,Yes,Researcher,Fine,Employed by non-profit or NGO,Python,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,"FastML Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,A health science,1 to 2 years,"Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",50,30,5,0,10,5,Outlier detection (e.g. Fraud detection),"Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Spain,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Researcher",Work,10,5,25,45,15,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,20 to 99 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Most of the time,,Often,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,Time Series Analysis",,,,,,Most of the time,,Often,,,,,,Sometimes,,Sometimes,,,,Most of the time,,,Most of the time,,Most of the time,,,,,Often,,,,50,30,5,10,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,"25,000",EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Singapore,29,Employed full-time,,,No,Yes,Other,Fine,Employed by government,Python,Neural Nets,Python,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,A health science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,0,10,,,A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,44,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,C/C++/C#,Google Search,"Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Not Useful,,,Very useful,,,Very useful,KDnuggets Blog,3-5 years,Necessary,Nice to have,Nice to have,Necessary,,Nice to have,,,Necessary,Necessary,Nice to have,,,,Traditional Workstation,0 - 1 hour,Kaggle Competitions,No,Master's degree,Computer Science,I don't write code to analyze data,"Data Analyst,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Unsupervised Learning,Neural Networks - CNNs,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important +Male,India,22,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,Jack's Import AI Newsletter,KDnuggets Blog",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,Workstation + Cloud service,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important +Male,United States,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,26,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Not Useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,Very useful,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst,Engineer,Software Developer/Software Engineer",University courses,20,0,80,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,500 to 999 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,Rarely,,Often,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Text Analytics,Time Series Analysis",Rarely,,Sometimes,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,Often,,,,,,,,,,,Sometimes,Often,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,Often,,,Often,,,76-99% of projects,Approximately half internal and half external,Business Department,geospatial data; population data; distance data; macro-economic data;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,25000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +A different identity,United States,34,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that performs advanced analytics,Julia,Genetic & Evolutionary Algorithms,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,Other","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,"Programmer,Software Developer/Software Engineer,Other",Self-taught,75,25,0,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Neural Networks - RNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important +Male,France,63,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Monte Carlo Methods,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook,Tutoring/mentoring",,,,,,,,,,,,Very useful,,,Very useful,,Somewhat useful,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),"Evolutionary Approaches,Other (please specify; separate by semi-colon)",A master's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10MB,"Evolutionary Approaches,Neural Networks","C/C++,Microsoft Excel Data Mining,SAS Base,SAS Enterprise Miner,SQL",,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,Sometimes,,,Often,,,,,,,,,,"Evolutionary Approaches,Neural Networks",,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,30,30,0,0,0,40,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools",,,,,,Sometimes,,Often,Most of the time,,,,Often,,,,,,,,,,None,Approximately half internal and half external,Standalone Team,,Predictive Modeling validation - Efficiency over time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Employed by college or university,TensorFlow,Genetic & Evolutionary Algorithms,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Friends network,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,"Emergent/Future Newsletter (Algorithmia),FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Machine Learning Engineer,Operations Research Practitioner,Researcher",Work,25,25,30,0,20,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data",Rarely,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,Often,,,,Most of the time,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,Sometimes,Most of the time,,Most of the time,,,,"A/B Testing,Data Visualization,Natural Language Processing,Neural Networks",Sometimes,,,,,,Often,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,Often,,,Often,,,,,,,,Often,,10-25% of projects,Do not know,IT Department,linkedin,human labour,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,75000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,R,Time Series Analysis,R,Government website,"Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Textbook,Other",Somewhat useful,Very useful,Very useful,,Very useful,,Somewhat useful,,,Somewhat useful,,,,,Not Useful,,,,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,60,0,30,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Bayesian Techniques,Random Forests","Java,Mathematica,MATLAB/Octave,NoSQL,Python,R,SQL,Stan",,,,,,,,,,,,,,,Rarely,,,,,Rarely,Sometimes,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,Sometimes,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,,Often,,,Rarely,,Rarely,,Often,,,,Sometimes,Most of the time,,Often,,,,,,,Most of the time,,,,50,25,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Often,,,,,,,,,,Most of the time,,,,Often,,,,10-25% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,500000,BRL,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Biology,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - RNNs,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Fine arts or performing arts,1 to 2 years,Other,University courses,50,0,0,50,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,30,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,40,0,10,30,0,,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important +Female,United States,32,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Textbook",Somewhat useful,,,,Somewhat useful,,,,,,,,,,Very useful,,,,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,40,0,60,0,0,0,Supervised Machine Learning (Tabular Data),"Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Sometimes,100MB,"Ensemble Methods,HMMs,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Python",,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",Often,,,,,Often,,Rarely,Rarely,,,,Rarely,Sometimes,,Sometimes,,Often,,,Sometimes,,Often,,,,,Often,,,,,,70,10,10,5,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",,,,,Most of the time,Often,,,Often,,,,,,,Often,Often,,,,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,I don't typically share data",,Git,Rarely,70000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,No,Yes,Statistician,,Employed by college or university,R,Factor Analysis,R,GitHub,"Blogs,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,,,Very useful,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Necessary,,,,,,,,,,,,,PhD,No,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Markov Logic Networks",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Time Series Analysis,SAS,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler",Self-taught,60,10,10,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Neural Networks,Regression/Logistic Regression","Angoss,IBM Cognos,IBM SPSS Statistics,Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,Rarely,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,Rarely,,,,Sometimes,,Rarely,,,,,Most of the time,Most of the time,,,Often,,,Often,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,Rarely,Rarely,Most of the time,,,,Sometimes,,,,,,Rarely,,,Often,Sometimes,,,,65,8,2,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,,Sometimes,Often,,Often,Often,Often,Most of the time,,Sometimes,,Sometimes,,,Often,Often,Often,,,Less than 10% of projects,More internal than external,Central Insights Team,n/a,cleanliness ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,100000,CAD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,India,24,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Social Network Analysis,Java,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Friends network",,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,< 1 year,Nice to have,Necessary,,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Other,Yes,Master's degree,Computer Science,Less than a year,,University courses,40,20,0,40,0,0,Natural Language Processing,"Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Colombia,32,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Spark / MLlib,Text Mining,R,"GitHub,University/Non-profit research group websites","Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,University courses,30,50,0,0,20,0,"Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Python,R",,,,,,,,,,,,,,,,,,,,,Often,,,Rarely,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Neural Networks,Random Forests,Recommender Systems",,,,,,,,,,,,,,,,Often,,,,Most of the time,,,Often,Most of the time,,,,,,,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,Often,,,Most of the time,Often,,,,,,,,,,,Sometimes,,51-75% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,42000000,COP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Spain,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,Not Useful,,,Somewhat useful,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Researcher",University courses,50,20,20,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Impala,NoSQL,Python,R,SQL",,Sometimes,,Sometimes,Sometimes,,,,Often,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics",Rarely,Rarely,Rarely,,Sometimes,Most of the time,Most of the time,Often,Sometimes,,,Often,,Often,Often,Sometimes,,Rarely,Sometimes,,Often,,Often,Often,,Often,Rarely,Sometimes,Often,,,,,50,15,10,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,Sometimes,Most of the time,,,,,Often,,,,Most of the time,Most of the time,,Sometimes,,,Often,Most of the time,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,30000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,41,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Fine,Self-employed,Microsoft Azure Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,More than 10 years,"Business Analyst,Data Miner,Predictive Modeler",Self-taught,50,30,10,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Financial,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Other",Sometimes,10MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Enterprise Miner,SQL,TensorFlow",,,,Rarely,,,,,,,,Sometimes,,,,,Sometimes,,,,,,Most of the time,Often,Sometimes,,,,,,Sometimes,,Most of the time,,,Often,,,,,,Sometimes,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,RNNs,Simulation,Time Series Analysis",,,,Sometimes,,Sometimes,Often,Often,Often,,,Often,,,,Often,,,,Sometimes,,,Often,,Sometimes,,Often,,,Sometimes,,,,50,30,10,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,60000,AUD,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Poland,24,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,30,10,0,20,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,QlikView,RapidMiner (free version),Tableau,TensorFlow",,,,,Sometimes,,Sometimes,,Often,,,,,Sometimes,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,Rarely,,,Sometimes,,,,,,,,,,Sometimes,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,Often,Often,,Often,,,,,,,,,,,30,20,10,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Often,,,,,,,,Often,,,,,,,,Often,,,,Often,,26-50% of projects,More internal than external,Standalone Team,Bimbo; Kaeser; Kaggle industrial datasets,Organizing and standarizing it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Rarely,28400000,COP,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Egypt,49,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,,,Nice to have,Unnecessary,Nice to have,Unnecessary,,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,40,30,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important,Not important,Not important +Male,Italy,42,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,R,Survival Analysis,R,Google Search,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Engineer,Programmer",Self-taught,30,20,30,10,0,10,"Survival Analysis,Time Series",Evolutionary Approaches,High school,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Relational data",Always,1GB,Decision Trees,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Decision Trees,Prescriptive Modeling",,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,30,30,5,10,25,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,,Business Department,,,,"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,Brazil,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by college or university,TensorFlow,Genetic & Evolutionary Algorithms,R,GitHub,"College/University,Friends network,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Somewhat useful,,,,,,,,Very useful,,,Very useful,Very useful,"Data Stories Podcast,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Engineer,Self-taught,40,10,0,40,10,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Relational data,Sometimes,1GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks,SVMs","Amazon Machine Learning,C/C++,Cloudera,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),Spark / MLlib,Tableau,TensorFlow",Sometimes,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,Rarely,,Often,,,,,,,Often,,Most of the time,,Often,,,,,,Often,,,,Sometimes,Often,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests,SVMs",,Often,Sometimes,,,,Often,Often,,,Sometimes,,,Often,,,,Often,,Often,,,Often,,,,,Often,,,,,,0,30,10,40,20,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Often,,,,Sometimes,Sometimes,,Often,Sometimes,,,Sometimes,,,,Sometimes,,,10-25% of projects,Do not know,Standalone Team,uci,size challanges,"Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,50000,RSD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,55,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Random Forests,R,University/Non-profit research group websites,"Blogs,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Data Miner,Other","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Retail,"10,000 or more employees",Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,,,,,,Sometimes,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,,Often,,,Often,,,,60,5,5,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Most of the time,,,,Most of the time,,Often,Most of the time,,,,,Most of the time,Most of the time,,Most of the time,Most of the time,Often,,Often,,51-75% of projects,Approximately half internal and half external,Other,mostly proprietary data,"size, no cloud services allowed so far","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,100000,CAD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Social Network Analysis,R,University/Non-profit research group websites,"Blogs,College/University,Conferences,Official documentation,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,Not Useful,,,,,Very useful,,,,Somewhat useful,Very useful,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,40,10,20,20,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - GANs",Primary/elementary school,Internet-based,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Impala,Java,Mathematica,NoSQL,Python,R,Spark / MLlib,Unix shell / awk,Other,Other,Other",,Sometimes,,,,,,,Sometimes,,,,,Sometimes,Most of the time,,,,,Sometimes,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,,,Most of the time,Sometimes,Sometimes,Sometimes,"Bayesian Techniques,Data Visualization,Decision Trees,GANs,Logistic Regression,Markov Logic Networks,Naive Bayes,Random Forests,Time Series Analysis",,,Sometimes,,,,Often,Sometimes,,,Rarely,,,,,Rarely,Rarely,Rarely,,,,,Sometimes,,,,,,,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Often,Sometimes,,Often,Sometimes,,Sometimes,Most of the time,,,,Often,,,Most of the time,Often,Often,Often,,Sometimes,,51-75% of projects,Entirely external,IT Department,enterprise clients,Multiple data sources; scalability; properly analyse;,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Company Developed Platform",,"Git,Other",Most of the time,"7,000,000",VEF,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,More than 10 years,"Researcher,Statistician",Self-taught,70,0,0,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Java,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,Somewhat useful,,,,,Very useful,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Data Analyst,Programmer",Self-taught,50,10,20,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,500 to 999 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression",Mathematica,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,Often,Often,,,Sometimes,,,,Often,,Often,Often,,,,25,25,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,Sometimes,,,Often,,,Sometimes,,,,Often,,Rarely,,,,,,,Sometimes,,76-99% of projects,More external than internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,,Sometimes,75000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Mexico,22,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Google Cloud Compute,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Official documentation,Stack Overflow Q&A,Textbook",,,,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer",Work,30,10,60,0,0,0,"Adversarial Learning,Computer Vision","Bayesian Techniques,Decision Trees - Random Forests",Primary/elementary school,Manufacturing,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,"Decision Trees,Random Forests,RNNs","Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,Sometimes,,,,Rarely,,,,,,Often,,,,,Rarely,,,Sometimes,,,,,,,"Collaborative Filtering,Data Visualization,Decision Trees,Text Analytics,Time Series Analysis",,,,,Most of the time,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,30,10,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",Most of the time,,Sometimes,,,Often,,,Often,,,,,Most of the time,Most of the time,,Often,Most of the time,,,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,240000,MXN,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Official documentation",Very useful,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,10,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",I don't know/not sure,Financial,Fewer than 10 employees,Increased slightly,Less than one year,A tech-specific job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Python,SQL",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing",,,,,,Most of the time,Often,Sometimes,Sometimes,,,Sometimes,,Rarely,,,,,Sometimes,,,,,,,,,,,,,,,30,30,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Rarely,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,48000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher",Self-taught,95,0,5,0,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased significantly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook",,Somewhat useful,,,,,Very useful,,,,,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",Kaggle competitions,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,France,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Computer Scientist,Researcher",University courses,35,35,0,25,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,Increased slightly,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,,10GB,Random Forests,"C/C++,Python,R,TensorFlow,Other",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,Sometimes,,,Often,,,,Often,,,Most of the time,,Most of the time,,,Sometimes,,Most of the time,,,,60,10,0,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,,,,,,,,,,,,,Often,Often,,26-50% of projects,Entirely internal,Other,,Noise ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,"75,000",USD,Other,8,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,"DataTau News Aggregator,KDnuggets Blog",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",5,NA,0,0,95,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,38,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,SAS JMP,Bayesian Methods,R,Google Search,"Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,Very useful,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Statistician,University courses,80,0,0,15,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,1MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,Rarely,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Simulation",,,,,,Often,,Sometimes,Sometimes,,,,,,,Often,,,,,,,Sometimes,,,,Often,,,,,,,40,25,5,5,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,,"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,45000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Vietnam,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,United States,59,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,Other,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),"College/University,Conferences,Official documentation,Online courses,Stack Overflow Q&A",,,Very useful,,Very useful,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher,Other",University courses,60,10,0,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Evolutionary Approaches",High school,Academic,"1,000 to 4,999 employees",Decreased slightly,Don't know,Some other way,Very important,Other,Traditional Workstation,Text data,Never,10MB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Minitab,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,Often,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Simulation,Time Series Analysis",,,Often,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,40,30,0,30,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,100% of projects,More internal than external,Other,NA,Separating the usable data from the noise.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"100,000",USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,75,25,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,57,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by non-profit or NGO,MATLAB/Octave,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Friends network,Kaggle,Online courses,YouTube Videos",Somewhat useful,,,,,Very useful,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,"Engineer,Programmer,Researcher",Self-taught,80,10,0,0,10,0,"Survival Analysis,Time Series",Decision Trees - Gradient Boosted Machines,A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,India,27,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Friends network,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,Less than a year,I haven't started working yet,Self-taught,20,30,50,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Online courses,Personal Projects,Podcasts",,,Not Useful,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,"Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,40,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Neural Nets,Python,"Google Search,Government website","Kaggle,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Not Useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,QlikView,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,Often,Rarely,Rarely,,,,,,,,,Most of the time,,,Often,Sometimes,,,,,,"A/B Testing,Data Visualization,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems",Often,,,,,,Most of the time,,,,,,,,,,,Sometimes,Sometimes,Rarely,,,,Sometimes,,,,,,,,,,40,20,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,Most of the time,Sometimes,,,,,Sometimes,Most of the time,,Sometimes,,,,,Most of the time,,,,,,,51-75% of projects,Entirely internal,IT Department,worldbank,data cleanup,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Other",Never,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",University courses,40,20,10,20,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,,"CNNs,Evolutionary Approaches,GANs,Neural Networks,RNNs","C/C++,Java,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,,,Often,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Rarely,,,,,,Often,,,,"CNNs,Evolutionary Approaches,GANs,RNNs",,,,Sometimes,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,10,10,10,50,20,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"DataCamp,Udacity,Other","Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Not important,,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,India,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Self-employed,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,Google Search,"Arxiv,Blogs,Kaggle,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,,,Very useful,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Programmer,Self-taught,70,20,0,0,10,0,"Adversarial Learning,Computer Vision","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,10GB,"CNNs,GANs,Neural Networks,RNNs,Other","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,GANs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",,,,Most of the time,,,Often,,,,Often,,,,,Sometimes,,,,Most of the time,Sometimes,,,Sometimes,Often,,,,Rarely,Rarely,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Often,,,,Often,Often,,Most of the time,,,,Sometimes,,,,Sometimes,,,10-25% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Other",Sometimes,650000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Canada,34,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,20,20,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Orange,Python,SQL,TensorFlow,Unix shell / awk,Other,Other",,,,,,,,Most of the time,,,,,,,Often,,Most of the time,,,,,,,,,,Often,,Sometimes,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,Sometimes,Often,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",Sometimes,,Often,,,Most of the time,Most of the time,Often,Often,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,Most of the time,,Often,,,Sometimes,,Often,,Often,,,,40,10,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Most of the time,,,,Most of the time,,,,Often,Sometimes,Often,,,,Most of the time,Most of the time,Most of the time,,,,,,76-99% of projects,Entirely external,Standalone Team,,Obtaining and cleaning source data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,100000,CAD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Colombia,48,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Poorly,Self-employed,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Master's degree,,Less than a year,"Programmer,Software Developer/Software Engineer,Other",Self-taught,80,0,0,0,20,0,Computer Vision,Logistic Regression,A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,6 to 10 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"5,000 to 9,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Time Series Analysis",Often,,Sometimes,,,Most of the time,Often,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Random Forests,,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Researcher",University courses,35,5,40,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression,Markov Logic Networks",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Never,,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,Rarely,,,Rarely,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Markov Logic Networks,Naive Bayes,Prescriptive Modeling,Segmentation,Simulation,Text Analytics,Time Series Analysis",Rarely,,,,,Rarely,Most of the time,,,,,,,,,Often,Sometimes,Rarely,,,,Often,,,,Often,Often,,Sometimes,Sometimes,,,,10,10,0,10,10,60,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,,Most of the time,Sometimes,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,,Sometimes,175000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,52,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,Python,Government website,"Blogs,Online courses",,Somewhat useful,,,,,,,,,Very useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Other,"10,000 or more employees",Stayed the same,6-10 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests","C/C++,Java,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Perl,Python,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,Often,Sometimes,,Often,,Rarely,,,Rarely,Often,,,,,,,,,,,Often,,,,,,Often,,,,"Bayesian Techniques,Decision Trees,Naive Bayes",,,Often,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,,,Sometimes,,,,Often,Sometimes,Most of the time,,Rarely,,,,51-75% of projects,Do not know,Standalone Team,UK Gov NHS datasets,Getting anonymised data - reluctance to make public due to privacy concerns,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,100000,GBP,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,R,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Podcasts,Stack Overflow Q&A",,,,,,,,,,,,,Somewhat useful,Very useful,,,,,"Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst",Work,45,15,40,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,Often,,Rarely,,,,Often,Often,Sometimes,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,35,5,40,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Sometimes,Often,,,Most of the time,,,,Sometimes,,,,,Often,Often,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,"Some ecommerce data, ",inconsistant and dirty. Cleaning and then confirming the data takes forever ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Other,Rarely,83000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Other,22,Employed part-time,,,Yes,,Data Analyst,Poorly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,41,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,SQL,Government website,"College/University,Kaggle,Official documentation,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,Very useful,Somewhat useful,,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Other,Self-taught,15,25,50,10,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,Mix of fields,"5,000 to 9,999 employees",Decreased slightly,1-2 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Always,10GB,Decision Trees,"QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,Often,,,Rarely,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,Often,,,,,,,,,,Often,,,,,50,25,5,5,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Most of the time,,,Most of the time,Sometimes,,Most of the time,Often,,,,,,,,,,Often,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Other,Most of the time,105000,CAD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",Other,90,0,0,0,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",High school,Manufacturing,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,Random Forests,"Python,QlikView,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,"GitHub,University/Non-profit research group websites,Other","Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Other,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics,Employed by government",Mathematica,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,,,,,,,Very useful,Somewhat useful,"Data Machina Newsletter,Linear Digressions Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Other,University courses,0,50,15,29,0,6,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Academic,,,,,Important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Other,Always,1GB,"Bayesian Techniques,Neural Networks","Java,Mathematica,Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,Sometimes,,,,,,,Often,,Often,,Rarely,,,,,,,,,,,Often,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Prescriptive Modeling,Segmentation",,Often,Often,,,Sometimes,,,,,,Sometimes,,,,Often,,Often,,,,Most of the time,,,,Most of the time,,,,,,,,30,40,10,10,10,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,Sometimes,,,,,,,,,,,,,,,Often,,,Often,,76-99% of projects,More external than internal,Business Department,Dataset of public depertment; markeing open dataset,Test the model predictive,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Bitbucket,Git",Sometimes,15000,PEN,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Other,Time Series Analysis,Python,Other,"Kaggle,YouTube Videos,Other",,,,,,,Somewhat useful,,,,,,,,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Other (please specify; separate by semi-colon),A master's degree,Internet-based,"10,000 or more employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1TB,"Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,NoSQL,Perl,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,Rarely,Sometimes,,,Rarely,Sometimes,,Often,,,,,,,,,Most of the time,,,Most of the time,,,Often,,,,"Data Visualization,Random Forests,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,Most of the time,,,,10,15,15,30,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,76-99% of projects,Entirely internal,Business Department,none,Lack of understanding on how to build useful models,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,225000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,49,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Other,Link Analysis,R,"Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos,Other",,,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,,Not Useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,0,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Other","Java,KNIME (free version),Microsoft Excel Data Mining,Python,QlikView,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,Most of the time,,,,,,,,Sometimes,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,Sometimes,Often,,,Often,,Often,,,,,,,,Most of the time,Most of the time,,,,30,10,10,10,10,30,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,Most of the time,,,,,,Sometimes,,,Often,,Often,,Often,Most of the time,,Often,,,76-99% of projects,Approximately half internal and half external,Business Department,"Weather data, census data, GIS.",Joining data from different sources about the same thing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Other",University courses,10,20,20,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,Most of the time,,,,,,,,Often,,,,,Rarely,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Often,,Often,,,,Most of the time,Most of the time,Often,,,Sometimes,,Sometimes,Often,Often,,Sometimes,,Rarely,,,Often,Most of the time,,,,Sometimes,,Most of the time,,,,35,15,5,5,5,35,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,Often,Most of the time,Most of the time,Most of the time,,Most of the time,Sometimes,Often,,,Often,Often,Most of the time,,,Most of the time,Most of the time,Often,Often,,10-25% of projects,More internal than external,Business Department,none,incomplete and unclear,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Google Sheets,Bitbucket,Rarely,100000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Ukraine,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,,,,,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,0,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,,,High school,Pharmaceutical,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,,,"Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,DBA/Database Engineer,Programmer",Self-taught,45,30,20,5,0,0,Time Series,Logistic Regression,Primary/elementary school,Financial,"10,000 or more employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Rarely,,,,,,,Most of the time,Rarely,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Often,Often,,Most of the time,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,,NA,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed part-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,,Self-taught,75,10,0,15,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,United Kingdom,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,I haven't started working yet,Self-taught,50,50,0,0,0,0,Survival Analysis,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Other,Fine,Employed by government,R,Random Forests,Python,"GitHub,University/Non-profit research group websites","College/University,Online courses,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,,University courses,10,15,25,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Government,"10,000 or more employees",Decreased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,1MB,"Bayesian Techniques,Regression/Logistic Regression,Other","Julia,MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Simulation,Time Series Analysis,Other",,,Sometimes,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Often,,,Sometimes,Often,,,5,10,10,5,10,60,Enough to refine and innovate on the algorithm,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,10-25% of projects,More external than internal,Other,,,Column-oriented relational (e.g. KDB/MariaDB),"Email,Share Drive/SharePoint",,Git,Rarely,135000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,70,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Rule Induction,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Podcasts,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,60,0,20,20,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",High school,Hospitality/Entertainment/Sports,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Most of the time,10MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks","C/C++,NoSQL,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Natural Language Processing",Often,,Often,Often,,Often,Often,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,40,30,15,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools",,,,,Often,,,,,,,Sometimes,Rarely,,,,,,,,,,100% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Other,Most of the time,,,,7,,,,,,,,,,,,,,,,,, +Female,Other,20,Employed part-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,I haven't started working yet",Other,0,50,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,25,0,0,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Other,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Bayesian Methods,R,GitHub,"College/University,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,90,0,5,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,,,,,,, +Male,Other,24,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,R,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Not Useful,,,,,,,Not Useful,,3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,40,0,0,50,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,Spain,44,"Not employed, but looking for work",,,,,,,,,Cluster Analysis,R,,"Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,30,15,20,5,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,United States,70,Retired,,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,60,0,10,0,0,30,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Self-employed,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Other",,,,,,,Very useful,,,,,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,,Less than a year,Statistician,"Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",United States,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,I don't plan on learning a new ML/DS method,R,Google Search,"Blogs,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Very useful,,Very useful,,,,,,Very useful,,Very useful,,Very useful,Very useful,Very useful,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,More than 10 years,"Data Scientist,Researcher",Self-taught,30,0,65,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Pharmaceutical,"10,000 or more employees",Decreased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Rarely,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Often,,,,,,Often,,,,"Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,Rarely,Often,,,Often,,Often,,Often,,,,Often,,Often,Often,,,,40,20,2,18,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",,,,,Often,,,Often,Sometimes,,,,,,Often,,Most of the time,,,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Other",google Drive,Git,Rarely,235000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Very useful,,Very useful,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,Researcher,Self-taught,80,0,10,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,Most of the time,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,Most of the time,,,,,Often,,Sometimes,,,,,,,,,,,60,20,0,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,Most of the time,,,,,Sometimes,,,,,,,,,,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,SAS Base,,R,Google Search,"Blogs,Textbook,Tutoring/mentoring,YouTube Videos",,Not Useful,,,,,,,,,,,,,Somewhat useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Electrical Engineering,I don't write code to analyze data,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,India,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Stack Overflow Q&A",Very useful,,,,,,,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Technology,,,,,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,HMMs,Markov Logic Networks,Regression/Logistic Regression","Java,Julia,NoSQL,Orange,Python,TensorFlow",,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,Most of the time,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Association Rules,HMMs,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,Most of the time,,,,,,,,,,,Often,Most of the time,,,Sometimes,Often,Most of the time,,Sometimes,,,Often,,,,,Most of the time,,,,,10,60,10,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,,Often,Most of the time,Often,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Flume,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Professional degree,,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Female,Russia,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,No,Master's degree,Computer Science,,"Computer Scientist,Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Kenya,27,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,Python,GitHub,"Blogs,College/University,Stack Overflow Q&A",,Very useful,Very useful,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",50,10,10,30,0,0,Time Series,"Bayesian Techniques,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,France,28,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Fine arts or performing arts,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Somewhat useful,,,,,,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Researcher",Self-taught,90,0,0,5,5,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,SVMs","DataRobot,Mathematica,MATLAB/Octave,Python,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),SQL,TensorFlow,TIBCO Spotfire",,,,,,Often,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Sometimes,Sometimes,Most of the time,Sometimes,Most of the time,,,,,,,Most of the time,,,,Rarely,Sometimes,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Simulation,SVMs,Time Series Analysis",,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,20,20,20,20,20,0,Enough to tune the parameters properly,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Sometimes,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,31,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Somewhat important +Male,Russia,24,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Other,Self-taught,50,20,10,15,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Gradient Boosting,Logistic Regression",A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,27,Employed part-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Business Analyst,University courses,0,0,20,80,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,"Not employed, but looking for work",,,,,,,,Oracle Data Mining/ Oracle R Enterprise,Factor Analysis,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Other,Other,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,I don't write code to analyze data,I haven't started working yet,University courses,0,20,0,80,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,Belarus,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Textbook",,,Somewhat useful,,,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Software Developer/Software Engineer",University courses,0,20,10,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,10 to 19 employees,Increased slightly,1-2 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10MB,"Neural Networks,SVMs","Google Cloud Compute,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,SVMs,Time Series Analysis",,,,,,Often,Often,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,Most of the time,,Often,,,,20,10,70,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,Bloomberg,Old and inconvenient API,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Never,1080000,RUB,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,36,"Not employed, but looking for work",,,,,,,,TensorFlow,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",10-15 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,40,30,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Very Important,Not important,Not important +Male,Argentina,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Friends network,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,30,40,10,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL",,Most of the time,,,,,,,,,,,,,Often,,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Often,Often,Sometimes,,,,,,Often,,Often,,,Most of the time,,Sometimes,,Often,,,,,,Most of the time,,,,,0,40,20,20,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,Often,,,,,,Often,,,26-50% of projects,More external than internal,Central Insights Team,Twitter data,"Twitter data is messy, lot's of SPAM and difficult to decide how to handle retweets","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,60000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Newsletters,Online courses,Textbook,YouTube Videos",,,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,30,40,30,0,0,0,Natural Language Processing,Other (please specify; separate by semi-colon),A doctoral degree,Mix of fields,500 to 999 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,,100MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,TensorFlow,Other,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Often,,,,,,,,,,,,,,,Rarely,,,Rarely,Often,,"Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,Sometimes,Often,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,40,20,20,0,20,0,Enough to run the code / standard library,"Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,,,,,,,,Often,,Sometimes,,,,Less than 10% of projects,Entirely internal,Standalone Team,-,Too dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Rarely,"864,000",RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,,,A bachelor's degree,Internet-based,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,"Google Search,University/Non-profit research group websites",Online courses,,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity",Other,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,South Africa,38,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,SQL,GitHub,"College/University,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,,,,,Very useful,Very useful,,,Very useful,"FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",33,65,2,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,"Some college/university study, no bachelor's degree",Insurance,"5,000 to 9,999 employees",Increased significantly,Don't know,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,100TB,Bayesian Techniques,"Amazon Web services,Hadoop/Hive/Pig,Python,R",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,20,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,,10000,AED,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,DBA/Database Engineer,Researcher,Statistician",University courses,10,0,20,70,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",A professional degree,CRM/Marketing,"10,000 or more employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,NoSQL,Python,R,SQL",,,,Often,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,Often,,Often,,,Rarely,,,,,,,,Most of the time,Most of the time,,,,40,20,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,,Sometimes,,,,,,Often,,,,,Often,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,Census data ; social media,IT information security policies ; Many layers between DS team and data providers,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,90000,BRL,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Regression,R,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Never,,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Time Series Analysis",Rarely,,,,,,Most of the time,Rarely,,,,Rarely,,,,Rarely,,,,,,,,,,,,,,Often,,,,9,5,5,41,40,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,Most of the time,,,,,,,,,Often,Sometimes,,,,Often,,Often,,100% of projects,More external than internal,Standalone Team,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,70000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,France,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,,Work,0,0,100,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,,,CNNs,"C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,Often,Most of the time,,Often,Often,Often,,,,,,Often,,,,,,,Often,,Often,,,,Often,Often,,Most of the time,,,,30,20,40,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by non-profit or NGO,Other,Neural Nets,,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",Work,50,40,5,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Non-profit,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,100MB,"Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Python,TensorFlow,Unix shell / awk,Other,Other",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,Often,Sometimes,,"Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,,,,,Rarely,,,,,,,,,Sometimes,,,Most of the time,Often,,,,,Often,,,,Often,,,,,0,0,20,0,0,80,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,Often,Sometimes,,,,,,,,Sometimes,,26-50% of projects,Do not know,Other,wikipedia; various nlp datasets,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,s3,Git,Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Colombia,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",University courses,0,30,50,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,100GB,"GANs,Neural Networks,RNNs","Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Often,,,,,,"Association Rules,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",,Rarely,,,,,Often,Sometimes,,,Often,,,Often,,,,,,Often,Often,,,,Often,Sometimes,Often,Sometimes,,Most of the time,,,,20,40,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,100% of projects,Entirely internal,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,"Git,Mercurial",Always,"84,000,000",COP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,45,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by government,TensorFlow,Deep learning,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses",,,Very useful,,Very useful,,Very useful,,Somewhat useful,,Very useful,,,,,,,,"Data Machina Newsletter,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,Computer Scientist,University courses,50,0,0,50,0,0,"Machine Translation,Recommendation Engines,Reinforcement learning,Time Series,Unsupervised Learning",Other (please specify; separate by semi-colon),Primary/elementary school,Government,I prefer not to answer,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Not at all important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,100PB,Regression/Logistic Regression,"Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,R,SQL,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Most of the time,,Rarely,,,,,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Markov Logic Networks,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,25,25,25,25,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,,100% of projects,Approximately half internal and half external,IT Department,Other law enforcement agencies ,Integration ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",,100000,,Other,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,70,0,20,10,0,Time Series,Logistic Regression,High school,Other,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SAS Base,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Often,,,,,Most of the time,,Rarely,,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,50,10,0,20,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Need to coordinate with IT",Often,,,,,,,Often,,,,Often,,,Often,,,,,,,,100% of projects,Entirely internal,Other,Weather station data,Understanding what each column means,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Sometimes,72000,SDG,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"College/University,Conferences",,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Researcher,Self-taught,10,10,0,80,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Image data,Relational data",Never,1TB,"Bayesian Techniques,CNNs,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Time Series Analysis",,,Often,Sometimes,,Most of the time,,,,,,,Sometimes,,,Sometimes,,Sometimes,Rarely,Sometimes,Often,,,,Sometimes,,,Most of the time,,Often,,,,20,60,0,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning",,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Java,I collect my own data (e.g. web-scraping),"College/University,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Online Courses and Certifications,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,Self-taught,20,40,0,10,30,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,Pakistan,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Necessary,,,,Necessary,,,,,Necessary,,,,Coursera,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,Norway,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Neural Nets,Python,GitHub,"Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Trade book",,,,,,,Very useful,,Very useful,,,Very useful,,Very useful,,Very useful,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,A humanities discipline,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - GANs",A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Australia,30,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by company that makes advanced analytic software,Java,Time Series Analysis,Python,"Google Search,Government website","College/University,Official documentation,Online courses,Stack Overflow Q&A",,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",15,20,20,40,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,10 to 19 employees,Increased slightly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,30,0,0,20,10,40,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Often,,,Sometimes,,Rarely,,,Sometimes,Sometimes,Rarely,Sometimes,,Most of the time,,Often,,Often,Often,Most of the time,,Most of the time,100% of projects,Entirely external,Business Department,Demographics; weather; commodity trading,Obtaining the right data from external stakeholders,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Google Drive,Bitbucket,Rarely,70000,AUD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Italy,44,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Support Vector Machines (SVM),Python,University/Non-profit research group websites,"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,Natural Language Processing,Hidden Markov Models HMMs,,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important +Female,United States,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Mathematica,Social Network Analysis,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,University courses,20,0,0,80,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Conferences,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,"Linear Digressions Podcast,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,70,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Podcasts,Stack Overflow Q&A",,,,,,Very useful,Very useful,Very useful,Very useful,,,,Very useful,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,Other,University courses,20,5,25,40,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,1MB,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,50,0,0,0,50,0,Enough to refine and innovate on the algorithm,"Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Most of the time,,,,,,,Most of the time,,Most of the time,,,Most of the time,,,Less than 10% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Russia,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,30,20,10,10,10,Time Series,"Gradient Boosting,Logistic Regression",A master's degree,Manufacturing,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,Other,"Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Rarely,,,Sometimes,"A/B Testing,Cross-Validation,Data Visualization,Random Forests,Text Analytics",Often,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,30,10,10,30,15,5,Enough to run the code / standard library,"The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,,,,,,,,,,,,Often,,Often,76-99% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,Researcher,Self-taught,50,30,19,1,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,,"Java,Perl,Python,R",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Sometimes,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Data Analyst,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",,,,,College/University,,,Very useful,,,,,,,,,,,,,,,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Data Analyst,Programmer",University courses,15,30,10,45,0,0,"Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Other,20 to 99 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Rarely,10GB,Decision Trees,"Amazon Web services,Python,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Often,Often,,,Most of the time,,,Sometimes,,,,,,,"CNNs,Decision Trees,Gradient Boosted Machines,Natural Language Processing,Segmentation,Text Analytics",,,,Often,,,,Often,,,,Often,,,,,,,Often,,,,,,,Often,,,Often,,,,,40,15,10,20,15,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,Often,,,,,,Often,,,,Often,,,26-50% of projects,More internal than external,Business Department,,,Graph (e.g. GraphBase/Neo4j),I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),,,,,,,,,,,,,,,,,,,,,,, +Female,South Africa,33,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,University courses,0,0,0,100,0,0,Unsupervised Learning,"Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,<1MB,Regression/Logistic Regression,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Logistic Regression,Simulation",,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,50,25,25,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,6 to 10 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Statistician",University courses,35,35,10,15,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Rarely,100MB,"Bayesian Techniques,CNNs,Neural Networks,RNNs","Jupyter notebooks,Python,TensorFlow,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,Rarely,Sometimes,,Most of the time,Sometimes,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,,,Often,,,,Often,,,,,5,85,5,5,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Less than 10% of projects,Entirely external,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,3 to 5 years,"Data Analyst,DBA/Database Engineer",Work,30,40,30,0,0,0,,,High school,Non-profit,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring",,,,,,,Very useful,,,,Very useful,,,,,,Very useful,,,3-5 years,,,,,Nice to have,Nice to have,,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Not important,Very Important,,,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,1 to 2 years,"Data Scientist,Other",Self-taught,30,20,42,5,3,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Never,,,"Minitab,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,,,,,Most of the time,,,,Often,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,Sometimes,,,Sometimes,,,,70,20,0,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,,Sometimes,,,Sometimes,,,,,,,,,Rarely,,,Often,Rarely,,76-99% of projects,Entirely internal,Other,,,,,,,Rarely,73000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Israel,25,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,,,,Somewhat useful,"Data Stories Podcast,No Free Hunch Blog,Partially Derivative Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,77,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Other,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Work,0,0,100,0,0,0,Unsupervised Learning,Bayesian Techniques,A bachelor's degree,Other,10 to 19 employees,Stayed the same,Less than one year,A tech-specific job board,Important,Other,Basic laptop (Macbook),Relational data,Most of the time,100MB,Bayesian Techniques,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Bayesian Techniques,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,100,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,None,More external than internal,IT Department,Facebook Analytics; Fabric,Dirty Data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,66000000,IDR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer",Kaggle competitions,30,50,0,0,20,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Financial,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM Cognos,Java,Jupyter notebooks,Python,QlikView,SAS Base,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,Sometimes,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,Often,,,,Often,,,,Most of the time,,Most of the time,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs",,Often,,Often,Often,Often,Often,Often,Often,,Often,Often,,,,Often,Often,,Often,Often,,,Often,Often,Often,,,,,,,,,10,20,10,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,Na,Compliance ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","I don't typically share data,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Italy,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Association Rules,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses",,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,10,10,50,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",High school,Retail,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Video data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,Rarely,,Rarely,,,,,,,Rarely,,Often,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",Rarely,Sometimes,Sometimes,,,,Often,,,,,,,,,Often,,,Sometimes,,,,,,,,,,Sometimes,Often,,,,30,30,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,,Often,,,,,,,,Sometimes,,,,,Often,,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,,Never,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Turkey,28,Employed full-time,,,Yes,,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,10,15,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Retail,"10,000 or more employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data",Most of the time,1TB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs,Other","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,Sometimes,,Rarely,,Often,,,,,Sometimes,Often,,Most of the time,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Most of the time,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,,,,Most of the time,Sometimes,Often,,,,,,,Often,,Sometimes,,,,Sometimes,Often,,,Most of the time,,Sometimes,Sometimes,Sometimes,Most of the time,Most of the time,,,,25,40,10,10,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Often,,,,Often,,,Sometimes,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Bitbucket,Never,"100,000",TRY,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Turkey,35,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by government,Hadoop/Hive/Pig,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Personal Projects,Textbook",,,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",5-10 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Other,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,80,10,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Female,United States,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Regression,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,KDnuggets Blog,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Very Important,Very Important,Very Important +Male,Spain,44,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Podcasts,Other",,Very useful,,,Very useful,,Very useful,,,,Very useful,,Very useful,,,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,30,20,25,5,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Telecommunications,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Java,KNIME (commercial version),KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",Sometimes,Often,,Sometimes,Rarely,,Rarely,Sometimes,Often,,Sometimes,Sometimes,Often,Rarely,Most of the time,,,Most of the time,Most of the time,Rarely,Most of the time,,Sometimes,,,,Most of the time,Most of the time,Sometimes,Sometimes,Most of the time,Sometimes,Most of the time,Most of the time,Most of the time,,,,Sometimes,,Most of the time,Often,,,Often,Sometimes,Rarely,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Most of the time,Sometimes,Sometimes,,Rarely,Most of the time,Often,Often,Sometimes,Most of the time,Often,,Sometimes,Sometimes,Often,Most of the time,Most of the time,,,,35,25,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,Often,,Most of the time,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint,Other",Removable devices,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,45000,EUR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Data Analyst,Predictive Modeler",University courses,30,30,30,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Mix of fields,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SAS JMP",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,Rarely,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Segmentation",,,,,,Often,Often,Most of the time,Most of the time,,,Sometimes,,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,70,20,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,,,,,,,,,,,Often,Often,,,76-99% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Most of the time,115000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Female,Romania,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Poorly,Self-employed,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer",Self-taught,50,35,0,0,15,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Technology,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Rarely,1GB,"CNNs,Neural Networks","Java,Python,TensorFlow",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,CNNs,Logistic Regression,Neural Networks,Simulation,Time Series Analysis",Often,,,Often,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,Rarely,,,Sometimes,,,,40,40,0,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,,,,Git,Most of the time,,,,8,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,,Primary/elementary school,Financial,20 to 99 employees,Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Other,Other,Text data,Never,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,46,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Kaggle,Personal Projects,Podcasts",,Very useful,,Somewhat useful,,,Very useful,,,,,Very useful,Very useful,,,,,,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Data Miner,Predictive Modeler,Programmer",Self-taught,60,0,10,20,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,Financial,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Always,1MB,"Decision Trees,Regression/Logistic Regression","SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,Often,,Rarely,,,,,,,,Often,,,,,,Often,,,,Most of the time,,,,Often,,,,50,10,30,5,5,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Most of the time,,,,,,,,,,Most of the time,,Often,,,Sometimes,,,Less than 10% of projects,More internal than external,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,240000,SGD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Random Forests,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,,No,Bachelor's degree,A social science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,,,,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,,,Necessary,,Necessary,Necessary,,,,,,,edX,,,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,,Neural Networks - GANs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,United States,76,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Matlab,Google Search,"Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Very useful,,,Very useful,,,,,5-10 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Operations Research Practitioner,Researcher",Self-taught,20,40,30,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,Very useful,Very useful,,< 1 year,,,,,Necessary,Necessary,,,,,,,,"Coursera,DataCamp","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Other,"Online courses,Other",,,,,,,,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,DataCamp,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Time Series Analysis,Python,"Government website,University/Non-profit research group websites","Blogs,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,Somewhat useful,,Not Useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Other,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Biology,6 to 10 years,Researcher,Self-taught,30,10,50,5,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Not Useful,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Not Useful,,,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Scientist,Other",University courses,40,20,15,15,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,1GB,Gradient Boosted Machines,"Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Rarely,,,,,,Sometimes,,,,Gradient Boosted Machines,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,0,95,5,0,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Other",,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,51-75% of projects,Entirely internal,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Never,"100,000",,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Miner,Data Scientist,Engineer,Software Developer/Software Engineer",University courses,20,10,30,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs",High school,Internet-based,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Cloudera,Impala,Java,MATLAB/Octave,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,Sometimes,Sometimes,,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,Rarely,,,,"Cross-Validation,Logistic Regression,Random Forests",,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,10,50,0,20,20,0,Enough to explain the algorithm to someone non-technical,Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,10-25% of projects,More internal than external,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,46,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",University courses,20,40,30,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,TensorFlow",Sometimes,Often,,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,,Often,,,,Sometimes,,Often,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,Often,Often,Often,Sometimes,Sometimes,,,,Often,,Often,,,,Sometimes,,,Often,,,,,Sometimes,,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,Often,,Most of the time,Most of the time,,,,,Often,,Often,,,,Often,,,100% of projects,Entirely internal,Standalone Team,OSM,Cleaning and interpretation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,960000,ZAR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",Work,30,10,40,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,Rarely,,,,,Often,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,,,"Association Rules,Cross-Validation,Natural Language Processing,PCA and Dimensionality Reduction,Time Series Analysis",,Sometimes,,,,Sometimes,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,20,10,50,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,Most of the time,,,,,,Often,,,,,,Often,,,10-25% of projects,More internal than external,Standalone Team,experian; weather data; scraped data,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,75000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,,Nice to have,,Necessary,Nice to have,,,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,50,40,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Turkey,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,0,35,10,45,10,0,Computer Vision,Neural Networks - CNNs,A doctoral degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Sometimes,1GB,"CNNs,Neural Networks","C/C++,IBM Watson / Waton Analytics,MATLAB/Octave,Python,TensorFlow",,,,Most of the time,,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"CNNs,kNN and Other Clustering,Neural Networks",,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,30,20,40,0,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,10-25% of projects,Entirely internal,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,"Bitbucket,Git",Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,Business Analyst,Self-taught,20,10,50,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,Sometimes,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,28,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,3 to 5 years,Other,Self-taught,70,10,10,5,5,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Hidden Markov Models HMMs",A bachelor's degree,Academic,10 to 19 employees,Increased significantly,3-5 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Never,10GB,Other,"Perl,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,Often,,,,,,,,,Most of the time,,,,,Most of the time,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,70,0,0,0,25,5,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Rarely,,,,,,,100% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Never,85000,CAD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Text Mining,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,40+,Github Portfolio,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important +Male,United States,50,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Microsoft SQL Server Data Mining,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Personal Projects,Stack Overflow Q&A,Trade book",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,,Somewhat useful,,,"Data Stories Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",75,10,10,0,5,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,1GB,"Regression/Logistic Regression,RNNs","C/C++,MATLAB/Octave,Python,R,SAS Base,Spark / MLlib,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,Sometimes,Most of the time,,,,,,Sometimes,,,,"Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Often,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Often,,,,15,45,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,Sometimes,,,,,,,,,,,,Often,,Most of the time,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,,110000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Work,40,30,30,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,,,,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,45,35,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Often,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,,,,4,,,,,,,,,,,,,,,,,, +Male,Italy,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,50,0,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Other,"1,000 to 4,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,,>1EB,Regression/Logistic Regression,"C/C++,Microsoft Excel Data Mining,Python,R,SQL,TIBCO Spotfire",,,,Most of the time,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,Rarely,,,,,,,,,Often,,,,,Sometimes,,,,,"Data Visualization,kNN and Other Clustering,Time Series Analysis",,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,50,15,10,20,5,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Most of the time,Rarely,,Sometimes,,,Often,,Sometimes,,Sometimes,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,120000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Company internal community,Non-Kaggle online communities",,,Very useful,Very useful,,,,,Very useful,,,,,,,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Software Developer/Software Engineer",Work,80,0,0,20,0,0,"Reinforcement learning,Time Series","Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,SVMs","C/C++,Java,Python,R,Other",,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,Often,,,"Simulation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,Often,,,,40,25,10,15,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",Sometimes,,,,,,,,Sometimes,,,Sometimes,Often,,Often,,,,,,,,26-50% of projects,More internal than external,Business Department,"market data like CME, EBS",Take time to download newly ,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Subversion,Never,6000000,JPY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Support Vector Machines (SVM),Python,University/Non-profit research group websites,"Arxiv,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Researcher,Self-taught,25,0,25,25,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A professional degree,Academic,I prefer not to answer,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Other",Text data,Sometimes,100GB,"Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Regression/Logistic Regression",C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation",,,,,,Sometimes,,Sometimes,,Sometimes,,Often,,Sometimes,,Often,,,,,Often,,,,,,Most of the time,,,,,,,10,20,30,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,Sometimes,,,,,,,,Sometimes,Often,,,100% of projects,Entirely internal,Other,n/a,"It's a search experiment, so we really can't allow a type 1 error","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,35000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Malaysia,45,Employed full-time,,,Yes,,Engineer,Poorly,Employed by government,TensorFlow,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Friends network,YouTube Videos",,Very useful,,,,Somewhat useful,,,,,,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,I don't know/not sure,Government,500 to 999 employees,Decreased significantly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10MB,Random Forests,"Java,R",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,0,10,20,0,40,30,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,,,,,,,,,,Often,,,10-25% of projects,More external than internal,IT Department,Open data,Insight of the data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Subversion,Rarely,160000,MYR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Web services,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,Researcher,Self-taught,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",High school,Other,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,,Sometimes,,,,,,,,,,Most of the time,Most of the time,Most of the time,,Most of the time,,,,Most of the time,,Most of the time,Often,,,,10,50,10,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,Often,,,Sometimes,,,,,Sometimes,,,,,,,,,76-99% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Sometimes,"100,000",USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Other,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important +Male,Germany,28,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,"Data Scientist,Predictive Modeler,Programmer,Researcher",University courses,20,10,25,40,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,3-5 years,Some other way,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Other,Sometimes,100GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,Sometimes,Sometimes,,Most of the time,,,,,,Sometimes,Sometimes,Sometimes,Often,,,,40,20,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,Sometimes,,,,,,Sometimes,Often,,76-99% of projects,More internal than external,Other,Public domain brain data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Most of the time,,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Colombia,35,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects,Textbook,YouTube Videos,Other",,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Master's degree,No,Master's degree,Computer Science,Less than a year,"Engineer,Researcher",University courses,0,50,20,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Indonesia,35,Employed full-time,,,Yes,,Data Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,Jupyter notebooks,Cluster Analysis,Python,Google Search,"College/University,Friends network,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,Very useful,Very useful,,,,,,,Very useful,,,Very useful,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,80,10,0,10,0,Reinforcement learning,Other (please specify; separate by semi-colon),A doctoral degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,1GB,Gradient Boosted Machines,"C/C++,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,Sometimes,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,80,10,0,5,5,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,Most of the time,,100% of projects,More internal than external,IT Department,,The lack of documentation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Git,Never,45000,CAD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Friends network,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,Very useful,,Very useful,,,,,,,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,,"Business Analyst,Data Analyst",Work,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important +Male,Canada,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,25,0,75,0,0,0,Time Series,,A master's degree,Retail,500 to 999 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,Decision Trees,"Amazon Web services,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,Often,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Segmentation,Time Series Analysis",Sometimes,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,30,10,30,25,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",,Often,,,Often,,,Often,Often,,,,Often,,,Often,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,115000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",Work,20,20,60,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Often,,,,Rarely,,Rarely,,,,Often,,,,Most of the time,,Sometimes,,,,,,Often,,Most of the time,Most of the time,,,Most of the time,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Markov Logic Networks,Naive Bayes,SVMs,Text Analytics,Time Series Analysis",Often,,Often,,,Most of the time,Most of the time,Often,Sometimes,,,,,,,Most of the time,Sometimes,Often,,,,,,,,,,Often,Often,Most of the time,,,,20,10,5,50,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,Often,,Sometimes,Often,Most of the time,,,,Often,,Sometimes,,76-99% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,80000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Unsupervised Learning",Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,Tableau,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Business Analyst,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,30,0,50,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Government,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Image data,Text data,Relational data",Most of the time,,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Data Visualization,Decision Trees,Time Series Analysis",,,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,,,35,30,15,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Privacy issues,Unavailability of/difficult access to data",,Most of the time,,,,,,Often,,,,,,,,,Most of the time,,,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,160000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Mexico,67,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,R,Neural Nets,R,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,A humanities discipline,More than 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",I don't know/not sure,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Female,Russia,32,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,,< 1 year,,,,,,,,,,,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Professional degree,,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,28,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,Other,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Operations Research Practitioner,Other",University courses,30,10,10,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Financial,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Other",Relational data,Always,100MB,Regression/Logistic Regression,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,Rarely,,Rarely,,,,,Sometimes,,,,,,Most of the time,,,Most of the time,,,,10,50,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,Privacy issues",,,,,Sometimes,,,,,,,,,,,Often,Sometimes,,,,,,100% of projects,More internal than external,Business Department,Reuters,These are complex time series indexed on different time intervals.,Other,"Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,78000000,COP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Scientist,Other",University courses,20,0,30,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks","Perl,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Most of the time,Sometimes,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,50,5,0,25,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,Often,Often,Often,,,Most of the time,,,Most of the time,,,100% of projects,Approximately half internal and half external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Rarely,100000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +,,NA,"Independent contractor, freelancer, or self-employed",,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,65,20,0,15,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Academic,"5,000 to 9,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Financial,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,,Relational data,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,55,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,SAS Enterprise Miner,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Kaggle,Online courses,Textbook",Very useful,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,University courses,10,25,10,55,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",Academic,500 to 999 employees,Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Rarely,10MB,SVMs,"MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,50,25,0,25,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,Most of the time,,,,,,,Often,Most of the time,Often,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,500,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Switzerland,28,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by professional services/consulting firm,Unix shell / awk,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Data Analyst,Researcher",University courses,20,30,10,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Insurance,"10,000 or more employees",Decreased significantly,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,50,20,5,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,,,,,Often,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Subversion",Rarely,100000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,,,Primary/elementary school,Hospitality/Entertainment/Sports,"5,000 to 9,999 employees",,,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Very useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Programmer,Software Developer/Software Engineer",Other,35,20,30,15,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Often,,Often,Sometimes,Most of the time,,"A/B Testing,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",Sometimes,,,Often,,Often,,Sometimes,Sometimes,,,,,Sometimes,,Often,,,,Often,Sometimes,,Sometimes,,Sometimes,,,,,,,,,10,20,20,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,76-99% of projects,Entirely external,Other,,Countless columns labelled with acronyms that nobody can decode.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Never,83000,CAD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Canada,30,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher,Statistician",Self-taught,80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning",Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data,Other",Sometimes,1GB,Other,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,kNN and Other Clustering,PCA and Dimensionality Reduction,Text Analytics",,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,Often,,,,,0,80,0,10,10,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Other",,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Fine arts or performing arts,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer,Statistician",Self-taught,40,20,20,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,I don't know,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service",Text data,Rarely,100GB,"Bayesian Techniques,Evolutionary Approaches,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,IBM Cognos,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,Rarely,,,,,Often,Rarely,,,,,Often,,Sometimes,,,,,,Often,,,,,,,,Often,,Sometimes,,,,,,,,Often,Often,,,,,,Often,,,,"Bayesian Techniques,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation",,,Sometimes,,,,,,,Sometimes,,Sometimes,,Often,,,Often,Often,Often,Sometimes,Often,,,,,,Often,,,,,,,10,30,10,30,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,Sometimes,,Often,,Often,,Sometimes,Often,,Most of the time,,Often,,76-99% of projects,Approximately half internal and half external,Other,Web crawls; public records,Time and resources,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Subversion",Rarely,160000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,C/C++,Text Mining,Python,I collect my own data (e.g. web-scraping),Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,10,0,80,10,0,0,Recommendation Engines,Logistic Regression,High school,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Always,100MB,Regression/Logistic Regression,"C/C++,Java,Python,SQL",,,,Often,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Decision Trees,Segmentation",,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,0,40,40,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Privacy issues",Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Most of the time,"24,000",EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,,University courses,10,10,10,70,0,0,Time Series,Logistic Regression,A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Time Series Analysis,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Very useful,,,,,,,,,Very useful,,Somewhat useful,Very useful,,Very useful,,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,25,10,20,35,10,0,,Decision Trees - Random Forests,A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Increased significantly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Random Forests,Regression/Logistic Regression,Other","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,R,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk",,Rarely,,Sometimes,,,,Often,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,Often,Rarely,,,Rarely,,Most of the time,,,,"Data Visualization,Ensemble Methods,kNN and Other Clustering,Random Forests,Time Series Analysis",,,,,,,Most of the time,,Often,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,30,10,5,35,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,Often,,Often,,Sometimes,,,,,,,Often,,,51-75% of projects,More internal than external,IT Department,Kaggle IMDb,Engineers designed the tables left our company (no documentation),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,50000,USD,Other,4,,,,,,,,,,,,,,,,,, +Male,Ireland,31,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,R,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Miner,Predictive Modeler",Work,30,20,50,0,0,0,,,A master's degree,Telecommunications,"10,000 or more employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10TB,,"Hadoop/Hive/Pig,Java,SQL,Tableau",,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,Prescriptive Modeling,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,50,10,10,20,10,0,Enough to refine and innovate on the algorithm,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,IT Department,,Need more experience in working with data sets and need to develop ability to create insights from data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,45000,EUR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,"College/University,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,25,0,25,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,500 to 999 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Python,R,SAS Base,Stan",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,Rarely,,,,,Rarely,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,60,20,0,0,0,20,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,Often,,,,Often,Often,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Online courses",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Work,25,10,25,40,0,0,"Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Insurance,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,,,"R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,Sometimes,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression",,,,,,Often,Most of the time,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,50,15,15,15,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Unavailability of/difficult access to data",,Often,,,Most of the time,,,Often,,,,,,,Sometimes,,,,,,Most of the time,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,,Rarely,57000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,South Africa,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Stack Overflow Q&A",,,Very useful,,Very useful,,Very useful,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,Talking Machines Podcast",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner",University courses,35,10,20,35,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Portugal,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle,Online courses,Podcasts,Textbook",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,Very useful,,Very useful,,,,"Linear Digressions Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Researcher",University courses,30,10,30,20,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,100GB,,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Recommender Systems",,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,35,5,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Sometimes,,,,,,,Often,,,,,,,Often,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Never,"25,000",EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Business Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100GB,Neural Networks,"Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow",,,,,,,,Often,,,,,,,Most of the time,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,Rarely,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,SVMs",Sometimes,,,,,Often,Often,,,,,,,,,Often,,,,Sometimes,,,,,,,,Rarely,,,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,Often,Often,Sometimes,,Sometimes,Most of the time,,Sometimes,,Sometimes,,,Most of the time,,Often,,Most of the time,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,"20,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,I haven't started working yet,Kaggle competitions,30,30,0,10,30,0,Computer Vision,Hidden Markov Models HMMs,A bachelor's degree,Academic,500 to 999 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Most of the time,1GB,Bayesian Techniques,"Amazon Web services,C/C++,IBM SPSS Statistics,MATLAB/Octave,RapidMiner (free version)",,Sometimes,,Most of the time,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,Often,Sometimes,,,Most of the time,Often,Often,,,,Most of the time,Most of the time,,Often,,Often,,Often,Often,,Often,,,Often,,,,Often,,,,20,20,10,20,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Graph (e.g. GraphBase/Neo4j),Email,,Git,Rarely,16000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Canada,38,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Amazon Machine Learning,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Non-Kaggle online communities",,,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Never,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","C/C++,Python,R,SAP BusinessObjects Predictive Analytics",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,Often,,,,,,,,,,,,,,,"Association Rules,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics",,Sometimes,,,,,,Often,Often,,,,,,Sometimes,Often,,,,Often,,Sometimes,Often,,,,,,Often,,,,,25,25,20,20,10,0,Enough to tune the parameters properly,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,None,Approximately half internal and half external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,,Always,95000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,31,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,NoSQL,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Data Machina Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,,"Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Statistician",University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Other,42,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,SAS Base,Monte Carlo Methods,R,"GitHub,Google Search","Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,"Data Miner,Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,15,15,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Random Forests,SVMs",Rarely,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,20,5,25,30,20,0,Enough to tune the parameters properly,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,51-75% of projects,More external than internal,IT Department,"TIMSS, PERLS",Fiability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,200000,MAD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Portugal,30,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Stories Podcast,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,Udacity,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Proprietary Algorithms,Python,Google Search,"Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,50,30,0,20,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Markov Logic Networks,Neural Networks - CNNs",High school,Military/Security,"10,000 or more employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Sometimes,10GB,Bayesian Techniques,"Amazon Web services,C/C++,Python,TensorFlow",,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,HMMs,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Neural Networks,Prescriptive Modeling",Sometimes,Often,Most of the time,Often,,Most of the time,,Often,,,,,Often,Sometimes,,,Most of the time,Most of the time,,Often,,Sometimes,,,,,,,,,,,,50,20,25,5,0,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,Sometimes,,Often,Most of the time,,Sometimes,Often,Sometimes,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,,200000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,High school,Retail,Fewer than 10 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,,"SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,10,0,0,20,70,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,,Most of the time,Most of the time,,76-99% of projects,Entirely internal,Other,None,None,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email",,Other,Never,60000,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",France,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Neural Nets,Python,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,Engineer,University courses,30,0,0,60,10,0,"Supervised Machine Learning (Tabular Data),Time Series",Ensemble Methods,,Internet-based,10 to 19 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,Ensemble Methods,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,RNNs,Segmentation",,,,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools",Often,Often,Often,,Often,,,,,,,,Often,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Never,51000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,Very useful,Somewhat useful,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Statistician",University courses,25,10,25,25,15,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,20 to 99 employees,Increased slightly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Always,10MB,"Decision Trees,Regression/Logistic Regression,SVMs","C/C++,Python,Other",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,SVMs",,,Sometimes,,,,Most of the time,Most of the time,,,,,,Sometimes,,Most of the time,,,,,Most of the time,Often,,,,Most of the time,,Most of the time,,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email",,Git,Sometimes,48000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,Other,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Business Analyst,Data Analyst,Other",University courses,40,40,10,10,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Python,R,Other,Other",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,Most of the time,Sometimes,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,,Rarely,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,Often,,,,,Often,,,Most of the time,,,,10,60,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,,,,,,Sometimes,,Often,,,,,,,100% of projects,Entirely external,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Bitbucket,Always,120000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,10,15,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Most of the time,Often,Sometimes,,,,Often,,,,Sometimes,,,,,,,Often,,,,,,,,,,,30,15,5,10,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,,,,,Often,Sometimes,,,,,Less than 10% of projects,Entirely internal,IT Department,,Measure and manage impact of problems in source data on final models,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Sometimes,2000000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Portugal,32,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Impala,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,YouTube Videos,Other",,,,,Very useful,Not Useful,Very useful,,Very useful,Very useful,,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,Less than a year,"Business Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",0,45,50,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,100GB,Other,"Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Other",,,,,Rarely,,,,Most of the time,,,,,Rarely,,,Most of the time,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,,Rarely,,,"Bayesian Techniques,Data Visualization,Segmentation,Time Series Analysis",,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,5,0,7,28,30,30,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Rarely,,,Most of the time,,Most of the time,,Often,,,,,,Often,Sometimes,,,10-25% of projects,More internal than external,Business Department,Wikipedia,Dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,45000,EUR,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,Canada,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Conferences,Online courses,Podcasts,YouTube Videos",,,,,Not Useful,,,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Other",University courses,20,40,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,,,,,,Sometimes,,Often,Rarely,Most of the time,,,,Most of the time,,,,,,Sometimes,Sometimes,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,Rarely,Often,,Most of the time,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Most of the time,Often,Often,Often,,,,,Often,Rarely,Often,,,Most of the time,Often,Sometimes,,,,,Most of the time,Often,Often,Often,Often,,,,5,20,60,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,,,,,,,,Sometimes,,,Sometimes,Often,Most of the time,,,Most of the time,Sometimes,,,Most of the time,,10-25% of projects,Entirely external,Other,,Finding it and getting it into the proper form for analysis,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,225000,CAD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Malaysia,20,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Online courses,Personal Projects,Textbook",,Very useful,,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,Self-taught,30,10,50,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Pharmaceutical,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,NoSQL,Python,R,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,,"Logistic Regression,Naive Bayes,Random Forests,Text Analytics",,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Sometimes,,,,,,Often,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,Sometimes,,Often,,Often,Sometimes,Often,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Git,Most of the time,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,KDnuggets Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,"Business Analyst,Data Analyst,Other",University courses,10,0,0,80,10,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Other",Self-taught,100,0,0,0,0,0,Machine Translation,,A professional degree,Mix of fields,"5,000 to 9,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle",,,,,Very useful,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Statistician",University courses,25,0,25,40,5,5,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Impala,Java,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),NoSQL,QlikView,R,RapidMiner (commercial version),SAS Base,Spark / MLlib,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,,Sometimes,,Rarely,,Often,,Sometimes,Sometimes,,Often,Sometimes,,,,Sometimes,,,,,Often,,,Often,,,,,Sometimes,Most of the time,Rarely,,,,Rarely,,,Often,Most of the time,,Rarely,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Rarely,Often,,Most of the time,Most of the time,Sometimes,,,,Often,Rarely,Often,,Sometimes,,Sometimes,,Often,Most of the time,,Most of the time,,,Most of the time,Sometimes,Most of the time,Rarely,Often,,,,35,15,25,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Often,Often,,,,,,,Sometimes,,Often,Sometimes,,,Most of the time,Sometimes,,,26-50% of projects,More internal than external,IT Department,Government data; Open source Data (like the weather),,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Never,34000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Other,Matlab,Google Search,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Data Elixir Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Predictive Modeler,Programmer",University courses,0,0,0,100,0,0,Natural Language Processing,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,27,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by government,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,40,0,10,0,0,"Speech Recognition,Survival Analysis","Bayesian Techniques,Logistic Regression",,Government,20 to 99 employees,Decreased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,"DataTau News Aggregator,FastML Blog,KDnuggets Blog",1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,Udacity",GPU accelerated Workstation,11 - 39 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Company internal community,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,Work,25,40,35,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,"1,000 to 4,999 employees",Decreased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100GB,"Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,,,Sometimes,,,,,Most of the time,Rarely,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Time Series Analysis",Sometimes,,,,,,Most of the time,,Sometimes,,,,,Often,,Often,,,,,,,,,,,,,,Sometimes,,,,35,20,10,15,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,Sometimes,,Often,Most of the time,,Most of the time,Often,,Sometimes,,,,Often,,Most of the time,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,555000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Brazil,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,41,"Not employed, but looking for work",,,,,,,,Minitab,,,,"Kaggle,Personal Projects,Podcasts,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Engineer,Operations Research Practitioner",University courses,10,20,5,50,15,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,36,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,Very useful,,Somewhat useful,Very useful,,Very useful,,,,,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,15,0,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,CRM/Marketing,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,SVMs","Impala,Julia,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,Often,,Rarely,Sometimes,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,Often,,,,Often,Sometimes,,Sometimes,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,,,,Often,,,,,,,,,,Often,,,,10-25% of projects,More internal than external,IT Department,Telco; locations; social media data,Inconsistent data formats,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,75000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Operations Research Practitioner,Researcher,Statistician",Self-taught,60,0,40,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Internet-based,500 to 999 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Orange,Python,R,RapidMiner (free version),SAS JMP",,,,,,,,,,,,,,,,,,,Sometimes,,Often,Sometimes,,,,,,,Sometimes,,Often,,Most of the time,,Sometimes,,,,,Rarely,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Time Series Analysis",Often,Often,Often,,Often,,Often,Often,,,,,,,Often,Often,,Often,,,Often,,,Often,,Often,,,,Often,,,,20,20,20,10,30,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,110000,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Hong Kong,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,30,60,10,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Neural Networks,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,Often,Often,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Segmentation,SVMs",,,Sometimes,,,,Often,Often,,,,,,,,Often,,,,Often,,,,,,Often,,Sometimes,,,,,,20,40,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,,,,,Sometimes,,,Often,Sometimes,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,700000,HKD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,41,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook",,Very useful,,,,,,,,,,,,,Somewhat useful,,,,Talking Machines Podcast,3-5 years,,,,,Necessary,Nice to have,,Nice to have,Nice to have,,,,,,Workstation + Cloud service,0 - 1 hour,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,18,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,Julia,"Ensemble Methods (e.g. boosting, bagging)",R,Google Search,"Blogs,Conferences,Friends network,Kaggle,YouTube Videos",,Very useful,,,Very useful,Somewhat useful,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,"Predictive Modeler,Researcher,Statistician",University courses,40,0,20,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",A master's degree,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,1GB,"Neural Networks,Regression/Logistic Regression","Julia,MATLAB/Octave,R,SQL",,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,Rarely,,,,Sometimes,Often,,,,,,Often,,,Most of the time,,,,25,50,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Often,Often,,,,,,,Most of the time,,,,Often,,Most of the time,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,90000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Ireland,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Java,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,,,Very useful,,Very useful,,,,,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,More than 10 years,"Researcher,Statistician",University courses,15,5,65,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,10GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",Rarely,,,,,,Most of the time,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,,,10,10,20,40,20,0,Enough to explain the algorithm to someone non-technical,"Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,,,,,,,Sometimes,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,70000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Turkey,24,"Not employed, but looking for work",,,,,,,,R,I don't plan on learning a new ML/DS method,SAS,Google Search,"Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,30,20,10,30,0,10,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Official documentation,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Very useful,,,,,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,5,5,20,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,,Pharmaceutical,"10,000 or more employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,KNIME (commercial version),Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,"kNN and Other Clustering,Logistic Regression",,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,50,20,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Sometimes,,,,Sometimes,Sometimes,Often,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,8,,,,,,,,,,,,,,,,,, +Female,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Personal Projects,Textbook,YouTube Videos,Other",,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important +Male,United States,53,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,I never declared a major,More than 10 years,Researcher,Self-taught,80,15,5,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,"5,000 to 9,999 employees",Stayed the same,3-5 years,Some other way,Very important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,1MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,SAS JMP,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,Often,,Sometimes,,,Rarely,Sometimes,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests",,Often,Often,,,Often,Most of the time,Often,Often,,,,,Often,Often,Most of the time,,Most of the time,Often,,Most of the time,Often,Often,,,,,,,,,,,50,40,0,10,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Trade book,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,Very useful,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,0,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,Australia,34,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,O'Reilly Data Newsletter,1-2 years,,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,DataCamp,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,NA,40,0,50,10,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Netherlands,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Perfectly,Self-employed,KNIME (free version),Proprietary Algorithms,Other,I collect my own data (e.g. web-scraping),"College/University,Company internal community,Conferences,Kaggle,Personal Projects,Textbook",,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,,Very useful,,,Very useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Management information systems,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Researcher,Software Developer/Software Engineer",University courses,0,25,45,30,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,,"Bayesian Techniques,Evolutionary Approaches,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,Java,Mathematica,NoSQL,R,SQL,TensorFlow",,Sometimes,,,,,,,,,Sometimes,Often,,,Most of the time,,,,,Most of the time,,,,,,,Often,,,,,,Sometimes,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Evolutionary Approaches,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,Simulation",,,Often,,,Most of the time,Most of the time,,,Often,,,,,,Often,,Often,Sometimes,,,,Often,,,Most of the time,Most of the time,,,,,,,10,50,10,20,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues",,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,100% of projects,More internal than external,Standalone Team,GEO data,"affordability, performance","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,100000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,36,Employed part-time,,,Yes,,Predictive Modeler,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Online courses,Personal Projects",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher",University courses,5,20,20,25,30,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,10 to 19 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Sometimes,1MB,Regression/Logistic Regression,"C/C++,Java,MATLAB/Octave",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Rarely,,,,Rarely,,Often,,Sometimes,,,,,Rarely,,Often,,,,,Sometimes,,Sometimes,Rarely,,,,,,,,,,30,25,5,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",Often,,,Sometimes,Often,,,,,,,Sometimes,,,,,,,,,Sometimes,,100% of projects,More external than internal,Other,various medical datasets,"Data is not clean, and sometimes requires expert domain knowledge (which I don't necessarily have) to detect problems with it. Additionally, it's not always clear how predictive the data is (ie. if I have a concordance of 0.6, is that bad, good, or the best that is possible?)","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,50000,CAD,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,United States,49,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,RapidMiner (commercial version),Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,1TB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,NoSQL,R,Spark / MLlib,Other",,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,Most of the time,,,"Data Visualization,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Time Series Analysis",,,,,,,Most of the time,,Rarely,,,,,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,Rarely,,,,,,Often,,,,50,0,0,40,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Privacy issues",Often,,,Sometimes,Sometimes,,,,,,,,,,,,Most of the time,,,,,,51-75% of projects,Entirely external,IT Department,weather;social media,finding and cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,cloud,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,200000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,35,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Conferences,Kaggle,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,,,,,GPU accelerated Workstation,,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Machine Learning Engineer,Programmer,Researcher",Work,10,0,20,0,70,0,"Machine Translation,Natural Language Processing,Speech Recognition","Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Russia,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Java,Deep learning,Python,GitHub,"Company internal community,Podcasts",,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Data Scientist,Work,30,20,40,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Retail,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,SAS Base",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,Often,Often,,Often,Sometimes,Often,Sometimes,Often,,,,,,,Sometimes,,Sometimes,,,Often,,Sometimes,Often,,,,Often,,,,,,10,30,10,10,40,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues",Often,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,Less than 10% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,Researcher,University courses,15,0,50,35,0,0,,Logistic Regression,A doctoral degree,Academic,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Other,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Most of the time,1GB,Regression/Logistic Regression,"SAS Base,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,Most of the time,,,"Cross-Validation,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,,,,Sometimes,Often,,,,,,,,,Often,,,,,,,,,,,Rarely,,,Often,,,,15,25,20,15,25,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,66,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Social Network Analysis,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Programmer,Statistician,Other",Work,60,0,40,0,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs",High school,Academic,"1,000 to 4,999 employees",Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Mathematica,SAS Base,SAS JMP",,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Often,Sometimes,,,Sometimes,,Most of the time,,Most of the time,,Often,,Often,,,Often,,,,Often,,Often,Often,,,,50,20,0,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,Often,,,,,,100% of projects,Entirely internal,Standalone Team,,Data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,"100,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,56,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects,Other",,,,,,,,,,,Not Useful,Somewhat useful,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity","GPU accelerated Workstation,Traditional Workstation",11 - 39 hours,Github Portfolio,Yes,Doctoral degree,Biology,More than 10 years,"Data Analyst,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",Work,0,25,50,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,37,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,Programmer,University courses,25,25,0,50,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,I don't know,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data",Don't know,1GB,"CNNs,Neural Networks,SVMs","Java,MATLAB/Octave,Python",,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Neural Networks,Segmentation",,,,Most of the time,,Sometimes,Often,,Rarely,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,25,50,10,15,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,"Random Structured Forests for crack detection https://github.com/cuilimeng/CrackForest-dataset; CrackIT http://amalia.img.lx.it.pt/CrackIT/CrackITv1_5.zip ; Crack Tree/ Crack FoSA other Crack https://sites.google.com/site/qinzoucn/documents ; Sylvie Chambon Crack Detection papers https://www.irit.fr/~Sylvie.Chambon/CrackDataset.zip https://www.irit.fr/~Sylvie.Chambon/Crack_Detection_Database.html ;",It is large and not consistent.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,"dropbox , google drive",Bitbucket,Always,"12,000",USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Italy,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Association Rules,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Online courses",,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Statistician",University courses,30,10,30,25,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Other",Other,Most of the time,10MB,"Bayesian Techniques,SVMs","Amazon Web services,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes,Time Series Analysis",,,Often,,,Most of the time,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,15,15,15,10,45,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Explaining data science to others,Need to coordinate with IT,Privacy issues",,,,,,Often,,,,,,,,,Often,,Sometimes,,,,,,76-99% of projects,Entirely internal,Standalone Team,Osint,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,"Git,Subversion",Rarely,40000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,50,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,"Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Miner,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,40,10,10,0,40,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Retail,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Relational data",Most of the time,100MB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics,Time Series Analysis",,,,Sometimes,,Most of the time,Often,Often,Often,,,Often,,Often,,Often,,,Often,Sometimes,Often,,,,Sometimes,,,,Sometimes,Sometimes,,,,10,50,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning",,,,Often,Sometimes,,,,,,,Often,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,"Weather, real estate prices, finance, macroeconomics","Cleaning from noise, feature engineering","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,32000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Canada,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses",,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,University courses,35,20,0,40,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Financial,500 to 999 employees,Decreased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,Decision Trees,"IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,Rarely,,,,,,Sometimes,,,,Rarely,,,,Rarely,,Rarely,,,,Rarely,,,,,Sometimes,,,Sometimes,,,,,,,"Decision Trees,Time Series Analysis",,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,75,5,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,Most of the time,,26-50% of projects,Do not know,Other,,"security and privacy concerns, there is barely enough t collaboration between departments","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Stack Overflow Q&A",,,,,,Very useful,Somewhat useful,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,30,40,30,0,0,"Adversarial Learning,Machine Translation,Survival Analysis",Logistic Regression,Primary/elementary school,Technology,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,100GB,Regression/Logistic Regression,"Google Cloud Compute,Microsoft Excel Data Mining,NoSQL,Python,QlikView,SQL,Tableau",,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Text Analytics",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,15,10,5,60,10,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Privacy issues",,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,76-99% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,"104,000",BRL,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,30,15,15,30,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,University/Non-profit research group websites,Newsletters,,,,,,,,Not Useful,,,,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Canada,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,College/University,Company internal community,Conferences,Friends network,Personal Projects,Textbook,YouTube Videos,Other",Very useful,,Very useful,Very useful,Very useful,Very useful,,,,,,Very useful,,,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",40+,Other,Yes,Bachelor's degree,Computer Science,,"Computer Scientist,Data Scientist,Engineer,Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Brazil,35,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Online courses,Textbook",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,Yes,Master's degree,Computer Science,,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,42,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,,,,,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,15,0,25,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Stayed the same,6-10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Gradient Boosted Machines,Regression/Logistic Regression","KNIME (free version),Microsoft Azure Machine Learning,NoSQL,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,,,,Rarely,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs",,,,,,,Most of the time,Sometimes,,,,Often,,,,Sometimes,,,,,,,Often,,,,,Sometimes,,,,,,40,20,10,15,15,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,,,,,,,,,,,,,,,Often,Often,,10-25% of projects,More internal than external,IT Department,"data from us government sites, uc",Data cleansing,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,2600000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,20,Employed part-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",33,33,34,0,0,0,"Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Argentina,33,Retired,,,Yes,,Data Analyst,Perfectly,Employed by college or university,Mathematica,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,Somewhat useful,Somewhat useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,3 to 5 years,"Machine Learning Engineer,Predictive Modeler",University courses,20,10,2,60,8,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,,O'Reilly Data Newsletter,3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,3 to 5 years,I haven't started working yet,Self-taught,40,20,10,30,0,0,"Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important +Male,Singapore,37,Employed full-time,,,No,Yes,Other,Fine,Employed by government,Cloudera,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important +Male,Canada,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Scientist",University courses,10,20,30,20,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Internet-based,100 to 499 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,Often,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Text Analytics",Sometimes,Sometimes,,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Often,,Most of the time,,Most of the time,,Most of the time,,,,,Most of the time,Often,Most of the time,Sometimes,,,,,Sometimes,,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,Sometimes,Most of the time,Often,,Sometimes,,,,,,Often,Sometimes,,,,,Sometimes,,,76-99% of projects,Entirely internal,Business Department,,Dirty,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,93000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,Python,Other,SQL,GitHub,"Company internal community,Official documentation,Online courses,Personal Projects",,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,0,80,10,10,0,0,"Machine Translation,Natural Language Processing,Unsupervised Learning","Gradient Boosting,Neural Networks - CNNs",,Government,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Video data,Never,1GB,"Evolutionary Approaches,Neural Networks","Amazon Machine Learning,C/C++,DataRobot,IBM Cognos,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL",,,,Rarely,,Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,Often,,Often,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,10,10,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Explaining data science to others,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,,Often,,,,,,,,,,,,Often,Often,,,,10-25% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,30000,BRL,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Conferences,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,Somewhat useful,,,,,,,Very useful,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer",University courses,30,0,10,60,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased slightly,Less than one year,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,Other","Amazon Machine Learning,Jupyter notebooks,Python",Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,Sometimes,,Most of the time,Most of the time,,,,,,,,,,Often,,,,Most of the time,Sometimes,,,Most of the time,,,,,,,,,,30,25,5,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Rarely,,,,,,,Rarely,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,None,A lot of noise coming from usage patterns (I use data originating from user actions within a mobile app),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,36000,BRL,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,France,37,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,I don't write code to analyze data,"Business Analyst,Engineer,Researcher,Statistician",Other,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Researcher",Self-taught,30,20,20,15,15,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data,Relational data",Don't know,10GB,"CNNs,GANs,Neural Networks,RNNs","Amazon Web services,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,Tableau,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Often,,,,,Sometimes,,,,Most of the time,,,Sometimes,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Often,,,,Most of the time,,,,Often,,,Often,Most of the time,Sometimes,Sometimes,Often,,Most of the time,,,Often,Often,Sometimes,,,,40,25,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,51-75% of projects,Do not know,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Don't know,87500,CAD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United Kingdom,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,IBM SPSS Modeler,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher",University courses,80,20,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Regression/Logistic Regression","IBM Cognos,IBM Watson / Waton Analytics",,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Time Series Analysis",,,Often,,,Often,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,50,5,5,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,Most of the time,,Often,,,,,,,,,,,,Often,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,The quality of the data. Data input is made usually by people that either don't have the time or don't care about the impact their data has in the wider business or with our customers,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"33,000",GBP,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Other,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",University courses,30,0,10,60,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Technology,10 to 19 employees,Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Stan,Neural Nets,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping)",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,University courses,0,0,40,60,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,10 to 19 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Random Forests",,,Sometimes,,,,Often,,Most of the time,,,Often,,,,,,Sometimes,Often,,,,Sometimes,,,,,,,,,,,55,20,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,,,,,,Sometimes,,,Often,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,Standalone Team,none,labeling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,Very Important,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,,,Somewhat important, +Male,Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,"Blogs,Official documentation,Online courses",,Very useful,,,,,,,,Somewhat useful,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,PhD,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Adversarial Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Singapore,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics",,Sometimes,,,,Most of the time,Often,Sometimes,Often,,,Most of the time,,Sometimes,,Sometimes,,,Often,,Sometimes,Often,Most of the time,,,Sometimes,Sometimes,Often,Often,,,,,50,10,10,10,20,0,Enough to tune the parameters properly,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,Sometimes,Often,,,,,Often,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,95000,SGD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Brazil,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Poorly,"Employed by college or university,Employed by a company that performs advanced analytics",Jupyter notebooks,Deep learning,Python,GitHub,Friends network,,,,,,Very useful,,,,,,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer",Self-taught,90,0,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches",A bachelor's degree,Internet-based,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1GB,Random Forests,"Java,MATLAB/Octave,Python",,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods",,,,,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,80,10,0,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Sometimes,40000,BRL,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,Researcher,University courses,50,10,20,20,0,0,"Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,48,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Master's degree,Other,3 to 5 years,"Business Analyst,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,India,19,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,India,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Jupyter notebooks,,Python,,"Personal Projects,Stack Overflow Q&A,Trade book",,,,,,,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,97,3,0,0,0,0,,,"Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Other,"Basic laptop (Macbook),Other",Text data,Never,,,"Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Impala,NoSQL,Perl,Python,SQL,Other",,Sometimes,,Rarely,Often,,,,Often,,,,,Rarely,,,,,,,,,,,,,Sometimes,,,Rarely,Often,,,,,,,,,,,Sometimes,,,,,,,,,Often,"A/B Testing,Data Visualization",Rarely,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,5,0,10,30,5,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Other",,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,,Most of the time,Less than 10% of projects,Entirely internal,Other,,collecting and analysis - munging is not a problem,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,Fewer than 10 employees,Stayed the same,Don't know,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Always,,,"Python,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Rarely,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization",Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,30,25,5,25,15,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Privacy issues",,,,,Often,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,R,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,50,20,5,0,0,Other (please specify; separate by semi-colon),Support Vector Machines (SVMs),A master's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Sometimes,100MB,"Neural Networks,SVMs","MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,RNNs,SVMs",,,,,,Most of the time,Often,Sometimes,Sometimes,,,,,,,Sometimes,,,,Most of the time,,,,,Often,,,Most of the time,,,,,,10,40,20,5,25,0,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,"GWAS catalog; HPIDB, VirusMenth",Identifier consistency ,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,"Bitbucket,Git,Subversion",Sometimes,60000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,55,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Julia,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Company internal community,Friends network,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,"Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",Work,70,5,5,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,"5,000 to 9,999 employees",Decreased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Relational data,Other",Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Statistics,Java,Julia,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk,Other",Rarely,Often,,Rarely,,,,Often,Often,,,Rarely,,,Sometimes,Sometimes,Often,,,,,,,Sometimes,,,Sometimes,Sometimes,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,Often,,,Sometimes,,Most of the time,Often,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other",Sometimes,Sometimes,Sometimes,,Sometimes,Often,Often,Sometimes,Sometimes,,,Sometimes,,,Often,Often,,,,,Often,,Often,Often,,Most of the time,Often,Sometimes,Sometimes,Most of the time,Most of the time,,,35,15,10,10,10,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Sometimes,,Sometimes,,Often,Sometimes,,,,,,Often,,,Sometimes,Often,,,,10-25% of projects,More internal than external,Business Department,transactional data from partners,timely delivery,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Sometimes,105000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Official documentation,Online courses,Personal Projects,Textbook",Very useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,,"Data Elixir Newsletter,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",Work,45,20,30,5,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,100MB,"Bayesian Techniques,Neural Networks,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,TensorFlow",,Rarely,,Sometimes,,,,Often,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,kNN and Other Clustering,Naive Bayes,Neural Networks,Simulation,Text Analytics",Rarely,,Sometimes,,,,,,,,,,,Sometimes,,,,Rarely,,Sometimes,,,,,,,Sometimes,,Often,,,,,20,20,30,10,20,0,Enough to run the code / standard library,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,More external than internal,Central Insights Team,crawling text data,Cleaning crawling text,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Git,Mercurial",Sometimes,"3,800,000",JPY,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,SQL,I collect my own data (e.g. web-scraping),"College/University,Conferences,Kaggle,Textbook,YouTube Videos",,,Very useful,,Very useful,,Very useful,,,,,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Data Analyst,Statistician",Work,21,20,20,20,19,0,"Natural Language Processing,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Rarely,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Rarely,Rarely,,Often,Most of the time,,,Sometimes,Rarely,,Most of the time,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,Most of the time,,,Often,Most of the time,Most of the time,Most of the time,,,,Sometimes,,,,Most of the time,,,,,,,Sometimes,Often,,Most of the time,,,,Sometimes,,,,40,20,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,Sometimes,,,Most of the time,,,,,,,,,,Most of the time,Often,,,51-75% of projects,Entirely internal,Other,,At the moment Oracle databases are primary while ur Hadoop cluster is still in development and data is loaded per project,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,69500,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,,,,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,"Data Analyst,Engineer,Machine Learning Engineer,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Somewhat useful,"Data Elixir Newsletter,No Free Hunch Blog,The Data Skeptic Podcast",< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Other,Less than a year,"Engineer,Programmer,Researcher",University courses,30,0,0,40,0,30,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Personal Projects",,,,,Very useful,,,,,,,Very useful,,,,,,,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,25,0,40,35,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Internet-based,"1,000 to 4,999 employees",Decreased slightly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Most of the time,100GB,"Decision Trees,Neural Networks,SVMs","Amazon Web services,Java,NoSQL,Python,Spark / MLlib",,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",Often,,Often,,,Often,,Often,,,,,,Often,,,,,Often,Often,,,Often,,,,,,Often,,,,,35,40,25,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Scaling data science solution up to full database",Sometimes,,,,,,,,Often,,,,,,,,,Often,,,,,Less than 10% of projects,,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,47,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,Personal Projects,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Business Analyst,Data Scientist,Software Developer/Software Engineer",University courses,25,0,25,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Technology,"10,000 or more employees",Decreased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R",,,,,,,,,,,Most of the time,Rarely,Rarely,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,,Often,Most of the time,,,,Often,Sometimes,,,,,,,,,Most of the time,,,,30,10,5,25,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues",,,,,,,,,Sometimes,,Often,,,,Often,,Often,,,,,,100% of projects,More internal than external,Standalone Team,"weather underground, governmental data (statistics)","privacy issues, clients are not sure if they can legally use their customers' data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,100000,EUR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,42,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Personal Projects,Tutoring/mentoring",,,,,,,Very useful,,,,,Somewhat useful,,,,,Very useful,,Data Machina Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,16-20,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,Taiwan,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,23,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,Other,Self-taught,10,60,0,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,DataRobot,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,Stan,TensorFlow,TIBCO Spotfire,Unix shell / awk",Sometimes,Often,,,,Rarely,,,,,,,,,,,Sometimes,Often,Often,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Rarely,Rarely,Rarely,,,Sometimes,,,,Most of the time,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Simulation,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Often,,,,Often,Often,Sometimes,Often,,Sometimes,,Often,,,Often,,,,80,10,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,,,,Often,,,,,,Often,,Sometimes,,Sometimes,Often,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,360000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Colombia,31,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,10,NA,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Regression,SQL,Google Search,"Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer",Work,20,10,50,20,0,0,,,A bachelor's degree,Other,"5,000 to 9,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,,,"SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,10,20,40,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Often,,,,Sometimes,,,Often,Most of the time,,Most of the time,,,,Most of the time,,Most of the time,,,,Most of the time,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Subversion,,105000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed part-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,C/C++/C#,University/Non-profit research group websites,"Arxiv,Online courses,Stack Overflow Q&A,Textbook",Not Useful,,,,,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",0 - 1 hour,Online Courses and Certifications,No,Doctoral degree,Mathematics or statistics,I don't write code to analyze data,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Recommendation Engines,"Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,23,Employed part-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by non-profit or NGO",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Statistician,University courses,20,0,0,80,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Non-profit,Fewer than 10 employees,Decreased significantly,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Always,,Regression/Logistic Regression,"IBM SPSS Statistics,Minitab,R",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,40,50,0,0,10,0,Enough to run the code / standard library,Explaining data science to others,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Management information systems,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Romania,49,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Enterprise Miner,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,United States,38,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,,,A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,10MB,"Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Markov Logic Networks,Random Forests,Segmentation,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,,Sometimes,,,,,,Sometimes,,,Most of the time,,,Sometimes,,,,,60,10,0,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Most of the time,,Sometimes,Often,,Most of the time,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,Often,Most of the time,Most of the time,,10-25% of projects,More internal than external,Business Department,,Accessibility ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,83000,,Has stayed about the same (has not increased or decreased more than 5%),,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Software Developer/Software Engineer",University courses,20,30,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Never,100MB,"Decision Trees,Regression/Logistic Regression","SAS Base,SAS JMP,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,"Decision Trees,Lift Analysis",,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,10,20,0,20,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Often,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,More external than internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,,63,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Microsoft Azure Machine Learning,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"Data Machina Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Data Miner,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Gradient Boosted Machines,Neural Networks","Amazon Machine Learning,IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,Python,QlikView,Tableau",Rarely,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,Rarely,Rarely,,,,,,,,,,,,,Often,,,,,,,"Ensemble Methods,Neural Networks,Prescriptive Modeling,Random Forests",,,,,,,,,Rarely,,,,,,,,,,,Often,,Often,Sometimes,,,,,,,,,,,60,10,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,,,,,,,,Sometimes,Rarely,Sometimes,,,,,Most of the time,,,,,Sometimes,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,160000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Switzerland,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Microsoft SQL Server Data Mining,Deep learning,Python,Google Search,"Arxiv,Kaggle,Online courses,Textbook",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,Less than a year,"Data Analyst,Researcher",Work,0,0,100,0,0,0,Unsupervised Learning,"Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Always,,Neural Networks,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,20,10,30,30,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Male,United Kingdom,72,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Link Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,Very useful,Very useful,,,,,,Very useful,Very useful,Very useful,,,Very useful,,,Very useful,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",GPU accelerated Workstation,11 - 39 hours,PhD,No,Bachelor's degree,Computer Science,,"Data Analyst,Data Miner,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,Unsupervised Learning,Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Russia,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Trade book",,,,,,,Very useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,Less than a year,"Engineer,Programmer",Self-taught,80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A professional degree,Financial,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,,Decision Trees,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Decision Trees,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to tune the parameters properly,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,None,Do not know,Standalone Team,Titanic,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Sometimes,45000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,Less than a year,"Data Miner,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,94,0,0,1,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,I don't know/not sure,CRM/Marketing,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,100GB,Random Forests,"Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,Often,,Often,,,,,,Often,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Random Forests,Text Analytics",Sometimes,,,,,,Often,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,75,20,0,5,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,Often,,,,Often,,,,,,,Often,,,,,,,76-99% of projects,More internal than external,IT Department,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Google Drive,Subversion,Sometimes,42000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,62,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by government,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,SQL,Government website,"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer",University courses,60,0,0,40,0,0,Other (please specify; separate by semi-colon),,A bachelor's degree,Government,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Most of the time,10GB,Decision Trees,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,"Data Visualization,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,20,10,10,50,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Limitations of tools,Need to coordinate with IT",Sometimes,Sometimes,,,Often,,,,,,,,Sometimes,,Often,,,,,,,,26-50% of projects,Entirely external,Business Department,none it's all proprietary,Getting the data in a timely fashion,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,Web based software,Other,Rarely,"53,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Oracle Data Mining/ Oracle R Enterprise,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",30,50,10,5,5,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests","NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Sometimes,,,,Rarely,Most of the time,,Often,,,,Most of the time,,,Often,Sometimes,,Sometimes,,,Often,,Often,Rarely,,,,,,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources",,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,30000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",0,30,40,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Retail,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Most of the time,,,Often,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,,,Rarely,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,Often,,Often,Most of the time,Most of the time,Sometimes,,Often,,,Often,,,,,40,30,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,Often,,,,,,,Often,,Most of the time,,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,130000,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,70,5,20,5,0,0,Computer Vision,"Bayesian Techniques,Support Vector Machines (SVMs)",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,3-5 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation","Image data,Other",,100GB,,"Amazon Web services,Python",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,15,30,20,20,15,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,100% of projects,Approximately half internal and half external,Other,,,Other,"Commercial Data Platform,I don't typically share data",,Git,Rarely,50000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Other,42,Employed full-time,,,Yes,,Programmer,Fine,Self-employed,Python,,,,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,0,0,80,20,0,0,Time Series,Decision Trees - Random Forests,A professional degree,Financial,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1GB,Decision Trees,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Decision Trees,Natural Language Processing",,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,10,50,20,10,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,College/University,Conferences,Kaggle,Personal Projects,YouTube Videos",Very useful,,Very useful,,Very useful,,Very useful,,,,,Very useful,,,,,,Very useful,"FastML Blog,No Free Hunch Blog",3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,61,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Operations Research Practitioner,Other",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Relational data",Never,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Perl,Python,R,SQL,Tableau,TIBCO Spotfire,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,Most of the time,,,,,,,,,Often,,,Often,,Often,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests",,,,,,Sometimes,Often,Most of the time,Sometimes,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,20,20,0,10,50,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",,,,,Most of the time,,,,Sometimes,,,Often,Often,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,,165000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,44,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp",Traditional Workstation,11 - 39 hours,Online Courses and Certifications,Yes,Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +A different identity,Brazil,41,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,64,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,Other","Arxiv,College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,,Very useful,,Very useful,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,10,10,40,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,1GB,"Ensemble Methods,Random Forests","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,Stan,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,,,Sometimes,,,,,Most of the time,Often,,,"Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Most of the time,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,Most of the time,,,,10,30,30,0,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Most of the time,Often,,,,,,,,,,,,,,,Often,Often,,Less than 10% of projects,More internal than external,IT Department,OpenStreetMap; google aps; here api; weather,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,AWS/DynamoDB,"Bitbucket,Git",Sometimes,"180,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,39,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Software Developer/Software Engineer",University courses,70,0,30,0,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,Decision Trees,"Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,Spark / MLlib",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Often,Often,,,,,,,,,Often,,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Prescriptive Modeling,Random Forests,RNNs,Segmentation",,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,,,,,35,25,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,"Credit risk score ,",,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,80000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Netherlands,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,34,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Predictive Modeler,Statistician",University courses,10,10,20,60,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Relational data,Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Bayesian Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Very useful,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Speech Recognition,Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,Chile,31,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Data Scientist,Programmer,Researcher",University courses,20,20,30,20,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Most of the time,10GB,Decision Trees,"Java,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Electrical Engineering,1 to 2 years,"Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,"Arxiv,Blogs,Personal Projects",Very useful,Somewhat useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,6 to 10 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Telecommunications,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,10GB,"Regression/Logistic Regression,Other","Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Sometimes,,Sometimes,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Rarely,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems",,,Sometimes,,Often,,,,,,,,,,,Most of the time,,Most of the time,Sometimes,,Most of the time,,,Often,,,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,Entirely internal,,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Australia,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,R,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Doctoral degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)",Bayesian Techniques,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,C/C++,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos,Other",Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,75,0,10,10,0,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics",,,Sometimes,Sometimes,,Sometimes,,Sometimes,Often,Rarely,,,,Rarely,,Sometimes,,,Most of the time,Often,Sometimes,,Rarely,,Sometimes,,,,Most of the time,,,,,60,10,10,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,Rarely,Often,,,,,,,,Rarely,Sometimes,,,,Sometimes,,Sometimes,Most of the time,,Less than 10% of projects,More internal than external,Other,"Location Datasets, Demographic Data, Census Data",Scalable and Democratic access to Data.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,Bitbucket,Sometimes,195000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Mexico,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,YouTube Videos,Other",,,,,Very useful,,Very useful,,,,Not Useful,,,,,,,Somewhat useful,KDnuggets Blog,1-2 years,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,25,25,0,0,50,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Canada,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Bayesian Methods,Python,,"Blogs,College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,,University courses,50,0,20,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,Often,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,Sometimes,Rarely,,Sometimes,Often,,,,50,10,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Often,Rarely,,,,,,Sometimes,Sometimes,,,,Most of the time,,Most of the time,,,,,,Most of the time,,100% of projects,More internal than external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,115000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,40,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,Other",Very useful,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,Somewhat useful,,"Data Elixir Newsletter,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,Other,Self-taught,55,25,0,0,5,15,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Not at all important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data,Other",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Other",,Sometimes,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,Often,,,,Sometimes,,,Sometimes,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,Sometimes,Sometimes,Often,Often,Often,Often,,,Often,,Sometimes,Often,Often,,Rarely,,Sometimes,Sometimes,Sometimes,Often,Sometimes,Sometimes,,,Sometimes,Often,Sometimes,,,,50,20,15,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Sometimes,Sometimes,,,Sometimes,Sometimes,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,Engineer,Researcher",University courses,75,0,0,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Textbook,Trade book,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,Very useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,10,25,30,5,30,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,Technology,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Often,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,Often,Sometimes,Sometimes,Sometimes,Often,Sometimes,,Often,Often,Often,,,Often,Often,Most of the time,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues",Sometimes,,,,Often,,,,Often,,,,,,,,Often,,,,,,26-50% of projects,Entirely internal,Business Department,,Cleaning and feature engineering,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,100000,BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,20,30,10,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,500 to 999 employees,Decreased slightly,3-5 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Gradient Boosted Machines","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Perl,Python,R",Often,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction",,Often,Often,Often,,,,Often,Sometimes,,,Sometimes,,Sometimes,,Often,,Sometimes,,,Sometimes,,,,,,,,,,,,,40,25,20,10,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations of tools,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,Sometimes,,,,,,,Sometimes,,,,Often,Sometimes,,Rarely,Sometimes,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,160000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Vietnam,33,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Python,Deep learning,Python,GitHub,"Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,Nice to have,Nice to have,Necessary,,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,,,,,,,,,,,,,, +Male,Egypt,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Researcher,Other",University courses,80,0,0,20,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Rarely,100GB,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,SVMs","C/C++,MATLAB/Octave",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,Sometimes,,,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,,,,Most of the time,Often,,Most of the time,,,,,Most of the time,,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Nigeria,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,Fine,Self-employed,SAP BusinessObjects Predictive Analytics,Deep learning,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Personal Projects,Tutoring/mentoring,Other",,,,,,,,,,,Very useful,Very useful,,,,,Very useful,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,,Bachelor's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",Self-taught,85,10,NA,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,,University courses,40,0,0,40,20,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A professional degree,Other,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,"Markov Logic Networks,Neural Networks,SVMs","Amazon Web services,Java,NoSQL,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"HMMs,Neural Networks",,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,5,0,0,15,0,80,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,80000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Very useful,,,Very useful,,Very useful,Somewhat useful,Very useful,,,Very useful,,Very useful,,,Very useful,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Researcher",Work,40,0,60,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"5,000 to 9,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Random Forests,RNNs","Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,Often,Sometimes,Often,Often,Often,Often,,,Often,,Often,,Often,,Sometimes,Sometimes,Often,Often,,Often,Sometimes,Often,Often,,Often,Sometimes,Often,,,,30,20,5,15,30,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Often,,,,,,,,,,,,,Often,Sometimes,,,Sometimes,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,60000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by college or university,Employed by non-profit or NGO",Amazon Web services,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Non-profit,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Other,,,Regression/Logistic Regression,"Jupyter notebooks,Perl,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,Rarely,,,,Most of the time,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,20,0,40,15,20,5,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,Sometimes,,,,,,Often,Sometimes,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"83,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Mexico,49,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,1 to 2 years,Engineer,Self-taught,70,30,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Telecommunications,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,Often,,,,,,,,,,Rarely,Most of the time,,Often,Often,,,,,,,Sometimes,,Rarely,,,,,,,Often,,,Most of the time,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Time Series Analysis",,Often,Often,Sometimes,,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,Sometimes,Often,,,,,Most of the time,Sometimes,,,,Sometimes,,,,Sometimes,,,,20,20,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Often,,,,,,Often,Often,,,,,,,Often,,,,,,,76-99% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,1285715,MXN,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,27,Employed part-time,,,Yes,,DBA/Database Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst",Kaggle competitions,40,20,10,20,10,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Technology,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Image data,Sometimes,10GB,"CNNs,Decision Trees,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,38,Employed full-time,,,Yes,,Programmer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Researcher,Software Developer/Software Engineer",Work,10,10,80,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Relational data,Always,100GB,"Neural Networks,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Python,Spark / MLlib,TensorFlow",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,RNNs,Time Series Analysis",,,,,,Most of the time,Often,,,,,,,,,Often,,,,Often,,,,,Most of the time,,,,,Often,,,,70,20,10,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Often,,,Often,Often,,Less than 10% of projects,Entirely external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,"66,000,000",KRW,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Spain,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,,,1-2 years,,,,,,,,,,,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Doctoral degree,Other,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences",Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,10,0,75,10,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Manufacturing,"5,000 to 9,999 employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Relational data",Sometimes,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs,Other","Jupyter notebooks,NoSQL,Orange,Python,R,SAS Enterprise Miner,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Rarely,,Most of the time,,Rarely,,,,,,Rarely,,Sometimes,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,,,,,,,Sometimes,,,,,,Often,Sometimes,,,,,,,Sometimes,,Often,,,,20,30,20,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data",Most of the time,,,Sometimes,Often,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Standalone Team,,Data cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,30000,TWD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Statistician",University courses,40,0,0,60,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Python,R,SQL,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,Rarely,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Logistic Regression,Random Forests,Simulation,Time Series Analysis,Other",,,Sometimes,,,Often,,Sometimes,,,,,,,,Often,,,,,,,Often,,,,Often,,,Often,Most of the time,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,45,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,DataRobot,Genetic & Evolutionary Algorithms,Python,GitHub,"College/University,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,Less than a year,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,10 to 19 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Video data,Text data",Never,100MB,SVMs,"Java,Python,TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,SVMs,Text Analytics",,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,0,100,0,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,,Most of the time,,,,,,,Most of the time,Often,,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,,Less than 10% of projects,Do not know,Standalone Team,kaggle,"Develop a recognize environment system, principally humans, the idea is use this system to interactive whit the user or another object to preserver the human live. ",Graph (e.g. GraphBase/Neo4j),I don't typically share data,,Other,Never,1000,ALL,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Very useful,FastML Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,,Kaggle competitions,40,10,30,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,500 to 999 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,,Basic laptop (Macbook),Relational data,Never,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Logistic Regression,Natural Language Processing,Neural Networks",,,,Often,,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,,,,,,,,,40,10,0,10,40,0,Enough to refine and innovate on the algorithm,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,Most of the time,Often,,,Most of the time,,,,,Often,Most of the time,Sometimes,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Other,Never,15,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Switzerland,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Online courses,Tutoring/mentoring,Other",Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Programmer,Researcher",University courses,30,10,50,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Never,,"Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Java,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,Sometimes,,,Often,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Rarely,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Text Analytics",,,,,,Most of the time,Often,Most of the time,Often,,,,,Sometimes,,Most of the time,,,,,,,Most of the time,,,,,,Rarely,,,,,20,10,0,10,10,50,Enough to explain the algorithm to someone non-technical,"Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Often,,,Sometimes,Most of the time,,76-99% of projects,More external than internal,Other,"open data platforms; webscrapped sites; Foursquare API; census data; Instagram API.",getting access to it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",database,Bitbucket,Sometimes,"72,000",CHF,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Engineer,Poorly,"Employed by non-profit or NGO,Employed by government",Tableau,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Personal Projects",,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,0,0,100,0,0,"Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Government,100 to 499 employees,Increased slightly,Less than one year,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,,<1MB,"Decision Trees,Ensemble Methods,Random Forests","Java,Orange,Perl,R,RapidMiner (free version),SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,Rarely,,,Rarely,,Rarely,,,,,,,Rarely,,,Rarely,,,Rarely,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Random Forests,Segmentation",,,,,,Rarely,,Rarely,Rarely,,,,,,,,,,,,,,Rarely,,,Rarely,,,,,,,,40,20,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Often,,,,Most of the time,,,,,,Often,Most of the time,Often,,Most of the time,,,,Less than 10% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Germany,69,Retired,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,Very useful,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Psychology,More than 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Statistician",University courses,20,0,30,50,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Amazon Web services,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Stack Overflow Q&A,Other",Somewhat useful,,,,,,Very useful,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Data Scientist,Researcher",Work,0,0,70,20,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,100MB,,"Amazon Web services,Jupyter notebooks,NoSQL,Python",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Recommender Systems,Text Analytics",,,,,,,Often,,,,,,,,,,,,Often,,,,,Often,,,,,Often,,,,,0,30,60,5,5,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Scaling data science solution up to full database",,,,Often,,,,,,,,,,,,,,Sometimes,,,,,Less than 10% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Git,Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Bayesian Methods,R,"Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Textbook",,,Very useful,,,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Predictive Modeler,Statistician,Other",Self-taught,30,15,30,25,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,"10,000 or more employees",Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,Often,Most of the time,,,,Most of the time,,,,Most of the time,,,,60,10,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Often,,,,,,Most of the time,,,Most of the time,,Sometimes,Often,,51-75% of projects,More internal than external,Standalone Team,Public socio-economic data,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,10,10,20,0,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,31,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,TensorFlow,Neural Nets,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning",Logistic Regression,A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Laptop or Workstation and local IT supported servers,Text data,Always,10MB,Regression/Logistic Regression,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Text Analytics",,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,70,0,10,0,20,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues",,,Often,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,IT Department,none;,cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,45000,SGD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,India,43,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,42,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,DataCamp,edX,Udacity,Other","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,10,0,75,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,Germany,67,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Cluster Analysis,R,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,15+ years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Physics,,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,Time Series,"Bayesian Techniques,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,"Data Machina Newsletter,FastML Blog,KDnuggets Blog",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,0,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Cluster Analysis,Java,I collect my own data (e.g. web-scraping),"Personal Projects,YouTube Videos",,,,,,,,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,More than 10 years,"Business Analyst,Computer Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,90,0,5,5,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Non-profit,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,1GB,"Evolutionary Approaches,Neural Networks","Java,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Evolutionary Approaches,Neural Networks",,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,10,40,10,40,0,0,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,,,,Often,Often,,,,,,,,,,100% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other",Company Developed Platform,,Git,Most of the time,"80,000",GBP,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,Self-taught,70,10,10,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",,Technology,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Image data,Text data,Relational data",Most of the time,,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,MATLAB/Octave,NoSQL,Python,R",,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction",,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,23,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hong Kong,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Data Scientist,Work,30,0,50,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Technology,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,0,20,0,10,20,50,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Most of the time,,Most of the time,Sometimes,,Often,Most of the time,,Most of the time,Often,,,,Sometimes,,,Often,Most of the time,Often,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,60000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,,Self-taught,80,0,0,10,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,Never,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Segmentation",,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,0,0,0,0,0,100,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,56000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,66,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,30,10,40,0,0,Natural Language Processing,Logistic Regression,Primary/elementary school,Academic,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",,,,"Amazon Web services,Python,R",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Simulation,Text Analytics",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,45,25,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed part-time,,,Yes,,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - GANs,Neural Networks - RNNs",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by government,Spark / MLlib,Random Forests,Scala,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Data Analyst,Data Scientist,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),Primary/elementary school,Government,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Sometimes,<1MB,Regression/Logistic Regression,"R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,,"Cross-Validation,Data Visualization,Other",,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Often,30,30,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,100% of projects,Entirely external,Other,household survey data,not enough variables/information,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Most of the time,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Brazil,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,20,20,10,30,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,<1MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Stan,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Predictive Modeler",University courses,30,10,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,100GB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Time Series Analysis",,,Sometimes,,,Often,,,,Sometimes,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,,,50,20,15,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,Sometimes,,,,,,,,Often,,Often,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,250000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Australia,35,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,,1 to 2 years,"Business Analyst,Other",University courses,0,40,0,10,50,0,Time Series,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Russia,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,,Not Useful,Somewhat useful,,,,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,35,25,5,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,100 to 499 employees,Increased slightly,3-5 years,A tech-specific job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,SVMs,Other","Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Often,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Rarely,,,,Rarely,,Often,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Often,,Sometimes,,,Sometimes,,,,Often,,,Most of the time,,Sometimes,,Sometimes,,,,,Sometimes,Most of the time,,,,,50,15,30,5,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Sometimes,,,,Often,Sometimes,,,Rarely,,,Most of the time,Often,,10-25% of projects,Do not know,IT Department,we data,"volume, data is not structured","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Bitbucket,Git",Always,74000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Perl,Association Rules,Python,"Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,15,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Simulation,Text Analytics,Time Series Analysis",,,,,,Often,Often,Sometimes,,,,,,Rarely,,Often,,Sometimes,Sometimes,,,,,,,,Often,,Sometimes,Sometimes,,,,60,10,5,15,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,,,,,,,,,,,,,,,Most of the time,Often,,51-75% of projects,Entirely external,Business Department,"Bloomberg, Factset, Thomson/Reuters","Lack of centralized data repository, data licensing issues","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,172000,USD,Has decreased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Self-employed,Amazon Web services,Neural Nets,Python,Google Search,Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer,Other",University courses,0,0,90,10,0,0,,,A bachelor's degree,Financial,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Sometimes,1TB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Python,Spark / MLlib,SQL",,Often,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Text Analytics",,,,,,,Sometimes,,,,,,,,,Sometimes,,,Most of the time,Sometimes,,,,,,,,,Most of the time,,,,,20,40,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Privacy issues",Sometimes,,,Most of the time,,,,,Sometimes,,,,,,,,Sometimes,,,,,,100% of projects,More external than internal,Business Department,SEC,Obtaining the data is difficult,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,239000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Miner,Software Developer/Software Engineer",University courses,20,40,20,20,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,"5,000 to 9,999 employees",Increased significantly,Less than one year,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1TB,"Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Spark / MLlib",,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems",Sometimes,,,,Often,,Often,Often,,,,,,,,,,Often,Often,,Often,,,Often,,,,,,,,,,20,30,20,20,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Google Search,"Arxiv,College/University,Kaggle,Textbook,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,,,,,,Very useful,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,,,,,,,,,Somewhat useful,"Data Elixir Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Financial,"10,000 or more employees",Increased slightly,Don't know,An external recruiter or headhunter,Somewhat important,Other,Basic laptop (Macbook),Relational data,,1GB,,"Jupyter notebooks,Python,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression",,,,,,Sometimes,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,51-75% of projects,Do not know,Central Insights Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Subversion,Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +A different identity,Russia,31,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Excel Data Mining,Decision Trees,,I collect my own data (e.g. web-scraping),"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Very useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,,Sort of (Explain more),Bachelor's degree,,I don't write code to analyze data,"Business Analyst,Researcher,Other",Self-taught,30,60,0,10,0,0,,Logistic Regression,A professional degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,YouTube Videos",,Very useful,Very useful,,,,Very useful,Very useful,,,Very useful,Very useful,Very useful,,,,,Very useful,"O'Reilly Data Newsletter,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,25,0,0,50,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Always,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Rarely,Often,,,Often,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,,Often,Often,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Often,Most of the time,Most of the time,Most of the time,,Most of the time,Sometimes,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,40,25,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Bitbucket,,162000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Brazil,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Time Series Analysis,Python,"University/Non-profit research group websites,Other","College/University,Friends network,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Engineer,Programmer",University courses,0,0,20,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Always,,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,Often,Often,,Most of the time,Most of the time,Sometimes,,,,,,,,Often,,Sometimes,Most of the time,Often,Often,,Sometimes,,,,,,Most of the time,,,,,10,30,10,40,10,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,100% of projects,Entirely internal,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Other,Network,"Bitbucket,Git",Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Other,Other,Python,Other,"Arxiv,Blogs,College/University,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,,"FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Researcher,University courses,50,10,30,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,100 to 499 employees,Increased significantly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data",,1GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Amazon Web services,Java,NoSQL,Python,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",Rarely,,Sometimes,,,,,Rarely,,,,,,Most of the time,,Rarely,,Rarely,Most of the time,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,20,20,0,10,50,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Other,imagenet,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Git,,"30,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,66,Retired,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Kaggle competitions,20,30,0,0,50,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Programmer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",5,80,15,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Technology,I prefer not to answer,Increased significantly,3-5 years,Some other way,Important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Rarely,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Naive Bayes,Random Forests",,,,,,,,Sometimes,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Git,Always,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",80,15,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not at all important,Other,"Basic laptop (Macbook),Traditional Workstation",Other,Never,1MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,50,25,25,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",I prefer not to answer,Technology,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",,1GB,"Decision Trees,Regression/Logistic Regression",RapidMiner (free version),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Government website,"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Hospitality/Entertainment/Sports,,,,,Important,Other,Basic laptop (Macbook),Relational data,Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Java,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Recommender Systems",Often,,,Often,,Most of the time,Most of the time,Often,Often,,,,,,,,,,,,Often,,,Often,,,,,,,,,,80,5,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,,,,,Sometimes,,,Sometimes,Sometimes,,76-99% of projects,Entirely internal,Business Department,UCI Machine Learning Repo,Data cleaning and wrangling.,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,"90,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Chile,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,40,20,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Sometimes,10GB,"Neural Networks,Random Forests,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python",,,,,,,,,,,,,,,,,Often,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Often,,,,,,,Sometimes,,,,,,Sometimes,Often,,Most of the time,,,,,Most of the time,,Sometimes,,,,40,35,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Limitations of tools,Unavailability of/difficult access to data",,,,,,Often,,,,,,,Often,,,,,,,,Most of the time,,76-99% of projects,Approximately half internal and half external,Standalone Team,"IMS dataset, FEMTO data set ",get data from real facilities and desing a proper way to store and retrieve the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Rarely,30000000,CLP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by government,MATLAB/Octave,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"Emergent/Future Newsletter (Algorithmia),FastML Blog,Jack's Import AI Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Professional degree,,1 to 2 years,Engineer,Self-taught,30,50,0,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important +Male,United States,27,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Decision Trees,Python,Government website,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Necessary,,,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer",University courses,10,30,0,50,10,0,"Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Russia,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Random Forests,R,"Government website,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses",,Very useful,,,,,Somewhat useful,,Not Useful,,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,DataCamp,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Management information systems,Less than a year,Programmer,Self-taught,50,20,10,0,10,10,"Computer Vision,Natural Language Processing,Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,45,Employed part-time,,,No,Yes,Business Analyst,Poorly,Employed by non-profit or NGO,SAP BusinessObjects Predictive Analytics,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),,Kaggle Competitions,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Statistician,Other",Self-taught,50,35,5,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,R,Text Mining,Python,Google Search,"Blogs,Online courses,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Social Network Analysis,Python,"Google Search,Government website","Blogs,Kaggle,Podcasts,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,Somewhat useful,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Researcher,Software Developer/Software Engineer",Self-taught,10,5,80,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,,1TB,"Regression/Logistic Regression,Other","C/C++,Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,RapidMiner (free version),Spark / MLlib,SQL",,,,Rarely,,,,,Most of the time,,Rarely,,,,,,Often,,,,Rarely,Sometimes,Often,,Rarely,,Rarely,,,,Most of the time,,Rarely,,Rarely,,,,,,Rarely,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Segmentation",,,Rarely,,,,Most of the time,Sometimes,,,,,,,,Often,,,,Sometimes,,,,,,Often,,,,,,,,50,10,5,15,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Often,,,,,,Often,Sometimes,,26-50% of projects,More internal than external,IT Department,cense; open street map,dirty identifiers,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,"DataTau News Aggregator,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,Less than a year,Programmer,Self-taught,75,10,5,5,5,0,Time Series,Bayesian Techniques,A master's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Always,10GB,Bayesian Techniques,"R,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Most of the time,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Sometimes,,,,0,35,0,35,30,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Sometimes,,USD,,5,,,,,,,,,,,,,,,,,, +Male,United States,55,"Not employed, but looking for work",,,,,,,,Tableau,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Trade book",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,Other,University courses,10,15,0,65,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,,,,,,,,,,,,,,,, +Male,Canada,45,"Not employed, but looking for work",,,,,,,,R,Deep learning,Matlab,"GitHub,University/Non-profit research group websites","Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Very useful,"Data Machina Newsletter,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"GPU accelerated Workstation,Workstation + Cloud service",11 - 39 hours,Master's degree,Yes,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,30,0,0,40,0,"Machine Translation,Supervised Machine Learning (Tabular Data),Survival Analysis","Markov Logic Networks,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,16-20,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,Taiwan,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,Hadoop/Hive/Pig,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Text Analytics",,,,,,,Often,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,,80,10,0,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others",Most of the time,Often,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,85000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Ireland,45,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,"Siraj Raval YouTube Channel,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Academic,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Image data,Most of the time,100MB,CNNs,"C/C++,Jupyter notebooks,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,CNNs,Data Visualization,Evolutionary Approaches,Neural Networks",Often,,,Most of the time,,,Often,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,15,60,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,Sometimes,,Often,,Sometimes,Most of the time,Sometimes,,,,Sometimes,,51-75% of projects,More external than internal,Standalone Team,Yahoo,The YUV etc format of the image data.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,EUR,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Taiwan,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Kaggle,Online courses,Personal Projects",,Very useful,,Very useful,,,Very useful,,,,Very useful,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,30,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,1MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,Tableau,Unix shell / awk",Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",,Most of the time,Most of the time,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Personal Projects,Textbook,YouTube Videos",,Not Useful,Somewhat useful,,,,,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,,3-5 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,10,80,0,10,0,0,Natural Language Processing,"Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,39,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,,,,Not Useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,DBA/Database Engineer",Self-taught,50,30,0,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Other,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,1TB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft SQL Server Data Mining,NoSQL,Python,QlikView,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,Rarely,,,,Rarely,,Most of the time,,,,Most of the time,Sometimes,Rarely,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Neural Networks,Recommender Systems,SVMs",,,,,,Often,,Often,Often,,,,,,,,,,,Most of the time,,,,Often,,,,Often,,,,,,10,40,10,0,40,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,"None, all datasets are internal",the amount of data to process by time unit,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Rarely,33000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Java,Time Series Analysis,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,1-2 years,,,,,,,,,,,,,,Coursera,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,University courses,5,40,0,50,5,0,"Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,No,Yes,Other,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",A master's degree,Other,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,GPU accelerated Workstation,Image data,Most of the time,1GB,"Bayesian Techniques,SVMs","C/C++,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,TensorFlow",,,,Most of the time,,,,Rarely,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Naive Bayes,PCA and Dimensionality Reduction,Segmentation",Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,Often,,,,,Often,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,Often,,,Often,,,,100% of projects,More internal than external,Other,,Accurately filtering out noisy samples,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Mercurial",Sometimes,,,Has decreased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer",University courses,30,50,10,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,,,Often,,,,Rarely,Sometimes,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,Often,,,,,"Association Rules,CNNs,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Time Series Analysis",,Sometimes,,Sometimes,Often,,Most of the time,Most of the time,Most of the time,,,Sometimes,,Most of the time,Often,Often,,Often,,Often,Most of the time,,Most of the time,Often,Sometimes,Often,Sometimes,Sometimes,,Often,,,,60,30,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",Sometimes,Often,,Often,Most of the time,,,Sometimes,Often,,Often,,,Sometimes,Often,,,Often,,,,,100% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,Git,Rarely,5000000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,R,Support Vector Machines (SVM),R,"Government website,University/Non-profit research group websites","College/University,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,,Somewhat useful,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Fine arts or performing arts,6 to 10 years,"Data Analyst,Researcher",Self-taught,80,0,10,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,75,5,5,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,Often,Most of the time,,Often,Most of the time,Sometimes,,,76-99% of projects,More internal than external,IT Department,,Lack of documentation for dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"94,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Poland,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Time Series,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Amazon Web services,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book",,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Predictive Modeler,Researcher",University courses,35,10,35,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,1MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SAS Enterprise Miner,SQL",,Rarely,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,Often,,Often,,Sometimes,,,,Rarely,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Rarely,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,Sometimes,,,Often,Often,,,,65,20,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,Rarely,,Rarely,Rarely,Rarely,,,,Rarely,,100% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,45,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by college or university",DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,Very useful,,,Somewhat useful,,,,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Data Analyst",Work,20,10,70,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs",High school,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,DataRobot,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,Often,,Rarely,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,Sometimes,,Often,Often,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",Often,Often,Most of the time,Sometimes,Often,Most of the time,,Sometimes,Most of the time,,,Most of the time,Sometimes,Often,Most of the time,,,Rarely,,,Often,,Often,Often,,Most of the time,,,Sometimes,,,,,30,20,10,0,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Most of the time,,,,Often,,,,,Often,,Most of the time,Rarely,,10-25% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Rarely,10000000,JPY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,Mathematica,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,Very useful,,,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,60,10,0,10,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Simulation,SVMs",,,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,,,Often,,,Sometimes,,,,Often,Often,,,,,,50,30,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","I prefer not to say,Unavailability of/difficult access to data",,,,,,,Often,,,,,,,,,,,,,,Often,,51-75% of projects,More external than internal,Standalone Team,,noise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,"13,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Other,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,10,0,5,55,0,,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Other,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,R,Deep learning,SAS,I collect my own data (e.g. web-scraping),"Arxiv,Online courses,Textbook,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,,,Very useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher",Work,30,20,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"5,000 to 9,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation",,,,,,,,Often,Sometimes,,,Sometimes,,,Sometimes,Most of the time,,,,,,Most of the time,Often,,,Sometimes,,,,,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others",,,,Sometimes,Often,Often,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,Credit beuaureu data,Accuracy,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,60000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,69,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Engineer,,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Fine arts or performing arts,3 to 5 years,"Engineer,Programmer",University courses,10,0,0,70,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed part-time,,,Yes,,Engineer,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,Logistic Regression,A doctoral degree,Academic,20 to 99 employees,,,A general-purpose job board,Not very important,Other,Traditional Workstation,,,,"Neural Networks,Regression/Logistic Regression",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,34,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Statistician",Work,30,0,50,0,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Random Forests,Segmentation,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,,Rarely,,,,,30,20,10,10,10,20,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Need to coordinate with IT",Often,,Sometimes,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,53,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Online courses,Personal Projects,Other",,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Predictive Modeler,Statistician,Other",Work,40,10,40,0,0,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",High school,Insurance,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,10GB,"GANs,Gradient Boosted Machines,Other","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Simulation,Other",Often,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,Often,,,,,,,,,,,Often,,,,Most of the time,,,40,30,10,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Often,,,,,,,,,,Most of the time,,,,Often,Often,Often,Sometimes,,Sometimes,,26-50% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,,Sometimes,80000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,South Korea,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Time Series Analysis,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook",,Very useful,,,,,,,,,,,,,Very useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,30,0,0,70,0,0,"Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",High school,Government,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Always,100GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","NoSQL,R,SAS Base,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,Most of the time,,,,,,,,,,,Rarely,,,"Decision Trees,Logistic Regression,Neural Networks,Segmentation,Simulation,Time Series Analysis,Other",,,,,,,,Sometimes,,,,,,,,Often,,,,Sometimes,,,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,30,20,0,10,40,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others",,Often,,,,Often,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,public dataset at HIRA,public dataset at HIRA,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,Git,Never,"45,000,000",,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,3 to 5 years,Data Miner,University courses,0,0,0,100,0,0,Time Series,Markov Logic Networks,,Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Video data,Sometimes,10TB,"Bayesian Techniques,RNNs",Angoss,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,0,0,0,100,0,0,,Lack of data science talent in the organization,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Friends network,Kaggle,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,,,,,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,30,0,10,50,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Non-profit,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Relational data",Never,10MB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,Often,,,,,,Often,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",Often,,Sometimes,Sometimes,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Often,,Sometimes,,Sometimes,Often,,Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,,40,20,0,40,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,Often,,Often,Often,,,,,Often,,Often,,,Often,Often,,,76-99% of projects,Entirely internal,Other,,Lack of clear direction.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,"71,000",CAD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,Very useful,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",Self-taught,30,15,45,10,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,Regression/Logistic Regression,"Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,QlikView,SQL,Tableau",,Sometimes,,,,,,Rarely,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",Rarely,,,,,,Most of the time,Sometimes,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,Often,Often,,,,20,40,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,Often,,,Rarely,,,,Sometimes,,Often,,,,,,Often,,100% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,130000,BRL,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Engineer,Researcher",Self-taught,20,20,40,0,0,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Retail,"10,000 or more employees",Increased significantly,3-5 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,R,SAS Base,SQL",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,Most of the time,,,,Sometimes,,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,70,20,5,2.5,2.5,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,R,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,20,10,10,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Italy,57,Employed full-time,,,Yes,,Other,Perfectly,Employed by government,Microsoft Azure Machine Learning,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,"Data Analyst,Data Miner,Data Scientist",University courses,15,50,0,10,25,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Government,100 to 499 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Relational data,Other",Never,1MB,"Bayesian Techniques,Regression/Logistic Regression","Microsoft Excel Data Mining,R,SAS Base,Other",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,,,Rarely,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression",Sometimes,Most of the time,Sometimes,,,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,15,10,20,20,35,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,Often,Often,,,Sometimes,,Often,Most of the time,,,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,100% of projects,Approximately half internal and half external,Standalone Team,public health from medical and veterinary surveillance ,"make risk managers understan potential of data but also limitations and challenges of data mining. risk managers are so unawere of data analysis that they use them just for management metrics, some believe data mining is somthing that answer their needs, others have no understanding that data can drive and advise their decision. I believe this is because I work in a public health system.",,"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,45000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,37,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,SQL,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,Necessary,Unnecessary,,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Other,Yes,Professional degree,,1 to 2 years,"Engineer,Researcher",Other,5,0,0,20,0,75,"Reinforcement learning,Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,,Very Important,,,Very Important,,Very Important,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,39,Employed full-time,,,Yes,,Programmer,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",Amazon Machine Learning,Neural Nets,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,"Data Stories Podcast,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,80,0,10,10,0,0,Natural Language Processing,Bayesian Techniques,"Some college/university study, no bachelor's degree",Financial,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,Bayesian Techniques,"Amazon Web services,NoSQL,Python",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Natural Language Processing",,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Other",,,,,Often,,,,,,,,,,,,Often,,,,,Most of the time,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,"Bitbucket,Git",Sometimes,25000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,Not Useful,Very useful,Very useful,Not Useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,,Kaggle competitions,45,35,5,5,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,,Don't know,Some other way,Somewhat important,Other,Basic laptop (Macbook),Relational data,Rarely,<1MB,,Mathematica,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,5,5,0,5,5,80,"Enough to code it again from scratch, albeit it may run slowly","The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Often,,51-75% of projects,Entirely internal,Other,,Generating it in the lab.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,"20,000",,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Scientist",University courses,25,40,10,20,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Conferences,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,Somewhat useful,,Very useful,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,R Bloggers Blog Aggregator,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Statistician",University courses,30,5,10,50,5,0,"Machine Translation,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,India,33,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,5,10,5,70,0,10,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,France,35,Employed full-time,,,No,Yes,Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Google Cloud Compute,Deep learning,Python,,"Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Physics,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",90,0,0,0,10,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6 to 10 years,"Data Scientist,Operations Research Practitioner,Researcher",Self-taught,40,20,30,5,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Retail,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Other",,,,Sometimes,Most of the time,,,,Most of the time,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Rarely,Often,,,,Sometimes,,,Often,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,,,Sometimes,Sometimes,Often,Most of the time,Often,Often,Sometimes,,Often,,Often,,Often,,Sometimes,Often,Often,Often,,Often,Sometimes,Often,,Sometimes,Often,Often,Sometimes,,,,25,25,40,5,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,,,Not Useful,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Computer Vision,,A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,Somewhat important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Video data,Never,,Other,"C/C++,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,Less than 10% of projects,Do not know,Other,,,Other,,,Git,Never,"8,500,000",JPY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,Python,Random Forests,R,Google Search,"College/University,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,Somewhat useful,,,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,Programmer,,0,20,0,80,0,0,Time Series,Logistic Regression,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Not important,Very Important +Male,Portugal,78,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Perfectly,Employed by college or university,Other,Other,C/C++/C#,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences",Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Researcher,Other",University courses,0,0,50,50,0,0,"Time Series,Other (please specify; separate by semi-colon)",Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Academic,,,,,Very important,Other,Laptop or Workstation and local IT supported servers,Other,Sometimes,10GB,"Evolutionary Approaches,Neural Networks,Other","C/C++,MATLAB/Octave",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis,Other",,,,,,,Often,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,Most of the time,Most of the time,,,0,80,0,0,20,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,,,,,Often,,,Often,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,Phoenix Mars Lander Database; UCI Machine Learning Repository; KONECT; ICES data,"Memory; Processing speed","Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Other",Dropbox; Google Drive,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,63000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Engineer,Programmer,Researcher,Other",Self-taught,50,10,30,0,10,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A doctoral degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Decision Trees,Random Forests","Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,Stan,Tableau,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Often,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,Rarely,,Rarely,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Lift Analysis,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,,Sometimes,Most of the time,,,,,,,,Rarely,,,,,,,Sometimes,Often,,,,Often,,Sometimes,Often,,,,40,10,5,30,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,Most of the time,Sometimes,Sometimes,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,None,Understanding ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Business Analyst,Fine,Self-employed,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),,A master's degree,Financial,Fewer than 10 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,100MB,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,50,20,0,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT",Sometimes,,,,Sometimes,,,Often,,,,,,,Sometimes,,,,,,,,51-75% of projects,More internal than external,IT Department,,Matching data across sources,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,120000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,50,30,20,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Financial,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,,"Bayesian Techniques,Decision Trees,Neural Networks","Amazon Web services,Microsoft Azure Machine Learning,SQL",,Often,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Natural Language Processing,Neural Networks,Text Analytics",,,Often,,,,Sometimes,Often,,,,,,,,,,,Often,Often,,,,,,,,,Often,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,A social science,,"Business Analyst,Data Analyst,Data Scientist,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,37,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,"Data Stories Podcast,Linear Digressions Podcast,No Free Hunch Blog",10-15 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"DataCamp,Udacity",Traditional Workstation,2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Kaggle competitions,50,5,0,20,25,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Very Important +Male,Canada,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,40,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Business Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Psychology,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Researcher,Statistician",Self-taught,100,0,0,0,0,0,,,A master's degree,Mix of fields,10 to 19 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,,,"Microsoft SQL Server Data Mining,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Often,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Engineer,University courses,20,40,15,25,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,NA,Retired,,,Yes,,Researcher,Poorly,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,60,20,15,0,5,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Rarely,10MB,Regression/Logistic Regression,"C/C++,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Rarely,,,Sometimes,,Often,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Often,Often,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Most of the time,,Sometimes,Often,Often,Most of the time,,Sometimes,Most of the time,,,,,,Often,,,,60,20,10,8,2,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",Often,,,,Most of the time,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Do not know,IT Department,,evaluatin performance,Key-value store (e.g. Redis/Riak),Company Developed Platform,,"Git,Other",Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,Data Analyst,Perfectly,Employed by professional services/consulting firm,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,"Data Analyst,Engineer,Software Developer/Software Engineer",University courses,40,20,10,0,20,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Google Search,"College/University,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,15,30,15,40,0,0,"Computer Vision,Unsupervised Learning",Logistic Regression,A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,1 to 2 years,"Business Analyst,Data Analyst",University courses,50,10,10,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Video data,Rarely,1MB,Bayesian Techniques,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Naive Bayes,Random Forests",Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,50,10,5,15,20,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),,,,,75000,,,7,,,,,,,,,,,,,,,,,, +Male,United States,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,Poorly,Self-employed,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,Somewhat useful,,Very useful,,,,,"Data Elixir Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Data Scientist,Predictive Modeler",Kaggle competitions,40,10,0,0,50,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,,Often,Often,,,Often,,Often,,Often,,,,,Often,,Often,,,,,Often,,,,,,10,40,40,10,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,"75,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Malaysia,42,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,60,30,5,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,500 to 999 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,SAS Enterprise Miner",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Most of the time,Often,Often,,,,,,,Most of the time,,,,Most of the time,Sometimes,,Most of the time,,,,Most of the time,Most of the time,Often,,,,,40,30,0,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Unavailability of/difficult access to data",Often,Sometimes,,,,,,Often,,,,,,,Often,,,,,,Most of the time,,76-99% of projects,More internal than external,Other,prefer not to say,proxy server,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,160000,MYR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +A different identity,People 's Republic of China,28,Employed part-time,,,No,Yes,Scientist/Researcher,,,Python,Deep learning,Python,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,,Nice to have,,,Nice to have,Unnecessary,,,,,Coursera,,11 - 39 hours,Kaggle Competitions,Yes,Doctoral degree,Computer Science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,"Computer Vision,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,Financial,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",,,,"Amazon Web services,Java,NoSQL,Perl,Python,R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Neural Networks,Segmentation,Simulation",,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,Often,Often,,,,,,,10,50,30,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Conferences,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,20,20,0,3,37,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Technology,,,,,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,Sometimes,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics",,,,,,,Often,,,,,,,,,Often,,,Rarely,,,,,,,,,,Rarely,,,,,25,10,0,10,55,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,Sometimes,,,,,,,,,,,,,Sometimes,,,,26-50% of projects,Entirely internal,Other,US Census Data; US Bureau of Labor Statistics,getting the third party data and proprietary data in the same format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,90000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Data Scientist,Work,30,0,0,70,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Non-profit,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Other,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Mathematica,Python,R,SAS JMP,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,Very useful,,,,,,Very useful,,,,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,20,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,Brazil,38,Employed full-time,,,No,Yes,Other,Perfectly,Employed by government,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,Other","Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,"Data Scientist,DBA/Database Engineer,Programmer,Other",University courses,50,30,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +A different identity,,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Tableau,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Not Useful,,Somewhat useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,30,20,40,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",,10GB,"Bayesian Techniques,CNNs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Perl,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other,Other",Often,Often,,Often,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,Most of the time,Rarely,,,,,Rarely,,,Often,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,Often,,Most of the time,Most of the time,Most of the time,,"Bayesian Techniques,CNNs,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,RNNs,Simulation,SVMs,Time Series Analysis",,,Most of the time,Most of the time,,,,,Most of the time,,,,Often,Often,,Often,Most of the time,Most of the time,Sometimes,,,,,,Most of the time,,Sometimes,Most of the time,,Most of the time,,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,51-75% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,Share Drive/SharePoint",,Git,Most of the time,107000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,,,,Somewhat useful,,,,,Very useful,,,,Very useful,"DataTau News Aggregator,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,40,20,10,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,Regression/Logistic Regression,"Amazon Web services,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,RapidMiner (commercial version),RapidMiner (free version),SAS Base,SQL,Tableau",,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,,,,,Often,,,,,,,,Often,,,Rarely,Rarely,,,Rarely,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests",,,,,,,Most of the time,Rarely,,,,,,Rarely,,Sometimes,,Rarely,,,,,Sometimes,,,,,,,,,,,40,10,10,20,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,Sometimes,Often,,,,,,,,Often,,,Most of the time,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Colombia,62,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Textbook",,,,,Very useful,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,University courses,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Image data,Video data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,RNNs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Time Series Analysis",,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,20,60,0,0,20,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,,Often,,,Often,Often,,Often,,,,,,Often,,Often,Often,,100% of projects,More internal than external,IT Department,Kaggle,Relevance,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,,Sometimes,150000000,COP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed part-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,R,Regression,R,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,Podcasts,YouTube Videos",,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Other,Sort of (Explain more),Master's degree,Other,I don't write code to analyze data,Other,Self-taught,90,4,6,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important +Male,Canada,25,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,,Very useful,,Very useful,,,Very useful,,Very useful,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Other,Sort of (Explain more),Master's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,Very Important,Not important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Brazil,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,Less than a year,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",45,25,2,23,5,0,"Survival Analysis,Unsupervised Learning",,High school,Pharmaceutical,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Time Series Analysis",Often,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,,,70,10,5,10,5,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools",,,Often,,,,,Most of the time,Often,,,,Often,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,"Bitbucket,Git",Never,18000,BRL,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Australia,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,35,20,10,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data",Never,100GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,HMMs,Naive Bayes,PCA and Dimensionality Reduction,SVMs",,,,Often,,Sometimes,Most of the time,Most of the time,,,,,Most of the time,,,,,Often,,,Most of the time,,,,,,,Most of the time,,,,,,30,40,0,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,,,,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",,,,,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,Python,Google Search,"Friends network,Personal Projects,Textbook",,,,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Operations Research Practitioner,Researcher,Other",Self-taught,30,0,40,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",High school,Other,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Decision Trees,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,Sometimes,,Sometimes,,,,,,,,Often,Most of the time,,,Sometimes,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,Rarely,,Rarely,,,,,,Often,,,Often,Sometimes,,,,,,,,,Most of the time,Sometimes,,,,50,20,10,5,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization",Often,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,220000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,70,15,0,5,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by company that makes advanced analytic software,,Genetic & Evolutionary Algorithms,Python,,"Blogs,Company internal community,Textbook,Other",,Somewhat useful,,Somewhat useful,,,,,,,,,,,Very useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Engineer,Researcher",University courses,20,0,40,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Other,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Regression/Logistic Regression","C/C++,MATLAB/Octave",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Evolutionary Approaches,Time Series Analysis",,,Sometimes,,,,Often,Often,,Often,,,,,,,,,,,,,,,,,,,,Often,,,,10,30,15,25,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Other,Never,"110,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Portugal,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,25,20,30,25,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Nigeria,34,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,33,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,Julia,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses",,Very useful,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,20,28,50,0,2,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Internet-based,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Rarely,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Simulation",Sometimes,,,,,Most of the time,Most of the time,,Sometimes,,,,,,Often,Often,,,Sometimes,,,,Often,,,Sometimes,Sometimes,,,,,,,30,25,15,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input",Most of the time,Most of the time,,Sometimes,,,,Most of the time,,,Sometimes,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,distribution of data changes significantly over short periods of time,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Other,AWS S3,Git,Sometimes,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Time Series Analysis,Other,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Newsletters,Personal Projects,Textbook,Trade book,YouTube Videos",Somewhat useful,Very useful,,,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",15+ years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Computer Scientist,DBA/Database Engineer,Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,50,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,"Kaggle,Other",,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Adversarial Learning,"Bayesian Techniques,Logistic Regression",A professional degree,CRM/Marketing,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,"Bayesian Techniques,Regression/Logistic Regression",SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Prescriptive Modeling,Segmentation",,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,Most of the time,,,,,,,,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,Often,,,Often,,,,,,,,,,,Often,,,Less than 10% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,75000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Other,Anomaly Detection,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Data Analyst,Data Scientist,Researcher,Other",Work,60,10,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,,,Rarely,Sometimes,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Most of the time,,Rarely,Most of the time,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,10,10,10,20,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,Most of the time,Often,,,,,Often,Often,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,59,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,Government website,"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,,,"Data Elixir Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,70,0,0,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Telecommunications,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,Tableau",,Sometimes,,,,,,,,,,,,,Rarely,,,,,,,Often,,Often,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Simulation,SVMs",Sometimes,,Often,,,Often,Often,Often,Often,,,Sometimes,,,,Often,,Often,,,,Sometimes,Often,,,,Sometimes,Often,,,,,,25,25,20,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Sometimes,,,,,,Sometimes,,,,,,Often,,,,,,Most of the time,Often,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,71,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",,,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,Other,Self-taught,20,10,60,10,0,0,,,A doctoral degree,Mix of fields,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Traditional Workstation,Workstation + Cloud service","Text data,Relational data,Other",,,,"Amazon Web services,C/C++,NoSQL,R",,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Segmentation,Simulation,Text Analytics",Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,Often,,,,,80,0,15,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,102759,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,14,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,Physics,1 to 2 years,I haven't started working yet,Self-taught,99,1,0,0,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Amazon Machine Learning,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Management information systems,6 to 10 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Other","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,Python,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Most of the time,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,,Often,Often,,,,Sometimes,Most of the time,,Often,,,Sometimes,,Most of the time,Sometimes,,Often,Rarely,,,,25,25,25,0,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Privacy issues",Often,,,,,,,,,,,,,,,,Often,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,Annotations ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,22000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Text Mining,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,,,Very useful,,,,,Somewhat useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Software Developer/Software Engineer",University courses,0,80,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Stayed the same,Don't know,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Relational data",Never,,,"Amazon Web services,Java,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Rarely,Rarely,,,Rarely,,,,,,,,,,,,,,10-25% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Bitbucket,Never,1200000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,46,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,Researcher,Self-taught,40,0,40,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Other (please specify; separate by semi-colon)",I don't know/not sure,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data,Other",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Neural Networks,Random Forests,RNNs,SVMs,Other","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Java,KNIME (free version),NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Python,R,RapidMiner (free version),Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,kNN and Other Clustering,Lift Analysis,Naive Bayes,Neural Networks,Random Forests,RNNs,Segmentation,SVMs,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,0,10,0,50,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Other,,,,,,,,,,,9,,,,,,,,,,,,,,,,,, +Female,Canada,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,,"Blogs,Non-Kaggle online communities",,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Data Analyst,Programmer",Other,20,10,20,0,0,50,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Financial,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",,100MB,"Decision Trees,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Random Forests",,,,,,,,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,10,10,0,10,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database",,Most of the time,,,,,,,Often,,,,,,,,Most of the time,Most of the time,,,,,76-99% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,"Data Elixir Newsletter,DataTau News Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,70,25,5,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,Often,,,,,Often,,Often,,,Often,,,,Often,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,Most of the time,,76-99% of projects,Entirely internal,Business Department,,Cleaning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,25515,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",Python,Text Mining,Python,GitHub,"Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,"FlowingData Blog,The Analytics Dispatch Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer",Work,30,10,40,10,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data,Other",Most of the time,100TB,"Bayesian Techniques,Ensemble Methods,Random Forests,RNNs,SVMs","Hadoop/Hive/Pig,IBM SPSS Statistics,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,Most of the time,,,Sometimes,,,Sometimes,,,,,,,,,,,,Most of the time,Sometimes,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,kNN and Other Clustering,Logistic Regression,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,,,,,,,,,,,Often,,Often,,,,,,,Most of the time,,Sometimes,,,Most of the time,Most of the time,Sometimes,,,,30,30,20,8,12,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,41,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Textbook",Very useful,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Computer Scientist,University courses,30,30,30,0,10,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,HMMs,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,Rarely,,,,,,,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Segmentation,SVMs",,Often,Often,,,Most of the time,Most of the time,Often,,,,,Often,Most of the time,,,,Often,Most of the time,,,,,,,Most of the time,,Most of the time,,,,,,30,10,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Most of the time,,,,,,Most of the time,,,,,,,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Most of the time,"120,000",CNY,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Predictive Modeler",University courses,20,0,30,50,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,Yes,,Programmer,Perfectly,Self-employed,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,Primary/elementary school,Internet-based,20 to 99 employees,Decreased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),"N/A, I did not receive any formal education",,,,,,,"Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Naive Bayes,Neural Networks,Random Forests,Recommender Systems",,,,,,Often,Most of the time,,,,,,,,,,,Sometimes,,Rarely,,,Often,Most of the time,,,,,,,,,,60,10,5,20,5,0,Enough to tune the parameters properly,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,1680000,RUB,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,10,0,0,70,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Java,Cluster Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,1 to 2 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,25,50,25,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Other",,,,"Amazon Web services,Hadoop/Hive/Pig,Java,Python,SQL",,Most of the time,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Most of the time,,,,,Often,,Sometimes,,,,,,Often,,Most of the time,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,"Git,Other",Sometimes,72000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,1-2 years,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,27,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",Very useful,,,,,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,,15+ years,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,,"Researcher,Software Developer/Software Engineer",Other,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by college or university,Employed by non-profit or NGO",IBM Watson / Waton Analytics,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,"Data Elixir Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Fine arts or performing arts,3 to 5 years,Statistician,University courses,40,20,10,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Perl,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Rarely,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,SVMs",,,Often,,,Most of the time,Most of the time,Sometimes,,,,,Sometimes,Often,,Most of the time,,Sometimes,,,,,Sometimes,,,,,Most of the time,,,,,,40,20,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,26-50% of projects,Entirely internal,Standalone Team,TCGA; CPTAC; 1000 genomes; HPP,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"50,000",USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Stan,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,,"Partially Derivative Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,DBA/Database Engineer,Predictive Modeler,Software Developer/Software Engineer,Other",Self-taught,70,5,20,0,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Financial,500 to 999 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,Stan,TensorFlow",,Often,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Most of the time,Rarely,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Most of the time,,,,,Most of the time,,Most of the time,,,,,,,Most of the time,,,,40,20,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Sometimes,Sometimes,,Often,Most of the time,,,,,Often,,Often,Sometimes,Sometimes,,Often,,,26-50% of projects,More internal than external,Business Department,FRED; BLS; EUROSTAT;,Size; dirty data.,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Sometimes,300000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Other,Self-taught,20,50,0,0,30,0,,,,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,Technology,500 to 999 employees,Increased significantly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,35,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,,Python,,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,1-2 years,Necessary,,,,Necessary,,,,,,,,,,,0 - 1 hour,,No,Professional degree,,I don't write code to analyze data,Programmer,Self-taught,50,50,0,0,0,0,,,A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Engineer",University courses,10,10,40,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,25,"Not employed, but looking for work",,,,,,,,R,Regression,SQL,"GitHub,Google Search","College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,,,,,,,,,,,,,,,Laptop or Workstation and local IT supported servers,,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,50,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,Other,Other,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,46,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Random Forests,R,I collect my own data (e.g. web-scraping),College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Kaggle Competitions,Yes,Master's degree,Management information systems,More than 10 years,"Business Analyst,Computer Scientist,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Recommendation Engines,Decision Trees - Random Forests,High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,United States,38,"Not employed, but looking for work",,,,,,,,SAS Base,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses",,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,,,,edX,,,Kaggle Competitions,No,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,50,0,0,40,10,0,Time Series,Decision Trees - Gradient Boosted Machines,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,Python,,"College/University,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Not Useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,"No Free Hunch Blog,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,80,10,0,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks",A bachelor's degree,Military/Security,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Random Forests","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Base,Tableau,Unix shell / awk",,Rarely,,,,,,,,,,,,,Sometimes,,Often,,,,Sometimes,,Most of the time,,,,Rarely,,,,Often,,Sometimes,,,,,Most of the time,,,,,,,Often,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests,Segmentation,Time Series Analysis",Often,,,,,Most of the time,Often,Sometimes,,,,,,Rarely,,,,,,,,,Often,,,Often,,,,Often,,,,30,10,NA,20,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,Sometimes,Often,Often,,,Often,,,,Often,,Often,,,,Often,Often,,,10-25% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",,36000,GBP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by government,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Kaggle,Newsletters,Stack Overflow Q&A,Trade book,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,,,,,Very useful,,Somewhat useful,,Somewhat useful,"Data Machina Newsletter,Jack's Import AI Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Relational data",Most of the time,10GB,"CNNs,Evolutionary Approaches,HMMs,Markov Logic Networks,SVMs","Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow",,,,,,,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,"CNNs,Evolutionary Approaches,Markov Logic Networks,Neural Networks,SVMs",,,,Most of the time,,,,,,Often,,,,,,,Rarely,,,Most of the time,,,,,,,,Sometimes,,,,,,40,30,20,10,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,Often,,Most of the time,,,,,Most of the time,,,,,Most of the time,,10-25% of projects,More internal than external,IT Department,google,debug,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Always,84,TWD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,Udacity","GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Female,United States,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,3 to 5 years,Statistician,University courses,15,0,50,35,0,0,,,A professional degree,Government,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,,"Amazon Web services,R,SAS Base,SQL",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,95,0,0,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Privacy issues",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,"Data Analyst,Data Scientist,Other",Self-taught,10,35,15,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics",,,,,,Sometimes,Often,Often,Often,,,Often,,,Often,Often,,,Sometimes,,,Often,Rarely,,,Sometimes,Often,,Sometimes,,,,,35,15,25,15,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Often,,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,76-99% of projects,More internal than external,Central Insights Team,Financial bureaus,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,90000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,DataCamp,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,A social science,Less than a year,,Self-taught,100,0,0,0,0,0,,,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important +Female,United States,32,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,30,10,20,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Other",University courses,35,30,15,15,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,500 to 999 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,10MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Julia,Mathematica,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,Rarely,,,,Rarely,Sometimes,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Rarely,,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,,,,,,,Often,Sometimes,,,Often,Most of the time,,,Most of the time,,,,30,20,5,25,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",Sometimes,Often,,,Often,,,Sometimes,Often,,,,,,,,,,,,Most of the time,,51-75% of projects,More internal than external,Other,FRED; Quandl;,Availability of (high-quality) source data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,,Sometimes,120000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,DBA/Database Engineer",Work,25,25,25,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,Amazon Machine Learning,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher",University courses,60,0,25,15,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",High school,Financial,,,,,Important,Other,Workstation + Cloud service,Other,Sometimes,1GB,"Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Minitab,Python,Spark / MLlib,TensorFlow,Other,Other",,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,Sometimes,,,Most of the time,Most of the time,,"Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Simulation,Time Series Analysis",,,,,,Most of the time,,,,Often,,,,Often,,Sometimes,,,,Sometimes,,,Most of the time,Sometimes,,,Most of the time,,,Most of the time,,,,30,40,30,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,"Stock, currency, commodity quotes from Interactive Brokers",Limitations on download speed,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,,USD,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,57,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,Very useful,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,,University courses,89,5,0,1,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer,Operations Research Practitioner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,Primary/elementary school,Mix of fields,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Never,100MB,,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,90,0,0,10,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,Often,,,,Often,,,51-75% of projects,More internal than external,Business Department,,,,,,,Rarely,60,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,,,Very useful,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",University courses,40,20,30,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Python,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,Often,,Sometimes,,Often,,,,,Often,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,,Often,Often,Most of the time,Often,Most of the time,,,Most of the time,,Often,,Often,,Sometimes,Sometimes,Sometimes,Often,,Often,,,,,Most of the time,Most of the time,Often,,,,30,20,10,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,Sometimes,,Most of the time,,,,,,,Often,,,,,,,,Often,Most of the time,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,Python,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,0,10,10,0,,,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,27,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,30,0,40,10,0,Unsupervised Learning,Logistic Regression,A bachelor's degree,Academic,"10,000 or more employees",Increased significantly,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",80,10,10,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - GANs",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Text data,Rarely,,,"Hadoop/Hive/Pig,Python,R,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,"Data Visualization,Segmentation",,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,0,0,0,0,0,0,,"Data Science results not used by business decision makers,Privacy issues",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Other",University courses,10,15,35,35,5,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests",,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,,Sometimes,Rarely,,,,,,,,,,,50,10,15,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Most of the time,Often,,,Sometimes,,Sometimes,,,Often,Often,,Most of the time,Sometimes,,,Often,,10-25% of projects,Approximately half internal and half external,Other,,Cleaning and preparing the modeling dataset. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,BRL,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,Engineer,University courses,30,10,40,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Military/Security,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Image data,Text data",Rarely,100GB,"CNNs,Neural Networks,Random Forests","Jupyter notebooks,NoSQL,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,,,,Sometimes,Most of the time,,,,,,"CNNs,Natural Language Processing,Neural Networks,Random Forests",,,,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,,,Sometimes,,,,,,,,,,,60,30,NA,0,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,50,20,0,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,10 to 19 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Other,Never,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,RNNs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,RNNs,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,Sometimes,,,,,,,,,,,Often,,,,,Often,,,,,Often,,,,30,10,0,15,45,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,Sometimes,Sometimes,Sometimes,,Sometimes,Often,Sometimes,,Rarely,,,,Sometimes,Most of the time,,,Most of the time,,,100% of projects,More internal than external,Standalone Team,,absence of diversity in data and difficulty to collect real world data due to privacy constraints,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,53000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Official documentation,Online courses,YouTube Videos",,,Very useful,,Somewhat useful,,,,,Somewhat useful,Very useful,,,,,,,Very useful,,3-5 years,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Work,40,20,0,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Female,Other,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Researcher,University courses,0,33,34,33,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Pharmaceutical,"5,000 to 9,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp,Other",,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,Google Search,"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,"FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Very Important +Male,United States,57,"Not employed, but looking for work",,,,,,,,DataRobot,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Becoming a Data Scientist Podcast,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,"Coursera,DataCamp",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",9,90,0,0,1,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,Netherlands,66,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Poorly,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Other,University/Non-profit research group websites,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,Computer Scientist,Self-taught,90,0,0,10,0,0,Computer Vision,Ensemble Methods,,Military/Security,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,Regression/Logistic Regression,"C/C++,Unix shell / awk,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,"Data Visualization,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,10,40,20,30,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,,Never,6000,EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Colombia,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Matlab,,"Arxiv,College/University,Conferences,Online courses,Stack Overflow Q&A,Textbook",Very useful,,Very useful,,Very useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Academic,500 to 999 employees,,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Other",Most of the time,1GB,"Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,Tableau",,,,,,,,,Often,,,,,,,,Most of the time,,,,Often,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Often,,,,Rarely,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,Often,Often,Often,,,,,,,Often,,Often,,,,,Often,,,,,,,,,Often,,,,45,15,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,Sometimes,Sometimes,,,,,,,Often,Most of the time,Most of the time,,,,76-99% of projects,More internal than external,Standalone Team,,dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,,Somewhat useful,,,Very useful,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,Nice to have,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,University courses,15,10,20,40,15,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,France,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,DataRobot,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,30,0,70,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Telecommunications,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data,Other",Most of the time,1MB,"Bayesian Techniques,Ensemble Methods,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Jupyter notebooks,MATLAB/Octave,Minitab,Python,R,Spark / MLlib,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,Sometimes,,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,Rarely,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,Sometimes,,,Most of the time,Most of the time,Sometimes,Often,,,,,Often,,Often,Sometimes,Often,,Often,Most of the time,,Most of the time,Often,,,,Most of the time,,,,,,40,10,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Often,Often,,Often,,,,Often,Often,Often,,,,,,,,,,,,,100% of projects,Do not know,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,,Most of the time,90000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,Other,Other,Python,GitHub,"Arxiv,Kaggle,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,,,,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Computer Scientist,Self-taught,0,0,0,0,100,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A doctoral degree,Internet-based,,,,,Not very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Rarely,1GB,Other,"C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs",,,Most of the time,,,,,Most of the time,,,,,,Most of the time,,Most of the time,,Most of the time,,Sometimes,,,Most of the time,Sometimes,,,,Sometimes,,,,,,80,10,10,0,0,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,,100000,GBP,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,Very useful,,Very useful,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Speech Recognition","Decision Trees - Random Forests,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Very useful,,,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Very useful,"Linear Digressions Podcast,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,Software Developer/Software Engineer,Self-taught,70,20,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",High school,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,RNNs","Amazon Web services,C/C++,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,Often,,,,,,,,,Sometimes,,Sometimes,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests,RNNs,Simulation,Time Series Analysis",,,Most of the time,Often,,Often,Most of the time,Most of the time,Most of the time,,,,,Often,,,,Sometimes,,Most of the time,,,Most of the time,,Often,,Most of the time,,,Most of the time,,,,65,10,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Sometimes,,,,Often,Often,,,,,,,,,Often,Most of the time,,,100% of projects,More internal than external,IT Department,stocks;twitter;kaggle;quantopian,Cleaning the data efficiently without throwing out too much of it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,75000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Norway,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,15,40,20,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Internet-based,"5,000 to 9,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,Often,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,R,Google Search,"Company internal community,Personal Projects",,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,"Data Elixir Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,"Data Analyst,Programmer,Other",University courses,10,0,0,90,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,A bachelor's degree,Technology,"5,000 to 9,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Don't know,10MB,,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Often,,,Most of the time,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,20,0,20,60,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations of tools,Privacy issues",Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,None,More internal than external,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,,,5,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,6 to 10 years,,University courses,40,20,20,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Other,10 to 19 employees,Increased significantly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Conferences,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,,Very useful,Somewhat useful,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,3 to 5 years,,Self-taught,85,5,0,0,0,10,,,A master's degree,Academic,500 to 999 employees,Increased slightly,Don't know,Some other way,Somewhat important,Other,,Relational data,Never,,,"IBM Watson / Waton Analytics,Python,R,SQL",,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,Often,,,,,,,,,,Often,Sometimes,,,,60,10,10,15,5,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,,Often,,,,,Sometimes,,,,,,Sometimes,,76-99% of projects,Entirely external,,"Peri, pew, gss",Transfer to R clean data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"63,000",,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Russia,32,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,57,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,C/C++,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Data Machina Newsletter,No Free Hunch Blog,The Analytics Dispatch Newsletter",< 1 year,,,Nice to have,,,Necessary,Nice to have,,,,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Professional degree,,I don't write code to analyze data,"DBA/Database Engineer,Programmer,Other",Work,40,0,45,0,0,15,Other (please specify; separate by semi-colon),Bayesian Techniques,"Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Female,Australia,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,DataRobot,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,Other,University courses,40,20,10,15,15,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",High school,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","C/C++,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R",,,,Rarely,,,,,,,,,,,,Sometimes,Sometimes,,,Most of the time,Most of the time,Rarely,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Simulation",,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,15,35,0,15,35,0,Enough to refine and innovate on the algorithm,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Jack's Import AI Newsletter,Linear Digressions Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,A humanities discipline,Less than a year,"Programmer,Software Developer/Software Engineer,Other",Self-taught,40,10,10,0,40,0,Natural Language Processing,Logistic Regression,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Taiwan,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,No,Yes,,Perfectly,Employed by company that makes advanced analytic software,Jupyter notebooks,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,No education,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,Very useful,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,1 to 2 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Government,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Text data,Sometimes,1GB,"CNNs,Neural Networks,Random Forests,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,Sometimes,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Often,,Most of the time,Most of the time,,,,,Often,,,,Rarely,,Rarely,Most of the time,Most of the time,Often,,Often,,Most of the time,,,Rarely,Often,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,Often,Often,,,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,"Privacy, dealing with human subject data","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,,,Other,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Stack Overflow Q&A",,,Very useful,,Somewhat useful,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Other",University courses,10,5,0,85,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Manufacturing,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis,Other",Sometimes,,,,,Often,Most of the time,Sometimes,,,,Sometimes,,Often,,Sometimes,,,,,,,Sometimes,,,Most of the time,,Sometimes,,Often,,,Often,50,5,5,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Sometimes,,Often,Most of the time,,,Most of the time,,Often,,Often,,,,,,Most of the time,Most of the time,Sometimes,,100% of projects,More internal than external,Other,"Fdic; casino city, DNB",Data difficult to access and not designed for analytics in mind,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Other,Rarely,90000,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Argentina,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer,Other",University courses,10,15,0,75,NA,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,Decision Trees,"Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,Sometimes,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,35,0,0,15,50,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,Most of the time,,,,,,,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Sometimes,45000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Nice to have,,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important +Male,India,42,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,,Very useful,Very useful,"Data Stories Podcast,FlowingData Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,20,20,20,30,10,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,100MB,"Decision Trees,Regression/Logistic Regression","C/C++,IBM Watson / Waton Analytics,Java,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Python,SQL,Statistica (Quest/Dell-formerly Statsoft),TensorFlow",,,,Often,,,,,,,,,Sometimes,,Often,,,,,,Sometimes,,Sometimes,,,Often,,,,,Most of the time,,,,,,,,,,,Sometimes,,Sometimes,,Often,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,RNNs,Simulation",,,,,,,Often,Sometimes,,,,,,,,Often,,,,Sometimes,,,,,Sometimes,,Often,,,,,,,30,15,10,15,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations of tools",Sometimes,,,,,,,,,,,,Often,,,,,,,,,,100% of projects,Do not know,Other,,tools,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Other",Sometimes,100000,INR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,76,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,"Not employed, but looking for work",,,,,,,,SQL,Monte Carlo Methods,R,University/Non-profit research group websites,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",3-5 years,,,,,,,,,,,,,,,Traditional Workstation,,Kaggle Competitions,No,Master's degree,,3 to 5 years,Business Analyst,University courses,20,20,10,50,0,0,"Survival Analysis,Time Series",Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,Somewhat useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Scientist,Other",Self-taught,90,0,0,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Image data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,,,,,Sometimes,,,,,Sometimes,,,Sometimes,,,,,,,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,Rarely,Often,Rarely,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,Sometimes,Sometimes,Sometimes,,,,,,Often,,Sometimes,,Sometimes,Often,,Often,,,,,Sometimes,Sometimes,Sometimes,,,,50,20,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,,Sometimes,,,,,,Often,Sometimes,,76-99% of projects,More internal than external,IT Department,census; NOAA; ,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,166000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Russia,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,SAP BusinessObjects Predictive Analytics,Genetic & Evolutionary Algorithms,R,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Textbook",,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,,Very useful,,,,"R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Management information systems,1 to 2 years,Researcher,University courses,0,0,50,50,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Academic,,,,,Important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks","IBM SPSS Modeler,IBM SPSS Statistics,QlikView,R",,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Time Series Analysis",,Often,Often,,,Often,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,,Often,,Most of the time,,,,,,,,Most of the time,,,,40,10,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,Most of the time,,,,,,Often,,,Most of the time,,,,Sometimes,Often,,,51-75% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,350000,PAB,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters",,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Data Scientist,University courses,10,0,40,40,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",,CRM/Marketing,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,100GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Often,,Sometimes,,,,,Sometimes,,,,,,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,Sometimes,,76-99% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,,,40000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Machine Learning Engineer,,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer",University courses,30,0,10,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,10 to 19 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,28,Employed full-time,,,Yes,,Researcher,Fine,"Employed by a company that performs advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Programmer",University courses,0,10,0,90,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,41,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by college or university,Amazon Machine Learning,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Conferences,Kaggle,Online courses,Textbook",,,Somewhat useful,,Not Useful,,Somewhat useful,,,,Very useful,,,,Very useful,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,,Nice to have,Necessary,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer",University courses,20,20,0,60,0,0,,,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important +Female,Canada,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Deep learning,Python,Other,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Markov Logic Networks,High school,Academic,20 to 99 employees,Stayed the same,Don't know,Some other way,Important,Other,Basic laptop (Macbook),"Image data,Relational data",Always,10GB,Bayesian Techniques,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Simulation",,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,10,30,30,20,10,0,Enough to run the code / standard library,"Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,Often,,Sometimes,,100% of projects,Do not know,Other,"SDSS, Planck collaboration, WMAP collaboration",lacking proper training ,Other,"Share Drive/SharePoint,Other",,Git,Never,20000,CAD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",Work,30,15,50,5,0,0,"Computer Vision,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Decreased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,C/C++,Java,Julia,Jupyter notebooks,KNIME (free version),Mathematica,NoSQL,Python,R,Unix shell / awk,Other",,Rarely,,Often,,,,,,,,,,,Often,Often,Most of the time,,Sometimes,Sometimes,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,Sometimes,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",Sometimes,,Sometimes,Often,,Most of the time,Most of the time,Often,Often,,,,Often,Often,,Often,,Sometimes,,Often,Often,,Most of the time,,,Most of the time,Most of the time,Often,,,,,,10,20,5,30,20,15,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Often,Most of the time,Most of the time,Often,,Most of the time,Most of the time,Often,Often,Most of the time,Most of the time,Sometimes,Often,,Most of the time,Most of the time,Most of the time,Often,Most of the time,,100% of projects,More external than internal,Standalone Team,various medical image datasets,Access tp ground truth.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Mercurial,Subversion",Most of the time,65000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Denmark,23,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,SQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,University courses,20,15,0,65,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,Fewer than 10 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",,,Regression/Logistic Regression,"MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,Often,,,,,Often,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Often,76-99% of projects,Do not know,Other,,,Other,Other,Course webpage,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by college or university,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Data Analyst,Other",Self-taught,30,10,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Always,1GB,Regression/Logistic Regression,"IBM Cognos,NoSQL,Perl,R,SQL,Unix shell / awk",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Simulation,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,,,,,,Often,,Sometimes,Often,,,,75,5,5,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Sometimes,Sometimes,Most of the time,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,Often,Most of the time,Often,,Most of the time,,Most of the time,Most of the time,Most of the time,,76-99% of projects,More internal than external,Other,US Census Datasets; Department of Education Datasets,"Disconnect between IT and functional units results in data architecture that imperfectly corresponds to business practices. Middle managers in both IT and functional units each believe the ""other side"" is ""doing it wrong,"" and so refuse to coordinate or compromise. Constant friction and minimal efficiency result.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,41574,USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,Random Forests,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",University courses,60,20,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Other,10 to 19 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R,RapidMiner (commercial version),SAS Base,SAS JMP,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,,,,Sometimes,,Often,,,,,Often,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis,Other",Often,Sometimes,,,,Most of the time,Most of the time,Often,Sometimes,,,,,,Often,Often,,,,,Sometimes,,Sometimes,,,Often,Sometimes,,Rarely,Sometimes,Often,,,40,20,0,20,20,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,CMS for Medicare information; Zillow; Epsilon,Connectivity to individuals in our marketing database,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,SFTP,Other,Most of the time,130000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,24,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,Less than a year,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,50,10,10,0,0,Survival Analysis,Evolutionary Approaches,High school,Academic,100 to 499 employees,Stayed the same,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Never,<1MB,"Evolutionary Approaches,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,I don't plan on learning a new ML/DS method,Python,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,,Somewhat important,Very Important,Somewhat important,,,,,,,,,,, +Female,United States,30,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Engineer,Software Developer/Software Engineer",University courses,0,20,20,60,0,0,Natural Language Processing,"Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by company that makes advanced analytic software,,,,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,0,40,50,10,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Logistic Regression",I don't know/not sure,Technology,500 to 999 employees,Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Other",Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,NoSQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Random Forests,Recommender Systems",,,,,,,,,,,,,,,,,,,Often,,,,Often,Often,,,,,,,,,,0,0,0,0,0,0,,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,10-25% of projects,,,,,,,,,,,,,5,,,,,,,,,,,,,,,,,, +Male,Turkey,50,Employed full-time,,,Yes,,Operations Research Practitioner,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Textbook,Trade book",Somewhat useful,,,,,,,,,,,,,,Very useful,Very useful,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Operations Research Practitioner,Researcher",Self-taught,100,0,0,0,0,0,Reinforcement learning,,A bachelor's degree,Academic,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,,"Bayesian Techniques,Decision Trees,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation",,,,,,,,Often,,,,,,,,Often,,,,Often,Sometimes,Most of the time,,,,,Most of the time,,,,,,,0,100,0,0,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team",,,,Sometimes,,,,Sometimes,,,,,,,,Often,,,,,,,51-75% of projects,Approximately half internal and half external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Spain,53,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,3-5 years,Unnecessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Computer Scientist,Programmer,Other",Kaggle competitions,0,70,0,0,30,0,"Computer Vision,Natural Language Processing",Neural Networks - CNNs,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,South Korea,25,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Company internal community,Conferences,Friends network,YouTube Videos",,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Self-taught,80,5,10,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Other,100 to 499 employees,Stayed the same,,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Rarely,<1MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,SAS Base",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,60,10,0,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",Often,Most of the time,Often,,,,,,Most of the time,,,Most of the time,Most of the time,,,,,,,,Often,,51-75% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,,Never,270000,ZAR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,United States,56,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Software Developer/Software Engineer,Other",University courses,0,2,18,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Government,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Other,Most of the time,<1MB,Random Forests,"Google Cloud Compute,Java,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,Often,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,Often,,,,"kNN and Other Clustering,Random Forests,Time Series Analysis",,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,Often,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,,,Often,Sometimes,,,,,,,,,,,Most of the time,,,10-25% of projects,Entirely internal,IT Department,none,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,159000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,Python,Google Search,"Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,Somewhat useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,"FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,,Nice to have,,Nice to have,,,,Necessary,,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression",A doctoral degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,Republic of China,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,,,,"Arxiv,Conferences,Kaggle,Online courses,YouTube Videos",Very useful,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,Malaysia,39,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,Factor Analysis,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,Not Useful,,Somewhat useful,,,,Somewhat useful,Emergent/Future Newsletter (Algorithmia),< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Doctoral degree,Computer Science,I don't write code to analyze data,"Computer Scientist,DBA/Database Engineer,Programmer",Self-taught,30,10,30,30,0,0,Computer Vision,Neural Networks - RNNs,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Turkey,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Microsoft Azure Machine Learning,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Very useful,,Very useful,,,Not Useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,"Computer Vision,Natural Language Processing,Time Series",Logistic Regression,A doctoral degree,Technology,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","R,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,Sometimes,Often,,,"Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,Most of the time,Often,,,,20,20,5,10,20,25,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,Most of the time,,Often,,,Often,Often,,,Often,,,,,Often,,,10-25% of projects,Entirely external,Standalone Team,,"performance of hardware, time response",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,62000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,,"Kaggle,Official documentation,Online courses,Textbook",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Scientist,Predictive Modeler",Kaggle competitions,0,5,15,0,80,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Insurance,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,,10GB,"Gradient Boosted Machines,Other","DataRobot,R,SAS Base,SQL",,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Lift Analysis,Logistic Regression",,,,,,Often,Sometimes,,,,,Rarely,,,Most of the time,Often,,,,,,,,,,,,,,,,,,50,40,0,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Most of the time,,Most of the time,,,,,,Often,Sometimes,Sometimes,Often,,,,,Most of the time,Most of the time,Most of the time,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"180,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,,R,Other,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Traditional Workstation,11 - 39 hours,PhD,No,Bachelor's degree,Mathematics or statistics,,,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Reinforcement learning,Time Series,Unsupervised Learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Not important +Male,Nigeria,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,SAS,I collect my own data (e.g. web-scraping),"College/University,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,Pakistan,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,University courses,20,5,20,40,15,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,Other,,,,,,,,,,,,,,,,,,,"Data Elixir Newsletter,DataTau News Aggregator,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,,Traditional Workstation,40+,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,50,0,0,0,50,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Not important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,Denmark,23,"Independent contractor, freelancer, or self-employed",,,No,Yes,Statistician,Poorly,Self-employed,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Podcasts,Textbook,YouTube Videos",,,,,,,,,,,,,Somewhat useful,,Not Useful,,,Somewhat useful,"Data Elixir Newsletter,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",,Github Portfolio,No,I did not complete any formal education past high school,,Less than a year,"Data Miner,Researcher,Statistician",Self-taught,70,20,10,0,0,0,"Reinforcement learning,Unsupervised Learning",Evolutionary Approaches,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important +Male,United States,22,Employed part-time,,,Yes,,Researcher,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,0,15,80,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,YouTube Videos",,Very useful,,,Somewhat useful,,Not Useful,,,,Very useful,,,,,,,Somewhat useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,10,70,0,20,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Internet-based,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,NoSQL,R,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",Most of the time,,Often,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Most of the time,,,,Sometimes,,,,80,0,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",,Sometimes,,,Often,,,,Most of the time,,Most of the time,,,,Sometimes,,Sometimes,Sometimes,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,45000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"Operations Research Practitioner,Statistician",Self-taught,75,10,0,15,0,0,,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,3-5 years,Some other way,Very important,Other,Traditional Workstation,"Text data,Relational data",Rarely,10MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,Sometimes,Sometimes,,Most of the time,,,,,Often,Sometimes,Most of the time,,,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Simulation",,,,,,Often,Most of the time,Often,,,,,,Often,,Often,,,,Often,,Often,Often,,,,Most of the time,,,,,,,35,20,20,10,15,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,Often,Most of the time,,Most of the time,,,,,,,,,,,,51-75% of projects,More internal than external,Other,WRDS,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Git,Most of the time,"150,000",USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,59,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,Other",Self-taught,40,25,10,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,C/C++,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",Sometimes,,,Sometimes,,,,,Rarely,,,,,Sometimes,,,Often,,,Rarely,Often,,,Sometimes,,,Sometimes,,,,Often,,Often,,,,,,,,Sometimes,Often,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,Time Series Analysis",Sometimes,,,Often,,Often,,,,,,,,Often,,Often,,Sometimes,Sometimes,Often,Sometimes,,,Sometimes,Often,Often,,,,Often,,,,20,20,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,,,,,,Sometimes,Sometimes,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,200000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Malaysia,21,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,,Very useful,Not Useful,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Self-employed",IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Outlier detection (e.g. Fraud detection),Speech Recognition","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Internet-based,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Google Cloud Compute,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,Tableau",,,,,,,,Sometimes,,,,,,,Rarely,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",Often,,,,,Often,Often,,,,,,,,,Often,,,Sometimes,,,,,,,,,,Sometimes,Rarely,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Need to coordinate with IT,Unavailability of/difficult access to data",,Sometimes,,,,,,,,,,,,,Often,,,,,,Sometimes,,10-25% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Always,200000,BRL,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +A different identity,Other,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Fine,Self-employed,Amazon Machine Learning,Deep learning,Python,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,,< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,20,0,70,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by college or university",Python,Neural Nets,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Very useful,Very useful,Somewhat useful,Not Useful,,,,Somewhat useful,Somewhat useful,Very useful,Not Useful,Very useful,Somewhat useful,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Other,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",Sometimes,100MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Often,,,,,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression",,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,20,5,10,15,30,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Sometimes,Often,,,Most of the time,Often,,,,,,Most of the time,Sometimes,,Often,Most of the time,Often,,76-99% of projects,More internal than external,Other,cms: https://data.cms.gov; HCUP: https://www.hcup-us.ahrq.gov/; NPI: https://npiregistry.cms.hhs.gov/,Data critical to models is not collected,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Always,110000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,31,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,I don't plan on learning a new ML/DS method,Python,"Google Search,Government website","College/University,Kaggle,Online courses",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,Reinforcement learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,19,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,I prefer not to answer,Electrical Engineering,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Vietnam,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",University courses,10,10,10,70,0,0,"Computer Vision,Recommendation Engines","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Stayed the same,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed part-time,,,Yes,,DBA/Database Engineer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,25,25,25,25,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,44,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search",Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Government,500 to 999 employees,,Less than one year,I visited the company's Web site and found a job listing there,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Image data,Rarely,,,"Amazon Machine Learning,Amazon Web services,NoSQL,Python,R",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,10,10,20,20,20,20,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database",Often,Often,Often,,,,,,Often,Often,Often,,,,,Often,Often,Often,,,,,Less than 10% of projects,More internal than external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,"Git,Subversion",Never,20000,EUR,Has decreased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,,,,,"Data Machina Newsletter,FastML Blog,R Bloggers Blog Aggregator",< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,40,30,0,15,15,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - GANs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United States,41,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,"Data Analyst,Statistician",University courses,50,0,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and private datacenters,Other,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Minitab,R,SAS Enterprise Miner,SAS JMP",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,,,,,Rarely,Rarely,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Simulation,SVMs",,Rarely,,,,Often,Often,Often,Often,,,Often,,,,Often,,,,,,,Often,,,,Often,Often,,,,,,20,50,0,20,0,10,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,26-50% of projects,,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Most of the time,79000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,0,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Other,Python,Other,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Other,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Deep learning,Python,Google Search,"Arxiv,College/University,Conferences,Friends network,Personal Projects,Textbook,YouTube Videos",Very useful,,Very useful,,Somewhat useful,Very useful,,,,,,Very useful,,,Very useful,,,Not Useful,,1-2 years,Nice to have,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,"Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,,,Very useful,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,80,0,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100MB,"Random Forests,Regression/Logistic Regression","Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",,Often,,,Often,,Often,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,,,,Often,,,,60,0,0,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,Sometimes,,,Sometimes,,Often,,,,,,Often,,,,,,Often,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,,Sometimes,115000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Norway,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",15,50,15,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A professional degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression",SAS Enterprise Miner,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Most of the time,Often,Most of the time,Sometimes,,,,,,,Most of the time,,,,,Often,,,,,,,,,,,,,15,45,20,20,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Other,I don't typically share data,,Other,Rarely,675000,NOK,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,"Researcher,Other",Work,35,5,0,60,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Never,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Julia,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,Rarely,Most of the time,,Most of the time,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,Sometimes,Most of the time,,Most of the time,,,,,Sometimes,,Often,,,,30,20,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team",Rarely,,Rarely,,Most of the time,,,,Most of the time,Often,,,,,Sometimes,Most of the time,,,,,,,100% of projects,Entirely external,IT Department,Gene Expression Omnibus from NCBI for Biology data. ,Preprocessing to prepare raw data to input file for machine learning; Data interpretation. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"20,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,10,20,15,54,1,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Technology,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Most of the time,100GB,,"Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Association Rules,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,40,5,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,Do not know,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Subversion",Rarely,28000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,1 to 2 years,Researcher,University courses,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",,,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,,,,"Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,"Data Visualization,SVMs",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,70,0,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by non-profit or NGO,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,,,,,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,Researcher,University courses,20,15,25,30,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Non-profit,"5,000 to 9,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Rarely,10GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,R,SQL,Tableau,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,Often,,,,"Data Visualization,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Rarely,,,,75,2,5,10,5,3,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,Often,,Most of the time,Sometimes,,Often,Most of the time,,Often,,,,,,Sometimes,,Often,Most of the time,,,51-75% of projects,More internal than external,Other,ClinVar; dbSNP; ExAC; gnomAD,The format is not conducive to easily extracting fine grained information.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"64,000",USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,Very useful,Very useful,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",1,98,0,0,1,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,,Most of the time,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,Often,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Sometimes,Sometimes,,Often,,Most of the time,,,,,Often,Most of the time,,,,,40,20,10,20,20,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Sometimes,102000,SGD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Italy,52,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Flume,Deep learning,Python,Google Search,"College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Textbook",,,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,"Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",10,30,50,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10GB,"Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","C/C++,Google Cloud Compute,Java,Jupyter notebooks,KNIME (free version),Mathematica,MATLAB/Octave,Python,R,SQL,TensorFlow,TIBCO Spotfire",,,,Rarely,,,,Often,,,,,,,Sometimes,,Most of the time,,Sometimes,Sometimes,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Sometimes,Most of the time,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Often,,,,Sometimes,,Most of the time,Often,,,,,,,,Often,Sometimes,,,Sometimes,Often,,Sometimes,Sometimes,,,,,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,Most of the time,,,,,Often,Sometimes,,Sometimes,,Most of the time,Most of the time,,,51-75% of projects,More internal than external,Other,Various,Cleaning them,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,130000,CHF,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Germany,21,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,College/University,Online courses,YouTube Videos",Very useful,,Very useful,,,,,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,,Less than a year,I haven't started working yet,Self-taught,50,40,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Rarely,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,,Often,,,,,,Sometimes,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,Most of the time,,,,,Sometimes,Most of the time,,Most of the time,,,,Most of the time,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Most of the time,100000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,37,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Podcasts,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,,,Very useful,,Very useful,,Very useful,,,Very useful,"KDnuggets Blog,Linear Digressions Podcast,No Free Hunch Blog",5-10 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,University courses,20,25,25,25,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +A different identity,United States,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,,"Computer Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer,Other",University courses,15,5,50,30,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,36,"Not employed, but looking for work",,,,,,,,SQL,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Friends network,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Somewhat useful,,,Very useful,,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Other,40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,20,10,50,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,Brazil,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Jupyter notebooks,Deep learning,R,Other,"Arxiv,Blogs,College/University,Conferences,Official documentation",Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,Very useful,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer",Other,40,0,40,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Technology,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data",Rarely,10MB,"Decision Trees,Neural Networks,Random Forests,SVMs","C/C++,Java,Python,R,RapidMiner (free version),TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,Most of the time,,Rarely,,,,,,,,,,,Often,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Often,,,,,Often,,,,,,20,10,20,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,,,,,,Most of the time,,,Most of the time,Often,,,,Often,,51-75% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Git,Sometimes,3600,BRL,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Finland,40,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,Textbook,Other",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Other,No,Master's degree,A humanities discipline,3 to 5 years,"Programmer,Researcher,Other",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,40,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,R,Neural Nets,R,"Government website,I collect my own data (e.g. web-scraping),Other","College/University,Company internal community,Textbook,Tutoring/mentoring",,,Very useful,Very useful,,,,,,,,,,,Somewhat useful,,Very useful,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Other",Work,10,10,50,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,"5,000 to 9,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Microsoft Azure Machine Learning,R,SAS Enterprise Miner",,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",Often,Sometimes,Sometimes,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,Most of the time,Most of the time,,,,Sometimes,Often,,Sometimes,,,Sometimes,,,Sometimes,,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT",,,,Often,Most of the time,,,,,,,,,,Often,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Bitbucket,Sometimes,195000,CAD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Cluster Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,Work,45,0,45,10,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Laptop or Workstation and private datacenters,Other,Sometimes,1GB,"Decision Trees,Neural Networks","C/C++,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,65,5,5,20,5,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,100% of projects,More internal than external,Other,HMIS (Homeless Management Information System); SDSS (Sloan Digital Sky Data),Understanding it. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,Git,Most of the time,72000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,Very useful,,,,"Data Machina Newsletter,Data Stories Podcast,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Engineer,Other",University courses,20,30,20,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important +Female,India,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,FastML Blog,< 1 year,,,,,,,,Necessary,Necessary,Necessary,,,,"Coursera,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Other,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,25,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,,Nice to have,Necessary,Nice to have,Necessary,,Necessary,Nice to have,,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,30,0,20,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,,,,,,,,,,,,,,,, +Male,Other,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,R,Anomaly Detection,SQL,"Government website,Other","College/University,Kaggle,Online courses,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Programmer,Statistician",University courses,0,0,90,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100GB,"Decision Trees,Markov Logic Networks,Regression/Logistic Regression","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Minitab,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SAS Enterprise Miner,SAS JMP,SQL",,,,Rarely,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,Rarely,,,,,,,Rarely,,,Rarely,,Most of the time,Rarely,Rarely,,Often,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Neural Networks,Segmentation,Time Series Analysis",,,,,,Often,,Sometimes,,,,,,,,Most of the time,,,,Sometimes,,,,,,Often,,,,Sometimes,,,,30,50,5,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,,,Most of the time,Most of the time,,,,Most of the time,,Often,,,,,Often,,,76-99% of projects,More internal than external,Other,Census,Cleaning it,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Ireland,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,Not Useful,,Somewhat useful,,,,,Very useful,,Very useful,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,30,0,10,60,0,0,"Natural Language Processing,Recommendation Engines",Decision Trees - Random Forests,A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,Regression/Logistic Regression,"Amazon Machine Learning,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,QlikView,R,Spark / MLlib,SQL",Rarely,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization",Rarely,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,55,20,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Rarely,"45,000",EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Company internal community,Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,Somewhat useful,,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,Other,University courses,0,30,50,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,Often,Sometimes,,,,20,40,10,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,The size makes it difficult to use traditional tools and slows down processing. I've had too learn new tools.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,prefer not to say,Other,Most of the time,88500,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,TensorFlow,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Online courses,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,"Engineer,Operations Research Practitioner,Predictive Modeler,Researcher",Other,50,10,30,10,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Evolutionary Approaches,Neural Networks - GANs",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Rarely,1MB,"Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,Sometimes,,,Often,Often,,Sometimes,Often,,,,Sometimes,,,,,,Often,Sometimes,Often,,,,,,,,Sometimes,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Unavailability of/difficult access to data",,Sometimes,,,,,,,Sometimes,Most of the time,Often,,,,,,,,,,Most of the time,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,50000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Hong Kong,31,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Conferences,Kaggle,Tutoring/mentoring",,,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,Researcher,University courses,30,0,0,70,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Not very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Don't know,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Often,Often,,,,,Often,,Often,,,,,Often,,Often,,,,,Often,,,,,,20,20,0,10,50,0,Enough to tune the parameters properly,"Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,Less than 10% of projects,Approximately half internal and half external,Central Insights Team,Kaggle,data structure and large scale (big data),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,15700,HKD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A health science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,10,40,0,40,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,100GB,"CNNs,Neural Networks","Amazon Web services,Python,R,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Neural Networks,SVMs",,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,40,20,30,10,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Singapore,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Weka,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Russia,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Non-Kaggle online communities,Official documentation,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,Not Useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",5,10,65,0,20,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,QlikView,SQL,Tableau,TensorFlow",,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,,,Often,,,Sometimes,Sometimes,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",Often,,,Sometimes,,Often,Most of the time,Often,,,,,,Often,Sometimes,Sometimes,,,,Sometimes,Sometimes,Rarely,Often,,,Sometimes,,Often,,Often,,,,50,30,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others",Sometimes,Sometimes,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,,1800000,RUB,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Czech Republic,19,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,25,25,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Mix of fields,I don't know,Stayed the same,Don't know,A career fair or on-campus recruiting event,"N/A, I did not receive any formal education",Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",,10MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,SVMs","C/C++,Java,Jupyter notebooks,Mathematica,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Sometimes,,Often,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,"Natural Language Processing,Recommender Systems",,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,60,25,5,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Often,,,26-50% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher",Self-taught,30,40,10,10,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Denmark,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,15,5,5,75,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",A bachelor's degree,Financial,I don't know,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Rarely,1TB,"CNNs,Neural Networks","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,Rarely,Often,,,Rarely,Sometimes,,Often,,,,"A/B Testing,CNNs,Cross-Validation,Naive Bayes,Neural Networks,Recommender Systems",Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Other",,,,,,,,,,,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Data Scientist,University courses,10,10,0,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Ensemble Methods,Evolutionary Approaches,RNNs,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,TIBCO Spotfire,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,Sometimes,,Most of the time,,Rarely,,Rarely,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,Often,Most of the time,,,,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Prescriptive Modeling,RNNs,Simulation,SVMs,Time Series Analysis",,,,,,,,,Most of the time,Often,,,,,,,,,,Most of the time,,Most of the time,,,Most of the time,,Often,Most of the time,,Often,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,Most of the time,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,Sometimes,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Other",Never,26000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,,,,"Blogs,Conferences,Friends network,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,10,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Technology,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL",Often,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Rarely,Rarely,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,,,Often,,Most of the time,,,,,,,,,,,40,30,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,Sometimes,,,,,Often,,,,,,,,,,,,Often,,26-50% of projects,More internal than external,IT Department,bindb;maxmind,"The biggest challenge working with our source data is to have them everything calculated and ready for use in real time, because we perform real time analysis.",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Bitbucket,Git",Rarely,96000,BRL,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist,Engineer,Predictive Modeler",Self-taught,45,5,45,5,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Python,Tableau,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Often,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Lift Analysis,Logistic Regression,Neural Networks,Recommender Systems,Segmentation,Text Analytics",Often,Sometimes,,,Sometimes,,,,,,,,,,Often,Sometimes,,,,Often,,,,Sometimes,,Often,,,Sometimes,,,,,10,30,0,0,60,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,Sometimes,,,,,Often,,,,,Often,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,France,61,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,,University courses,30,10,0,60,0,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Other,Basic laptop (Macbook),"Text data,Relational data",Rarely,1MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,"Decision Trees,SVMs",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Google Search,"Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Work,70,2,23,0,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,CRM/Marketing,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,Python,R,Tableau,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Rarely,Sometimes,,Often,,,,"Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,Sometimes,,Often,Sometimes,Sometimes,,,Sometimes,Sometimes,,,Often,,Often,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Often,Sometimes,Rarely,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Often,,,,Most of the time,,100% of projects,Approximately half internal and half external,IT Department,"Google search from time to time to find new data, AWS s3 datasets",,"Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"dropbox,git","Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,"150,000",CAD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Friends network,Personal Projects",,Somewhat useful,Very useful,,,Very useful,,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,10,35,0,50,5,0,Unsupervised Learning,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,United States,27,Employed part-time,,,Yes,,Machine Learning Engineer,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer",University courses,45,0,0,55,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Microsoft Azure Machine Learning,R,Spark / MLlib,SQL,TensorFlow",Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,R,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,"DataTau News Aggregator,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",,Academic,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data,Other",Most of the time,10GB,"CNNs,Decision Trees,HMMs,Markov Logic Networks,Neural Networks","C/C++,Python,R,SQL,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Bayesian Techniques,CNNs,Cross-Validation,HMMs,Neural Networks,Time Series Analysis",,,Sometimes,Most of the time,,Most of the time,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,50,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Often,,,,,,Sometimes,,,,,,,,,Sometimes,,,51-75% of projects,More external than internal,Other,none,data cleanup and anonymization ,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,48000000,COP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Switzerland,40,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,,KDnuggets Blog,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,22,Employed full-time,,,No,Yes,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Genetic & Evolutionary Algorithms,Python,Google Search,"Blogs,College/University,Friends network,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,,,,,Very useful,,,Very useful,,,Somewhat useful,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,"Computer Vision,Machine Translation,Time Series","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,More than 10 years,"Business Analyst,Data Analyst,Other",Work,50,35,15,0,0,0,"Recommendation Engines,Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Monte Carlo Methods,SQL,GitHub,"College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",1,80,15,1,3,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,100MB,"Decision Trees,HMMs,Neural Networks,Random Forests,SVMs","Cloudera,Jupyter notebooks,KNIME (free version),Python,R,Spark / MLlib,SQL,Tableau",,,,,Sometimes,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,"Decision Trees,HMMs,Logistic Regression,Naive Bayes,Neural Networks,SVMs,Time Series Analysis",,,,,,,,Often,,,,,Often,,,Often,,Often,,Often,,,,,,,,Often,,Often,,,,65,20,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input",Often,,,,,,,Often,,,Often,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,720000,MXN,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Social Network Analysis,R,Google Search,"Blogs,YouTube Videos",,Very useful,,,,,,,,,,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,Necessary,Necessary,,,,Necessary,,,,,,,,,Workstation + Cloud service,0 - 1 hour,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Programmer",Self-taught,30,30,20,10,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Sweden,24,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",Very useful,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,0,40,10,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,100MB,Random Forests,"Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Natural Language Processing,Random Forests",,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,,70,0,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,,,,Most of the time,,,,,,,,,,,Often,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,3,,,,,,,,,,,,,,,,,, +Male,Chile,54,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,Other",,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,Somewhat useful,,FastML Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",Kaggle competitions,10,20,60,0,10,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,500 to 999 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,1TB,"Regression/Logistic Regression,Other","Amazon Web services,Java,Python,R,SQL,Other,Other",,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,Most of the time,Often,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Recommender Systems",Most of the time,,,,,Often,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,20,20,40,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others",,,Sometimes,Often,,Often,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,"Bitbucket,Git",Rarely,135000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Argentina,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Flume,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses",Somewhat useful,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Statistician",Kaggle competitions,25,0,40,10,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Military/Security,"10,000 or more employees",Stayed the same,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,C/C++,DataRobot,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,Rarely,,Sometimes,,Often,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,,,,,Often,Often,Sometimes,Often,,,Most of the time,,,,,,,,,,,Most of the time,Rarely,,Sometimes,Often,,,Sometimes,,,,50,5,10,20,15,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Most of the time,Rarely,,,Rarely,,Sometimes,,,,,,Often,,,10-25% of projects,Do not know,Business Department,google api,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,80000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Researcher,University courses,10,0,20,70,0,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A professional degree,Financial,100 to 499 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Recommender Systems,Time Series Analysis",,,,,,Most of the time,Sometimes,,Often,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,Often,,,,50,30,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,,,,Most of the time,,Often,,,Often,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,Financial datasets from FRED.,The data is messy and the people who give it to us don't know much about it.,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Git,Rarely,120000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Data Scientist,Other",University courses,50,0,20,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Python,Other,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Online courses,Personal Projects,Podcasts,YouTube Videos",,Very useful,Very useful,,,,,,,,Very useful,Very useful,Somewhat useful,,,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,60,20,10,0,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Telecommunications,100 to 499 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and local IT supported servers,Other,Don't know,,"Evolutionary Approaches,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Evolutionary Approaches,Simulation",,,,,,,Most of the time,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,30,20,0,40,10,0,"Enough to code it again from scratch, albeit it may run slowly",The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,76-99% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,11500,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,Retired,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,,,,,Very useful,,Very useful,,,,,"FastML Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,10,0,30,40,20,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Mix of fields,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Image data,Rarely,10GB,"CNNs,GANs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,Python,Stan",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,"CNNs,Data Visualization,GANs,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,,,Most of the time,Sometimes,,Sometimes,,Often,,,,,Often,,,,20,30,5,10,15,20,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,51-75% of projects,Approximately half internal and half external,Standalone Team,image datasets,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,90000,EUR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,22,"Not employed, but looking for work",,,,,,,,Java,Text Mining,Python,Google Search,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,Data Stories Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Traditional Workstation,,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Machine Learning Engineer,Programmer",University courses,20,30,10,30,5,5,"Computer Vision,Machine Translation,Natural Language Processing","Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Friends network,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests",High school,Technology,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,Tableau,TensorFlow,Unix shell / awk",Sometimes,Often,,,,,,Sometimes,,,Sometimes,Rarely,,,,,,,,,,,Sometimes,,,,,,,,Often,,Rarely,,,,,,,,,,,,Often,Sometimes,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks",Sometimes,Often,Often,,,Often,Often,Often,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,60,10,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,Sometimes,,,Often,Often,,,,,Most of the time,Often,,Often,,Sometimes,Most of the time,,100% of projects,Entirely external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,200000,USD,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Personal Projects",Very useful,,Very useful,,,,,,,,,Very useful,,,,,,,"O'Reilly Data Newsletter,Talking Machines Podcast",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Researcher,University courses,33,33,0,34,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Monte Carlo Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,,,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,Researcher",Work,25,0,0,75,0,0,Recommendation Engines,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Conferences,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,Somewhat useful,Very useful,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Data Visualization,Naive Bayes,Random Forests",Most of the time,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,100,0,0,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Privacy issues",Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,10-25% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Other,"Cdap, sql server",Bitbucket,Sometimes,140000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Friends network,Kaggle,Stack Overflow Q&A",,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,Very useful,,,,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Workstation + Cloud service,2 - 10 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,University courses,40,20,0,30,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",80,15,5,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,Often,,,,,Sometimes,Often,,,,,,,,Most of the time,,Rarely,,,,Rarely,Rarely,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Segmentation",Most of the time,,,,,Sometimes,Most of the time,,,,,,,Rarely,Often,Sometimes,,,,,,,,,,Often,,,,,,,,10,5,10,15,60,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,,Sometimes,,Sometimes,Often,,,,,Sometimes,,Most of the time,,,,Most of the time,,,76-99% of projects,More internal than external,Business Department,Acxiom; mintel; census,Automating intake and loading of new sources into relational environment. ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,183000,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Business Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,0,80,20,0,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,"CNNs,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Very useful,,,,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,70,0,0,0,30,,,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Other,21,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Other,Laptop or Workstation and private datacenters,Other,Rarely,10MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Trade book",Very useful,,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,45,25,10,15,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Other",Image data,Sometimes,10GB,"Bayesian Techniques,CNNs,Ensemble Methods,Markov Logic Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,Often,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs",Rarely,,Often,Often,,Often,,,,,,,,Often,,Most of the time,Sometimes,Often,,Often,Most of the time,,,,,Often,Often,Often,,,,,,30,50,5,5,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Often,,,,Often,,Sometimes,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,,Our data is near random in nature.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,server,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,98000,CHF,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer",Self-taught,65,10,20,5,0,0,,,A master's degree,Technology,I don't know,,Don't know,An external recruiter or headhunter,Very important,Other,,Text data,,,,"C/C++,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,35,15,20,5,5,Recommendation Engines,Bayesian Techniques,I prefer not to answer,Retail,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,Bayesian Techniques,Spark / MLlib,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Bayesian Techniques,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,30,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle",Very useful,,,,Very useful,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,Data Miner,University courses,40,30,10,10,10,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,Often,,,,,,,,Often,,Sometimes,,,,,,,,Often,,,,Often,,,,,,,,,,Often,Sometimes,,,Sometimes,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs",,,,Often,,Often,Often,,Often,,,Often,,Often,,Often,,Often,Often,Often,,,Often,,Often,,,Often,,,,,,10,80,0,10,0,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,10000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Statistician,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,University courses,0,0,50,50,0,0,Time Series,Other (please specify; separate by semi-colon),A master's degree,Academic,20 to 99 employees,Stayed the same,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Other,Never,1GB,,"Amazon Machine Learning,Amazon Web services,C/C++,NoSQL,Python,R,SQL,Unix shell / awk",Often,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,,Often,,,,,,Often,,,,Cross-Validation,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,30,30,0,10,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Jack's Import AI Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,70,0,5,25,0,"Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Very useful,,Somewhat useful,,"Data Stories Podcast,FlowingData Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Researcher,Self-taught,50,10,30,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Pharmaceutical,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,"Data Visualization,Decision Trees,Logistic Regression,Simulation",,,,,,,Most of the time,Rarely,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,55,5,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,Sometimes,Most of the time,Often,,Sometimes,Sometimes,,,,,Sometimes,Often,,,Often,Most of the time,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,118000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,,0 - 1 hour,Master's degree,No,Master's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,University courses,0,10,0,90,0,0,,,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,23,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),FastML Blog,KDnuggets Blog",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,60,10,0,20,10,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Female,India,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Google Search,"Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Not Useful,Somewhat useful,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Other",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Programmer,Software Developer/Software Engineer",Other,0,50,0,0,0,50,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,Microsoft Excel Data Mining,Monte Carlo Methods,SQL,"GitHub,University/Non-profit research group websites","Kaggle,Personal Projects,Tutoring/mentoring",,,,,,,Very useful,,,,,Somewhat useful,,,,,Very useful,,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,"Basic laptop (Macbook),Workstation + Cloud service",0 - 1 hour,Github Portfolio,No,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,50,0,10,0,10,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Female,United States,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Engineer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",30,20,20,10,20,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,100 to 499 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Image data,Relational data",Sometimes,100MB,,"Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Simulation",,,Often,,,,Most of the time,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,,,,,20,10,20,30,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Organization is small and cannot afford a data science team",,,,Often,Most of the time,,,,,,,,,,,Often,,,,,,,76-99% of projects,Entirely internal,Other,None,Non-standard formats for dates and text,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Rarely,,USD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Mexico,45,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Other,Text Mining,R,Government website,"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,70,10,0,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series",Neural Networks - CNNs,A bachelor's degree,Financial,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Always,100GB,"CNNs,Regression/Logistic Regression","Python,R,SAP BusinessObjects Predictive Analytics,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,Often,,,,,,,,Rarely,,,Sometimes,,,,"CNNs,Time Series Analysis",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,20,40,10,30,0,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Most of the time,624000,MXN,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,Very useful,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,20,20,30,20,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,20 to 99 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Video data,Other",Most of the time,1GB,Bayesian Techniques,"Amazon Machine Learning,C/C++,Jupyter notebooks,Python,TensorFlow",Rarely,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,15,30,30,20,5,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,YouTube Videos",,,,,,,Very useful,,Very useful,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Software Developer/Software Engineer",Other,20,10,0,0,20,50,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,Less than one year,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Sometimes,,Bayesian Techniques,"SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,10,10,10,10,20,40,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,Often,,,,,,,Often,,Often,Often,,,,Often,,Often,,,,,10-25% of projects,,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1300000,INR,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,Self-taught,80,10,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,DataRobot,Python,R,Other",,Rarely,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,"Association Rules,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing",,Often,,,,,,Often,,,,,,,,Often,,Sometimes,Sometimes,,,,,,,,,,,,,,,20,50,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,Sometimes,,,,Sometimes,,,Often,Most of the time,,,Most of the time,,Most of the time,,,,,,Often,,10-25% of projects,More internal than external,Standalone Team,,hipaa laws,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",University courses,50,0,0,50,0,0,"Machine Translation,Natural Language Processing,Reinforcement learning,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,I prefer not to answer,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,25,25,0,25,25,0,Natural Language Processing,"Bayesian Techniques,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,30,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Manufacturing,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SAS Base,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Sometimes,,Rarely,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,,Sometimes,,,,Sometimes,,,,,,Sometimes,Sometimes,,,Often,,Sometimes,,Often,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Often,,,Sometimes,,Often,,Sometimes,,,,,,Most of the time,,,,26-50% of projects,More internal than external,Central Insights Team,,,,,,,,,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Engineer,Poorly,Employed by non-profit or NGO,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,"KDnuggets Blog,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Engineer,Other",University courses,50,0,10,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,"5,000 to 9,999 employees",,More than 10 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",,,,,"Amazon Web services,Hadoop/Hive/Pig,Java,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,Sometimes,,,Sometimes,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Sometimes,Rarely,Rarely,Rarely,,,Rarely,Rarely,,Rarely,,Rarely,Rarely,Rarely,Sometimes,,Rarely,Sometimes,Rarely,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,,30,30,5,10,25,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git,Other",Sometimes,"110,000",,Has increased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Web services,Deep learning,Python,GitHub,"Blogs,Friends network,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,20,30,0,10,0,,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,,Basic laptop (Macbook),Image data,Never,10GB,Neural Networks,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Neural Networks",,,,,,Sometimes,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,50,20,0,30,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Engineer,Other",Self-taught,30,5,25,40,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Ensemble Methods,Logistic Regression",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,100GB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Unix shell / awk",,Sometimes,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"Lift Analysis,Logistic Regression,Natural Language Processing,Simulation,Text Analytics",,,,,,,,,,,,,,,Often,Often,,,Often,,,,,,,,Often,,Often,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"GitHub,Google Search","Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Other",Other,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,60,40,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Neural Networks - RNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Chile,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,Python,GitHub,"Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,Software Developer/Software Engineer,Self-taught,80,0,0,0,0,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression","IBM Watson / Waton Analytics,Julia,Jupyter notebooks,Mathematica,Python",,,,,,,,,,,,,Rarely,,,Sometimes,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes,Natural Language Processing,Time Series Analysis",,,Sometimes,,,,,,,,,,,,,Rarely,,Rarely,Sometimes,,,,,,,,,,,Most of the time,,,,40,15,10,20,15,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,Private,Private,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,12000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Textbook",,,Very useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,,"Decision Trees - Random Forests,Logistic Regression",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Physics,Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",60,30,0,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,Other,University courses,80,10,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Technology,I prefer not to answer,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"Jack's Import AI Newsletter,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),,"Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important +Male,Germany,30,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Kaggle,Official documentation,YouTube Videos",Very useful,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,Not Useful,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,0 - 1 hour,PhD,No,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Programmer,Researcher,Statistician",University courses,60,0,20,20,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Text Mining,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Not Useful,,,,Very useful,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,"DataTau News Aggregator,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Other",Self-taught,80,20,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,Most of the time,,Often,,,,,Rarely,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Sometimes,,,,Sometimes,,Often,Most of the time,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation",Sometimes,Sometimes,Sometimes,,Sometimes,Often,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,Often,,Often,Rarely,Rarely,Sometimes,Often,Often,Sometimes,,Sometimes,Rarely,,,,,,,10,10,10,10,10,50,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,Rarely,,51-75% of projects,Entirely internal,Central Insights Team,,Data storage / access systems are not optimized for efficient use,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,Other,Never,217000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Italy,51,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Other,Self-taught,50,10,40,0,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Minitab,NoSQL,Orange,Python,R,RapidMiner (free version)",,,,,,,,,,,,,,,Sometimes,,Often,,,Sometimes,Often,,,,,Sometimes,Often,,Most of the time,,Most of the time,,Most of the time,,Often,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,Often,,,,Most of the time,Most of the time,Often,,,,,Sometimes,Most of the time,,Most of the time,,Often,,,Most of the time,,Most of the time,,,Often,Often,Most of the time,,Often,,,,0,40,20,20,20,0,Enough to refine and innovate on the algorithm,"Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,,,,,,,Often,,,,,,,,,Most of the time,,Most of the time,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Singapore,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,20,30,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Google Search,"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),,A professional degree,Government,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,100MB,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,"Logistic Regression,Random Forests",,,,,,,,,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,40,20,0,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,Often,,Often,Often,,Often,,,Often,Sometimes,,,,Sometimes,Sometimes,Sometimes,,100% of projects,Do not know,Business Department,BLS;Census;National Center for Education Statistics,Tidying,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,90000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed part-time,,,Yes,,Computer Scientist,Perfectly,Employed by non-profit or NGO,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Official documentation,Online courses,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,,,,Somewhat useful,Very useful,,,,,,,Very useful,"Jack's Import AI Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,Researcher,University courses,15,15,10,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Non-profit,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service",Image data,Most of the time,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation,SVMs",,,,Most of the time,,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Sometimes,,,,Sometimes,,,Often,,,Often,,Most of the time,,,,,,25,30,10,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,,Sometimes,,Often,,Most of the time,,Most of the time,,,,,,Most of the time,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Sometimes,60000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Other,"College/University,Online courses",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,18,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,Python,,"Blogs,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,Very useful,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",,Technology,"10,000 or more employees",Decreased significantly,1-2 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100MB,"Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,NoSQL,Python,R",Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,,,,,,,,,Often,,,,,,Often,Often,,,,,,Sometimes,Often,,,,20,40,10,30,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,,,,,,Sometimes,,Often,,,26-50% of projects,More internal than external,Standalone Team,None,dirty data,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Subversion,Sometimes,230000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +A different identity,United States,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Mathematica,Bayesian Methods,C/C++/C#,I collect my own data (e.g. web-scraping),"Personal Projects,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,"Engineer,Operations Research Practitioner,Statistician",Other,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A doctoral degree,Government,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Other,Sometimes,10TB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Regression/Logistic Regression","C/C++,MATLAB/Octave,Stan,Other,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,"Association Rules,Ensemble Methods,Logistic Regression,Markov Logic Networks,Time Series Analysis",,Sometimes,,,,,,,Rarely,,,,,,,Sometimes,Often,,,,,,,,,,,,,Often,,,,20,25,5,25,25,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,Government data ,organization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,158000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +A different identity,Japan,48,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Very useful,,5-10 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Udacity,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Other,60,10,10,0,2,18,"Computer Vision,Natural Language Processing,Recommendation Engines,Time Series","Logistic Regression,Neural Networks - CNNs",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Philippines,21,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,Self-taught,50,0,0,50,0,0,,,A doctoral degree,Insurance,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,"Researcher,Other",Self-taught,65,0,0,20,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Very important,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Textbook",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist",Kaggle competitions,30,10,30,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Support Vector Machines (SVMs)",High school,Internet-based,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Rarely,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Random Forests","Jupyter notebooks,NoSQL,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,Often,,,,,,,,,Often,,,,Rarely,,Often,,,,"A/B Testing,Data Visualization,Ensemble Methods,Natural Language Processing",Often,,,,,,Most of the time,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,70,10,10,5,5,NA,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,Sometimes,,,,,,,,,Most of the time,,Often,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Git,Always,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United States,59,Employed full-time,,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,Python,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Researcher,Other",University courses,5,0,70,25,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,SAS Base,SAS Enterprise Miner,SAS JMP",,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Sometimes,Often,Often,Often,,,Sometimes,,Sometimes,,Often,,,,,Often,,Sometimes,,,Often,,,,Most of the time,,,,10,25,0,25,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues",,,,,Sometimes,,,,,,,,,,,,Often,,,,,,10-25% of projects,More internal than external,Other,,,Other,Share Drive/SharePoint,,Git,Rarely,190000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,,"No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",55,25,15,5,0,0,Recommendation Engines,"Decision Trees - Random Forests,Ensemble Methods",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Statistician",University courses,30,0,18,50,2,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Amazon Web services,DataRobot,Microsoft R Server (Formerly Revolution Analytics),NoSQL,R,Tableau",,Most of the time,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Textbook,YouTube Videos",,,,,,Somewhat useful,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,10,25,50,5,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Other,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,Spark / MLlib",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"Bayesian Techniques,Decision Trees,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics",,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,Sometimes,,Sometimes,Most of the time,,,,Sometimes,,,,,,Often,,,,,30,40,10,15,5,0,Enough to explain the algorithm to someone non-technical,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,130000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,36,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,Very useful,FastML Blog,< 1 year,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Other,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Not Useful,,Somewhat useful,Very useful,,Very useful,Very useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,75,15,5,5,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Always,1TB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Java,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Often,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Logistic Regression,Prescriptive Modeling",Often,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,15,15,10,20,40,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Rarely,500000,NPR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Singapore,NA,Employed full-time,,,Yes,,Data Analyst,,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,,Python,"GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Other",Kaggle competitions,25,25,10,0,40,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",,,,"Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Most of the time,,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,Often,Often,Often,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods",,,,,,,Often,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,Most of the time,,Most of the time,,,Most of the time,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,,,,,,5,,,,,,,,,,,,,,,,,, +Male,Poland,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,Researcher,University courses,0,0,50,50,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Increased slightly,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Traditional Workstation",Other,Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,Often,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,SVMs",,,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Regression,R,Google Search,"Blogs,Conferences,Friends network,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"DBA/Database Engineer,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,Reinforcement learning,Logistic Regression,A doctoral degree,Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,R",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Rarely,Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,80,15,0,0,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Sometimes,,Often,,,,Often,,,,,,,Often,,,,Often,,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,95000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Finland,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,,"Data Elixir Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Statistician",University courses,30,10,45,10,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Perl,Python,QlikView,R,SQL,Stan,Tableau,Other",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,Rarely,Sometimes,Rarely,Most of the time,,,,,,,,,Sometimes,Rarely,,Most of the time,,,,Often,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis,Other",,,,,Rarely,Rarely,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Often,,Rarely,Sometimes,Rarely,Sometimes,,Sometimes,Rarely,,Sometimes,Rarely,,Sometimes,Sometimes,Sometimes,,,70,5,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,,Rarely,Most of the time,Rarely,,,Rarely,,,,Sometimes,Rarely,Rarely,,Sometimes,Sometimes,Often,Often,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,"54,000",EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Africa,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Time Series Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,Bayesian Techniques,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important +Male,United States,23,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Republic of China,21,"Not employed, but looking for work",,,,,,,,Python,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Ensemble Methods",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important +Male,Other,24,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,SAS Base,Time Series Analysis,R,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,FlowingData Blog",1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,,1 to 2 years,Business Analyst,Self-taught,50,50,0,0,0,0,Adversarial Learning,Logistic Regression,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,India,47,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Very useful,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity,Other","Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,40,30,25,0,5,0,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important +Male,United States,42,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,45,Employed full-time,,,Yes,,Data Analyst,Poorly,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,20,30,10,30,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data,Relational data",Never,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Orange,Python,R,SQL",,,,Rarely,,,,,,,,,,,Most of the time,,Often,,,,Sometimes,,,,,,Most of the time,,Rarely,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,Sometimes,Often,Sometimes,,,,,,Often,,,,,,,Often,,Sometimes,,,Often,,,,,,,,10,60,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,,Sometimes,,Most of the time,Sometimes,,Sometimes,,,,,,,,,Often,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,200000,BRL,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,51,Retired,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,C/C++/C#,University/Non-profit research group websites,Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,,Work,0,0,100,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Researcher",University courses,40,0,10,50,0,0,"Reinforcement learning,Speech Recognition,Time Series,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Academic,"1,000 to 4,999 employees",,,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Workstation + Cloud service",,Most of the time,1PB,"CNNs,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,Not Useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Not Useful,Somewhat useful,,,,Very useful,"Data Elixir Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Machine Learning Engineer,Other",Work,40,10,50,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,Often,,,,Often,,,,,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics",Often,,,Often,,,,Often,,,,,,Sometimes,,Rarely,,,Often,Often,Sometimes,,Often,Sometimes,,,Rarely,,Often,,,,,50,15,20,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,Sometimes,,,Sometimes,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",Cloud,"Bitbucket,Git",Sometimes,"160,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Mexico,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,Financial,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,Bayesian Techniques,"Hadoop/Hive/Pig,Impala,Python,R,Spark / MLlib,SQL",,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Segmentation",,,Often,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,40,10,0,40,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Sometimes,,,,,Most of the time,,,Most of the time,Most of the time,,,,,,Rarely,Sometimes,,,100% of projects,More internal than external,IT Department,"INEGI (Mexico), Datos Abiertos (Mexico)","Fitting in limit tools, cleaning data.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Rarely,18000,MXN,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,Less than a year,Researcher,Self-taught,100,0,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,Don't know,A general-purpose job board,Somewhat important,Other,Laptop or Workstation and local IT supported servers,Image data,Never,<1MB,Neural Networks,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,90,10,0,0,0,0,Enough to run the code / standard library,Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Rarely,,,,,None,Entirely internal,Standalone Team,na,I am new,Column-oriented relational (e.g. KDB/MariaDB),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,90000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Brazil,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Cluster Analysis,R,Other,"College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,,,"Coursera,DataCamp,edX,Udacity,Other",Traditional Workstation,2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Personal Projects,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,,,Very useful,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,Workstation + Cloud service,11 - 39 hours,PhD,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important +Male,Malaysia,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Flume,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,Somewhat useful,,Very useful,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,5,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,100 to 499 employees,Decreased slightly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,DataRobot,Hadoop/Hive/Pig,Impala,MATLAB/Octave,Python,QlikView,R,Spark / MLlib,SQL",,,,,Most of the time,Rarely,,,Most of the time,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,Often,Most of the time,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs",,Most of the time,,,Often,,Most of the time,Most of the time,,,,,,Most of the time,Sometimes,,,,,Sometimes,,,Most of the time,Often,,Most of the time,,Often,,,,,,40,30,15,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,,Often,Sometimes,,Most of the time,,26-50% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Other,Sometimes,160,MYR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,27,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,"Data Analyst,Data Scientist,Researcher",Self-taught,25,10,55,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,"10,000 or more employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,36,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Podcasts,YouTube Videos",,,Very useful,,Somewhat useful,,Very useful,,,,,,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,,"Data Analyst,Engineer,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,56,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,51,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hungary,32,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,Jack's Import AI Newsletter,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,Yes,Doctoral degree,Other,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,5,5,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Italy,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,5,65,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - RNNs,High school,Technology,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Relational data,Other",Sometimes,10GB,"Ensemble Methods,Neural Networks,RNNs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"Decision Trees,Neural Networks,Random Forests,RNNs,Time Series Analysis",,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,Sometimes,,Often,,,,,Often,,,,55,20,10,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Privacy issues",Sometimes,Sometimes,,Sometimes,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,Julia,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,,,,"FastML Blog,Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",1-2 years,,Nice to have,Nice to have,,Necessary,,,,Nice to have,,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,No,Doctoral degree,Electrical Engineering,More than 10 years,"Researcher,I haven't started working yet",Self-taught,60,10,0,30,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,,,Very Important,,Somewhat important,Somewhat important,,Somewhat important,,,Somewhat important +Female,Romania,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Other,Other,40,0,40,0,0,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A professional degree,Technology,100 to 499 employees,Decreased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,,,Often,,Often,,,Rarely,Sometimes,Most of the time,,Sometimes,,,Often,,Sometimes,Rarely,Sometimes,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Need to coordinate with IT,Privacy issues",,,,,,,,,,,,,Often,,Most of the time,,Often,,,,,,100% of projects,More external than internal,Standalone Team,-,Code automation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",Server,Other,Rarely,-,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Russia,22,"Not employed, but looking for work",,,,,,,,Cloudera,Bayesian Methods,Haskell,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,Somewhat useful,Not Useful,,,Very useful,Somewhat useful,,Very useful,,Very useful,,,Very useful,,,Somewhat useful,,"FastML Blog,KDnuggets Blog",1-2 years,Nice to have,Necessary,Necessary,,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),GPU accelerated Workstation",40+,Github Portfolio,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,45,5,10,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,,,,,,,,,,,,,,,, +Male,Norway,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,I never declared a major,3 to 5 years,"Business Analyst,DBA/Database Engineer",Self-taught,50,30,20,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - RNNs",Primary/elementary school,Telecommunications,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SAS Base,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,Most of the time,,Most of the time,,,,,,Sometimes,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,Often,,,,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,Often,,Most of the time,,Often,Often,,,,,Most of the time,,,,,,,,80,5,3,10,2,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,Sometimes,Often,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Often,Often,Often,Most of the time,Most of the time,,,Sometimes,,Most of the time,Often,,76-99% of projects,More internal than external,Central Insights Team,Demographics data from gov open data,"Accessing, cleaning and transforming","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,"36,400",,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,Random Forests,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,6 to 10 years,Data Analyst,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Other,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,25,0,0,50,25,0,,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,Often,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"56,400",,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Tableau,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,"Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,1 to 2 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",10,40,40,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,A master's degree,Financial,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Relational data,Other",Most of the time,10GB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,Rarely,,,,,Often,Sometimes,,,Often,,,Rarely,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,Rarely,,,Often,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data",Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,Industry Leading Investment Real Estate Data; Trepp Data; CoStar Data; Moody's CRD,"The EDA Process. But, that's also the part where we learn the most from the data.","Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,106000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Other,37,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Amazon Web services,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Online courses,Textbook",,,,,Somewhat useful,,Very useful,,Very useful,,Very useful,,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,Less than a year,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Recommendation Engines,Reinforcement learning",Ensemble Methods,A doctoral degree,Academic,"1,000 to 4,999 employees",Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,MATLAB/Octave,NoSQL,R,SQL,Unix shell / awk",,Often,,Sometimes,,,,Sometimes,Sometimes,,,,,,Often,,,,,,Often,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Recommender Systems,Simulation,Time Series Analysis",,Sometimes,,,,Often,Most of the time,Often,,,,,,Often,,Sometimes,,,,,,,,Often,,,Most of the time,,,Sometimes,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,Sometimes,,,,,,Often,,,,,Sometimes,,,,Often,Most of the time,,26-50% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Most of the time,20000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,37,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Very useful,,,Very useful,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,I never declared a major,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,30,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Neural Nets,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),I prefer not to answer,Computer Science,,"Engineer,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,Time Series,"Logistic Regression,Neural Networks - GANs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,21,"Not employed, but looking for work",,,,,,,,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Textbook",,,,,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,,,No,Some college/university study without earning a bachelor's degree,Other,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,"College/University,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,United Kingdom,39,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Other,GPU accelerated Workstation,"Image data,Relational data",Rarely,,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,,,,Most of the time,,Most of the time,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Recommender Systems",,,,Sometimes,Sometimes,Most of the time,Often,Often,Often,,,Often,,,,,,,,Often,,,Often,Often,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,33,Employed full-time,,,Yes,,Other,Poorly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,20,40,0,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,"A friend, family member, or former colleague told me",Very important,Other,Workstation + Cloud service,Image data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,16,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,45,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,Not Useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Computer Scientist,Engineer,Researcher",Self-taught,20,40,0,20,20,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10GB,Neural Networks,"Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow,TIBCO Spotfire",,Rarely,,,,,,Rarely,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Sometimes,Rarely,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Segmentation",,,,,,,Most of the time,,,,,,,Sometimes,,Rarely,,Rarely,,Often,,,,,,Sometimes,,,,,,,,30,30,10,20,10,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,S3,Bitbucket,Sometimes,,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Canada,52,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,10,0,10,80,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines","Microsoft Excel Data Mining,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Text Analytics",,,,,,Often,Often,Often,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,15,15,10,10,50,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,,,,,Most of the time,,Most of the time,,,,26-50% of projects,Entirely internal,IT Department,Census data,dirty data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Subversion,Sometimes,108000,CAD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,Researcher,University courses,30,40,20,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Rarely,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SAS Base,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,Rarely,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,Often,Sometimes,,Sometimes,,,,Most of the time,,Most of the time,,,,,Sometimes,,,,Most of the time,,,Most of the time,Often,,Most of the time,Sometimes,,,"CNNs,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis,Other",,,,Often,,,Most of the time,,,,,,,Sometimes,,,,,Most of the time,Often,Often,,Sometimes,,Sometimes,,,Sometimes,Most of the time,Often,,Most of the time,,50,15,10,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Unavailability of/difficult access to data",,Often,,Sometimes,Most of the time,Often,,,Often,Sometimes,Often,Often,,,Sometimes,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,ACS; NLS; GitHub,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Other,Sometimes,95000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A",Very useful,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Business Analyst,Self-taught,30,65,0,5,0,0,Computer Vision,Neural Networks - CNNs,A professional degree,Other,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,,1GB,"CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,"CNNs,Data Visualization,Logistic Regression,Neural Networks",,,,Often,,,Most of the time,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,30,40,0,5,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Often,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,100% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Stack Overflow Q&A,Textbook",Very useful,,,,Very useful,,Very useful,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,10,20,60,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",,,,Sometimes,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Sometimes,,,Sometimes,,,,Most of the time,,,,,,,Often,,,,0,0,0,0,0,100,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Often,,,,,,,Most of the time,,Often,,,,Most of the time,Often,,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Rarely,145000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Singapore,39,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by government,R,Deep learning,R,Government website,"Blogs,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,Very useful,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Traditional Workstation,0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Logistic Regression,Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,58,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,SAS Base,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer,Other",Kaggle competitions,NA,0,0,0,100,0,Survival Analysis,Logistic Regression,,Technology,,,,,Very important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,100MB,"Decision Trees,Neural Networks,SVMs","C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,,,Often,,,,,,,,Often,,,,Most of the time,,,Often,,,,,Most of the time,,,,,,10,40,0,0,50,0,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,Standalone Team,Client provided data,Clear documentation of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Rarely,3000000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,18,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,,NA,I prefer not to say,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Self-employed",KNIME (commercial version),,C/C++/C#,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,More than 10 years,"Data Analyst,Data Miner,Machine Learning Engineer,Researcher",Work,20,0,80,0,0,0,"Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),A doctoral degree,Academic,,,,,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Other,Rarely,100MB,Other,"C/C++,Jupyter notebooks,Python,R,Statistica (Quest/Dell-formerly Statsoft)",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,"Cross-Validation,Decision Trees,Neural Networks,Other",,,,,,Most of the time,,Often,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,30,50,10,10,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Limitations of tools",,Often,,,,,,,,,,,Often,,,,,,,,,,100% of projects,More internal than external,Other,,,,,,,Always,24000,RUB,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Scientist,Researcher",University courses,60,10,0,30,0,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Technology,I prefer not to answer,Increased significantly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data",,,,"C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,GANs,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,SVMs,Time Series Analysis",,,,Often,,Often,Often,Often,,,Often,,Often,Often,,Often,,,,Often,Often,Often,Often,,Often,,,Often,,Often,,,,50,10,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Proprietary Algorithms,C/C++/C#,Google Search,"Company internal community,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Researcher,Self-taught,80,0,0,20,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,Fewer than 10 employees,Increased slightly,6-10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Most of the time,,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Random Forests,Regression/Logistic Regression","C/C++,Python,R,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation",,,Often,,Often,Often,Often,Often,,,,,,Often,,Often,,,,Often,Often,,Often,,,,Often,,,,,,,35,25,10,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,Often,,,Most of the time,,,,Sometimes,,,Sometimes,,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Rarely,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Africa,32,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by company that makes advanced analytic software,Cloudera,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",25,40,25,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",High school,Other,20 to 99 employees,Increased slightly,More than 10 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",Sometimes,Sometimes,,,,,,,Sometimes,,,,,,Rarely,,Sometimes,,,,,Sometimes,Often,Often,Most of the time,,Often,Often,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Naive Bayes,Prescriptive Modeling,RNNs,Simulation,Time Series Analysis",,,Often,,,,Most of the time,Often,,Sometimes,,,,,,Often,Sometimes,Sometimes,,,,Often,,,Sometimes,,Most of the time,,,Often,,,,40,10,10,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,Often,,Often,Often,,,Sometimes,Sometimes,,Often,,,Often,,,Sometimes,Most of the time,,,Often,,76-99% of projects,More external than internal,Standalone Team,Open Street Map,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Julia,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",80,5,NA,13,2,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Other",,10GB,"CNNs,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,Rarely,,,,"CNNs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Simulation,SVMs",,,,Sometimes,,,,,,,,,,Often,,Often,,Rarely,,Sometimes,,,,,,,Often,Often,,,,,,50,15,0,15,20,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Most of the time,,,,8,,,,,,,,,,,,,,,,,, +Male,Poland,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,CRM/Marketing,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10MB,"Decision Trees,Random Forests","Jupyter notebooks,NoSQL,Python,R,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Often,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Often,,Often,,,,,,Sometimes,,,,,Sometimes,,Often,,Often,,,Often,,,Sometimes,,,,,50,10,10,10,10,10,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,10-25% of projects,Approximately half internal and half external,Standalone Team,social media; Polish Central Statistical Office; geodata,"Social media APIs changing overtime, social media data access policies","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint,Other",GIT,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle,Personal Projects,Textbook",,,Very useful,Very useful,,,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",4,5,0,90,1,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,1GB,"Ensemble Methods,Evolutionary Approaches,HMMs,Random Forests","Amazon Web services,C/C++,Google Cloud Compute,Java,Mathematica,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Often,,Often,,,,Sometimes,,,,,,,Often,,,,,Sometimes,Sometimes,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Random Forests,Segmentation,Text Analytics",Often,,,,,Most of the time,Most of the time,Often,,Often,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,Most of the time,,,,,65,15,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,Often,Often,,,,,,,,,Most of the time,,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,None,Availability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,135000,USD,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Argentina,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,30,60,0,10,0,0,Computer Vision,Other (please specify; separate by semi-colon),A doctoral degree,Technology,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Very useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Researcher,Self-taught,30,30,15,0,15,10,Time Series,"Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,500 to 999 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Most of the time,10GB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",,,,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,Sometimes,,,Most of the time,,,,70,10,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,quandl;datastream,formatting,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Seafile,Other,Never,40000,EUR,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Colombia,53,Employed full-time,,,No,Yes,Other,Fine,Employed by government,DataRobot,Association Rules,R,"Google Search,Government website","Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,YouTube Videos",,Not Useful,Somewhat useful,,Somewhat useful,Not Useful,Not Useful,,,Somewhat useful,Not Useful,Not Useful,Not Useful,,,,,Somewhat useful,"FastML Blog,Jack's Import AI Newsletter,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Other,Self-taught,33,0,34,33,0,0,Other (please specify; separate by semi-colon),,Primary/elementary school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Amazon Web services,Deep learning,Python,Government website,"Conferences,Online courses,Personal Projects",,,,,Very useful,,,,,,Very useful,Somewhat useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,,University courses,10,60,10,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)",Support Vector Machines (SVMs),A master's degree,Academic,I don't know,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Image data,Text data,Other",Sometimes,,SVMs,"Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,Text Analytics",,,,,,,Often,,,,,,,Often,,,,,Often,,,,,,,,,,Often,,,,,20,20,0,20,40,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,Rarely,,,Most of the time,,,,Often,Most of the time,Most of the time,,,,,,,,,,,,100% of projects,More external than internal,Standalone Team,open data; governmental data; social network data,"Some data is dirty, and not well verified.","Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,20800,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,56,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Text Mining,Python,I collect my own data (e.g. web-scraping),College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,Researcher,University courses,30,0,0,70,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Pharmaceutical,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,,,Never,,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,90,0,0,10,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Business Department,pharmaceutical companies,Not know now,Graph (e.g. GraphBase/Neo4j),"Email,Share Drive/SharePoint",,Other,Sometimes,1000,USD,Has decreased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Matlab,Google Search,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,Necessary,Unnecessary,Nice to have,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,30,0,20,50,0,0,Time Series,Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",More than 10 years,Researcher,University courses,10,5,0,80,0,5,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,I don't know,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,38,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,Julia,Proprietary Algorithms,C/C++/C#,University/Non-profit research group websites,"Blogs,Conferences,Kaggle,Official documentation,Textbook",,Very useful,,,Very useful,,Very useful,,,Somewhat useful,,,,,Very useful,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Predictive Modeler,Programmer,Researcher",University courses,70,0,0,10,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,Microsoft Azure Machine Learning,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Other,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Never,10MB,Bayesian Techniques,"Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,Most of the time,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Most of the time,Most of the time,Often,,Often,,,Often,,Sometimes,Often,,,,Often,,Often,Often,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,49000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,Somewhat useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data",Sometimes,100GB,"CNNs,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Minitab,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,Rarely,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Most of the time,,Most of the time,,,Most of the time,Sometimes,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,Sometimes,,Most of the time,Most of the time,,Most of the time,,Sometimes,,,,20,30,30,20,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Often,,,,,,,,,,,,,,,Often,,,,,,,26-50% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,,400000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer",University courses,50,10,10,20,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,10MB,"Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,Spark / MLlib,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,,,Often,,,,,,,"Collaborative Filtering,Logistic Regression,Random Forests",,,,,Sometimes,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,40,20,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,Unavailability of/difficult access to data",,Sometimes,Often,,,,,,Most of the time,,Most of the time,,,,,,,Sometimes,,,Most of the time,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,60000,USD,Other,6,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,Other",Self-taught,30,10,50,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100GB,"Decision Trees,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,IBM SPSS Modeler,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Often,Sometimes,,,Often,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Time Series Analysis",Sometimes,,,,,,Most of the time,Often,,,,,,Sometimes,,Often,,,,,Sometimes,Often,,,,Sometimes,,,,Sometimes,,,,50,15,10,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",,,,,Most of the time,,,,,,,,,Often,Often,,,,,,,,51-75% of projects,Entirely internal,Other,"Acxiom, Census, Weather data",Dirty Data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Access to client database,Other,Most of the time,132000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",15,60,0,0,25,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,<1MB,,"Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,Often,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Markov Logic Networks",Sometimes,,,,,,Often,,,,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,,Python,,"College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,60,20,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,,,"Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Perl,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Sometimes,,,Often,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,Competitor Data ,Data handling ; data Storage ; Tools,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,I don't typically share data,Share Drive/SharePoint",,"Subversion,Other",Rarely,"110,000",USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,30,15,15,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,Financial,100 to 499 employees,Stayed the same,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks","Microsoft SQL Server Data Mining,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Naive Bayes,Neural Networks,SVMs",Sometimes,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Poorly,"Employed by a company that performs advanced analytics,Self-employed",IBM Watson / Waton Analytics,Rule Induction,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Friends network,Kaggle,Stack Overflow Q&A,Textbook",Very useful,,,,,Very useful,Very useful,,,,,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,Linear Digressions Podcast,Partially Derivative Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",27,40,20,0,13,0,"Computer Vision,Natural Language Processing",Evolutionary Approaches,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Female,Egypt,40,Employed full-time,,,No,Yes,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Udacity,"Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Other,I don't write code to analyze data,"Business Analyst,Programmer,Software Developer/Software Engineer",Other,10,80,0,0,10,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Canada,47,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Online courses,Podcasts,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,30,20,0,0,"Recommendation Engines,Unsupervised Learning,Other (please specify; separate by semi-colon)","Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Other","Image data,Video data,Text data",Most of the time,10GB,"CNNs,Ensemble Methods,RNNs","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,RNNs",,,,Most of the time,Often,Often,,,Often,,,,,Often,,,,,Sometimes,,Sometimes,,,Often,Sometimes,,,,,,,,,10,30,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,Sometimes,Often,Often,,,Sometimes,,Sometimes,,,Sometimes,,,,,,,Often,,76-99% of projects,Approximately half internal and half external,Standalone Team,project dependent can't list,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Rarely,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,50,"Not employed, but looking for work",,,,,,,,Amazon Web services,I don't plan on learning a new ML/DS method,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,Not Useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,O'Reilly Data Newsletter,15+ years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer",University courses,10,0,50,40,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Female,Philippines,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Time Series Analysis,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,Somewhat useful,Somewhat useful,,3-5 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop or Workstation and local IT supported servers,Other",2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series",Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,44,"Not employed, but looking for work",,,,,,,,,Time Series Analysis,R,Google Search,"Blogs,College/University,Friends network,Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,Yes,Master's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Engineer,Operations Research Practitioner,Other",Work,50,5,30,15,0,0,"Outlier detection (e.g. Fraud detection),Time Series",,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Textbook",,Very useful,,,,,Somewhat useful,,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,More than 10 years,Engineer,University courses,20,20,40,20,0,0,,,A bachelor's degree,Manufacturing,500 to 999 employees,Stayed the same,6-10 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,,,"Python,QlikView,SAS JMP,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,Rarely,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,20,40,0,20,20,0,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Most of the time,Often,,100% of projects,Entirely internal,Other,,Multiple names and locations of data fields,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,,Never,250000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Pakistan,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Data Analyst,Data Miner,Researcher",University courses,20,25,35,20,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,Yes,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Employed by government",SAS Enterprise Miner,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Very useful,,,,,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,"The Data Skeptic Podcast,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst,Other",University courses,10,10,20,50,10,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Stan,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Perl,Python,R,Spark / MLlib,SQL,Unix shell / awk",Sometimes,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,Most of the time,Often,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation",Rarely,Rarely,Sometimes,,,Sometimes,Often,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,50,25,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,Often,,,,,Often,Often,,,,,Often,,,,,,Sometimes,,,100% of projects,More internal than external,Other,Publically available genome sequencing data,Internal data is fragmented and difficult to extract,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Netherlands,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Company internal community,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,Very useful,,,,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,"Data Elixir Newsletter,DataTau News Aggregator,FastML Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Programmer",University courses,20,30,20,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow",Sometimes,Sometimes,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Sometimes,Sometimes,,Often,,Sometimes,Often,,,Often,,,,,,Sometimes,,Sometimes,Often,,Often,,,,,Sometimes,,Most of the time,,,,85,5,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,Most of the time,Most of the time,,,,,,Often,,,,,,Often,,,Most of the time,Often,,76-99% of projects,More internal than external,IT Department,Government Data,Understanding and Finding data from the DW,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Sometimes,55000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,45,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Other,Proprietary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Other",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,No Free Hunch Blog,5-10 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,DBA/Database Engineer,Operations Research Practitioner,Programmer,Researcher,Software Developer/Software Engineer",Other,50,0,0,0,50,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Other (please specify; separate by semi-colon)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Newsletters,Official documentation,Personal Projects,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,Somewhat useful,,,Very useful,"FastML Blog,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,Self-taught,50,20,0,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A bachelor's degree,Financial,10 to 19 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Most of the time,10MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",Sometimes,,,,,Sometimes,Often,,,,,,,Often,,Most of the time,,,,,Often,,,,,Most of the time,,,,,,,,60,20,NA,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,Often,,,,,Often,,,,,,Most of the time,Most of the time,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,540000,MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Colombia,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,IBM Watson / Waton Analytics,Bayesian Methods,R,Government website,"Blogs,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,Very useful,,,,,,,Very useful,,Very useful,Very useful,,,Very useful,,Very useful,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,80,5,5,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Other (please specify; separate by semi-colon)",High school,Academic,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Not very important,Other,Basic laptop (Macbook),Relational data,Never,1GB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,5,15,0,5,75,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Organization is small and cannot afford a data science team",,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,10-25% of projects,Entirely external,Other,DANE; GEM,"There is no challenge, all the data is structured.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,3000000,COP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer",Work,60,20,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Retail,20 to 99 employees,Stayed the same,6-10 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Rarely,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Simulation",Often,,,,,Most of the time,Most of the time,,,,,,,,Often,Most of the time,,,,,,,,,,,Often,,,,,,,80,15,1,1,3,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Sometimes,,Often,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Never,130000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos,Other",,Somewhat useful,Very useful,,,,Not Useful,,,Very useful,Not Useful,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,edX,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher",University courses,10,10,0,50,0,30,Computer Vision,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Not important,Not important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Very Important,Not important,Not important,Not important +Male,United States,32,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Other",Self-taught,30,0,30,0,0,40,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10GB,Regression/Logistic Regression,"Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing",Often,,,,,,,,,,,,,Often,Most of the time,Often,,,Rarely,,,,,,,,,,,,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,,,Most of the time,Often,,Often,Most of the time,,,,,Often,,,,,,Often,Often,,76-99% of projects,Approximately half internal and half external,Business Department,Client datasets,Privacy policies related to data ownership,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"300,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Portugal,33,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook",,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,,"Programmer,Other",University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,Self-taught,60,10,20,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,500 to 999 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,Often,Rarely,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,Rarely,,,,50,10,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,1 to 2 years,"Business Analyst,Operations Research Practitioner,Predictive Modeler",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Belarus,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,20,15,15,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,500 to 999 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Relational data,Other",Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Sometimes,,,,,,Often,,Most of the time,,,,,Most of the time,,Often,,,,,,,Most of the time,,,,20,10,10,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Often,,,,Most of the time,Most of the time,Sometimes,,,,Often,,Most of the time,Often,,Most of the time,,,26-50% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Other,Rarely,46000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",,,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United Kingdom,47,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,,Very useful,,Very useful,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,5,10,5,80,0,0,Recommendation Engines,"Ensemble Methods,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Mexico,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,45,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series",Logistic Regression,A master's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,100MB,Regression/Logistic Regression,"Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Orange,Python,QlikView,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,Sometimes,,,Most of the time,Sometimes,,Often,,Most of the time,,,,Often,,,,,,Rarely,,,,Often,,Often,,Most of the time,Sometimes,,,,,,,,,Sometimes,Most of the time,,,Sometimes,Sometimes,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Prescriptive Modeling,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,Often,,Often,,,,Often,,,,30,30,5,15,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,Most of the time,Often,,,Often,,,,Often,,,,,,Most of the time,,,,100% of projects,Do not know,IT Department,"google, inegi, kaggle",to understand at first sight the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,360000,MXN,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,NA,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,,Necessary,Nice to have,,Nice to have,,Necessary,,,,,,,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Management information systems,,"Data Analyst,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,,,,Somewhat important,,,,,,,,,,, +Female,Poland,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,Self-taught,50,30,0,20,0,0,"Computer Vision,Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,50,25,10,5,0,Speech Recognition,Markov Logic Networks,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,,Decision Trees,"Amazon Machine Learning,Amazon Web services,C/C++,Java,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,SQL",Rarely,Sometimes,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,30,20,30,10,10,0,Enough to run the code / standard library,Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Sometimes,,,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Python,,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Personal Projects,Stack Overflow Q&A,Other",,,,,,,,,,,,Very useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,,Self-taught,90,0,0,0,0,10,"Supervised Machine Learning (Tabular Data),Survival Analysis,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Other,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SAS Base,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,Most of the time,,,,,,,,,Often,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,,Often,Sometimes,,,,,,,Rarely,,,,65,15,5,15,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,Often,,,,,,,,,,,Often,,,Sometimes,,,76-99% of projects,More internal than external,Other,National Weather Service; Internal Revenue Service; Bureau of Labor Statistics; Bureau of Economic Analysis; US Census Bureau,Understanding well the underlying data generating process,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),,175000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,NoSQL,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,Very useful,,,,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,"Data Scientist,Other",Self-taught,NA,NA,NA,NA,NA,NA,Survival Analysis,Bayesian Techniques,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,57,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Self-employed,TensorFlow,Bayesian Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,Friends network,Official documentation,Textbook",Very useful,Somewhat useful,,,,Very useful,,,,Very useful,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,Fewer than 10 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Rarely,1TB,"CNNs,SVMs","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Often,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,33,33,10,10,14,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,Most of the time,,,Most of the time,,,,,,100% of projects,More internal than external,Standalone Team,TCIA,,Other,Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,41,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Google Cloud Compute,Factor Analysis,Python,I collect my own data (e.g. web-scraping),"Company internal community,Friends network,Kaggle,Online courses,Podcasts",,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,,,"Data Machina Newsletter,FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,"Business Analyst,Data Analyst,Engineer",Work,5,0,0,0,0,95,Time Series,Hidden Markov Models HMMs,A bachelor's degree,Internet-based,500 to 999 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,100GB,Decision Trees,"Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Rarely,Often,Often,Often,Often,,,,,,Often,,Sometimes,,,,,,,,Often,,,,Often,Rarely,,,,,,"A/B Testing,Data Visualization,Lift Analysis,Markov Logic Networks,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,100000,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,34,"Not employed, but looking for work",,,,,,,,SQL,Bayesian Methods,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Professional degree,,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Portugal,50,Employed full-time,,,No,Yes,Statistician,Fine,Employed by college or university,R,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Newsletters,Trade book",,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,Very useful,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Workstation + Cloud service,0 - 1 hour,Online Courses and Certifications,No,Doctoral degree,Mathematics or statistics,1 to 2 years,"Data Miner,Statistician",Self-taught,70,10,0,20,0,0,Time Series,"Bayesian Techniques,Logistic Regression",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Self-employed",Amazon Machine Learning,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Textbook",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,"Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Non-profit,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Decision Trees,Regression/Logistic Regression","Mathematica,MATLAB/Octave,R,Tableau",,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,"Decision Trees,Logistic Regression,Prescriptive Modeling",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,10,0,0,30,60,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,Most of the time,Often,,,,,Most of the time,Sometimes,,,,,,,Sometimes,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Most of the time,,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,47,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Anomaly Detection,Python,,"College/University,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,,,,,Very useful,Not Useful,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),5-10 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Master's degree,Engineering (non-computer focused),,"Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Programmer,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,United States,38,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,No Free Hunch Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +A different identity,Other,69,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Other,Deep learning,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Online courses,Personal Projects,Textbook,YouTube Videos,Other",Somewhat useful,Very useful,,,Somewhat useful,,,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Programmer,Researcher,Statistician",University courses,50,20,0,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Other,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Impala,Julia,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Other",,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,Time Series Analysis",Sometimes,,Sometimes,,,Often,Often,Sometimes,,,,Sometimes,Sometimes,,Sometimes,Often,,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,,,,Sometimes,,,,60,20,10,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,Often,,Often,Most of the time,,,,,,,Often,,,26-50% of projects,Entirely internal,Standalone Team,government data,"Combining and transforming it to the right shape to carry out the wide variety of analysis that has to be done, and spotting errors that occur in the data before comes into the analysis team.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Sometimes,,,,6,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,11-15,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Not important +Male,United States,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,20,0,0,70,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Support Vector Machines (SVMs),Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Colombia,34,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Perfectly,Self-employed,Python,Neural Nets,Python,"Google Search,University/Non-profit research group websites","College/University,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,Very useful,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Other,No,Professional degree,,I don't write code to analyze data,Software Developer/Software Engineer,University courses,10,10,0,80,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,"Blogs,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,,,,,,Very useful,,Very useful,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100TB,Other,"C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Lift Analysis,Text Analytics",Sometimes,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,30,20,20,0,30,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,Sometimes,,Often,,,,,,,,,Often,,,,,Often,,,10-25% of projects,More internal than external,IT Department,,,"Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",Rarely,"325,000",,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed part-time,,,No,Yes,Machine Learning Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,,,,,,< 1 year,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Male,Finland,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,Not Useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Researcher,University courses,35,5,35,20,5,0,Time Series,,High school,Academic,I prefer not to answer,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and private datacenters,Other,Most of the time,,Regression/Logistic Regression,"C/C++,Mathematica,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Simulation,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,,,70,5,5,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,Rarely,,,,,,Often,Sometimes,,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by non-profit or NGO",Other,Other,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos,Other",Very useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,"KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Other,Work,30,0,70,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Military/Security,"5,000 to 9,999 employees",Increased slightly,More than 10 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service,Other","Image data,Other",Most of the time,100GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,SVMs,Other","C/C++,Java,Mathematica,MATLAB/Octave,Unix shell / awk,Other",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,"Association Rules,Bayesian Techniques,CNNs,Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis,Other,Other",,Sometimes,Most of the time,Sometimes,,,Sometimes,,Sometimes,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,,,,,Most of the time,Often,,Most of the time,Most of the time,Most of the time,,5,45,20,15,15,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data,Other",,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,Rarely,Often,76-99% of projects,More internal than external,Other,many,changing sensors and formats; approvals to use data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Other",,"Bitbucket,Git,Mercurial,Subversion",Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Kaggle",,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,5,0,40,10,45,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Academic,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Never,1TB,"CNNs,Ensemble Methods,Random Forests,RNNs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,Often,,,,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,Segmentation",,,,,,Most of the time,Most of the time,Often,Often,,,,,,,Most of the time,,,,Most of the time,,Often,Most of the time,,Sometimes,Most of the time,,,,,,,,20,40,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,Sometimes,,,Sometimes,,,100% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,90000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Other,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,Researcher,Work,40,0,15,15,30,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Pharmaceutical,"10,000 or more employees",Increased slightly,6-10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Gradient Boosted Machines","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,Rarely,,,,Most of the time,Rarely,,,,,,Rarely,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,Sometimes,Sometimes,,,Sometimes,Often,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,Natural Language Processing,Neural Networks,Time Series Analysis",,,,Often,,Often,Often,,,,,Often,,,,,,,Often,Often,,,,,,,,,,Sometimes,,,,30,20,20,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,Sometimes,,,Often,,,,,Often,,,Often,,Most of the time,,Often,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,15340000,JPY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Other,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,,,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,Engineer,Work,50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Manufacturing,500 to 999 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Minitab,R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Other",,,,,,Often,Most of the time,Often,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,Most of the time,,,50,10,20,10,10,NA,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input",,,,,Most of the time,,,,,,Often,,,,,,,,,,,,100% of projects,Entirely internal,Business Department,No,messy data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Not Useful,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,More than 10 years,Other,Self-taught,95,0,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,500 to 999 employees,Stayed the same,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,Regression/Logistic Regression,"R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,Rarely,,,"A/B Testing,Data Visualization,Logistic Regression,Time Series Analysis",Rarely,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,75,5,0,15,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Rarely,Sometimes,,,,,,,,,,,,,,,Often,Often,,,Most of the time,,100% of projects,Entirely internal,Other,WDI;BLS;IPEDS,lack of access,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"150,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Stack Overflow Q&A",Somewhat useful,,,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Researcher",University courses,20,20,15,25,20,0,"Natural Language Processing,Unsupervised Learning","Ensemble Methods,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Academic,500 to 999 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Sometimes,1GB,"CNNs,Random Forests,RNNs","Cloudera,NoSQL,Python,R,SQL,TensorFlow",,,,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Lift Analysis,Natural Language Processing,Neural Networks,Random Forests",,,,,,Often,Most of the time,,,,,,,,Often,,,,Often,Sometimes,,,Often,,,,,,,,,,,40,15,10,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",,,,Sometimes,Most of the time,,,,,Sometimes,,,,,,,,Rarely,,,,,10-25% of projects,Entirely external,IT Department,Propietary data,Improve energy saving ,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,15000,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,35,15,0,25,0,Survival Analysis,"Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"1,000 to 4,999 employees",Decreased significantly,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Never,100MB,Decision Trees,"Java,MATLAB/Octave,Python",,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,50,10,0,20,20,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,Often,,,,,Often,,Often,,Often,Often,,,,,,Often,Often,,51-75% of projects,Entirely internal,Central Insights Team,No 3rd party datasets,Dont have clear insights of what problem to be looked for solution;Lack of proper hands-on experience in data analytics and corresponding python libraries;Lack of practical experience of incorporating ML techniques and data modelling.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Subversion,Never,1296000,INR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Ukraine,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",University courses,50,30,0,10,10,0,"Computer Vision,Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,"5,000 to 9,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",,,,,,Most of the time,Most of the time,Often,,,,,,,,Sometimes,,,,Often,Sometimes,,Often,Sometimes,,,,,,,,,,20,20,50,10,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,10000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Greece,24,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,25,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1TB,"Markov Logic Networks,Neural Networks,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Other,Other",,Most of the time,,,,,,,Most of the time,,,,,,Often,,Rarely,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,Most of the time,Most of the time,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",Often,,Sometimes,,,,Often,Sometimes,,,,,Most of the time,Often,,Sometimes,,Sometimes,Often,Most of the time,,,Sometimes,,,,,,Often,Most of the time,,,,35,25,45,5,0,0,"Enough to code it again from scratch, albeit it may run slowly",Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Rarely,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,25000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Jack's Import AI Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",75,0,0,25,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SQL,TensorFlow",,,,,,,,,,,,Sometimes,,,,,Often,,,,,,Often,,,,Sometimes,,,,Most of the time,,Sometimes,,,,Often,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Often,Often,Sometimes,Sometimes,,,Sometimes,,Often,,Most of the time,,,,Sometimes,Often,,Sometimes,,,,,Most of the time,Often,,,,,50,10,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Sometimes,Sometimes,,Often,Most of the time,,,,Often,,,Most of the time,Sometimes,,,Often,Often,,51-75% of projects,More internal than external,Other,,"Privacy issues, large datasets, computing resources.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Sometimes,"72,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,50,20,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,1GB,"CNNs,Decision Trees","Amazon Web services,Python,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Random Forests",,,,Most of the time,,Often,Often,Often,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,75,25,0,0,0,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,None,Labelling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Git",Most of the time,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,43,"Independent contractor, freelancer, or self-employed",,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,Python,Other,"Online courses,Other",,,,,,,,,,,Somewhat useful,,,,,,,,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",10-15 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),Less than a year,Other,University courses,10,10,0,80,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,Pakistan,21,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,,Sometimes,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests",Often,,,,,Often,Often,Often,Sometimes,,,Often,,Sometimes,,Often,,Often,Most of the time,,,,Often,,,,,,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,Sometimes,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,,"Arxiv,Blogs,Company internal community,Conferences,Official documentation,Stack Overflow Q&A",Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",5,5,90,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","DataRobot,Hadoop/Hive/Pig,Python,R",,,,,,Rarely,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,Simulation",Most of the time,,,,,Sometimes,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,Rarely,,,,,,,30,10,0,10,50,0,Enough to tune the parameters properly,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Most of the time,,,51-75% of projects,Approximately half internal and half external,Other,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,150000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Other,37,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Poorly,Self-employed,NoSQL,Link Analysis,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,5-10 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,,,"Coursera,DataCamp,edX,Udacity,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,Sort of (Explain more),Master's degree,Other,More than 10 years,"Programmer,Other",University courses,5,10,5,80,0,0,,,A professional degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,Greece,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping),Other","Blogs,Friends network,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,40,30,20,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Most of the time,,Most of the time,,,,,Most of the time,Often,Sometimes,,,,30,25,5,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,Sometimes,,,Rarely,,,,,,,Sometimes,,,,Often,Most of the time,,51-75% of projects,More external than internal,Standalone Team,,Very few samples of labeled and trusty examples.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,30000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,33,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,6 to 10 years,Researcher,Other,0,50,0,0,0,50,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Neural Networks - CNNs,A doctoral degree,Pharmaceutical,100 to 499 employees,Increased slightly,6-10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Other,"Image data,Other",Never,,Other,"Java,Oracle Data Mining/ Oracle R Enterprise,Python,SQL",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,Rarely,,,Often,,,,,,,,,,,Rarely,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,5,0,0,5,10,80,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,Very useful,,,,,Somewhat useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",60,30,5,5,0,0,,,,Other,100 to 499 employees,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,<1MB,,"C/C++,MATLAB/Octave,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,60,5,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input",Often,,,,Often,,,,,,Often,,,,,,,,,,,,51-75% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Never,105000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Textbook",,Very useful,Somewhat useful,,Very useful,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,0,0,0,0,50,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,100MB,"Decision Trees,Regression/Logistic Regression","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Perl,Python,R,RapidMiner (free version),SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,Rarely,Often,,Most of the time,,Sometimes,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Text Analytics",,,,,,Often,Most of the time,Often,,,,,,,,Often,,,,,,,,,,,,,Often,,,,,65,10,0,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,,,,Often,Often,Often,,,,,Sometimes,,,Sometimes,Most of the time,Most of the time,,76-99% of projects,Do not know,Other,stock/financial datasets; kaggle datasets; government open data sources,obtaining it and cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"130,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Bayesian Methods,Python,GitHub,"Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,80,0,15,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs",High school,Military/Security,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Video data,Relational data",Sometimes,1GB,"CNNs,Evolutionary Approaches,Neural Networks,Random Forests","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"Data Visualization,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,Random Forests,Segmentation,Simulation",,,,,,,Most of the time,,Rarely,Most of the time,,,,Sometimes,,,,,,Most of the time,,,Rarely,,,Sometimes,Most of the time,,,,,,,5,60,0,20,15,0,Enough to refine and innovate on the algorithm,Explaining data science to others,,,,,,Sometimes,,,,,,,,,,,,,,,,,100% of projects,More external than internal,Other,"COCO, VOC",Enough positive and negative samples.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,Git,,,USD,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,51,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Pharmaceutical,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Often,,,,Most of the time,Often,,,,,,,,Most of the time,,Often,,,Sometimes,,Often,,,,,,,Often,,,,20,20,30,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,Often,,Often,,,76-99% of projects,More internal than external,Standalone Team,public weather data,Keep me up to date to meet the expectation,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,BRL,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,61,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Bayesian Methods,R,"GitHub,Other","Blogs,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,Somewhat useful,,,Very useful,"FlowingData Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Statistician",Self-taught,70,0,10,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Other",Sometimes,10GB,"Bayesian Techniques,HMMs,Random Forests,Regression/Logistic Regression","C/C++,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,Often,,,Often,Most of the time,Often,Often,,Sometimes,,Sometimes,Most of the time,,Sometimes,,Often,,,Most of the time,,Sometimes,,,Often,Most of the time,Sometimes,,Often,,,,30,20,10,20,20,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",,,Sometimes,,Often,,,Often,Sometimes,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Other,Bioconductor packages,Raw data needs denoising.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Other,box/dropbox.,"Bitbucket,Git",Sometimes,"190,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,No,Yes,Business Analyst,,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Udacity,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Logistic Regression,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United Kingdom,57,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Other,Yes,Master's degree,A health science,1 to 2 years,Other,University courses,20,10,0,70,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",I prefer not to answer,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,,,,,,,,Very useful,,Very useful,,,,Somewhat useful,,1-2 years,Unnecessary,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation",Neural Networks - RNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Python,Monte Carlo Methods,R,Google Search,"College/University,Kaggle,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Work,25,0,25,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,United Kingdom,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,,"Bayesian Techniques,Logistic Regression",High school,Technology,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Researcher,Software Developer/Software Engineer",University courses,0,0,0,70,0,30,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Hidden Markov Models HMMs","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Stayed the same,,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,48,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,40,50,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Not important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,India,NA,Employed part-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,Amazon Web services,Link Analysis,R,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner",Work,50,0,40,0,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",Other (please specify; separate by semi-colon),A bachelor's degree,Other,20 to 99 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Always,10GB,"Regression/Logistic Regression,Other",IBM SPSS Statistics,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,70,10,10,10,0,0,Enough to tune the parameters properly,"Explaining data science to others,I prefer not to say,Limitations of tools",,,,,,Sometimes,Most of the time,,,,,,Sometimes,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Egypt,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,Ensemble Methods,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Fine arts or performing arts,6 to 10 years,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,37,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,Amazon Machine Learning,Deep learning,Python,Google Search,"Kaggle,Official documentation,Online courses,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,Somewhat useful,,Very useful,,Not Useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Programmer,Statistician",Work,20,70,0,0,10,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A professional degree,Government,500 to 999 employees,Decreased slightly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1GB,Neural Networks,"MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",Most of the time,Often,,Often,Most of the time,,,,Most of the time,,,,,,,,,,,,Most of the time,,100% of projects,Do not know,Standalone Team,None,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,18000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Italy,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Natural Language Processing,Recommendation Engines","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Manufacturing,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Neural Networks,Regression/Logistic Regression","Java,Jupyter notebooks,NoSQL,Python,SQL,Other",,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,Often,,,,,,,Most of the time,,,"Logistic Regression,Neural Networks",,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,45,30,5,5,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,Sometimes,,,,,,,,Most of the time,,,,Sometimes,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,25000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,Jupyter notebooks,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,,Very useful,,Very useful,,,Very useful,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Scientist",Work,40,0,60,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Financial,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests","IBM SPSS Modeler,IBM Watson / Waton Analytics,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,Often,Sometimes,,,,Most of the time,Most of the time,,,,,,,Most of the time,Most of the time,,Sometimes,,Often,,Most of the time,Often,,,Most of the time,,,Most of the time,Most of the time,,,,40,30,5,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,Often,,Most of the time,Most of the time,Sometimes,,76-99% of projects,Approximately half internal and half external,IT Department,"Census, Demographic, Social Economic, Postal, Tiger/Geographic","Compliance, Info Security","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Sometimes,200000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,42,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Other",,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Somewhat useful,,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",Kaggle competitions,20,10,5,15,50,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,Rarely,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Text Analytics",,,,,,Sometimes,,Sometimes,Rarely,,,Rarely,,,,Sometimes,,,,,,,Sometimes,,,,,,Often,,,,,40,5,7,8,10,30,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Sometimes,,,Most of the time,Rarely,,,Most of the time,,,,,,,,Often,,Sometimes,,,,26-50% of projects,Entirely internal,IT Department,Census;US Government;NIH;Medicare/Medicaid,Lack of ownership of third party systems which leads to lack of access and dirty data through disconnected systems.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Other,Sometimes,175000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,35,20,30,10,5,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,10GB,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,Tableau,TensorFlow,TIBCO Spotfire",,,,,,,,,Rarely,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,Sometimes,Often,Sometimes,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Logistic Regression",,,,,,Sometimes,Often,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,40,20,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Sometimes,,Most of the time,Sometimes,,,Often,,Often,,Often,,,,Sometimes,Often,Often,Most of the time,,,100% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Most of the time,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,46,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",Somewhat useful,Very useful,Very useful,,,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,1 to 2 years,"Data Analyst,Other",Self-taught,25,25,0,50,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",,Mix of fields,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Ensemble Methods,HMMs","Java,MATLAB/Octave,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Naive Bayes,Natural Language Processing,Text Analytics",,Often,,,,Often,Often,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,70,10,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,,,,,,,,Often,Sometimes,,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Git,Most of the time,76800,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A master's degree,Internet-based,100 to 499 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,,"Bayesian Techniques,Regression/Logistic Regression","Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,Most of the time,,,,Sometimes,,Often,Often,,,"Bayesian Techniques,Time Series Analysis",,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,0,80,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,Most of the time,,,,,,,,,,,Sometimes,,Most of the time,,,,Less than 10% of projects,More internal than external,Standalone Team,Quarterly Census of Employment and Wages; Local Area Unemployment Statistics; Current Population Survey; American Community Survey; Local Area Personal Income,Filling in missing data points (hidden for privacy reasons) with informed estimates that are consistent within the dataset.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,50000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,32,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,TIBCO Spotfire,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,5,0,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning",Logistic Regression,A master's degree,Other,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10MB,"Regression/Logistic Regression,Other","R,SQL,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,Often,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,20,25,10,30,15,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,,Most of the time,,,,,,Often,,,,,,,100% of projects,Do not know,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,,,,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects",,Very useful,,,,,,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,Kaggle competitions,100,0,0,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition","Decision Trees - Gradient Boosted Machines,Gradient Boosting",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,43,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Python,Google Search,"Kaggle,Personal Projects,Tutoring/mentoring",,,,,,,Somewhat useful,,,,,Not Useful,,,,,Somewhat useful,,O'Reilly Data Newsletter,1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Computer Science,,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,India,27,"Not employed, but looking for work",,,,,,,,SQL,Other,SAS,Google Search,"Online courses,Other",,,,,,,,,,,Somewhat useful,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,,,,,Necessary,,,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Management information systems,Less than a year,Other,Self-taught,40,20,0,0,0,40,Survival Analysis,,No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,Belarus,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Official documentation,Online courses,Textbook",,,,Very useful,,,,,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,30,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Image data,Most of the time,,,"Java,Jupyter notebooks,NoSQL,Python,SQL,Other",,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Often,,,"Cross-Validation,Data Visualization,Random Forests,Text Analytics",,,,,,Often,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,30,20,10,15,25,0,Enough to tune the parameters properly,"Dirty data,Limitations of tools",,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Subversion",Most of the time,43000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,59,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,Very useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),40+,Online Courses and Certifications,Yes,Master's degree,Computer Science,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",9,90,0,0,1,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important,Not important +Female,United States,35,"Not employed, but looking for work",,,,,,,,Python,Deep learning,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Newsletters,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,,,,,Very useful,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,The Analytics Dispatch Newsletter",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",,Github Portfolio,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,I haven't started working yet",Self-taught,50,50,0,0,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,47,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Text Mining,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Somewhat useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,Necessary,,,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,Other,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Data Analyst,Data Scientist,Other",University courses,10,20,15,55,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,CRM/Marketing,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SAS Enterprise Miner,SQL,Tableau,TensorFlow",,,,,Sometimes,,,,Sometimes,Rarely,Rarely,,,,,,Most of the time,,,,,,Sometimes,,Most of the time,,Often,Often,,,Most of the time,Rarely,Sometimes,,,,,,Often,,,Most of the time,,,Rarely,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,SVMs,Text Analytics",Often,Sometimes,Sometimes,,Often,,Most of the time,Most of the time,Often,,,,,Often,,Most of the time,,,Sometimes,Sometimes,Rarely,,,Often,,Most of the time,,Sometimes,Often,,,,,30,20,5,15,20,10,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,Most of the time,,,Often,,,,Most of the time,,,,,Sometimes,,,Often,,,,Often,,26-50% of projects,More internal than external,Central Insights Team,crawled web pages,data latency; data formatting,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Git,Other",Most of the time,110000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,SAS JMP,Deep learning,Python,GitHub,"Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,Not Useful,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,"Coursera,DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...),Other",2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Data Miner,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,30,50,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Canada,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Statistician,Other",Self-taught,25,10,65,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A doctoral degree,Government,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,10MB,"Decision Trees,Random Forests","Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Recommender Systems,Simulation,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,Often,,,Sometimes,,,,5,40,5,15,35,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,Sometimes,,,Often,,Most of the time,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,10,10,30,10,30,10,Time Series,Logistic Regression,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,10TB,Bayesian Techniques,"Amazon Web services,SAP BusinessObjects Predictive Analytics",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Bayesian Techniques,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,5,10,30,20,5,Enough to run the code / standard library,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,100% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Other",University courses,30,10,40,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,Software Developer/Software Engineer,Self-taught,50,5,20,5,10,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Increased significantly,1-2 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,C/C++,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk,Other,Other,Other",,Most of the time,,Often,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,,Sometimes,,,Often,,Sometimes,,Most of the time,,,,,,,Sometimes,,,Sometimes,,,,,,,,50,15,25,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,Often,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,"world bank,alexa","constantly changing, very high volume","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,s3,"Bitbucket,Git",Sometimes,172000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,40,0,10,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Non-profit,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,Ensemble Methods,"Python,R,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,,,,,Rarely,,,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",Rarely,,,,,,Most of the time,Rarely,,,,,,,Rarely,Rarely,,,Most of the time,,Often,,,Most of the time,,,,,Often,,,,,30,10,10,20,30,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,Sometimes,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,,INR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Ukraine,23,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Trade book,YouTube Videos",,,,,,,,,,,,,,Very useful,,Somewhat useful,,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,A health science,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,20,10,0,0,70,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,Fewer than 10 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Often,,Sometimes,,,Often,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,,,,Often,,,,20,20,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,Most of the time,,,,,,Sometimes,Most of the time,Sometimes,Sometimes,Often,Most of the time,,51-75% of projects,More internal than external,Other,Centers for Medicare and Medicaid; US Census,"Not enough of it, small sample size, sparse data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,sftp server,Git,Rarely,"85,000",USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,Python,Neural Nets,Python,University/Non-profit research group websites,"Friends network,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,Somewhat useful,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,0,20,Computer Vision,Bayesian Techniques,A bachelor's degree,Academic,10 to 19 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Never,10PB,,"Amazon Web services,C/C++,Java,NoSQL,Python,R,Spark / MLlib,Unix shell / awk",,Rarely,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,Often,,,,,,,Often,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,0,0,50,50,0,0,Enough to tune the parameters properly,"Dirty data,I prefer not to say",,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Other,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,70,0,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Sometimes,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Segmentation,Simulation,Time Series Analysis",,,,,,,Most of the time,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,Often,,,Most of the time,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Kaggle,Online courses",,Not Useful,Somewhat useful,,,Very useful,Somewhat useful,,,,Very useful,,,,,,,,"No Free Hunch Blog,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,"Business Analyst,Data Analyst,Researcher",University courses,50,10,20,15,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,Often,,,,,,,,,,,,Most of the time,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",Most of the time,,,,,Often,Most of the time,Most of the time,Most of the time,,,Sometimes,,,Most of the time,Most of the time,,,,,,Often,Often,,,Most of the time,Most of the time,,,Most of the time,,,,40,5,5,50,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,,Most of the time,,Often,,Most of the time,,,Most of the time,Sometimes,Often,Most of the time,,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,95000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,I never declared a major,6 to 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",Often,,,,,Most of the time,Most of the time,Sometimes,,,,Sometimes,,Sometimes,,,,Sometimes,,Most of the time,Most of the time,,Sometimes,Most of the time,,,,Sometimes,,,,,,40,30,0,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,76-99% of projects,More internal than external,Other,None,Clean,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,Sometimes,45000,BRL,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,50,10,25,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,"Image data,Video data",Never,1TB,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,RNNs,Segmentation",,,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,,Sometimes,Often,,,,,,,,5,35,40,20,0,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,Often,Often,,Sometimes,,Sometimes,,,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,Imagenet,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,100008,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,15,30,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Internet-based,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Workstation + Cloud service",Relational data,Rarely,,Regression/Logistic Regression,"Amazon Web services,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,30,5,5,40,20,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,140000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Google Cloud Compute,Other,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Friends network,Personal Projects,Textbook",Very useful,,,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Engineer,Machine Learning Engineer",Self-taught,60,0,20,20,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data,Text data",Most of the time,100GB,"CNNs,Ensemble Methods,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,Segmentation",,,,Most of the time,,Most of the time,Most of the time,,,,,,,Sometimes,,,,,,Most of the time,,,,,,Often,,,,,,,,10,30,10,20,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,Sometimes,,,,,,Sometimes,,,Most of the time,,,,100% of projects,Approximately half internal and half external,Standalone Team,LUNA;ImageNet;,Very dirty data;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Never,400000,CNY,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Poland,31,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Tutoring/mentoring",,,Very useful,,,,Very useful,,Somewhat useful,,,,,,,,Somewhat useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer,Researcher",University courses,0,40,0,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",35,25,15,20,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Never,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Often,,,Often,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Often,Most of the time,,Most of the time,,,Often,,Sometimes,,Sometimes,,,Often,,Most of the time,,Most of the time,,,,,Sometimes,Most of the time,,,,,35,25,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Unavailability of/difficult access to data",,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,Often,,76-99% of projects,Approximately half internal and half external,IT Department,TripAdvisor,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,150000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,27,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Personal Projects",,,,,,Very useful,Very useful,,,,,Somewhat useful,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,40,0,30,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Very Important +Female,United Kingdom,46,Employed part-time,,,No,Yes,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Personal Projects,Textbook",,,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,,Doctoral degree,Computer Science,1 to 2 years,Other,University courses,35,10,0,50,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Female,United States,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Decreased significantly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,Somewhat useful,Very useful,,Very useful,,Very useful,,Somewhat useful,,,,,Very useful,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,6 to 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,10,20,55,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Minitab,Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Often,,,Often,,,,Most of the time,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Often,,,,15,15,25,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,Often,,Sometimes,,,Rarely,,,,Rarely,,,,Sometimes,Often,,,100% of projects,Entirely external,Standalone Team,Bloomberg; Quandl,finding unique data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,115000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Other,37,Employed full-time,,,No,Yes,DBA/Database Engineer,Poorly,Employed by government,Python,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,DBA/Database Engineer,Self-taught,90,0,0,10,0,0,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon),,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Female,India,52,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,SAS Enterprise Miner,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Newsletters,Personal Projects,Textbook",,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Statistician",University courses,10,10,50,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Never,100MB,"Evolutionary Approaches,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Microsoft Excel Data Mining,R",,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis",,Often,,,,,,,,,,,,Often,,Most of the time,,,,,Most of the time,,,,,,Most of the time,Often,,Often,,,,30,20,0,50,0,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Most of the time,Most of the time,Most of the time,,Often,,,Most of the time,Most of the time,,,,Often,,,51-75% of projects,Approximately half internal and half external,Standalone Team,"UCLA ,kdnuggets,open government data",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data",,,Sometimes,1500000,INR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,100,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Engineer,Software Developer/Software Engineer",Self-taught,70,10,0,20,0,0,Supervised Machine Learning (Tabular Data),"Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,Most of the time,10TB,"GANs,RNNs,SVMs","Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,"GANs,kNN and Other Clustering,RNNs",,,,,,,,,,,Often,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,0,0,0,0,0,100,Enough to code it from scratch and it will run blazingly fast and be super efficient,Inability to integrate findings into organization's decision-making process,,,,,,,,Often,,,,,,,,,,,,,,,None,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Always,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler",Work,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Random Forests,Python,"Google Search,Government website","College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,"Linear Digressions Podcast,Talking Machines Podcast",< 1 year,,,,,,,,,,,,,,,,,,No,Professional degree,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Reinforcement learning,Neural Networks - RNNs,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,31,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Data Analyst,Engineer,Statistician",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,Amazon Machine Learning,Regression,Python,Google Search,"Kaggle,Personal Projects,Podcasts",,,,,,,Very useful,,,,,Very useful,Somewhat useful,,,,,,"FlowingData Blog,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Never,,Bayesian Techniques,"C/C++,Java,Python,SQL",,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques",Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,Often,,Often,,Most of the time,,,,,,,,Sometimes,,100% of projects,Entirely internal,Business Department,,,,,,Git,Sometimes,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,Very useful,,Very useful,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,15,0,50,15,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Python,Spark / MLlib,Other",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Sometimes,Sometimes,,,,Most of the time,Most of the time,Often,,,,,,,Most of the time,Often,,,,,Sometimes,,Often,,,Most of the time,,,,Sometimes,,,,40,30,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,Often,Often,,Sometimes,,,Sometimes,,,,Sometimes,,Most of the time,,,26-50% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other",Company Developed Platform,,Bitbucket,Sometimes,150000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A",Not Useful,Very useful,Somewhat useful,,,,Not Useful,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Business Analyst,University courses,30,2,8,50,10,0,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",,1GB,"Ensemble Methods,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Prescriptive Modeling,Recommender Systems,Segmentation,Time Series Analysis",,,,,,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,Sometimes,,,,30,10,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,Often,,,Sometimes,,,Sometimes,,,,,,,,,,,,,Rarely,,51-75% of projects,Entirely internal,Standalone Team,,"No data dictionary, multiple conflicting sources",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"70,000",,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Belgium,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"College/University,Official documentation,Textbook,YouTube Videos",,,Very useful,,,,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,,University courses,67,0,3,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Rarely,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,Sometimes,Often,,,Often,Rarely,Most of the time,Often,,Most of the time,,Often,,Often,,Rarely,,Often,Often,,Often,,,Sometimes,,Often,,,,,,30,50,10,5,5,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database",,Often,,,Often,Often,,,Often,,,Often,,,,,Often,Often,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,,,,6,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Colombia,22,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,,,,,Very useful,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,,,,"Java,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Neural Networks,PCA and Dimensionality Reduction",,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,70,5,5,5,15,0,Enough to tune the parameters properly,"Dirty data,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Most of the time,,,,Sometimes,Most of the time,Sometimes,,51-75% of projects,Approximately half internal and half external,Standalone Team,No third party datasets,Size and the need of a conscious cleanning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,39000000,COP,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A bachelor's degree,Other,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,"N/A, I did not receive any formal education",Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",Sometimes,,Often,,,,,,Most of the time,,,,,,,Sometimes,,Rarely,Most of the time,Most of the time,,,,,,,,,Often,,,,,20,40,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",University courses,30,10,10,40,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,Very useful,,,,Very useful,Very useful,Very useful,,,Very useful,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,Data Stories Podcast",1-2 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",60,35,0,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Female,Mexico,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,Somewhat useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,30,10,20,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Greece,28,Employed part-time,,,Yes,,Data Analyst,Fine,Self-employed,Spark / MLlib,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Stack Overflow Q&A",,Very useful,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",15,50,20,10,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Image data,Don't know,10GB,"Bayesian Techniques,CNNs,Neural Networks,Random Forests","Amazon Machine Learning,Amazon Web services,Python,R,Tableau,TensorFlow",Rarely,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Rarely,Most of the time,,,,,,"Bayesian Techniques,CNNs,kNN and Other Clustering,Neural Networks",,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,70,5,5,10,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Limitations of tools,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,,,,,Often,,,,,,,,Sometimes,,100% of projects,Do not know,Standalone Team,,,,,,,Sometimes,0,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,SQL,Deep learning,Python,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,Somewhat useful,Not Useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,Somewhat useful,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Data Analyst,Researcher,Other",Self-taught,45,45,10,0,0,0,,Ensemble Methods,A doctoral degree,Government,20 to 99 employees,Increased slightly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,,Regression/Logistic Regression,"Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,Often,,,,,,,,Most of the time,,,,,,,,,Often,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,40,1,5,40,14,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,,Sometimes,Sometimes,Sometimes,,Most of the time,,Most of the time,Often,Sometimes,,76-99% of projects,More internal than external,Standalone Team,BLS; Census; State LMI; TIGER,Cleanliness,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,87500,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,"Government website,I collect my own data (e.g. web-scraping)",YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,More than 10 years,"Engineer,Programmer",Self-taught,80,20,0,0,0,0,Time Series,Logistic Regression,High school,Financial,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,,Most of the time,100GB,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Sometimes,,,,7,,,,,,,,,,,,,,,,,, +Female,United States,61,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Google Search,"Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,Very useful,,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,,Self-taught,45,0,45,5,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Other,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,Other,Laptop or Workstation and local IT supported servers,Other,Most of the time,10TB,SVMs,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Segmentation,SVMs,Other",,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,,Often,,Often,,,Often,,,10,65,10,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Other",,,,Sometimes,,,,,Often,,Often,,Rarely,,,,,,,,,Often,100% of projects,Entirely internal,Other,,Size / portability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,Fedexing 6 TB Hard Drives,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,105000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Canada,19,Employed part-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,Matlab,"Google Search,I collect my own data (e.g. web-scraping)","Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,,1-2 years,Necessary,Necessary,Nice to have,,Nice to have,Nice to have,Necessary,,,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,,,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,1-2 years,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,20,30,0,10,0,,,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Female,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Excel Data Mining,Monte Carlo Methods,Python,University/Non-profit research group websites,"Blogs,Conferences,Kaggle,Personal Projects",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,1 to 2 years,"Data Analyst,Other",Other,0,0,0,0,0,100,"Computer Vision,Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Video data,Text data",,1GB,"Decision Trees,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,"Association Rules,Segmentation,Text Analytics,Time Series Analysis",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Rarely,Sometimes,,,,80,5,0,5,10,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",,,,,,Most of the time,,,Most of the time,,,,,,Most of the time,Most of the time,Often,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Business Analyst,Other",University courses,20,10,10,60,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,,,,,,,,"Association Rules,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Simulation",,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,,,,Sometimes,Sometimes,,,,,,,40,20,5,5,30,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,,Often,,Often,,Often,Most of the time,,,,Sometimes,,Most of the time,Sometimes,,Less than 10% of projects,Do not know,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,,140000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by government,IBM Cognos,Neural Nets,Python,Government website,"Blogs,College/University,Online courses,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Predictive Modeler,Statistician",University courses,25,7,30,35,3,0,"Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Government,"1,000 to 4,999 employees",Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Neural Networks,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Minitab,Python,QlikView,R,Tableau",,,,,,,,,,,Rarely,Often,,,,,Sometimes,,,,,,,,,Sometimes,,,,,Sometimes,Often,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Sometimes,,,,20,20,15,30,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,,,,,,,Often,Often,,,,,,,,Often,Often,,76-99% of projects,More internal than external,Standalone Team,census,full,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,144000,PEN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,41,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Nice to have,,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Workstation + Cloud service,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Female,Romania,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,Very useful,,,,Somewhat useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,6 to 10 years,"Data Analyst,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,30,5,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,100MB,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,Often,,,,,"Data Visualization,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation",,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,10,20,10,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,12000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Egypt,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Self-employed,Python,Neural Nets,Python,Google Search,"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,"FastML Blog,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,PhD,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,SQL,Other,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",10,40,20,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",,Technology,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Often,Sometimes,Sometimes,,,Often,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,30,10,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,Often,Often,,,,Often,,,,Often,,,,,Often,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,115000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,Very useful,,,,,Very useful,Very useful,,,,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,Other,University courses,NA,40,0,50,10,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,India,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,20,30,30,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",,,Often,,,Most of the time,Often,Often,,,,,,Often,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Often,,,,Most of the time,Sometimes,,Most of the time,,,,30,40,20,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,Most of the time,,,Most of the time,Most of the time,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, but looking for work",,,,,,,,Microsoft Excel Data Mining,Deep learning,R,I collect my own data (e.g. web-scraping),Textbook,,,,,,,,,,,,,,,Very useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,20,10,10,0,10,Computer Vision,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Ukraine,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Textbook",Very useful,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,25,10,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs",A doctoral degree,Financial,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Always,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,Rarely,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,Rarely,,Rarely,,,,Sometimes,Rarely,,Sometimes,,,,,Sometimes,,,,,,30,50,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,,,,,,,Sometimes,,,,,Sometimes,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Sometimes,38400,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,Not Useful,Very useful,,,,Somewhat useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,30,20,10,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Most of the time,10GB,"Gradient Boosted Machines,RNNs","Java,Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",,,,Rarely,,,Most of the time,,,,,Most of the time,,,,Sometimes,,,Most of the time,Sometimes,Sometimes,,,,Sometimes,,,,Sometimes,,,,,40,25,20,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",Most of the time,,,Often,Sometimes,Often,,,Most of the time,,,,,Often,Most of the time,,Sometimes,Often,,,,,26-50% of projects,Approximately half internal and half external,Business Department,,scale,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,90000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Indonesia,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Statistician",University courses,30,10,30,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - GANs",A bachelor's degree,Other,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,Other","IBM SPSS Statistics,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Enterprise Miner",,,,,,,,,,,,Most of the time,,,,,,,,,Often,,Most of the time,,,Most of the time,,,,,,,Most of the time,,,Often,,,Often,,,,,,,,,,,,,"PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Time Series Analysis,Other",,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,Most of the time,Most of the time,,,20,40,0,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,I prefer not to say,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Most of the time,,,,,,Most of the time,,Often,,,Often,,Most of the time,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,IT Department,"IFLS, OECD data set, Bureau of Meteorology Indonesia",Funding and Appreciation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,3250000,IDR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,1 to 2 years,,Self-taught,90,5,0,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,Technology,Fewer than 10 employees,,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,,,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"10,000 or more employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Web services,IBM Cognos,Python,QlikView,R",,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,"Decision Trees,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,Often,Sometimes,,,,0,5,5,90,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Most of the time,Often,,,,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,2500000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Java,Julia,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Most of the time,,,,,,,Often,Rarely,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Rarely,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"GitHub,Google Search,Government website","Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,,Kaggle competitions,30,10,20,10,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,20 to 99 employees,Decreased slightly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Angoss,C/C++,DataRobot,R,SAS Base,SAS Enterprise Miner,Tableau",,,Sometimes,Rarely,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,Often,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Often,,,,Often,Most of the time,Most of the time,,,Most of the time,,Often,Often,Most of the time,,Most of the time,Often,,,,,,,Most of the time,,Most of the time,Often,Sometimes,,,,10,30,20,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,1600000,INR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,FastML Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Researcher,Statistician",Other,33,33,34,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,500 to 999 employees,Increased significantly,3-5 years,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Statistics,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,SAS Base,SQL,Stan,Tableau",,Rarely,,,,,,Rarely,Sometimes,,,Rarely,,,,,Sometimes,,,Rarely,Rarely,,,,,,Often,,,,Most of the time,,Sometimes,,,,,Rarely,,,,Often,Sometimes,,Rarely,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,,Sometimes,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Often,,Rarely,Sometimes,Rarely,Sometimes,,Often,,,Sometimes,,,,Often,,,,40,10,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools",,,Often,,Often,,,,Sometimes,,Sometimes,,Often,,,,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,140000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,France,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,Data Scientist,University courses,30,25,15,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,RapidMiner (free version),Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Python,Survival Analysis,Python,University/Non-profit research group websites,"Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FlowingData Blog",1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,edX","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Operations Research Practitioner",Work,25,5,45,25,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,United States,53,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,CRM/Marketing,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Segmentation,SVMs",,,Sometimes,,,Often,Often,,,,,,,,,Often,,,,,,,,,,Often,,Rarely,,,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,,Sometimes,Sometimes,,Often,,,Sometimes,Often,,,76-99% of projects,More external than internal,Standalone Team,Automotive marketing prospect;national consumer;automotive warranty schedules,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Mercurial",Sometimes,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,35,25,10,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,Often,Often,Often,Often,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,Sometimes,,Often,,,Rarely,,Sometimes,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,150000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Spain,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Other,"Arxiv,Blogs,College/University,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,,,,,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Most of the time,10GB,"CNNs,Neural Networks","Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Neural Networks,Segmentation",,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,20,15,60,0,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,Most of the time,,,,,,Most of the time,,Often,Sometimes,Often,Often,,,,,,Sometimes,,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,"Git,Mercurial",Never,42000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,France,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,R Bloggers Blog Aggregator,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Operations Research Practitioner,University courses,30,20,10,40,0,0,"Adversarial Learning,Natural Language Processing","Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Company internal community,Kaggle,Newsletters,Online courses,Personal Projects,Textbook",,,Very useful,,,,Not Useful,Not Useful,,,Very useful,Very useful,,,Very useful,,,,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,GPU accelerated Workstation,0 - 1 hour,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,65,0,0,5,0,"Natural Language Processing,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Female,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,0,0,80,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher",Self-taught,35,10,30,15,10,0,"Natural Language Processing,Reinforcement learning,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Academic,"1,000 to 4,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,HMMs,Neural Networks,Regression/Logistic Regression,RNNs","Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow,Other",,,,,,,,,,,,,,,Sometimes,,Often,,,,Rarely,,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,Most of the time,,,Most of the time,,,"Bayesian Techniques,CNNs,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Often,Often,,,,,,,,,Often,Often,,Often,Sometimes,Often,Often,Often,Often,,,,Often,,Often,Sometimes,Often,Often,,,,40,30,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,,,Often,,,,,,,,,,,,,Often,,,Most of the time,,76-99% of projects,More external than internal,Central Insights Team,Government Open Data; Kaggle; Clients' provided dataset,Data clean; Privacy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,"60,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,,,,,,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,Speech Recognition,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,<1MB,"Neural Networks,Random Forests","Amazon Machine Learning,Amazon Web services",Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Neural Networks",,,,,,,Often,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,Often,,,,,,Most of the time,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Bitbucket,Never,5500000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Netherlands,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,SQL,,"Conferences,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer",Self-taught,30,0,70,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Other,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,Rarely,,,,,,Most of the time,Most of the time,,,Most of the time,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation",Often,,,,Often,Most of the time,Most of the time,Often,Often,,,,,Often,,Often,,,,Sometimes,Sometimes,Often,Most of the time,Most of the time,,Often,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,Often,,,,,Most of the time,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer",University courses,0,10,10,80,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Don't know,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,Other","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,Python,R,TensorFlow,Other",,,,Rarely,,,,,,,,,,,Sometimes,,Often,,,,Rarely,,,,,,,Sometimes,,,Most of the time,,Often,,,,,,,,,,,,,Sometimes,,,Sometimes,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Time Series Analysis,Other",,Sometimes,Rarely,,Often,Often,Most of the time,Often,,,,,,,,Sometimes,,Sometimes,,Sometimes,Often,Rarely,Rarely,Often,,Rarely,,,,Sometimes,,,Sometimes,30,40,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Unavailability of/difficult access to data,Other",Sometimes,,,,,Often,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search",Other,,,,,,,,,,,,,,,,,,,KDnuggets Blog,< 1 year,,Necessary,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,Speech Recognition,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,,,,,,,,,,,,,,, +Female,United States,30,"Not employed, but looking for work",,,,,,,,SQL,Association Rules,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,Less than a year,"Data Analyst,Predictive Modeler,Researcher,Other",Other,40,55,0,5,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Female,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by non-profit or NGO,SAP BusinessObjects Predictive Analytics,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Conferences,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,"Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Researcher,Kaggle competitions,0,40,40,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,20 to 99 employees,Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Always,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests",,,,,,Sometimes,Most of the time,Rarely,Often,,,Sometimes,,,Often,Most of the time,,,,,,Sometimes,Most of the time,,,,,,,,,,,45,10,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Often,Most of the time,,,Most of the time,,,,,,,,,Often,,Often,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,82500,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Singapore,39,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,SQL,Rule Induction,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,Other,Work,0,0,80,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Government,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Sometimes,100MB,Decision Trees,"IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,SAS Enterprise Miner",,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,Rarely,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Segmentation",,Sometimes,,,,Often,Often,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,50,0,0,20,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,Often,,Often,,,,Often,,,Most of the time,,Sometimes,Most of the time,,Most of the time,,,26-50% of projects,More internal than external,Central Insights Team,,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,150000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Ukraine,19,"Not employed, but looking for work",,,,,,,,Python,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,YouTube Videos,Other",,Very useful,,,,,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,Very useful,"Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,,Necessary,,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,"DataCamp,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Researcher,Other",Self-taught,60,40,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Speech Recognition,Other (please specify; separate by semi-colon)","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,63,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Other,Genetic & Evolutionary Algorithms,R,GitHub,"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Engineer,Self-taught,60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Most of the time,1TB,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Data Visualization,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,,Most of the time,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,,,Most of the time,,,Most of the time,,,,10,25,20,25,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Need to coordinate with IT,Privacy issues",,,,,Often,,,,,,,,,,Sometimes,,Sometimes,,,,,,76-99% of projects,Entirely internal,Standalone Team,none,quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,"350,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,KDnuggets Blog",< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,0,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,16-20,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,53,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Text Mining,R,"Google Search,Government website,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Miner,Researcher,Other",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,,,Often,,,,,,,Often,,,,,,,Sometimes,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Often,Most of the time,Sometimes,,Often,,,,,,,,,,,,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Rarely,"115,000",USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation",,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,Kaggle competitions,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Retail,"1,000 to 4,999 employees",Stayed the same,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Relational data,Always,100GB,"Bayesian Techniques,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Logistic Regression,Recommender Systems",Sometimes,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,19,5,70,5,1,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Scaling data science solution up to full database",Most of the time,,,,,,,,Most of the time,,,,,,,,,Often,,,,,Less than 10% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,230000,USD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,Operations Research Practitioner,Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Never,100MB,SVMs,"C/C++,Mathematica,MATLAB/Octave,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,PCA and Dimensionality Reduction,SVMs",,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,Often,,,,,,10,30,0,30,30,0,Enough to refine and innovate on the algorithm,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,40000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Pakistan,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,No,Yes,Computer Scientist,Perfectly,Employed by college or university,Amazon Web services,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,Other,University courses,5,10,0,75,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,41,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,FastML Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,More than 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,45,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - GANs",High school,Telecommunications,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and private datacenters,Text data,Never,<1MB,Gradient Boosted Machines,"Amazon Web services,C/C++,MATLAB/Octave,R",,Rarely,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Logistic Regression,Natural Language Processing,Neural Networks,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,Often,,Sometimes,,,,,,,,Sometimes,,,,Less than 10% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Sometimes,3269000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Julia,Deep learning,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,30,20,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Amazon Web services,IBM Watson / Waton Analytics,Java,Microsoft Azure Machine Learning,NoSQL,R,SQL,Unix shell / awk,Other",,Often,,,,,,,,,,,Often,,Often,,,,,,,Rarely,,,,,Often,,,,,,Often,,,,,,,,,Often,,,,,,Often,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Sometimes,Often,,,,,,,Often,,Sometimes,,Sometimes,Often,,,,,,,,,Sometimes,Often,Sometimes,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",,,,,Often,Often,,,,Sometimes,,,,,,,Sometimes,,,,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,70000,USD,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Researcher",University courses,50,0,10,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data,Other",Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics,Time Series Analysis",,Rarely,Often,,,Often,Most of the time,,,,,,,Often,,Sometimes,,Sometimes,Most of the time,,Often,,,,,,,Most of the time,Most of the time,Often,,,,60,10,1,20,9,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Often,,,Most of the time,,,,Often,,,,,,Most of the time,,Most of the time,,Sometimes,,,,100% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,"Git,Subversion",Sometimes,120000,CAD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,Somewhat useful,Very useful,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Master's degree,A health science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Random Forests",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Social Network Analysis,Python,,"Arxiv,Blogs,Company internal community,Conferences,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,,,,Somewhat useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,University courses,20,10,50,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,GANs,Lift Analysis,Logistic Regression,Random Forests,Recommender Systems,Time Series Analysis",Often,,,,Rarely,Most of the time,,Sometimes,,,Sometimes,,,,Most of the time,Most of the time,,,,,,,Often,Sometimes,,,,,,Most of the time,,,,20,40,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,Often,,Often,Sometimes,,,Sometimes,,,,,,,Sometimes,Most of the time,,,Most of the time,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,190000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,25,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,,Nice to have,Nice to have,,Necessary,Nice to have,Nice to have,,,,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",2,80,10,1,7,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,47,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,60,20,10,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data",Sometimes,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,,Often,Often,Sometimes,,,,Often,,Sometimes,,Often,Often,Sometimes,Often,,Sometimes,,,,,Sometimes,,Sometimes,,,,50,25,15,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Most of the time,,,Most of the time,,Most of the time,Sometimes,Sometimes,,,,,,Often,Most of the time,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,I don't typically share data",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,10,10,15,50,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A professional degree,Military/Security,20 to 99 employees,Increased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hungary,26,Employed full-time,,,Yes,,Statistician,Fine,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,Very useful,Not Useful,Very useful,Not Useful,,,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",30,10,50,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Sometimes,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,,,,Sometimes,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,Often,,,,Often,Sometimes,Sometimes,,,Sometimes,,Often,,Sometimes,,Sometimes,Most of the time,,,,Often,,,,,,Most of the time,Sometimes,,,,60,20,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,Sometimes,,,,,,Sometimes,Sometimes,,Often,Most of the time,,Sometimes,Most of the time,Sometimes,,,26-50% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,2650000,HUF,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Online courses,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Professional degree,,Less than a year,Other,University courses,60,0,0,40,0,0,,,A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Ukraine,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,50,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,5,10,0,85,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,29,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Miner,Predictive Modeler",University courses,15,10,55,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Telecommunications,"5,000 to 9,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,Self-taught,30,10,20,40,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,I prefer not to answer,,Don't know,"A friend, family member, or former colleague told me",,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1TB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,More than 10 years,"Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Proprietary Algorithms,Python,,"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer",Work,20,10,30,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Video data,Other",Always,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Java,KNIME (free version),Mathematica,Python,RapidMiner (free version),Unix shell / awk",,,,,,,,,,,,,,,Often,,,,Sometimes,Often,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Rarely,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Often,,,Most of the time,,,,5,5,5,15,20,50,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Most of the time,Sometimes,,,Sometimes,,,,,Sometimes,,,Sometimes,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,90000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Malaysia,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Technology,10 to 19 employees,Decreased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Python,QlikView,R,SQL,Tableau,TensorFlow,TIBCO Spotfire",,,,,,,,Most of the time,,,,,Often,,,,Sometimes,,Rarely,,Sometimes,,,,,,,,,,Often,Sometimes,Often,,,,,,,,,Often,,,Most of the time,Most of the time,Sometimes,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests",Sometimes,,Rarely,,,,Most of the time,Often,,,,,,,,Often,,Sometimes,,,,,Often,,,,,,,,,,,60,10,5,15,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,Often,,,Most of the time,Most of the time,Often,Often,Sometimes,Rarely,,Rarely,Most of the time,,,Sometimes,Often,,26-50% of projects,Approximately half internal and half external,Business Department,data.gov.my; kaggle; data.com,"Cleaning; massaging;filtering, mapping","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,99600,MYR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United Kingdom,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Predictive Modeler,Self-taught,30,0,20,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Time Series Analysis",,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,10,40,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Data Analyst,Data Scientist",University courses,25,15,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Sometimes,,,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,70,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection)",,A bachelor's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,100GB,,"C/C++,R,SAS Base,SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,,,,Often,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Simulation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,Retired,,,Yes,,Software Developer/Software Engineer,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,Other,"Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,55,10,30,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,QlikView,R,Spark / MLlib,SQL,Stan,TensorFlow",,,,,,,Sometimes,,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,Most of the time,,,,,,,,Sometimes,Sometimes,Rarely,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,Sometimes,Often,Often,Often,Often,Sometimes,,Often,,Often,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Often,Sometimes,Often,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,,,65,15,5,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues,Other",,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,,Often,100% of projects,More internal than external,Business Department,Nda,Time,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,75000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,36,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Bayesian Methods,R,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Cluster Analysis,R,University/Non-profit research group websites,"Blogs,YouTube Videos",,Somewhat useful,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,10,0,90,0,0,Outlier detection (e.g. Fraud detection),,A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Microsoft R Server (Formerly Revolution Analytics),Deep learning,,"Google Search,Government website,I collect my own data (e.g. web-scraping),Other","Online courses,Stack Overflow Q&A",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,,University courses,40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,Don't know,Some other way,Very important,Other,Traditional Workstation,"Video data,Other",Rarely,,"Decision Trees,Neural Networks,Random Forests,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Neural Networks,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,,,,,,Email,,,Most of the time,,,,8,,,,,,,,,,,,,,,,,, +A different identity,United States,100,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Time Series Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Newsletters,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,Not Useful,,,Very useful,Somewhat useful,Not Useful,,Not Useful,,,Somewhat useful,Partially Derivative Podcast,< 1 year,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,23,Employed part-time,,,Yes,,Computer Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,10,5,50,0,0,"Computer Vision,Recommendation Engines","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,"Data Analyst,Researcher",University courses,40,50,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,,University courses,30,30,0,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"5,000 to 9,999 employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Other,Sometimes,100GB,,"Amazon Web services,Python,R",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,PCA and Dimensionality Reduction,Simulation",Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,"Not employed, but looking for work",,,,,,,,Julia,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Podcasts,Textbook,Trade book",,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,Very useful,Very useful,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Programmer,Other",Self-taught,90,0,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Amazon Web services,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,Personal Projects",,,,,Very useful,,Very useful,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,1 to 2 years,"Data Analyst,Predictive Modeler,Other",University courses,20,10,20,50,0,0,"Natural Language Processing,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A master's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,KNIME (commercial version),KNIME (free version),Microsoft Excel Data Mining,NoSQL,Python,R,RapidMiner (commercial version),RapidMiner (free version),Tableau",,,,Often,,,,,,,,,,,,,,Most of the time,Most of the time,,,,Often,,,,Sometimes,,,,Most of the time,,Most of the time,Often,Often,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Natural Language Processing,Prescriptive Modeling,SVMs,Text Analytics,Time Series Analysis",Often,,Often,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,Most of the time,Often,Most of the time,,,,20,30,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,Sometimes,,,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Mercurial",Never,67000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Conferences,Official documentation,Online courses,Podcasts",,Very useful,,,Very useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,20,0,30,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Never,10GB,"Bayesian Techniques,Decision Trees,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,R,TensorFlow",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,Often,,,Often,,Sometimes,,,Sometimes,,Sometimes,Rarely,Often,,,,10,25,5,20,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Unavailability of/difficult access to data,Other",Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,Often,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,160000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",70,10,0,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Cluster Analysis,SQL,Other,"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,10,40,0,0,10,Time Series,"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased slightly,6-10 years,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Relational data,Other",Rarely,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,Rarely,Sometimes,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Time Series Analysis",,,Sometimes,,,,Often,Often,,,,,,,,,,Often,,,,,,,,,,,,Rarely,,,,20,50,10,10,10,0,Enough to run the code / standard library,"Limitations of tools,Need to coordinate with IT",,,,,,,,,,,,,Sometimes,,Often,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,Azure marketplace,Need cleaning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,125000,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,FlowingData Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,High school,Financial,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1TB,Bayesian Techniques,"Amazon Machine Learning,Python",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,30,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Scaling data science solution up to full database",Often,,,,,,,,,,,,,,,,,Often,,,,,Less than 10% of projects,More internal than external,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),,,"Bitbucket,Git",,130000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,,,,,"College/University,Conferences,Kaggle,Textbook",,,Very useful,,Somewhat useful,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Predictive Modeler",Work,40,0,20,40,0,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Most of the time,10GB,CNNs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,Most of the time,,Often,,,,40,10,15,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues",,,,,Rarely,,,Rarely,,,,,,,,,Rarely,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by government,Python,Deep learning,R,GitHub,"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",15,25,10,35,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Rarely,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools",Often,,,,,,,,,,,,Sometimes,,,,,,,,,,Less than 10% of projects,More internal than external,,,missing data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform",,,Rarely,4200000,XOF,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,R,,,,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Company internal community,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Somewhat useful,,Very useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Business Analyst,Work,20,30,40,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A master's degree,Retail,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,Python,R,Tableau,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Often,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Often,,,Most of the time,,,,"Association Rules,Data Visualization,Natural Language Processing,Neural Networks,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,,,10,40,30,20,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,Often,,Sometimes,,,,,Often,,Most of the time,,,10-25% of projects,Entirely internal,Standalone Team,None,Accuracy in capturing the data,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"1,600,000",,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed full-time,,,Yes,,Other,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,66,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","College/University,Company internal community,Friends network,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher",Work,20,20,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A master's degree,Government,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Image data,Sometimes,10GB,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression,RNNs","Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Simulation,Time Series Analysis",,,Often,,,,Most of the time,,,,,,Sometimes,Often,,Often,,,,Often,,,,,Often,,Most of the time,,,Most of the time,,,,50,15,0,15,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,,Sometimes,,Often,,,Often,,Often,Most of the time,,Most of the time,Most of the time,,,Often,Most of the time,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,50000,USD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,Julia,Text Mining,Julia,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher",University courses,25,10,35,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Financial,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Sometimes,100TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","C/C++,Julia,Jupyter notebooks,Perl,Python",,,,Most of the time,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Random Forests,Time Series Analysis",Sometimes,,Often,,,Most of the time,Sometimes,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,20,70,5,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",Rarely,,,,Often,,,,,,,Often,Often,,,,,,,,Often,,Less than 10% of projects,Entirely internal,Business Department,market data from stock and futures exchanges,normalizing it; join it with other datasets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,networked filesystem,,Most of the time,1000000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Poland,21,Employed part-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"College/University,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,,Self-taught,75,0,5,20,0,0,"Natural Language Processing,Reinforcement learning","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,500 to 999 employees,,Don't know,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,,100MB,"Neural Networks,RNNs","C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Neural Networks,RNNs,Text Analytics",,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,Often,,,,,5,55,40,0,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +A different identity,India,28,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Machine Learning Engineer,University courses,75,15,0,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,29,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,30,40,0,30,0,"Natural Language Processing,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,Financial,20 to 99 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Image data,Text data",Rarely,100MB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests",,,Often,,,Often,Most of the time,,,,,,,,,Often,,Sometimes,Sometimes,,,,Most of the time,,,,,,,,,,,50,10,0,10,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",50,15,25,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,Rarely,Sometimes,Often,,,,,,Often,,,,,,,,,,Rarely,Rarely,,,Often,,Often,,,,,Sometimes,Sometimes,,Sometimes,Most of the time,,,Sometimes,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",Rarely,,,,,Sometimes,Most of the time,Sometimes,,,,,,,Often,Often,,Sometimes,Sometimes,Rarely,,,Often,,,,,,Sometimes,,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,Sometimes,,Sometimes,Sometimes,Sometimes,,,,Often,Often,Sometimes,Most of the time,,,,Sometimes,,76-99% of projects,More internal than external,Standalone Team,Census,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Bitbucket,Sometimes,80000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,64,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by college or university,Amazon Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Personal Projects,Other",,Very useful,,,,,Very useful,Very useful,,,,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,60,0,0,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,Very important,Other,Laptop or Workstation and private datacenters,Other,Always,10GB,"Bayesian Techniques,Neural Networks","IBM SPSS Statistics,Java,Microsoft R Server (Formerly Revolution Analytics),R,Tableau",,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Markov Logic Networks,Neural Networks,Segmentation,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,Often,,,Often,,,Most of the time,,,,,,Often,,,,Most of the time,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Sometimes,,,,,,,,Often,,,,,,Often,,,,,,,,76-99% of projects,More external than internal,Standalone Team,"Stats Can, Industry Stats",Translation because it is handled by users,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Brazil,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",University courses,90,0,0,10,0,0,,Logistic Regression,A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Never,10GB,Regression/Logistic Regression,"Amazon Web services,C/C++,Google Cloud Compute,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,NoSQL,Perl,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,49,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Textbook",,,,,Very useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Other,University courses,30,0,10,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Relational data,Other",Most of the time,1MB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Minitab,R",,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Decision Trees,Evolutionary Approaches,Logistic Regression,Neural Networks,Segmentation,Simulation,Time Series Analysis",,,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,Sometimes,,,,,,Often,Often,,,Often,,,,65,20,10,0,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",Often,,,,Often,,,,,Most of the time,,,,,,,,,,,Often,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Canada,16,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,United States,46,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,Other,Self-taught,50,15,20,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,500 to 999 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,Sometimes,,,"A/B Testing,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,,Rarely,,,,,Rarely,,Sometimes,,Rarely,Often,,Rarely,,,,,,,,Often,Sometimes,,,,50,10,5,10,25,0,Enough to explain the algorithm to someone non-technical,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,115000,USD,Other,8,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,54,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,Tableau,Decision Trees,R,I collect my own data (e.g. web-scraping),"Conferences,Kaggle,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,,,,Somewhat useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A professional degree,Other,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,<1MB,Bayesian Techniques,"IBM Cognos,Jupyter notebooks,Python,R,SQL,Other",,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,Sometimes,,,"Naive Bayes,Time Series Analysis",,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,50,25,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Limitations of tools,Privacy issues",Often,Often,,,Often,,,,,,,,Most of the time,,,,Often,,,,,,51-75% of projects,Do not know,Business Department,Government PI data,governance and privacy,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,Pentaho portal; Excel spreasheets,Other,Never,"85,000",CAD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Very useful,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Miner,Software Developer/Software Engineer,Statistician",Work,15,20,50,0,15,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Mix of fields,"10,000 or more employees",Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Relational data,Other",Rarely,1GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Often,,,,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,Sometimes,,,Most of the time,Often,Often,,,,,Most of the time,,Sometimes,,,,,Often,,Often,,,Often,,,,Often,,,,50,20,0,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database",Often,,,Most of the time,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Sometimes,75000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Data Scientist",University courses,40,10,10,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed part-time,,,No,Yes,Other,Fine,Employed by college or university,Python,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses,Textbook",,,Somewhat useful,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Doctoral degree,,Less than a year,"Business Analyst,Programmer",University courses,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,20,0,0,0,30,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Recommendation Engines,Bayesian Techniques,A bachelor's degree,Mix of fields,100 to 499 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,10MB,Bayesian Techniques,"Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Recommender Systems,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,70,10,0,10,10,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,,"FastML Blog,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,60,5,0,0,35,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,United States,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,,,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Python,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Operations Research Practitioner,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,58,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,Reinforcement learning,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Female,Colombia,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Data Analyst,Engineer,Researcher",University courses,10,10,20,50,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,50,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,Nice to have,Nice to have,,,Nice to have,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Data Analyst,,50,0,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,5,35,20,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,YouTube Videos,Other",,,,,Very useful,,Somewhat useful,,,,Very useful,,,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",Kaggle competitions,50,0,0,0,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Other,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,,,,Most of the time,Most of the time,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Rarely,,,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Most of the time,,,,,,,Often,,Sometimes,,Most of the time,,,,,,Often,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT",,,,,Most of the time,Sometimes,,,Most of the time,,Often,,,,Sometimes,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Git,Rarely,250000,ILS,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites",College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,University courses,80,10,0,10,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased slightly,6-10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,Bayesian Techniques,"Java,Oracle Data Mining/ Oracle R Enterprise,Python,SQL,Tableau",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,80,10,10,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Privacy issues,Scaling data science solution up to full database",Sometimes,,,,,,,,Rarely,,,,,,,,Rarely,Rarely,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Never,75000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,,Very useful,,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Other",University courses,0,5,25,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,<1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Python,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Rarely,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Text Analytics",Often,,Sometimes,,,Most of the time,Often,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Most of the time,,Sometimes,Often,Sometimes,Sometimes,Often,Sometimes,,,,Sometimes,Often,Often,,,,,40,10,10,10,20,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,Often,,Often,Sometimes,,,Often,,,,,Often,Often,,,,,,Often,,100% of projects,More internal than external,Business Department,,Lack of data integrity efforts over which I have no control. Politics of convincing people to collect the right data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Box,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,65000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,39,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,Nice to have,,,Nice to have,,,,,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Professional degree,,1 to 2 years,Engineer,University courses,30,0,20,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,NA,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,Python,University/Non-profit research group websites,Company internal community,,,,Very useful,,,,,,,,,,,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Partially Derivative Podcast",5-10 years,,,,,,,,,,,,,,,,,,Yes,Professional degree,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Machine Translation,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,Self-taught,15,5,70,10,0,0,"Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Don't know,<1MB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,MATLAB/Octave,R",,,,Often,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Time Series Analysis",,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,20,50,10,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,Most of the time,Sometimes,,,Sometimes,,,,,,,,,,,,,,None,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,300000,RUB,Has decreased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Russia,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,C/C++/C#,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,Very useful,,Somewhat useful,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Work,0,0,100,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,100 to 499 employees,Decreased significantly,More than 10 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Image data,Always,100MB,Neural Networks,"C/C++,Java",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Decision Trees,Neural Networks,RNNs,Segmentation,SVMs",,,,Rarely,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,Rarely,Sometimes,,Rarely,,,,,,20,20,50,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,,,,Often,Often,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,only internal datasets,preparation of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Most of the time,"23,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Other,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Personal Projects,Tutoring/mentoring",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,Somewhat useful,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,Engineer,University courses,60,0,20,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Neural Networks,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,MATLAB/Octave,Python,RapidMiner (free version)",Rarely,Often,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,,,,,,Sometimes,,,,,,,Sometimes,,,,,Often,Sometimes,Often,,,,Sometimes,,,Often,Often,,,,,80,30,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,Often,,,,,,Sometimes,,,10-25% of projects,Approximately half internal and half external,IT Department,,,,,,,,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,51,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"Blogs,Company internal community,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Tutoring/mentoring",,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer",Self-taught,50,20,30,0,0,0,Reinforcement learning,,High school,Retail,"5,000 to 9,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1TB,,"IBM Cognos,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,QlikView,R,SQL",,,,,,,,,,Often,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Often,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,Often,,,Often,,,,40,10,5,5,40,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Decision Trees,R,Google Search,"College/University,Textbook,YouTube Videos",,,Not Useful,,,,,,,,,,,,Somewhat useful,,,Very useful,Linear Digressions Podcast,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,,,,,,,,Basic laptop (Macbook),,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,I haven't started working yet",University courses,30,50,0,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Spain,48,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Podcasts,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Often,Most of the time,,,,,Sometimes,,Often,,Most of the time,,,,,Often,,Sometimes,,,Often,,Often,,Often,,,,40,15,15,20,10,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,Often,,,Often,,,,,,,,,,,,Sometimes,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,80000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,Python,Government website,"Arxiv,Company internal community,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Somewhat useful,,,Very useful,,,,,,,,Very useful,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Military/Security,10 to 19 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Other,GPU accelerated Workstation,"Video data,Other",Rarely,10GB,"Bayesian Techniques,CNNs,Regression/Logistic Regression","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Naive Bayes,Neural Networks,SVMs",,,,Sometimes,,Most of the time,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,,,,,,30,30,0,0,40,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,Often,,100% of projects,More external than internal,Other,"None, though we sometimes make them.",,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,75000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,India,32,"Not employed, but looking for work",,,,,,,,SAS Base,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,1-2 years,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,edX,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,60,0,10,20,0,Time Series,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,36,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Text Mining,Python,Google Search,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,,I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Brazil,28,Employed part-time,,,No,Yes,Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,,O'Reilly Data Newsletter,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity,Other","GPU accelerated Workstation,Workstation + Cloud service",0 - 1 hour,Github Portfolio,No,Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important +Male,United States,19,Employed part-time,,,No,Yes,Other,Fine,Employed by college or university,R,Cluster Analysis,Java,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,I don't write code to analyze data,I haven't started working yet,University courses,10,10,0,80,0,0,,,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,23,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Personal Projects,Textbook",,Somewhat useful,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,Somewhat useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Machine Learning Engineer,Programmer",Self-taught,50,0,50,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems",Sometimes,,,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,,,25,15,15,15,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,Often,Sometimes,,100% of projects,Approximately half internal and half external,Standalone Team,"Geospatial data, financial data",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Git,Rarely,300000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,0,10,30,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Non-profit,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",Image data,,<1MB,"Ensemble Methods,Random Forests","MATLAB/Octave,Python,R,SAS Base,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Often,,,Most of the time,Most of the time,,Often,,,,,,,,,,,,Most of the time,,Often,,,,,Often,,,,,,40,40,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,Often,,Most of the time,,,,Sometimes,,,100% of projects,Entirely internal,Standalone Team,notMNIST; TCIA; ADNI,.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Web services,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Operations Research Practitioner",University courses,30,0,40,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,20 to 99 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,Rarely,,,,Sometimes,,Often,,,,,,,,Sometimes,,,,,,Most of the time,Sometimes,,,Most of the time,,,,Sometimes,,,,10,10,20,10,50,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Subversion",Rarely,162000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Cloudera,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Personal Projects,Textbook",,,Very useful,,,Somewhat useful,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Analyst,Statistician,Other",Self-taught,10,0,0,90,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Most of the time,,,,Sometimes,,,,,Most of the time,,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,70,5,1,14,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,,,,Sometimes,,,,,,,,Often,,Often,,,,76-99% of projects,Entirely internal,Other,,Privacy,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Bitbucket,Rarely,140000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,Other,Self-taught,90,10,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Neural Nets,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,20,0,0,10,Recommendation Engines,Logistic Regression,A bachelor's degree,Financial,"5,000 to 9,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10GB,"Decision Trees,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Often,,,Sometimes,,,,,,,,,,"Decision Trees,Recommender Systems,Text Analytics",,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,45,15,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,Often,,,,,Most of the time,,,,,Often,,,76-99% of projects,Entirely internal,Central Insights Team,,Lack of standardization,Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,375000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,More than 10 years,"Data Scientist,Other",University courses,50,20,20,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Pharmaceutical,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,"Decision Trees,Neural Networks,Random Forests","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,RapidMiner (free version),SQL,Tableau,Other,Other",,Often,,,Often,,,,Often,,,,,Often,,,Rarely,,,,,,,,,,,,,,Rarely,,Often,,Rarely,,,,,,,Most of the time,,,Most of the time,,,,Often,Often,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Time Series Analysis",,Sometimes,,,,Sometimes,Often,Often,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,Often,,,,,,,Sometimes,,,,60,10,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Often,,,Sometimes,Sometimes,Often,,,Often,,,,Often,,Often,Sometimes,Sometimes,,51-75% of projects,More internal than external,Central Insights Team,"Optum, Marketscan, Symphony, FlatIron, SEER",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,160000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Social Network Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,Very useful,Very useful,Very useful,,Not Useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,"Coursera,DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook",,,,,,,Somewhat useful,,,,,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,6 to 10 years,Researcher,Self-taught,90,0,0,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Academic,500 to 999 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Traditional Workstation,Other,Most of the time,1GB,Other,"C/C++,MATLAB/Octave,Perl,Python,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,10,0,20,10,0,60,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,Less than 10% of projects,Do not know,Standalone Team,Potential energy surfaces,Reliability,Other,Email,,Other,Sometimes,"35,000",CAD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Republic of China,22,"Not employed, but looking for work",,,,,,,,R,Regression,R,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,1-2 years,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,70,0,30,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,Canada,52,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,,,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important +Male,Brazil,34,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",Microsoft Azure Machine Learning,Time Series Analysis,R,,"Blogs,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,YouTube Videos",,Very useful,,,,,,Very useful,Very useful,Very useful,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A doctoral degree,Non-profit,100 to 499 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,Decision Trees,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Minitab,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Often,Often,Sometimes,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Segmentation,Simulation,Text Analytics,Time Series Analysis",Most of the time,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,Sometimes,Most of the time,,,,30,10,10,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Often,,,Often,Often,Often,Often,,Often,,Often,Often,,,,Often,Often,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,,Rarely,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Tableau,Deep learning,SQL,University/Non-profit research group websites,"Blogs,College/University,Conferences,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,Other,University courses,10,20,0,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Academic,I don't know,Increased significantly,Don't know,Some other way,Somewhat important,Other,Basic laptop (Macbook),Other,,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation",Most of the time,Sometimes,Sometimes,,,Often,Often,Sometimes,Sometimes,,,,,Often,,Most of the time,,Sometimes,,,Often,,Sometimes,,,,Sometimes,,,,,,,30,10,0,5,55,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Sometimes,,,,,,,,,,,Rarely,,Often,,,,10-25% of projects,More internal than external,Other,Data from PSLC Datashop,Feature generation,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,,Most of the time,"30,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Machine Learning Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,30,30,0,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,34,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,Natural Language Processing,"Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Sometimes,10GB,"CNNs,Neural Networks,RNNs,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R",,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",Rarely,,,Often,,Rarely,,,,,,,,,,,,,Most of the time,Sometimes,Rarely,,,,,,,Sometimes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Textbook,Tutoring/mentoring",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Very Important +Male,Republic of China,26,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Python,Monte Carlo Methods,SQL,University/Non-profit research group websites,"Arxiv,Conferences,Kaggle,YouTube Videos",Very useful,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,20,30,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Image data,Video data,Text data",Sometimes,10GB,"CNNs,Gradient Boosted Machines,Neural Networks,RNNs","Amazon Machine Learning,C/C++,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",Sometimes,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,Data Visualization,GANs,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,SVMs",,,,Most of the time,,Often,Sometimes,,,,Sometimes,,,,,Often,,,Often,Most of the time,,,,,Often,,,Sometimes,,,,,,50,40,0,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,10-25% of projects,Entirely external,IT Department,ImageNet;PASCAL VOC;MicroSoft COCO,data pre-processing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,"30,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,University courses,20,20,10,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,Laptop or Workstation and local IT supported servers,Other,Sometimes,1GB,"Regression/Logistic Regression,Other","C/C++,Jupyter notebooks,MATLAB/Octave,Orange,Python,R,RapidMiner (free version)",,,,Most of the time,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,Rarely,,Often,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,PCA and Dimensionality Reduction,Time Series Analysis,Other",,,,,,Rarely,,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,Most of the time,Often,,,5,15,30,40,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,Most of the time,,,,Often,Sometimes,,100% of projects,More internal than external,IT Department,,Labelling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Git,Never,62000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,,Very useful,Somewhat useful,,,,Very useful,Very useful,,< 1 year,Nice to have,Unnecessary,,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,India,23,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,Very useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Adversarial Learning,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Rarely,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,,Rarely,,Most of the time,,,,,,,,,,Most of the time,,,,Often,Often,,,,,,,,,,,,,45,25,15,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Often,,,,,,,Often,,,,,Most of the time,,Less than 10% of projects,Entirely external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,,"600,000",INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,University courses,30,0,20,30,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,I prefer not to answer,Computer Science,Less than a year,I haven't started working yet,Self-taught,40,30,10,10,10,0,"Adversarial Learning,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,22,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Other,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,The Analytics Dispatch Newsletter",< 1 year,,,,,Necessary,Necessary,,,,,,,,Coursera,Other,2 - 10 hours,Master's degree,No,Master's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,University courses,20,0,0,80,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,University courses,5,15,0,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ukraine,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Textbook",,,,,Not Useful,,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist",Self-taught,40,20,40,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Python,R,Tableau",,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Simulation,Time Series Analysis",Rarely,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Most of the time,,Sometimes,,Most of the time,,,,,,,,,,,Often,,,Often,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,160000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,43,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Mix of fields,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Sometimes,10MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation",,,,,,Sometimes,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,40,40,0,10,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others",,Often,,,Often,Most of the time,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,38,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Amazon Web services,Text Mining,Python,"Google Search,University/Non-profit research group websites","Blogs,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,,,,,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Other",Self-taught,60,20,20,0,0,0,"Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,10GB,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",Sometimes,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Simulation,SVMs,Time Series Analysis",,,,,,,Most of the time,,Sometimes,,,,,Often,,Sometimes,,,,Sometimes,,,,,Sometimes,,Sometimes,Most of the time,,Most of the time,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Other",,,,,,Sometimes,,,,,Sometimes,Often,,Sometimes,,,,,,,,Often,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,220000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,39,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,45,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",0,30,30,30,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Rarely,10TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Often,,Often,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,Often,Most of the time,,Most of the time,Most of the time,Often,Often,,Often,,,,,,Most of the time,,,,,60,10,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,Most of the time,Most of the time,Often,,Often,,,,Often,Sometimes,Often,Most of the time,,Often,,Most of the time,Often,Often,,51-75% of projects,Approximately half internal and half external,Central Insights Team,"weather, AC Nelson surveys, etc","Metadata are not available (data dictionary, data models, etc)","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Rarely,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,"Conferences,Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Statistician",University courses,25,15,50,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",,High school,,,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,,"Text data,Relational data",Most of the time,,,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,35,40,20,5,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,,,,,,,,,,,,,8,,,,,,,,,,,,,,,,,, +Male,Italy,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Miner,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",University courses,20,30,10,35,5,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,"Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,,,,,,Often,,,,,,Most of the time,,Most of the time,,,Often,Most of the time,,,Often,,,,Sometimes,,,Sometimes,,,,35,20,5,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Sometimes,Often,,Often,,,,Most of the time,Often,,,,Most of the time,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,30000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Web services,Deep learning,Python,GitHub,"College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",,Master's degree,Yes,Doctoral degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Time Series","Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,51,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Very useful,,,Somewhat useful,,"Data Elixir Newsletter,Emergent/Future Newsletter (Algorithmia),No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,35,45,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Sometimes,,,Sometimes,Most of the time,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,GANs,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,Text Analytics,Time Series Analysis",,Sometimes,,Sometimes,,Most of the time,Most of the time,,,,Sometimes,,,,,Sometimes,,,Often,Most of the time,,,,,Most of the time,,,,Often,Often,,,,30,25,25,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Often,,,,,,,Often,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,135000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Very useful,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,Very useful,,Not Useful,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",5-10 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,20,0,10,40,30,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,Turkey,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler",Self-taught,30,20,20,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","IBM SPSS Modeler,Jupyter notebooks,Microsoft Azure Machine Learning,Oracle Data Mining/ Oracle R Enterprise,Python,R",,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,Rarely,,,,,,Often,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",,Often,,,Often,Most of the time,,,Most of the time,,,Most of the time,,Sometimes,,,,Sometimes,,,Sometimes,,Often,Often,,Often,,,,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools",Often,Often,,,,Often,,,,Most of the time,Sometimes,Often,Often,,,,,,,,,,10-25% of projects,More internal than external,Business Department,Census data; industry specific data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,"30,000",,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Business Analyst,,Employed by non-profit or NGO,R,Proprietary Algorithms,C/C++/C#,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,"Data Analyst,Researcher",University courses,30,30,20,0,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Markov Logic Networks",,Academic,500 to 999 employees,Stayed the same,1-2 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,,"CNNs,Random Forests",NoSQL,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation",,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,40,0,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,ACS,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,,,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,19,Employed part-time,,,Yes,,Statistician,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,30,2,5,3,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,28,"Not employed, but looking for work",,,,,,,,QlikView,Support Vector Machines (SVM),R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,30,0,0,20,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Non-Kaggle online communities,Online courses",,,,,,,Very useful,,Very useful,,Very useful,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",40,20,10,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Rarely,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,,Rarely,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,Most of the time,Sometimes,,,,Most of the time,Sometimes,,,,,20,50,10,10,10,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform,Email",,,Sometimes,650000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,,,,Very useful,Very useful,,,Somewhat useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,15,0,35,45,0,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Most of the time,10TB,"Ensemble Methods,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow",,,,Most of the time,,,,,Sometimes,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Rarely,,Sometimes,,,,,,,,Rarely,Rarely,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,Rarely,Often,,Sometimes,,,,,,Often,,Often,,,,,Most of the time,,Often,,,,,Sometimes,Rarely,,,,,30,35,5,5,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Often,Most of the time,,,,,,Often,,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,SNAP dataset ; Tiny Images Dataset,"Since no real data exists in the TB scale which my research deals with, I mainly create Synthetic datasets for my research problems.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,24000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,Other",,,,,,,Very useful,,,,,,,,Very useful,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,,GPU accelerated Workstation,11 - 39 hours,Other,Yes,Bachelor's degree,Other,1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,Malaysia,24,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics",Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,Not Useful,,Not Useful,Very useful,Somewhat useful,,,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,University courses,50,10,30,10,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,Other,20 to 99 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,10MB,SVMs,"Amazon Web services,Jupyter notebooks,Python,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,"Natural Language Processing,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,Often,Sometimes,,,,60,5,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,,,,,,,Sometimes,,,,Often,,100% of projects,More external than internal,Standalone Team,Population;weather,Definition of variable; ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Email",,Git,Rarely,60000,MYR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,Less than a year,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A doctoral degree,Insurance,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,Natural Language Processing,Time Series Analysis",Sometimes,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,Often,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,10-25% of projects,Entirely internal,IT Department,,getting permission to accessing data.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,,Rarely,"128,000",,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Google Search,"Blogs,Personal Projects",,Very useful,,,,,,,,,,Very useful,,,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,,Github Portfolio,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Mexico,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Other",,,,,,,,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Researcher",Self-taught,60,40,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Don't know,10MB,Other,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,50,0,0,50,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Privacy issues",,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,100% of projects,Do not know,IT Department,Don't know,Slow to load data sets,Other,Commercial Data Platform,,Other,Always,"12,000",MXN,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,< 1 year,,,,,,,,,,,,,,Other,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,51,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects",,,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,More than 10 years,"Engineer,Other",Kaggle competitions,40,0,0,0,60,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs",A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Never,100MB,"CNNs,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,KNIME (free version),Microsoft Excel Data Mining,Python,RapidMiner (free version),SQL",Sometimes,Often,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,Rarely,,,,,,,Often,,,,,,,,,,"CNNs,Data Visualization,Logistic Regression,Neural Networks",,,,Sometimes,,,Sometimes,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,70,15,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,Sometimes,Sometimes,Sometimes,,,,Often,,,Often,Often,,Most of the time,,,10-25% of projects,More external than internal,Central Insights Team,None,secure storage and retrieval,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Sometimes,125000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Mexico,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Neural Nets,R,University/Non-profit research group websites,"Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,Not Useful,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",0,50,0,45,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,Regression/Logistic Regression,"Amazon Web services,Python,R,SQL,Other,Other",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,Often,Often,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Text Analytics",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,Sometimes,,,Rarely,,Often,,,Rarely,,,,,45,20,5,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Rarely,Often,Most of the time,,,,,,,,,,,,,,,Sometimes,Sometimes,,76-99% of projects,Entirely internal,Other,None,"Sometimes it is not clear how some data points are generated. Also, we use two different sources for data and sometimes it is unclear how and where we should visualize results coming from the different data sources. We may have found a solution by using Superset.",Column-oriented relational (e.g. KDB/MariaDB),Other,Superset; Slack,"Git,Other",Sometimes,371800,MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Conferences,Online courses",,,,,Somewhat useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Engineer,Researcher",University courses,20,20,10,50,0,0,Natural Language Processing,"Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Rarely,,"Ensemble Methods,Neural Networks,Other","Java,Python,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,40,40,10,0,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,,Often,,,,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Subversion,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,,Other (please specify; separate by semi-colon),A master's degree,Financial,10 to 19 employees,Stayed the same,1-2 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Fine arts or performing arts,1 to 2 years,"Business Analyst,Data Analyst",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,Very useful,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,55,15,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,Rarely,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Sometimes,,Often,,,,Sometimes,,Most of the time,,Often,,,,,Sometimes,,Often,,,,,,Often,,,,,40,20,15,5,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,Often,,,,Sometimes,,,,Often,,,,Most of the time,,Less than 10% of projects,More internal than external,Central Insights Team,,"Data quality, generally not good enough for use beyond simpler/linear models.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Bitbucket,Sometimes,600000,NOK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,48,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,"Company internal community,Friends network,Personal Projects,Stack Overflow Q&A,Textbook",,,,Very useful,,Very useful,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Data Analyst,Researcher,Other",Self-taught,60,15,25,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,"N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,100MB,"Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,NoSQL,Python,R,RapidMiner (free version),SAS JMP,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,Often,,,,Rarely,,,,Often,,Often,,Rarely,,,,,Rarely,Rarely,Often,,,Often,,,Sometimes,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,,Often,Sometimes,,,,,,Sometimes,,Often,,,Sometimes,,,,,,,,,Sometimes,Sometimes,Often,,,,30,10,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,,,Often,,,Often,,,Often,,,,,,Often,Most of the time,,51-75% of projects,Approximately half internal and half external,Other,Various government publicly available datasets; industry proprietary datasets; other research data,Access; data governance and stewardship; ,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Git,Other",Sometimes,90000,CAD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,C/C++,Social Network Analysis,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,1-2 years,,,,,,,,Necessary,,,,,,,Basic laptop (Macbook),,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Computer Vision,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,36,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Miner,Data Scientist,Operations Research Practitioner,Other",Kaggle competitions,60,0,20,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important +Male,United States,24,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer,Researcher",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,63,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician,Other",University courses,40,30,10,10,10,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Other","Image data,Video data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,TensorFlow",,,,,,,,,,,,,,,Often,,Often,,,,,,,Often,,,,,,,Often,,Often,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Often,Sometimes,Sometimes,,,,Sometimes,Sometimes,Often,,,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,,,,20,50,10,10,10,0,Enough to tune the parameters properly,"Explaining data science to others,Limitations of tools",,,,,,Often,,,,,,,Often,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,kaggle; ncbi; other;,appropriate configuration for ML,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Switzerland,37,Employed full-time,,,No,Yes,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Master's degree,,I don't write code to analyze data,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",0,30,0,0,70,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Female,Argentina,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Stack Overflow Q&A",Very useful,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,50,30,5,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,10 to 19 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Sometimes,10MB,"Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,NoSQL,Python",,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,SVMs",,,,,,Often,,,,,,,,,,,,,Most of the time,,Sometimes,,,Often,,,,Often,,,,,,30,40,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,,Sometimes,,,,,,,,,,,,,Often,,,,,Often,,26-50% of projects,Do not know,Business Department,,Find the right features to train a good ranker,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Git,Rarely,546000,ARS,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,France,60,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Time Series Analysis,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Other,30,40,0,0,0,30,Unsupervised Learning,Logistic Regression,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,University/Non-profit research group websites,"Conferences,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Statistician",University courses,25,25,0,50,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Insurance,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,Sometimes,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Time Series Analysis",Sometimes,,,,,,Most of the time,Often,,,,Often,,,Often,Often,,Rarely,,,,,,,,,,,,Often,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools",Often,,,,Most of the time,,,,Sometimes,Often,,,Most of the time,,,,,,,,,,100% of projects,More internal than external,Business Department,Census,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"180,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Personal Projects,Textbook,Other",,,,,,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,3 to 5 years,"DBA/Database Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,,,A bachelor's degree,Internet-based,10 to 19 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,100MB,Bayesian Techniques,"Amazon Web services,Java,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Natural Language Processing,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,Often,Often,,,,50,20,20,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Most of the time,Often,,Often,Most of the time,,Often,Sometimes,,,,,,Most of the time,,,,Often,,,,10-25% of projects,Entirely internal,IT Department,,Deduplication in a meaningful way. ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,45000,GBP,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,India,40,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Survival Analysis,Python,University/Non-profit research group websites,Friends network,,,,,,Somewhat useful,,,,,,,,,,,,,"FastML Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,,Kaggle competitions,25,0,25,25,25,0,"Computer Vision,Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Increased slightly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Other",Rarely,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,Python,R,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,Rarely,Sometimes,,Most of the time,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics",,,Often,Sometimes,,,Most of the time,Rarely,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,25,25,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,20,30,0,40,10,0,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Other,"5,000 to 9,999 employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,,"Decision Trees,Neural Networks,Random Forests","C/C++,Java,Python",,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Decision Trees,Neural Networks,SVMs,Text Analytics",Often,,,Often,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,Often,Most of the time,,,,,30,30,0,10,30,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Very useful,Somewhat useful,Very useful,,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,University courses,30,10,30,30,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Retail,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data",Most of the time,1GB,"CNNs,Regression/Logistic Regression,SVMs","Java,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,GANs,Logistic Regression",,,,Often,,Often,,,,,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,20,30,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",,,,,Often,Sometimes,,,,,,,,Often,Often,,,Often,,,,,10-25% of projects,More external than internal,Central Insights Team,"Imagenet, Cifar 10",Dirty data and uncontrollable explosion in the amount of data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,175000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,Blogs,,Very useful,,,,,,,,,,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,Recommendation Engines,Logistic Regression,A bachelor's degree,Telecommunications,"5,000 to 9,999 employees",Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Never,<1MB,SVMs,"Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,Less than 10% of projects,Entirely internal,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,21,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Online courses,YouTube Videos",,,Very useful,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Business Analyst,University courses,10,5,10,70,5,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important +Male,Other,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Cluster Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses",,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,"Data Analyst,Data Scientist,Engineer,Programmer",Self-taught,0,30,0,40,30,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,500 to 999 employees,Increased significantly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Other","Text data,Relational data",Sometimes,1GB,Other,"Hadoop/Hive/Pig,Mathematica,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Text Analytics",Often,,,,,Often,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,30,20,10,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,Often,,,,,,,,,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,40000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Canada,40,"Not employed, but looking for work",,,,,,,,Python,,Python,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,1-2 years,,,,,,,,,,,,,,,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Engineering (non-computer focused),Less than a year,"Engineer,Researcher",Work,30,10,30,30,0,0,Time Series,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,Python,GitHub,"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,FastML Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,20,10,10,50,10,0,Natural Language Processing,Ensemble Methods,A bachelor's degree,Internet-based,500 to 999 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,1GB,"Ensemble Methods,Random Forests","Amazon Machine Learning,Amazon Web services,Java,NoSQL,Python,SQL,TensorFlow",Often,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,Rarely,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,40,20,20,20,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,100000,USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Mexico,41,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Operations Research Practitioner,Researcher",Self-taught,70,20,0,10,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),Text data,,10MB,Regression/Logistic Regression,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,,,10,10,0,20,0,60,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,Most of the time,,10-25% of projects,Do not know,Business Department,Don't know,Data structure,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"100,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,FastML Blog,3-5 years,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,15,10,15,45,15,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,United States,56,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Machine Learning Engineer,Researcher",Work,0,10,60,30,0,0,Speech Recognition,"Bayesian Techniques,Hidden Markov Models HMMs","Some college/university study, no bachelor's degree",Government,I don't know,Increased slightly,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",Work,30,30,40,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",,,Bayesian Techniques,"Google Cloud Compute,Jupyter notebooks,Mathematica,Python,SQL,Tableau,TensorFlow",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,Rarely,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees",,,Sometimes,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,20,10,40,10,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,Computer Scientist,Fine,Self-employed,DataRobot,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Talking Machines Podcast",< 1 year,Necessary,Necessary,Necessary,,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Professional degree,,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Programmer",Self-taught,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,,< 1 year,,,Necessary,,Necessary,Nice to have,,,Nice to have,,,,,Other,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,"Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,19,"Not employed, but looking for work",,,,,,,,DataRobot,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,Siraj Raval YouTube Channel,3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Other,No,I did not complete any formal education past high school,,3 to 5 years,Programmer,University courses,75,25,0,0,0,0,Unsupervised Learning,Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Finland,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,"Coursera,Udacity,Other","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,,< 1 year,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important +Male,Greece,24,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,NoSQL,Deep learning,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,DBA/Database Engineer,Researcher",Work,0,80,20,0,0,0,Speech Recognition,"Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Financial,100 to 499 employees,Increased slightly,1-2 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Java,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,R,SQL",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,Often,Sometimes,Most of the time,,Sometimes,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Segmentation,Text Analytics",Rarely,Most of the time,,,,,Most of the time,Often,,,,,,Sometimes,,Sometimes,,,,Often,,,,,,Most of the time,,,Often,,,,,30,15,5,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT",Most of the time,Often,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,76-99% of projects,Entirely external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,25000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Personal Projects,Stack Overflow Q&A",,,Very useful,,Somewhat useful,,,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,Electrical Engineering,More than 10 years,Other,University courses,0,0,0,100,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Sometimes,1TB,"Bayesian Techniques,Ensemble Methods","C/C++,NoSQL,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,HMMs,Logistic Regression,Naive Bayes,Neural Networks,SVMs,Time Series Analysis",,Sometimes,Most of the time,,Sometimes,Most of the time,,,,,,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,Sometimes,,Most of the time,,,,30,30,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Scaling data science solution up to full database",,,,,Sometimes,,,,,,,Often,Often,,,,,Often,,,,,100% of projects,Approximately half internal and half external,Other,Sequence datasets,Understanding the patterns,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"50,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Microsoft R Server (Formerly Revolution Analytics),Deep learning,R,,"Blogs,Company internal community,Conferences,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,Somewhat useful,Very useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,More than 10 years,"Data Scientist,Predictive Modeler,Statistician",Work,20,10,40,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Insurance,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),NoSQL,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,Rarely,,,,,,Often,,,,,Most of the time,Sometimes,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,,,Rarely,Sometimes,,,,30,10,10,30,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,Most of the time,,,Often,,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Always,120000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,38,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Very useful,,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Talking Machines Podcast",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Reinforcement learning,"Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,24,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",10,50,20,20,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,,Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Impala,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,QlikView,SQL,TIBCO Spotfire,Unix shell / awk",,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,Most of the time,,,,Often,,,,Most of the time,Often,,,,,,,,,,Most of the time,,,,,Sometimes,Often,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",,,Often,,,,Often,,,,,,,,,Often,,,,,,,Often,,,,,,Most of the time,Most of the time,,,,50,10,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Sometimes,,,Often,,,,,Often,Often,,,,,,,Sometimes,,,,26-50% of projects,Entirely internal,Central Insights Team,financial transaction statements,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,800000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,University courses,0,0,0,100,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Academic,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",,1GB,"CNNs,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,SVMs,Other","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Rarely,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Other",,,,Often,,Sometimes,Most of the time,Rarely,,Sometimes,,,,Most of the time,,,,Rarely,,Most of the time,Often,,Rarely,,,,,Sometimes,,,Most of the time,,,10,20,20,15,35,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Often,,,,,,,Sometimes,,,,,,,,,Most of the time,,76-99% of projects,More external than internal,Standalone Team,mnist; notmnist; stl-10; UCSD Ped 1 & 2; UMN anomaly detection; Avenue anomaly detection,?,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"18,000",BRL,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Programmer,Statistician",University courses,40,5,30,10,15,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,Most of the time,Sometimes,,,,,,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,,,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,Most of the time,Sometimes,Most of the time,,,,30,15,15,15,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,100% of projects,More internal than external,Other,FRED; Moody's; ,Lack of source documentation ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,110000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,6 to 10 years,Data Scientist,University courses,0,20,40,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R,Spark / MLlib,SQL,Stan",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Often,,Often,,,Often,Most of the time,,Often,,,Often,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,15,15,15,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,Often,,Often,,,,Sometimes,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,33,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Company internal community,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,,,,,,,Very useful,,,Very useful,,,Somewhat useful,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,Less than a year,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",25,70,5,0,0,0,,,High school,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important +Male,Mexico,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,R,"Google Search,Government website",Other,,,,,,,,,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,5,90,0,0,5,0,Survival Analysis,Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Female,United States,52,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,QlikView,Neural Nets,Python,Other,"Non-Kaggle online communities,Online courses,Personal Projects,Other",,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Other,I don't write code to analyze data,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,70,10,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important +Male,Other,37,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects",,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,30,20,30,0,0,20,Time Series,Other (please specify; separate by semi-colon),A master's degree,Government,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Never,100MB,Neural Networks,"Cloudera,Python,SQL,Other",,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,Often,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,40,0,0,10,50,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Other",,,,,,,,,Often,,,,,,,Often,,,,,,Most of the time,100% of projects,More internal than external,IT Department,"Open data, basemaps","Time, knowledge","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Rarely,42500,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,0,0,20,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics",Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,,Very useful,Very useful,"DataTau News Aggregator,FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,30,20,20,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,20 to 99 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Microsoft Azure Machine Learning,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics",,Sometimes,,Often,Often,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Often,Often,,Often,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,40,10,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Unavailability of/difficult access to data",,Often,,,Often,,,,,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,-,-,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,1380000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Poland,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,30,20,20,10,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Internet-based,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Image data,Most of the time,1GB,"CNNs,Neural Networks,Random Forests,RNNs,SVMs","Java,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Sometimes,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",,,,Most of the time,,Most of the time,Sometimes,,,,,Rarely,,Rarely,,Most of the time,,Sometimes,Often,Most of the time,Sometimes,,Sometimes,,Most of the time,Often,,Sometimes,Often,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,Most of the time,,,Most of the time,,,,,,Rarely,Most of the time,,,,Often,Most of the time,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",Rarely,75600,PLN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Matlab,University/Non-profit research group websites,"Blogs,Kaggle,Online courses",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",15,75,10,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United Kingdom,NA,"Not employed, but looking for work",,,,,,,,R,I don't plan on learning a new ML/DS method,SAS,University/Non-profit research group websites,Textbook,,,,,,,,,,,,,,,Not Useful,,,,,3-5 years,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Work,90,0,10,0,0,0,,Logistic Regression,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Not important,Very Important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,48,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,,,,,O'Reilly Data Newsletter,1-2 years,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,I don't write code to analyze data,"Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer",Self-taught,70,10,0,20,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Mexico,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",20,30,10,30,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Financial,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,SAS Base,SQL,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation",,,,,,Most of the time,Often,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Limitations of tools,Organization is small and cannot afford a data science team",,Sometimes,,Sometimes,Often,,,,,,,,Often,,,Often,,,,,,,76-99% of projects,More internal than external,Central Insights Team,Bureau ,Dirty data and documentation missing,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"44,000",MXN,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Self-employed,Tableau,Cluster Analysis,Python,Google Search,"College/University,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,3 to 5 years,"Business Analyst,Engineer",Self-taught,20,20,20,10,0,30,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10MB,"Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,,,,,,,,,,Often,,,Often,,,Most of the time,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,,,,,,,40,20,10,10,20,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Often,,Often,,,,,,,,Often,Often,Often,,51-75% of projects,Entirely external,Other,"City of Portland data, Oregon State data","Differing formats, poor descriptions of what data refers to, context description as to how data produced, year to year changes in what is collected.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"Github, Slack","Git,Other",Sometimes,1000,USD,I am not currently employed,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Other,University/Non-profit research group websites,"College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,Logistic Regression,A master's degree,Other,100 to 499 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,,"Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,100,0,0,0,0,0,Enough to tune the parameters properly,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,None,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,120000,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Female,Other,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Other,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",Very useful,,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,Data Scientist,Researcher,Statistician",University courses,5,50,35,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Video data,Relational data",Most of the time,10GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,R,SAS JMP,SQL,Tableau,TensorFlow",,Most of the time,,Sometimes,,,,,,,,,,,,,Rarely,,,,,Rarely,,,Rarely,,,,,,Most of the time,,Sometimes,,,,,,,Often,,Most of the time,,,Rarely,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,,Rarely,Sometimes,,Most of the time,Often,Often,,,,,,Sometimes,,Sometimes,,,,Often,,,Often,,Often,,,Most of the time,Sometimes,Often,,,,30,30,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,Most of the time,,,,Often,,,,Most of the time,Sometimes,,,26-50% of projects,More internal than external,IT Department,Kaggle competition; CMS/medical; ,Obtaining ground truth data for training and validation/testing.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,85000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,23,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,The Analytics Dispatch Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,Researcher,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important +Male,Canada,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Other,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,I never declared a major,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,5,0,95,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,Often,,,,,Sometimes,,Often,,,,,Sometimes,,,Sometimes,,,,,Often,Sometimes,,,,50,15,0,25,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,Sometimes,,,,,,,,,,Sometimes,Often,Sometimes,Sometimes,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Never,"65,000",CAD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Africa,27,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by government,Java,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Statistician,Self-taught,50,20,0,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Mexico,30,Employed full-time,,,Yes,,Scientist/Researcher,,,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Not Useful,Not Useful,Not Useful,,Very useful,Very useful,Not Useful,Very useful,Somewhat useful,,,Very useful,"Data Elixir Newsletter,FastML Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Researcher",Self-taught,35,30,30,5,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,1-2 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,TensorFlow,Unix shell / awk",,Often,,,,,,Often,,,,,,,,,Sometimes,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,HMMs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics",,Sometimes,,Most of the time,,Most of the time,Most of the time,,Often,,,,Rarely,,,,,,Most of the time,Most of the time,Often,,,,,,Sometimes,Sometimes,Most of the time,,,,,40,40,5,10,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Often,,,Often,,,,,,,,Most of the time,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,40000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,1-2 years,Necessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,25,5,0,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Amazon Web services,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Academic,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1MB,Regression/Logistic Regression,"R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Rarely,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,25,10,0,15,50,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,Sometimes,,,,,Often,Often,,,,,,,Most of the time,Rarely,,,Often,,,76-99% of projects,More internal than external,Other,,Accessing our data on third party platforms. ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Sometimes,72500,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,SQL,Google Search,"Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,Very useful,Very useful,,,,Very useful,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Data Miner,Data Scientist,Software Developer/Software Engineer",University courses,20,0,70,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Sometimes,,,,Often,Often,,Often,Most of the time,,,,,Most of the time,,Most of the time,,,,,Often,,Most of the time,Most of the time,,Most of the time,,,,,,,,20,10,50,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Often,,,,,,,,,Often,,,,,,Most of the time,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,experian;datalogic,formatting;data integrity,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,185000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,25,20,5,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,10MB,"Decision Trees,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,R,"GitHub,Google Search,University/Non-profit research group websites","Non-Kaggle online communities,YouTube Videos",,,,,,,,,Very useful,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,,,Primary/elementary school,Pharmaceutical,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1GB,Regression/Logistic Regression,"SAS Base,SQL,TIBCO Spotfire,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,Sometimes,Sometimes,,,,"Data Visualization,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,25,30,5,40,0,0,Enough to refine and innovate on the algorithm,Scaling data science solution up to full database,,,,,,,,,,,,,,,,,,Sometimes,,,,,26-50% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,140000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,Company internal community,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Engineer",University courses,9,10,10,70,1,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Tableau,Unix shell / awk",,,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Often,,Rarely,,,,,,,,Rarely,,,,Often,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Sometimes,,,,,Sometimes,Sometimes,Rarely,Rarely,,,Sometimes,,Rarely,Sometimes,Sometimes,,,,,Rarely,,Sometimes,Rarely,,Rarely,,,,,,,,30,5,5,30,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Rarely,Sometimes,,Sometimes,Sometimes,,Rarely,,,,,,Often,,,Sometimes,,Rarely,Often,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,France,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Other,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,Other (Separate different answers with semicolon),3-5 years,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity",GPU accelerated Workstation,11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Very Important +Male,United States,16,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer",Self-taught,50,30,0,0,20,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data",Rarely,10GB,"Decision Trees,Neural Networks,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Other,,R,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Company internal community,Online courses,Stack Overflow Q&A",,,,Very useful,,,,,,,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Work,20,10,60,10,0,0,"Survival Analysis,Time Series",Logistic Regression,High school,Mix of fields,Fewer than 10 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Rarely,10MB,Regression/Logistic Regression,"Amazon Web services,IBM SPSS Statistics,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,Often,,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Rarely,,,Rarely,Often,,,,65,10,5,5,15,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Rarely,,,,,,,,,,,,,,Rarely,Often,,26-50% of projects,More internal than external,Other,Police crime and outcomes; Court and CPS; Probation services,Obtaining,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,27000,GBP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by college or university",TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,"Data Analyst,Engineer,Researcher",University courses,5,15,0,80,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,10MB,,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Simulation",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,40,5,5,30,20,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,78000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Nigeria,36,Employed full-time,,,Yes,,Business Analyst,Poorly,Self-employed,Jupyter notebooks,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,6 to 10 years,"Computer Scientist,Data Analyst,Engineer,Programmer,Researcher",Self-taught,85,15,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,Less than one year,Some other way,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft Azure Machine Learning,R",,,,,,,,,,,,,,,Often,,Often,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Random Forests",,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,,70,5,15,5,5,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Most of the time,Most of the time,Often,,,Often,,,,,,,Most of the time,,,Most of the time,,,,76-99% of projects,More external than internal,Standalone Team,National survey data,"format of data dirty data ","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Dropbox,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,0,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Somewhat useful,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer",University courses,20,30,40,0,10,0,"Natural Language Processing,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Financial,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,100MB,"Bayesian Techniques,Other","Python,R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,Sometimes,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,15,35,0,15,35,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Often,,,,Often,,,,,,,,Most of the time,,,,Most of the time,,,,Often,,100% of projects,Entirely internal,Other,N/A,To clean data and find relevant tools for text mining in portuguese language,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,"53,756",BRL,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,,"Business Analyst,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,Very useful,Somewhat useful,,Very useful,,"Linear Digressions Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,Other,University courses,50,0,0,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Collaborative Filtering,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",,Sometimes,,,Most of the time,,,Most of the time,Sometimes,,,Most of the time,,Sometimes,,Often,,,,,Sometimes,Most of the time,Most of the time,Most of the time,,Often,,,,Sometimes,,,,30,30,15,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,Often,,,,,,,,Most of the time,,,,,,Sometimes,,,Less than 10% of projects,More external than internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,90000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,20,50,0,20,10,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased significantly,1-2 years,Some other way,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Relational data,Most of the time,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Google Cloud Compute,Python,SQL,Tableau,TensorFlow",,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,Sometimes,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation",Sometimes,,,Sometimes,Sometimes,Most of the time,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Most of the time,,,Sometimes,Often,Sometimes,,Sometimes,Often,Sometimes,Sometimes,,,,,,,,30,30,30,10,50,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,Sometimes,Sometimes,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Other,R,GitHub,"Blogs,College/University,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",1,90,2,7,0,0,,,A bachelor's degree,Non-profit,I don't know,Stayed the same,Don't know,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,<1MB,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,Often,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,40,10,0,25,25,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Business Department,,,,,,,,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,10,0,0,90,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search","Arxiv,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Very useful,,Very useful,,Very useful,Very useful,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important +Male,Brazil,22,"Not employed, but looking for work",,,,,,,,NoSQL,Genetic & Evolutionary Algorithms,R,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Other,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Other,University courses,40,0,0,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Other (please specify; separate by semi-colon)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,Computer Scientist,Self-taught,60,10,20,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Programmer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,0,100,0,0,0,0,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst",Work,20,15,55,0,10,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",No education,Technology,100 to 499 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Relational data",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Jupyter notebooks,Python,R,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Often,Often,Most of the time,Most of the time,,,Most of the time,,,,Often,,,,,Often,,Most of the time,,,Often,,,Sometimes,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Often,,,,,,,,,,,,,,Sometimes,,,,10-25% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Rarely,740000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"College/University,Online courses,Textbook,Tutoring/mentoring",,,Somewhat useful,,,,,,,,Very useful,,,,Not Useful,,Very useful,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),,Engineer,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important +Female,Pakistan,29,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Text Mining,Matlab,Other,"Conferences,Official documentation,YouTube Videos",,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,Very useful,,< 1 year,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Other,2 - 10 hours,Other,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Other,University courses,20,10,40,30,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Female,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Academic,10 to 19 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Other,,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Python,R",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Simulation",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,30,40,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Often,,,,,,,,,,Often,Often,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),,60000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Canada,50,Retired,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,,"Blogs,Company internal community,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Very useful,,,Somewhat useful,,,,Very useful,,Not Useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,Work,10,30,60,0,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A master's degree,Other,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Always,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL",,,,,,,,,Rarely,,,,,,,,Often,,,,,,,Sometimes,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Rarely,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,Most of the time,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,,Rarely,,Rarely,,Often,,,Sometimes,,,,,,,,60,15,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,Often,Most of the time,,,,,,Often,,,Most of the time,,,Sometimes,,,Often,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,35000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,Computer Science,More than 10 years,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Jack's Import AI Newsletter,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,50,20,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Rarely,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,Rarely,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,Rarely,Sometimes,,,Rarely,Often,,,,,,"Bayesian Techniques,Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,,,Often,,,,,,Often,,Often,,Often,,Sometimes,Most of the time,Often,Sometimes,,,,,,,,Often,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,Often,,,,,,Sometimes,,,,,,Often,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,130000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",5,5,60,0,2,28,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,Other,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,10GB,"Gradient Boosted Machines,Regression/Logistic Regression,Other","Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Rarely,Rarely,,Often,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Sometimes,Sometimes,Most of the time,Sometimes,,,,Sometimes,,Often,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,,,,,Sometimes,Sometimes,,,,35,20,10,20,10,5,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,Most of the time,,,,Rarely,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,120000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),,Python,GitHub,"Arxiv,Blogs,College/University,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,,Very useful,,,Somewhat useful,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Operations Research Practitioner,Other",University courses,15,5,20,60,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1GB,"Neural Networks,SVMs,Other","Microsoft Excel Data Mining,QlikView",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Neural Networks",,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,55,20,10,10,5,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,nope,data handling,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,45000,INR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Argentina,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Machine Learning Engineer,Researcher",University courses,10,10,40,20,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,Data Analyst,Self-taught,50,50,0,0,0,0,,,Primary/elementary school,Technology,100 to 499 employees,Increased significantly,Less than one year,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,,,"C/C++,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,SAS Enterprise Miner,Tableau",,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,Often,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,0,70,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,41,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,I never declared a major,More than 10 years,"Operations Research Practitioner,Software Developer/Software Engineer",Self-taught,30,20,20,10,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Researcher",Work,40,30,10,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,SQL,Deep learning,R,,"College/University,Company internal community,Friends network,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Data Analyst,Operations Research Practitioner,Researcher,Other",University courses,20,0,10,70,0,0,,,High school,Technology,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Sometimes,10MB,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,,Sometimes,,Sometimes,,,Often,,Sometimes,Often,,,,,,Most of the time,,,26-50% of projects,Do not know,Standalone Team,,,,,,,,56000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Rarely,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","DataRobot,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Often,,Often,Often,,,,,Often,,Often,,Often,Often,,Often,,Often,,,,,,Most of the time,,,,,80,10,0,10,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Often,Often,,,Often,,Often,Most of the time,,Most of the time,,,,,,,Most of the time,Sometimes,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Never,102,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,Not Useful,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Retail,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Impala,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),Tableau",Rarely,,,,,,,,,,,,,Often,,,Often,,,,,Sometimes,,Sometimes,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,,,,Rarely,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Often,Often,,Often,,,Often,,,,Often,,Often,Often,Often,Often,Often,Often,,,Often,,Often,Often,Often,,,,20,40,0,10,10,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,,Often,,,Often,,,,,Often,,,Often,,Often,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Bitbucket,,130000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Business Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Rarely,10MB,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,University courses,0,30,20,40,10,0,"Machine Translation,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"5,000 to 9,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,Python,QlikView,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,Sometimes,,,,,Often,Most of the time,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,Segmentation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,25,25,25,25,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,,Sometimes,,,,,,,,,Often,,,,26-50% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,Deep learning,Python,Google Search,"Blogs,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,,,,Somewhat useful,Very useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,,,,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",0 - 1 hour,Online Courses and Certifications,No,,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),"Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Work,30,15,30,20,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,"Data Stories Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Other",Work,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"5,000 to 9,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,NoSQL,Python,R,SAS JMP,SQL,Tableau,Unix shell / awk",,Often,,,,,,,Sometimes,,Often,Often,,,,,,,,,,,Often,,,,Sometimes,,,,Often,,Most of the time,,,,,,,Most of the time,,Often,,,Often,,,Sometimes,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,Often,,,,,Often,Often,,,,,,Sometimes,,Often,,,,Often,Often,Often,Often,,,,,,Sometimes,Often,,,,20,20,10,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Sometimes,,,,,,,,Often,,Often,,,10-25% of projects,Approximately half internal and half external,Business Department,www.data.gov,various internal data sources ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,"Arxiv,College/University,Online courses",Very useful,,Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,University courses,40,0,0,60,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,I don't know,Increased slightly,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Most of the time,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Python",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests,RNNs,SVMs",,,,Most of the time,,,,Most of the time,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,Sometimes,,Most of the time,,,Most of the time,,,,,,50,0,50,0,0,0,Enough to run the code / standard library,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,,,,,,,,,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,31,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Newsletters,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,20,0,0,50,0,30,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important +Male,Iran,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Survival Analysis,Time Series",,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,24,"Not employed, but looking for work",,,,,,,,R,Deep learning,Matlab,University/Non-profit research group websites,"College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,50,0,30,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,South Africa,27,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,R,Google Search,"Blogs,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,15,35,0,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,South Africa,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Textbook,YouTube Videos",,,,,,,,,,,,,,,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Other,University courses,50,10,10,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Other,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service","Image data,Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"Flume,Hadoop/Hive/Pig,Python,R,SAS Base,SQL,Tableau",,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Sometimes,,,,Often,,,Most of the time,,,,,,,"Logistic Regression,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,Sometimes,,,,,,,Often,,,,,,,100% of projects,Approximately half internal and half external,Business Department,Social media data,Joins with different data source types,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,30000,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Management information systems,More than 10 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",10,80,5,0,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Spark / MLlib,Deep learning,Java,GitHub,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Machina Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,6 to 10 years,"Computer Scientist,Data Miner,Engineer",University courses,20,50,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Sometimes,1EB,"Bayesian Techniques,Decision Trees,RNNs,SVMs","Flume,Hadoop/Hive/Pig,R,Spark / MLlib",,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,"Association Rules,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests,SVMs",,Most of the time,,,,,,Often,,,,,,Rarely,,,,,,Sometimes,,,Most of the time,,,,,Sometimes,,,,,,20,30,40,0,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,I prefer not to say,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Rarely,Often,,,,,Sometimes,,,,,,,Rarely,,,,,Sometimes,,,,10-25% of projects,,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",,,,,,AED,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Biology,Less than a year,Other,Self-taught,100,0,0,0,0,0,Unsupervised Learning,"Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Academic,,,,I visited the company's Web site and found a job listing there,,Other,Laptop or Workstation and private datacenters,Other,,10GB,"Decision Trees,Regression/Logistic Regression,SVMs,Other",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,SVMs",,,,,,,,,,,,,,Often,,Sometimes,,,,,Often,,,,,,,Rarely,,,,,,15,15,40,30,0,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,"45,000",USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,90,0,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,50,0,10,40,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,500 to 999 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,SVMs","C/C++,MATLAB/Octave,Perl,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,,Often,,,,,,Sometimes,,,,,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,,,,,20,20,20,10,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,None,Entirely internal,Standalone Team,no,data selection/cleaning,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Subversion,Never,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",C/C++,Neural Nets,R,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Textbook",,,,,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Computer Scientist,Programmer,Researcher",University courses,5,10,5,80,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Most of the time,10GB,,"Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Sometimes,,,,"Naive Bayes,Natural Language Processing",,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,12,3,80,0,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,Often,,,,,Most of the time,,,,,,Most of the time,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,Google ngrams; CMU pronouncing dictionary,Lack of others to discuss idea with.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Other",Dropbox,Generic cloud file sharing software (Dropbox/Box/etc.),Always,5000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer,Other",Self-taught,46,8,46,0,0,0,Other (please specify; separate by semi-colon),,A bachelor's degree,Technology,,,,,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service",,,,Neural Networks,"C/C++,Jupyter notebooks,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Neural Networks,Simulation",,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,4,84,4,4,4,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Engineer,,Employed by a company that doesn't perform advanced analytics,,,,,"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer",University courses,0,50,25,25,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",,,,,"Amazon Web services,Python,Spark / MLlib,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,50,0,40,10,0,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,100% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Very Important +Male,United States,51,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Data Analyst,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Ensemble Methods,Neural Networks - CNNs",A professional degree,Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,1GB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression","C/C++,Oracle Data Mining/ Oracle R Enterprise,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Kenya,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Researcher,I haven't started working yet",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,70,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,47,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,Statistician,University courses,30,10,10,30,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Telecommunications,"10,000 or more employees",Decreased slightly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,Regression/Logistic Regression,"Perl,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,Random Forests,Segmentation",,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,Sometimes,,,Most of the time,,,,,,,,30,30,5,30,5,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,,,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Software Developer/Software Engineer,Other",University courses,0,10,30,60,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,University courses,20,20,10,20,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,"10,000 or more employees",,,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,,,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Simulation,SVMs",Often,,,,,,Often,,,,,,,Often,,,,,,,Sometimes,,,,,,Most of the time,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Text Mining,Python,"GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Not Useful,Very useful,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,"Data Analyst,Researcher,Other",Work,30,5,60,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,Fewer than 10 employees,Decreased significantly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Relational data,Other",Always,1GB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",Sometimes,,,,,Often,Most of the time,,,,,,,Sometimes,,Most of the time,,,,Sometimes,Often,,,,,,,Sometimes,,Most of the time,,,,20,25,20,30,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Often,Often,,,,Often,,Most of the time,Sometimes,,,,Most of the time,,Often,Often,Often,Sometimes,,100% of projects,More internal than external,Other,Physionet; pubmed,Manual tagging of signal quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,120000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Self-employed,Hadoop/Hive/Pig,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,1 to 2 years,"Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Kaggle competitions,40,20,0,0,40,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Internet-based,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Hadoop/Hive/Pig,Python,Spark / MLlib,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,Often,,,,,Sometimes,,Often,,Often,,,Often,,,Often,,,,,10,30,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,,,,,,,,Most of the time,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,Canada,27,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,15,25,47,3,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,100MB,SVMs,"Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,Most of the time,,,Often,,,,"A/B Testing,Bayesian Techniques,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",Sometimes,,Sometimes,,,,Most of the time,,,,,,,Sometimes,,,,Often,Often,,Often,,,,,,,Often,Most of the time,,,,,50,15,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,I prefer not to say",,,,,Often,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,,unclear classification,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"30,000",CAD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,40,25,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Other,Most of the time,10MB,Other,"Amazon Web services,Jupyter notebooks,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,,Rarely,,,,,Rarely,,Rarely,,,,,Rarely,,,,,,Often,,,Often,,,,25,5,25,15,15,15,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,Sometimes,,,,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,22,Employed full-time,,,Yes,,Data Scientist,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,5,30,15,30,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,52,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Excel Data Mining,Text Mining,Python,Government website,"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,Not Useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Data Skeptic Podcast",5-10 years,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX",Workstation + Cloud service,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Physics,,"Business Analyst,Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,16-20,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,46,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,Rule Induction,Python,I collect my own data (e.g. web-scraping),"Arxiv,Kaggle,Official documentation,Personal Projects,YouTube Videos",Very useful,,,,,,Very useful,,,Very useful,,Very useful,,,,,,Very useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,25,25,25,0,0,25,"Adversarial Learning,Computer Vision,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",Primary/elementary school,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Never,100GB,"Bayesian Techniques,Decision Trees,Neural Networks","C/C++,Java,Mathematica,Python,Other",,,,Most of the time,,,,,,,,,,,Often,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,,,"Bayesian Techniques,CNNs,GANs,Neural Networks,Random Forests,RNNs,Simulation,Other",,,Often,Often,,,,,,,Often,,,,,,,,,Often,,,Often,,Often,,Most of the time,,,,Often,,,25,25,0,25,25,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Other",Often,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,100% of projects,More internal than external,Standalone Team,"recipes, zooniverse, wikipedia, news streams, and government data.",Not having an intern or team to process the data.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,150000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,1,4,5,90,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Financial,20 to 99 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,Most of the time,100GB,"Gradient Boosted Machines,Regression/Logistic Regression","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems",Most of the time,,,,,Most of the time,,Most of the time,Often,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,Often,,,,,,,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,,,Most of the time,Most of the time,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,,,A professional degree,Other,Fewer than 10 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,<1MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,20,25,0,20,35,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,Often,,,Most of the time,,,,,Sometimes,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,data may not be maintained in a readily accessible form,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SAP BusinessObjects Predictive Analytics,Time Series Analysis,SQL,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,,,,,Necessary,Necessary,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,Other,Self-taught,50,50,0,0,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks",High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,Very Important,,, +Male,Poland,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,Very useful,Very useful,,Somewhat useful,,Very useful,,Very useful,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Miner,Data Scientist,Other",University courses,25,5,45,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Telecommunications,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Always,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,QlikView,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,Rarely,Rarely,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Rarely,Sometimes,Sometimes,,,Most of the time,Often,Often,Most of the time,,,Most of the time,,Often,Often,Most of the time,,,Sometimes,Sometimes,Often,,Most of the time,,,Most of the time,,,Often,Rarely,,,,50,5,10,5,10,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,Rarely,,Most of the time,Often,,Often,,,Most of the time,,,,,,Sometimes,Rarely,,51-75% of projects,More internal than external,Other,weather data,"In my current long project we are creating a set of classification models. Various models are created for different customers based on the same database, yet populated with different values for each customer due to discrepancies in the processes and integration with customers' own systems. Major challenges include: 1) lack of clear definition of target variables 2) lack of understanding from business SMEs and decision-makers alike regarding why target variables need to be defined in some way 3) non-standard ways of working resulting in very low credibility of data (people use the source systems as they wish, not as per standard procedures, since little control is exterted) 4) semi-free text fields (most informative fields comprise parsed information without any separators in various languages; partially coming from dictionaries or other systems, partially typed by users, without clear distinction) 5) a lot of information commonly shared by operational professionals seems to not be included in the systems","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Portugal,39,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,I don't plan on learning a new tool/technology,Bayesian Methods,R,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Not Useful,Not Useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,Very useful,,Not Useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Researcher,University courses,20,35,10,30,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Rarely,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,R,Other",,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Often,,,,,,,Most of the time,,,,Often,Sometimes,Sometimes,Sometimes,,,,20,25,10,25,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Entirely external,Standalone Team,,,,"Email,I don't typically share data",,Git,Sometimes,18000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Amazon Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",Python,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle",Very useful,,,,,,Very useful,,,,,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Predictive Modeler,Statistician",Work,25,5,35,35,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",,Telecommunications,10 to 19 employees,Stayed the same,1-2 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,100GB,"Bayesian Techniques,Gradient Boosted Machines,Markov Logic Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",Often,Sometimes,,,,Most of the time,Most of the time,,,,,,,,,Sometimes,Sometimes,Often,,,Sometimes,,,,,Often,,Often,Often,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Maintaining responsible expectations about the potential impact of data science projects",Often,Sometimes,,,,,,,,,,,,Often,,,,,,,,,76-99% of projects,Entirely internal,Business Department,Geospatial data sets,,Document-oriented (e.g. MongoDB/Elasticsearch),Other,Github,Git,Always,"120,000",CAD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",10,30,30,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,51,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",40+,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer,Other",University courses,10,0,0,75,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Male,United States,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Scientist,DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,,100,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,1 to 2 years,Engineer,Self-taught,80,20,0,0,0,0,Computer Vision,Neural Networks - CNNs,High school,Other,I don't know,Stayed the same,Less than one year,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Image data,,10GB,Neural Networks,"C/C++,R,SQL",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,22,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer,Other",University courses,15,30,30,15,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,Rarely,,,,,,,,,,,,,Sometimes,,,Rarely,Sometimes,Often,Sometimes,Sometimes,Rarely,,,,,,Most of the time,,Most of the time,,,,,Sometimes,Sometimes,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Often,,Often,Often,Often,,,,Often,,Often,,Often,,,,Most of the time,Sometimes,,Often,,Rarely,Often,,Often,,Often,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Sometimes,,Sometimes,,,,Often,,,,,Most of the time,,,100% of projects,Approximately half internal and half external,IT Department,kaggle; UC Irvine Machine Learning Repository; cortana intelligence gallery; numerai,prepare it to acceptable for analysis form,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,11400,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,33,"Not employed, but looking for work",,,,,,,,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,No Free Hunch Blog,< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"DataCamp,edX",GPU accelerated Workstation,2 - 10 hours,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,Self-taught,90,0,0,10,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,Other,"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer",University courses,25,20,30,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Often,Sometimes,,Sometimes,,,Often,,Most of the time,,,,,,,,,,,20,20,25,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,Often,Most of the time,,Sometimes,,,,,,,,Often,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,postal codes;phone number based information;email address based information,Making data available in online mode for serving live data to ML models. Basically going from offline/batch processing to live computation.,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,140000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Portugal,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,,Nice to have,Nice to have,,,Necessary,,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Other,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,7,3,10,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",R,Decision Trees,R,Other,"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,"Data Analyst,Operations Research Practitioner,Researcher",Self-taught,75,15,0,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",,Other,100 to 499 employees,Increased significantly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,Often,,,,,,,"Decision Trees,Logistic Regression,Random Forests,Simulation",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,,Sometimes,,,,,,,40,25,5,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,,Most of the time,,,,,,Most of the time,Often,,,,Often,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,,Always,140000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,29,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Company internal community,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,1 to 2 years,"Data Scientist,Programmer,Researcher",University courses,20,25,15,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,Rarely,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Rarely,Sometimes,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Sometimes,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,Often,,,,,83,5,2,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,63,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Neural Nets,Python,"Google Search,Government website","College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Bachelor's degree,Management information systems,,Other,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Not important,Not important,Not important +Male,Poland,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Time Series Analysis,R,"Google Search,University/Non-profit research group websites","Blogs,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,20,5,30,0,5,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Argentina,28,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp",Traditional Workstation,2 - 10 hours,Github Portfolio,No,Doctoral degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Engineer,Researcher,Software Developer/Software Engineer",Self-taught,50,40,10,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Very useful,,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,PhD,No,Master's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,0,0,0,70,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,26,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that performs advanced analytics,Julia,Deep learning,Python,Google Search,"College/University,Kaggle,Official documentation,Online courses,Textbook",,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,80,0,0,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Sometimes,Often,,,,Often,,,,,Often,Sometimes,Rarely,Most of the time,,,Rarely,Rarely,,Rarely,,,,,,,,Most of the time,,Most of the time,,,,Rarely,Sometimes,Sometimes,,Sometimes,Most of the time,,,Sometimes,Sometimes,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Sometimes,,Most of the time,Sometimes,Often,Often,,,Most of the time,,,,Often,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Most of the time,,Sometimes,,Often,Often,Often,Often,,,,50,20,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,,,,Often,Often,,,Often,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,Canadian Census data; economics data; stock data,Latency of the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),,80000,CAD,I am not currently employed,7,,,,,,,,,,,,,,,,,, +A different identity,Other,25,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by college or university,Microsoft Azure Machine Learning,Social Network Analysis,Stata,I collect my own data (e.g. web-scraping),Friends network,,,,,,Very useful,,,,,,,,,,,,,Emergent/Future Newsletter (Algorithmia),,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,A health science,3 to 5 years,Statistician,University courses,0,0,0,100,0,0,Computer Vision,Markov Logic Networks,No education,Internet-based,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Always,100TB,Regression/Logistic Regression,Google Cloud Compute,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,0,100,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Data Science results not used by business decision makers,,Rarely,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,IT Department,Health,Concatenation ,Key-value store (e.g. Redis/Riak),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,"25,000,000",ETB,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Stan,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,Very useful,,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,"No Free Hunch Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,6-10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,R,SQL,Tableau,Other,Other,Other",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,Most of the time,Most of the time,Rarely,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Other,Other",,,,,,Most of the time,Most of the time,Often,,,,Sometimes,,,Often,Most of the time,,,,,,,Often,,,Sometimes,,,,,Often,Sometimes,,20,40,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,Often,,,,,,Often,,,76-99% of projects,Do not know,Central Insights Team,US government data,Continuing changes to product database schemas,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Rarely,110000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,39,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation",,,Very useful,,Very useful,,Very useful,,Very useful,Very useful,,,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Engineer,Software Developer/Software Engineer",University courses,30,0,0,50,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,,,,,Very Important,,,,,,,,Very Important,, +Male,India,37,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,30,0,30,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,Most of the time,,,,Most of the time,,,,,,,,Often,,,,,,Sometimes,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Often,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,Sometimes,,Most of the time,Most of the time,Sometimes,Often,,,Most of the time,,Sometimes,,Often,,,,Often,Sometimes,,Sometimes,,,Sometimes,,,,,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,,,,,Often,,Often,,,,Often,,,,,Often,Sometimes,,51-75% of projects,Approximately half internal and half external,IT Department,None,Access to secured data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git,Subversion",Never,2200000,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Spain,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Company internal community,Online courses",,Somewhat useful,Somewhat useful,Very useful,,,,,,,Somewhat useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Pharmaceutical,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,100MB,"Regression/Logistic Regression,Other","IBM SPSS Statistics,Microsoft Excel Data Mining,R,SQL,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,Most of the time,,,"kNN and Other Clustering,Logistic Regression",,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,40,10,0,20,30,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,Often,,,,,,,,,,Most of the time,,,,,Most of the time,,100% of projects,Entirely internal,Standalone Team,We always analyze external databases owned by our customers,Gathering enought samples to have significative insights,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Always,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Self-employed,,Social Network Analysis,R,,"Blogs,Official documentation,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,Very useful,,Very useful,,Somewhat useful,,,,,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,Other,Self-taught,80,10,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Orange,R,RapidMiner (free version),Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,Often,,Rarely,,,,,,,,,,Rarely,,,,,,,"Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,,,,,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,,,Often,,Often,,,Often,Sometimes,,,,,,,40,20,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,Often,,,,,,,,Most of the time,,,,26-50% of projects,Entirely internal,Other,none,variety of formats and layouts,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,80000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Very useful,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,1 to 2 years,Business Analyst,Self-taught,60,20,0,20,0,0,Time Series,"Evolutionary Approaches,Logistic Regression",A master's degree,Retail,"10,000 or more employees",Increased slightly,Don't know,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,100MB,"Evolutionary Approaches,Regression/Logistic Regression,Other","MATLAB/Octave,NoSQL,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Often,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Rarely,,,,72,2,6,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,,,,I do not want to share information about my salary/compensation,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Python,I don't plan on learning a new ML/DS method,Python,Google Search,"College/University,Kaggle,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,,Very useful,,,Very useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,PhD,No,Doctoral degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Iran,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,"Employed by college or university,Self-employed",Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,30,NA,0,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Technology,,,,,Very important,,Basic laptop (Macbook),Text data,Never,100MB,"CNNs,GANs,Neural Networks,Regression/Logistic Regression,RNNs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,"CNNs,Data Visualization,GANs,Logistic Regression,Neural Networks,RNNs,Time Series Analysis",,,,Often,,,Most of the time,,,,Often,,,,,Sometimes,,,,Often,,,,,Often,,,,,Rarely,,,,10,15,50,25,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,,,,,,,,,,Often,,,51-75% of projects,Do not know,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,,Never,0,IRR,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,30,10,0,30,30,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Bayesian Methods,Python,Google Search,"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Scientist,Engineer",Work,25,20,40,10,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"5,000 to 9,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,Tableau",,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,Rarely,Most of the time,,,,,,,,Sometimes,,,,Often,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,Sometimes,Often,Often,Often,Most of the time,,,Sometimes,,,Often,Often,,Sometimes,Sometimes,Sometimes,,Rarely,Often,Most of the time,,Often,,Often,Sometimes,Sometimes,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Often,Often,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,Noisy data,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,2600000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Belgium,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Personal Projects",,,Very useful,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",< 1 year,Unnecessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Traditional Workstation,0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,22,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Scientist,Self-taught,1,0,0,0,0,99,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Financial,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos,Other,Other",Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Talking Machines Podcast,Other (Separate different answers with semicolon)",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,"DataCamp,edX,Other",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,45,0,0,35,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A health science,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",University courses,0,10,20,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,25,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Anomaly Detection,Python,"GitHub,Google Search","College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Somewhat useful,Somewhat useful,Not Useful,,Somewhat useful,Very useful,,,Somewhat useful,"Data Machina Newsletter,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,Less than a year,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Researcher",University courses,20,0,0,80,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,India,22,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,50,5,0,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,30,10,30,0,30,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Netherlands,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,College/University,Online courses,YouTube Videos",Very useful,,Somewhat useful,,,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,Researcher,Work,80,10,10,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,100 to 499 employees,Increased slightly,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Other",,10GB,"CNNs,Neural Networks,Regression/Logistic Regression","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,Often,Often,,,,,,,,,Most of the time,,,,10,50,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,,,,Most of the time,,,,Often,,,Rarely,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,Lack of literature on the subject,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,,EUR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Other,Self-taught,80,10,10,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",,Academic,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Traditional Workstation","Image data,Text data,Relational data",Rarely,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Python,R,RapidMiner (free version),Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,Rarely,,,,,Most of the time,,Rarely,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,Python,Text Mining,Python,University/Non-profit research group websites,"College/University,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,,University courses,0,30,0,70,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Biology,1 to 2 years,"Researcher,Other",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,32,Employed part-time,,,No,Yes,Computer Scientist,Fine,Employed by college or university,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,DataRobot,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,0,10,0,20,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,Brazil,25,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,10,40,10,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer",University courses,50,0,20,10,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,100MB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Neural Networks",,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,54,"Not employed, but looking for work",,,,,,,,Python,Link Analysis,C/C++/C#,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Kaggle,Newsletters,Personal Projects",Very useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,Very useful,,,,,,,Data Machina Newsletter,5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,Nice to have,,,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Other,Sort of (Explain more),Doctoral degree,Other,More than 10 years,"Data Miner,DBA/Database Engineer,Researcher,Statistician",Work,20,0,50,30,0,0,"Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,,,,Somewhat important,,,,Very Important,Somewhat important,Somewhat important,,,,,,Somewhat important +Male,Finland,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,Software Developer/Software Engineer,University courses,20,40,20,20,0,0,Computer Vision,"Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Rarely,100MB,"HMMs,Random Forests","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"HMMs,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,Rarely,,,Rarely,,Rarely,,,,,,50,30,15,5,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,100000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Official documentation",Very useful,Very useful,,,,,Very useful,,,Somewhat useful,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,Self-taught,20,10,40,10,20,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,I don't know,Increased slightly,Don't know,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data,Text data",Never,100GB,"CNNs,GANs,Gradient Boosted Machines,RNNs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs",,,,Most of the time,,Most of the time,Most of the time,Sometimes,Sometimes,,Sometimes,Sometimes,,,,,,,,Most of the time,Sometimes,,Sometimes,,Sometimes,Often,,Sometimes,,,,,,50,30,0,10,NA,10,Enough to refine and innovate on the algorithm,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Other,,,,I don't typically share data,,Git,Always,25000,USD,Other,5,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Java,Genetic & Evolutionary Algorithms,Python,Google Search,"Company internal community,Online courses,Personal Projects,Stack Overflow Q&A",,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Survival Analysis,Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",Retail,I don't know,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1TB,Regression/Logistic Regression,"C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL",,,,Rarely,,,,Often,Often,,,,,,,,Often,,,,Rarely,,Often,Rarely,Most of the time,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Often,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,24,14,14,24,24,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,Often,,,,,,Often,,,,,,Most of the time,,,,,Often,,,76-99% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,"66,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,"Not employed, but looking for work",,,,,,,,Python,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Podcasts",,,,,Somewhat useful,,,,,,Very useful,,Somewhat useful,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,Supervised Machine Learning (Tabular Data),Gradient Boosting,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Female,Russia,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,University/Non-profit research group websites,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Researcher",University courses,50,20,0,0,30,0,Computer Vision,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,34,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Podcasts",,,,,,,Very useful,,,,Very useful,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,30,60,0,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Canada,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,55,10,10,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Government,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Never,1GB,"Bayesian Techniques,CNNs,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Hadoop/Hive/Pig,Java,Julia,MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Tableau,TensorFlow,Unix shell / awk",Rarely,,,,,,,,Often,,,,,,Often,Often,,,,,Often,,,,,,Sometimes,Rarely,,,Most of the time,,Often,,,,,,,,,,,,Most of the time,Sometimes,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Recommender Systems,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Engineer,Other",University courses,20,80,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk",,Most of the time,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,Rarely,,,Often,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Time Series Analysis",Most of the time,Most of the time,Most of the time,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Sometimes,Often,Most of the time,Often,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,Often,Often,,Most of the time,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,Inability to integrate findings into organization's decision-making process,,,,,,,,Often,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,N/a,Scale,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Most of the time,126000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,Linear Digressions Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,10,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,SAS Base,Spark / MLlib,SQL,Tableau",,,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,,,Often,,,,,,,Often,,,Sometimes,Often,,,Often,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,SVMs,Text Analytics",,,Sometimes,,,Sometimes,Often,Sometimes,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,Sometimes,Rarely,,,,Sometimes,Often,,,,,20,30,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Sometimes,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,10-25% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,400000,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Company internal community,Personal Projects,Stack Overflow Q&A,Textbook",,,,Very useful,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Engineer",University courses,15,10,50,25,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",Often,,,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Often,,,Often,Sometimes,,,,,Rarely,,Most of the time,,,Sometimes,,Rarely,,,,,,45,15,25,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",Often,,,Often,,,,,Most of the time,Often,,,,Sometimes,Most of the time,,,Often,,,,,76-99% of projects,More internal than external,IT Department,Census data; election results; voter data,Data architecture and warehousing,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,Git,Sometimes,88000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,29,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,SQL,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,3 to 5 years,"Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,25,50,25,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Academic,20 to 99 employees,Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,"Regression/Logistic Regression,SVMs","Mathematica,MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,Rarely,Often,,,,,,,,,,Sometimes,,Most of the time,,,,,Often,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",Often,,,,,Often,,,,,,,,,,Most of the time,,,,,Often,,,,,,Often,,,Most of the time,,,,25,25,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,Often,,,,,,,,Often,,,,,,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,35000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,United States,56,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Time Series Analysis,Python,Google Search,"Arxiv,Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos",Very useful,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,15,0,80,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,Perl,Python,R,TensorFlow,Unix shell / awk",,Sometimes,,Rarely,,,,,,,,,,,Rarely,,Often,,,,Sometimes,,,,,,,,,Rarely,Often,,Rarely,,,,,,,,,,,,,Rarely,,Rarely,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Most of the time,Most of the time,,Rarely,Often,,Sometimes,Sometimes,Often,,Often,,Sometimes,Often,,Most of the time,,,Sometimes,Often,Often,,Sometimes,,,,50,20,5,10,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Privacy issues",Often,,,,Most of the time,,,,,,,,,,Often,,Sometimes,,,,,,100% of projects,More external than internal,Standalone Team,Any publicly available dataset,"Keeping track of ownership and licensing, redistribution rights","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,80000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Company internal community,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,,Very useful,,,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Fine arts or performing arts,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,60,20,15,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,,Often,,,Often,,Sometimes,Sometimes,Often,,Rarely,Sometimes,Rarely,Rarely,,Often,,Rarely,Sometimes,Rarely,Sometimes,Sometimes,Sometimes,,,,45,10,10,25,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,Often,,,Sometimes,,,Sometimes,,,51-75% of projects,More internal than external,Central Insights Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,90000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,47,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Other,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,"No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician,Other",Work,35,15,25,10,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Insurance,500 to 999 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Most of the time,Sometimes,Sometimes,Often,,,Often,,Sometimes,,Often,,,Often,,Sometimes,,Most of the time,,,Sometimes,,,Sometimes,,,,,80,5,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,Often,,Often,Most of the time,,,,,,,Often,,,Most of the time,,,,,,Often,,Less than 10% of projects,Entirely internal,Other,,dirty,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,I don't typically share data",,Git,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,R,Tableau",,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,Sometimes,,,,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Often,,,Often,,,,,,,,70,20,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,,,,,,Often,,,,,Often,,Often,Often,Often,,,100% of projects,Entirely internal,Central Insights Team,none,"not clean, repetitive",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,700000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,France,57,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Textbook,YouTube Videos",,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Miner,Statistician",Work,30,0,70,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,NoSQL,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Statistica (Quest/Dell-formerly Statsoft),Tableau",,Sometimes,,,,,,Often,Often,,Most of the time,Rarely,,,,,Often,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,Rarely,,,Often,Sometimes,Sometimes,Often,Often,,Rarely,Rarely,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Sometimes,,Often,Most of the time,Often,Most of the time,,,,Often,,Often,Often,Often,,Often,,,Most of the time,,Often,Often,,Often,,Sometimes,Often,Sometimes,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Sometimes,Often,,,Often,,,,,Often,,,Sometimes,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed part-time,,,Yes,,Statistician,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,50,0,0,30,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",,Academic,20 to 99 employees,Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,C/C++,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,Very useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),40+,Github Portfolio,Yes,Master's degree,,3 to 5 years,Software Developer/Software Engineer,Kaggle competitions,10,40,0,0,50,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,France,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,"Employed by a company that performs advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",30,20,40,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs",Primary/elementary school,Technology,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Image data,Video data,Text data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Random Forests","Amazon Web services,C/C++,Java,Jupyter notebooks,Mathematica,Perl,Python,Spark / MLlib",,Rarely,,Often,,,,,,,,,,,Sometimes,,Sometimes,,,Rarely,,,,,,,,,,Sometimes,Often,,,,,,,,,,Rarely,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes,Random Forests,Simulation,Time Series Analysis",,,Often,,,,Often,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,Often,,,Often,,,,20,30,20,20,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,Sometimes,,,,,,,,,,,,Often,,,,Often,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Sometimes,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Link Analysis,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Conferences,Official documentation",Very useful,,,,Somewhat useful,,,,,Very useful,,,,,,,,,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",Self-taught,60,2,24,0,14,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,10 to 19 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Markov Logic Networks,Other","Google Cloud Compute,Jupyter notebooks,Mathematica,NoSQL,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Sometimes,,,Most of the time,,,Most of the time,,,Most of the time,,,,Often,Most of the time,,Sometimes,,Most of the time,,Often,,,Often,Most of the time,,,Sometimes,,,,30,40,20,0,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,Sometimes,,,,,,,,,,Often,,,,10-25% of projects,More internal than external,Central Insights Team,"Government Public Records dataset, Weather data; Openstreetmaps data","Preparing the training data, which is decided once the data distribution is cleaned.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Subversion,Most of the time,600000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",50,15,30,0,5,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Video data,Relational data",Sometimes,1GB,"Gradient Boosted Machines,Random Forests","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs",,,,Sometimes,,Most of the time,Most of the time,Sometimes,,,,Most of the time,,,,,,,,Sometimes,,,Most of the time,,,,,Sometimes,,,,,,40,20,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,,Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,edX,Udacity",Other,2 - 10 hours,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Programmer,Self-taught,75,25,0,0,0,0,"Computer Vision,Reinforcement learning,Unsupervised Learning",,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,United States,NA,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Very useful,,,,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Physics,,"Business Analyst,Data Analyst,Programmer,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Canada,36,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Online courses,Textbook",Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer,Other",University courses,25,0,0,75,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Technology,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,Often,,,,,,,Sometimes,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,"Bayesian Techniques,Decision Trees,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Often,,,,,Often,,,,,,,,,,Often,,,Sometimes,,Often,,,,,Often,,,,,,20,40,20,0,0,20,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Most of the time,Often,,,,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,150000,CAD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Russia,26,Employed part-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,R,Factor Analysis,C/C++/C#,GitHub,"College/University,Conferences,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Biology,I don't write code to analyze data,Researcher,Work,10,0,20,50,20,0,Survival Analysis,Logistic Regression,A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Switzerland,44,Employed full-time,,,Yes,,Programmer,,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,Python,Google Search,"Arxiv,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,0,30,0,0,0,"Machine Translation,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,,1GB,"Decision Trees,Neural Networks,RNNs,SVMs","Amazon Web services,NoSQL,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Rarely,,Sometimes,,,,"A/B Testing,CNNs,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Text Analytics",Often,,,,,,,,,,,,,,,,,,Often,Often,Sometimes,,Sometimes,,Often,Often,,,Often,,,,,30,10,10,10,40,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Limitations of tools",,,,Often,Often,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Aws s3,Git,Sometimes,150000,CHF,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,,,,,"Friends network,Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,,,,,,,Very useful,Very useful,,< 1 year,Necessary,,Necessary,,Necessary,,,,,,,,,,,0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,Self-taught,100,0,0,0,0,0,,,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Pakistan,33,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Google Cloud Compute,Random Forests,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Newsletters,Online courses,Textbook,YouTube Videos",,,Very useful,,Very useful,,,Very useful,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Emergent/Future Newsletter (Algorithmia),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Researcher,University courses,0,0,0,100,0,0,"Computer Vision,Reinforcement learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Image data,Most of the time,10MB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,SVMs","MATLAB/Octave,Microsoft Excel Data Mining",,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Naive Bayes,Neural Networks,Random Forests,Segmentation,Simulation,SVMs",,,Most of the time,,,Often,,,,,,,,,,,,Most of the time,,Often,,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,80,100,100,100,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",,,Most of the time,,,,,,,Most of the time,Often,Often,Most of the time,,Most of the time,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"60,000",PKR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Kaggle competitions,55,35,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Financial,"1,000 to 4,999 employees",Decreased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,100MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,Rarely,Rarely,,Most of the time,,,,,,,,,,Often,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Prescriptive Modeling,Time Series Analysis",,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,40,18,10,15,17,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,Often,,Often,,,,,Often,Often,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,E Natis; banking data; bareou data,"Inaccurate data, missing data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Most of the time,0,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Data Scientist,,,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,Python,R,SQL,Stan,TensorFlow",,Most of the time,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,Rarely,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",Sometimes,,,,,Most of the time,Most of the time,Often,,,,Most of the time,,,,Often,,,Sometimes,Rarely,,,Sometimes,,,,,,Rarely,,,,,25,25,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,Often,,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,,76-99% of projects,More internal than external,Other,credit scores,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Rarely,300000,BRL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Kaggle competitions,20,0,0,0,80,0,Computer Vision,,A master's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important +Female,Brazil,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,University/Non-profit research group websites,"College/University,Conferences",,,Very useful,,Somewhat useful,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Machine Learning Engineer,Researcher",University courses,0,0,50,50,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A doctoral degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,SVMs","C/C++,Java,Perl,Python,R,Other",,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,Often,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,,Often,Often,Sometimes,,,,Often,,,,,,Sometimes,Often,,Often,,,,,Often,,,,,,30,50,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Unavailability of/difficult access to data",Often,,,,,,,,Often,,,,,,,,,,,,Often,,10-25% of projects,Approximately half internal and half external,Other,"UCI, Keel, benchmarks",Collecting data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Most of the time,35000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,26,"Not employed, but looking for work",,,,,,,,Python,Support Vector Machines (SVM),Python,"I collect my own data (e.g. web-scraping),Other","Arxiv,College/University,Company internal community,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,Very useful,Very useful,Very useful,,,,,Very useful,Very useful,,Very useful,,,,Very useful,,1-2 years,Necessary,Necessary,Necessary,,Necessary,Necessary,,,,,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Yes,Bachelor's degree,Other,1 to 2 years,"Engineer,Machine Learning Engineer,Other",Other,0,0,0,100,0,0,"Computer Vision,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Very Important,Very Important,,Very Important,Very Important,,Very Important,,,Very Important,,Very Important,Very Important,,Very Important,Very Important +Male,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Other,Other,50,0,0,0,0,50,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,,,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,90,0,10,0,0,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,Rarely,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Not Useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,40,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,100 to 499 employees,Increased slightly,6-10 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Perl,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Rarely,,,Most of the time,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Often,Sometimes,,,Sometimes,,Often,Sometimes,,,,,,,,Most of the time,,,,,Often,,Sometimes,Often,,Most of the time,,,,Most of the time,,,,10,30,10,40,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,Sometimes,,,Often,Sometimes,,,Most of the time,,Sometimes,,,,Often,,Often,Most of the time,,Often,Sometimes,,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Git,Subversion",Never,78000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Ukraine,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed part-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,17,"Not employed, but looking for work",,,,,,,,TensorFlow,Decision Trees,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Other,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Computer Vision,Reinforcement learning",,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Not important,Not important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important,Not important,Somewhat important +Male,United States,51,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,University courses,20,0,60,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Programmer,Fine,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Random Forests,Python,GitHub,"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,QlikView,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Sometimes,Most of the time,,,,,,,Often,,,,Often,,Often,,Sometimes,,,,Rarely,,,,,,Often,,,,Most of the time,Rarely,,,,,,,,,Sometimes,Most of the time,,,Often,Most of the time,,Most of the time,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",Most of the time,Often,,,,,Most of the time,Most of the time,,,,,,,,Often,Sometimes,Often,Rarely,,Sometimes,Most of the time,Often,,,,Rarely,,Most of the time,Most of the time,,,,45,25,0,5,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Unavailability of/difficult access to data,Other",Most of the time,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,Most of the time,Most of the time,10-25% of projects,More internal than external,Business Department,Price lists from multiple vendors ,Access,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,S3; Wordpress,"Bitbucket,Git,Other",Sometimes,180000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,38,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,10,80,10,0,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Engineer,Poorly,Employed by non-profit or NGO,TensorFlow,Deep learning,Python,Google Search,"College/University,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Data Analyst,Engineer,Researcher",University courses,0,100,0,0,0,0,Computer Vision,Neural Networks - RNNs,,Government,100 to 499 employees,Stayed the same,More than 10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Never,,,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,40,20,10,20,10,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,,Often,,,,Often,,,,Often,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Rarely,12000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Spain,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,DataCamp,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",Self-taught,70,0,0,30,0,0,,"Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Female,Switzerland,NA,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Deep learning,,"Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,,,,,"Data Stories Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,65,0,0,15,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,"R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Often,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,50,0,5,10,35,0,,"Dirty data,Explaining data science to others",,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,76-99% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Other,Poorly,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),< 1 year,Unnecessary,Unnecessary,Necessary,,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Software Developer/Software Engineer,Other",Self-taught,0,100,0,0,0,0,Other (please specify; separate by semi-colon),Neural Networks - CNNs,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important +"Non-binary, genderqueer, or gender non-conforming",United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Predictive Modeler,Self-taught,30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,TensorFlow,Unix shell / awk",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Most of the time,Often,Most of the time,,,,,,Rarely,,Rarely,,,,"A/B Testing,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",Often,,,,,Most of the time,,Often,,,,,,Often,,Often,,,,,Most of the time,,,,,,,,,Often,,,,30,15,0,20,35,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Sometimes,,,,,,,,Most of the time,,,,,,Often,,,76-99% of projects,Entirely internal,Standalone Team,Google maps GIS data,"Usually one of: too small or too many missing values to answer the question being asked, or so large and messy that getting it into a workable format takes up much of the time.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,30000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Official documentation,Online courses,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,"Researcher,Other",Work,50,20,20,0,10,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Academic,"5,000 to 9,999 employees",Stayed the same,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data,Other",,1GB,"Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,,,,Often,Most of the time,Sometimes,Often,,Sometimes,,,Most of the time,,Often,,,Sometimes,Often,,,Often,,,,,,Sometimes,,,,,40,25,0,25,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Privacy issues",,,,,Often,,,,,,,,,,,,Sometimes,,,,,,51-75% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Never,105000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,Other,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,Software Developer/Software Engineer,Other,25,25,25,25,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A master's degree,Academic,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Other,,100MB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Neural Networks,Segmentation",,,,Most of the time,,,Often,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,10,80,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data,Other",Often,,,,,,,,,,,,,,,Often,,,,,Most of the time,Most of the time,Less than 10% of projects,Entirely internal,Other,None,Lack of labeling.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Company servers,Git,Always,"92,000",USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,SQL,Social Network Analysis,Python,GitHub,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),1 to 2 years,Programmer,Work,40,20,40,0,0,0,,,A master's degree,,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,1GB,,"Amazon Web services,MATLAB/Octave,NoSQL,Python,SQL",,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Natural Language Processing,Simulation",Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,600,EUR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,60,0,0,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100TB,Other,"Java,Python,R,SQL,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,Most of the time,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,10,0,30,60,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Has increased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Female,Sweden,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,Other,University courses,0,20,30,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,SQL",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Logistic Regression,Segmentation,Simulation",Sometimes,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,Sometimes,Often,,,,,,,20,10,30,10,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,50000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Other,46,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",QlikView,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Kaggle,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Operations Research Practitioner",Kaggle competitions,50,0,0,0,50,0,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Most of the time,10GB,"Regression/Logistic Regression,Other","Microsoft Excel Data Mining,QlikView,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,Rarely,,,,,,,,,Rarely,,,,,,,,,Most of the time,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,20,35,5,10,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Often,Sometimes,,Often,,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,25000,MAD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Colombia,31,Employed full-time,,,Yes,,Engineer,,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Scala,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Self-taught,NA,50,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","IBM Cognos,IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,R",,,,,,,,,,Rarely,Sometimes,,Often,,,,Often,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Natural Language Processing,Neural Networks,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,,,,,,Most of the time,Sometimes,,Often,,,,,,,Often,Most of the time,,,,70,10,10,10,0,0,Enough to run the code / standard library,"Dirty data,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,10-25% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,20000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,55,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Government,"1,000 to 4,999 employees",Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,10GB,"Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R,SQL",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,100% of projects,More internal than external,Central Insights Team,Most demographic and socio -economic open source data,Data Sharing and Dirty/incomplete data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,42000,GBP,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Company internal community,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,Very useful,,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A master's degree,Financial,100 to 499 employees,Stayed the same,More than 10 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,SQL",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Most of the time,Often,Often,,,,Sometimes,,,,Often,,,,,,,Often,,,,,,,Often,,,,40,10,10,40,0,0,"Enough to code it again from scratch, albeit it may run slowly",Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,Data Scientist,University courses,10,20,10,60,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,"Data Analyst,Data Scientist,Statistician",Work,0,0,70,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,Retail,"1,000 to 4,999 employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,Cloudera,IBM SPSS Modeler,NoSQL,Python,R,SAS Base,SQL,Other",,Sometimes,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,Often,,,,Often,,,,,,,Sometimes,,,"Collaborative Filtering,Decision Trees,Logistic Regression,Naive Bayes,Text Analytics",,,,,Rarely,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,,,,Sometimes,,,,,50,30,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,Computer Scientist,University courses,15,0,25,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Rarely,,,,Sometimes,,,,,Rarely,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,Rarely,Most of the time,,,,,,,,Sometimes,Often,,,,Often,,Most of the time,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,5,10,10,5,30,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,Often,,Most of the time,,Most of the time,Most of the time,Most of the time,,26-50% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Social Network Analysis,Python,Google Search,"Blogs,Online courses,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,,,,Very useful,KDnuggets Blog,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,40,60,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Mexico,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,33,33,34,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Other (please specify; separate by semi-colon),A master's degree,Internet-based,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Neural Networks,Random Forests","Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,Most of the time,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,Rarely,Sometimes,,Most of the time,Most of the time,Most of the time,,,,,,,Often,,,,Often,Often,Often,Often,,Sometimes,Most of the time,,,,,Often,Often,,,,30,20,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,,,Sometimes,,,,,,Often,,,,,,Often,,,10-25% of projects,Entirely internal,IT Department,,Cleaning and business rules interpretation,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Other",Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,Official documentation,,,,,,,,,,Somewhat useful,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Other,Self-taught,30,30,0,40,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning",,A master's degree,CRM/Marketing,"5,000 to 9,999 employees",,,,Very important,,,,,,,"NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Text Analytics",,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,None,,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Brazil,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by company that makes advanced analytic software,Mathematica,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses",,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Programmer,University courses,10,80,0,10,0,0,Other (please specify; separate by semi-colon),Evolutionary Approaches,A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,United States,58,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,"Data Elixir Newsletter,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Other,20,5,30,0,5,40,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",,,,,,,,,,,,,,Very useful,Very useful,Very useful,,Very useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,10,20,60,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Often,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Sometimes,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,PCA and Dimensionality Reduction",,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,60,10,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Often,,,,,,Sometimes,,Often,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,115000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,Unnecessary,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,75,0,5,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,32,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,50,20,0,30,0,0,"Machine Translation,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,22,"Independent contractor, freelancer, or self-employed",,,No,Yes,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",MATLAB/Octave,Time Series Analysis,C/C++/C#,I collect my own data (e.g. web-scraping),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,,Other,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,1 to 2 years,"Data Miner,Other",Self-taught,80,0,0,20,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition","Decision Trees - Gradient Boosted Machines,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,,,,,,,,,,,,,,, +Male,Colombia,34,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,Business Analyst,University courses,20,30,0,50,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Decreased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",Relational data,Sometimes,,"Random Forests,SVMs","Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,"Data Visualization,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,Sometimes,,Often,,,,40,30,0,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,10-25% of projects,More internal than external,Other,Weather ,Quality ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,Portugal,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,0,20,50,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,Most of the time,,Sometimes,,Most of the time,,,,,Most of the time,,,Often,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Decision Trees,Gradient Boosted Machines,Random Forests,Recommender Systems",Most of the time,,,,Most of the time,,,Often,,,,Often,,,,,,,,,,,Often,Most of the time,,,,,,,,,,50,20,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Often,,,,,,,,Often,,,,,Often,Most of the time,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Poland,26,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search","Arxiv,Blogs,Personal Projects,Textbook",Somewhat useful,Very useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Researcher,Software Developer/Software Engineer",Work,50,10,20,20,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,Fewer than 10 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Most of the time,100GB,"CNNs,Neural Networks","Amazon Machine Learning,C/C++,Google Cloud Compute,Java,Python,TensorFlow,Unix shell / awk",Rarely,,,Rarely,,,,Often,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Ensemble Methods,Neural Networks",,,,Most of the time,,Often,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,50,20,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",Often,Sometimes,,,,,,,Sometimes,Often,Sometimes,Sometimes,,,,Sometimes,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Always,"140,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Canada,30,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,KDnuggets Blog,1-2 years,Necessary,Necessary,,Nice to have,Necessary,Necessary,Necessary,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,1 to 2 years,Business Analyst,University courses,0,0,0,100,0,0,Natural Language Processing,Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,44,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,Government website,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Don't know,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Often,Often,,,,,,,Often,,Often,Sometimes,Sometimes,Most of the time,,Often,,,Sometimes,,Often,,Sometimes,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Sometimes,140000,USD,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Company internal community,Kaggle,Online courses,Personal Projects",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,20,60,0,10,0,10,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Rarely,1TB,"Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,KNIME (free version),Minitab,Orange,Python,R,SQL,Tableau",,,,Often,,,,,,,Most of the time,Often,,,,,,,Rarely,,,,,,,Rarely,,,Rarely,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"Decision Trees,Logistic Regression,Neural Networks,Random Forests,Segmentation",,,,,,,,Most of the time,,,,,,,,Sometimes,,,,Sometimes,,,Often,,,Sometimes,,,,,,,,70,15,2,10,3,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Most of the time,Often,,,Sometimes,,,,Often,,,Sometimes,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,weather,Management buy in.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +,Denmark,23,Employed part-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Textbook",,,Very useful,,,,,,,,,,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,No,Master's degree,A health science,1 to 2 years,Other,University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Personal Projects,Textbook",,Very useful,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Data Scientist,Researcher",Work,20,20,50,0,10,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Always,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Sometimes,Often,Often,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,Most of the time,,,,50,5,15,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",,Sometimes,,,Most of the time,Often,,,,,,,,Often,Sometimes,,,,,,,,100% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Rarely,41000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Russia,33,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Newsletters,Online courses,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,A social science,,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Colombia,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Online courses,Personal Projects,Textbook",,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,6-10 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Enterprise Miner,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Rarely,,Most of the time,,,,,,Sometimes,,Sometimes,Most of the time,,,,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics",,Sometimes,Sometimes,,Sometimes,Often,Most of the time,Often,Often,,,,,Often,Sometimes,Often,,Often,Sometimes,,Often,Most of the time,Often,Sometimes,,Most of the time,,,Sometimes,,,,,30,20,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Sometimes,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,Often,,76-99% of projects,More internal than external,Standalone Team,web available data;thirdparty providers,Lack of documentation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,150000000,COP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Business Analyst,Researcher,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,Not Useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,65,15,10,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Other,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Rarely,100GB,"CNNs,Neural Networks,Random Forests","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,Most of the time,,Most of the time,Often,,,,,,,,,,,,,Most of the time,Often,,Most of the time,,,Often,,,,,,,,40,40,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,26-50% of projects,More internal than external,Other,"BraTS,ProstateX",Images are heavy and time consuming to train models.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Friends network,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Somewhat useful,,,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Other,University courses,15,20,10,55,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,C/C++,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle",,,Somewhat useful,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Scientist,Kaggle competitions,0,0,20,20,60,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Text data,Relational data,Other",Sometimes,100MB,"CNNs,Gradient Boosted Machines,Random Forests,RNNs,SVMs","C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation",Often,,,,Often,,,,,,,,,Often,,,,,Sometimes,Sometimes,Often,,,,Sometimes,Often,,,,,,,,40,20,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Sometimes,,,Most of the time,,Sometimes,,,Most of the time,Sometimes,Often,,Often,,Sometimes,Often,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,50000,USD,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",High school,Other,20 to 99 employees,Stayed the same,Less than one year,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,,"Bayesian Techniques,Regression/Logistic Regression","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,30,0,10,10,0,Enough to run the code / standard library,"Lack of significant domain expert input,Other",,,,,,,,,,,,,,,,,,,,,,Sometimes,Less than 10% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,12000,,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Finland,45,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Scientist",University courses,40,0,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,CRM/Marketing,"1,000 to 4,999 employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Tableau,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,,,Very useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,15,10,50,10,0,Time Series,"Bayesian Techniques,Ensemble Methods",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,Some other way,Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data",Most of the time,10TB,"Bayesian Techniques,Ensemble Methods","C/C++,MATLAB/Octave,Minitab,Python,R,Other",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,Most of the time,,,"A/B Testing,Data Visualization,Ensemble Methods,Simulation,Time Series Analysis",Rarely,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,0,50,15,20,15,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Subversion",,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Personal Projects,Textbook",,Somewhat useful,Somewhat useful,,,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,Natural Language Processing,"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,6 to 10 years,"Business Analyst,Data Analyst,Researcher,Software Developer/Software Engineer",Self-taught,80,5,5,10,0,0,Natural Language Processing,Support Vector Machines (SVMs),A professional degree,Academic,I don't know,Increased slightly,Don't know,Some other way,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Google Cloud Compute,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,10,40,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),High school,Mix of fields,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,SVMs,Text Analytics,Time Series Analysis",Often,,Sometimes,,,,Most of the time,Often,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,Sometimes,Most of the time,,,,30,45,5,20,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Other",,Git,Always,"90,000",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,R,Support Vector Machines (SVM),Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Other,University courses,20,5,50,25,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Academic,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Traditional Workstation",Image data,Rarely,100MB,Regression/Logistic Regression,"C/C++,MATLAB/Octave,Python,R",,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Simulation",Often,,Often,,,Often,Most of the time,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,20,50,0,10,20,0,Enough to refine and innovate on the algorithm,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,,,,Often,,100% of projects,Approximately half internal and half external,Other,Clinical Data from Hospital,Computation time for simulations,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"88,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Mathematica,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Kaggle",Very useful,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,PhD,Yes,Master's degree,Electrical Engineering,,"Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Italy,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Cloudera,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,0,5,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods",Primary/elementary school,Military/Security,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,SVMs","C/C++,Java,Python,Other",,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,,,"Decision Trees,Ensemble Methods,Naive Bayes,Random Forests",,,,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,,,34,33,33,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,IT Department,,reduce false positive rate,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Other,Never,30000,EUR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,20,30,15,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Video data,Text data,Relational data",Sometimes,1MB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,Often,,Sometimes,,Often,,,,,Often,Most of the time,,Sometimes,,,,,Rarely,Rarely,Rarely,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,Sometimes,Often,Often,Often,,,,Often,,,,Sometimes,Sometimes,,Sometimes,Rarely,Sometimes,,Sometimes,,Sometimes,,,Often,Often,Often,,,,40,15,10,25,10,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Engineer",Self-taught,20,80,0,0,0,0,,,A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,,,SVMs,"Amazon Machine Learning,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",Rarely,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,Rarely,,,Sometimes,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,80,10,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization",,,,,Sometimes,,,,Often,,,,,,,,,,,,,,26-50% of projects,Do not know,Business Department,"Census,indeed",Fitting it together with internal data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Never,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,34,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Friends network,Personal Projects,Podcasts,Textbook,Other",Very useful,Very useful,,,Very useful,Somewhat useful,,,,,,Very useful,Somewhat useful,,Somewhat useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Researcher,Other",Self-taught,70,20,0,10,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Italy,28,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"FlowingData Blog,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,30,10,0,50,10,0,"Adversarial Learning,Recommendation Engines,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,42,Employed full-time,,,No,Yes,Other,Perfectly,Employed by government,R,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,Very useful,,,Very useful,Data Elixir Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",7,90,0,0,3,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Finland,26,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Other,Neural Nets,R,I collect my own data (e.g. web-scraping),"College/University,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data,Relational data",,,,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Prescriptive Modeling,Simulation",Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,,Often,,,,,Often,,,,,,,10,30,20,10,30,0,Enough to tune the parameters properly,"Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,Most of the time,,,,,,,,,Often,Sometimes,,100% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,"30,000",EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Chile,28,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Stack Overflow Q&A",Very useful,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,15,15,20,40,10,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,1GB,"Neural Networks,SVMs","Amazon Web services,Cloudera,Jupyter notebooks,NoSQL,Python,SQL,Other",,Most of the time,,,Rarely,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Rarely,,,"CNNs,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,SVMs",,,,Sometimes,,,,,,,,,,,,Often,,,Most of the time,Sometimes,,,,Often,Sometimes,,,Sometimes,,,,,,30,10,25,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,,,,Often,Rarely,,Sometimes,,Often,,Most of the time,Sometimes,,,Often,,,51-75% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,6000000,CLP,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Ireland,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,"Data Analyst,Data Miner,Predictive Modeler,Statistician",University courses,20,10,20,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Relational data,Never,100GB,Decision Trees,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,30,Employed part-time,,,No,Yes,Other,Fine,Employed by college or university,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,College/University,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Somewhat useful,,,Not Useful,Very useful,,Not Useful,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Other",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,5,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important +Female,India,24,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Blogs,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Very useful,Very useful,,,Very useful,,,,,,< 1 year,,,,,,,,,,,,,,Udacity,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,75,0,5,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Brazil,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,Computer Scientist,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Necessary,,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",9,90,0,0,1,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,United States,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Statistician",Self-taught,30,20,0,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,31,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"GitHub,Google Search","Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Very useful,,,,,,Very useful,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher",Self-taught,50,20,10,20,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Video data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Java,KNIME (free version),MATLAB/Octave,Python,R,RapidMiner (free version),SQL,Tableau",Sometimes,,,,,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,Most of the time,,,,,,,Most of the time,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs",,Sometimes,Often,,,Often,,,,,,,,,,,,Often,Most of the time,Often,,,Often,Often,,,,Often,,,,,,30,20,20,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,Sometimes,,,,,,Most of the time,,,,Often,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Hungary,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,DataRobot,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,Very useful,,Very useful,,,,Very useful,,,Very useful,,,,,"FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,70,10,0,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"1,000 to 4,999 employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Most of the time,10GB,"CNNs,Neural Networks,SVMs","Amazon Web services,C/C++,Perl,Python",,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Neural Networks,SVMs",Often,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,75,10,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,Most of the time,,,,,,,10-25% of projects,More internal than external,IT Department,,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",Never,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,,SQL,Google Search,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Engineer,Programmer",Self-taught,90,10,0,0,0,0,,,A doctoral degree,Hospitality/Entertainment/Sports,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Always,100GB,Other,"Hadoop/Hive/Pig,IBM Cognos,Java,NoSQL,SQL",,,,,,,,,Sometimes,Sometimes,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Segmentation",Sometimes,Sometimes,,,Sometimes,Sometimes,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,25,20,5,20,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,10-25% of projects,Entirely internal,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,130000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Russia,31,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,Coursera,,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",15,40,0,5,40,0,,,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Not important +Male,United Kingdom,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,Very useful,,,,,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,6 to 10 years,,Work,30,0,30,30,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Financial,20 to 99 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Sometimes,,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","C/C++,Python,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Sometimes,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,Often,,,Sometimes,Most of the time,,Sometimes,,Rarely,Sometimes,,,,,,,,Rarely,Sometimes,,,,Rarely,,,,,Most of the time,,,,10,30,30,30,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Other,,,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,Other,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Canada,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Proprietary Algorithms,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,10,10,25,45,10,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,100 to 499 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,1GB,"CNNs,Neural Networks,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,"CNNs,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,RNNs,SVMs",,,,Often,,,Sometimes,,,,,,,Rarely,,,,,Often,Often,,,,,Often,,,Often,,,,,,20,20,45,5,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data,Other",,,,,,,,,Often,,,Sometimes,,,,,Sometimes,,,,Sometimes,Often,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,"70,000",,Other,7,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,Very useful,,Very useful,Very useful,,Very useful,,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,"Software Developer/Software Engineer,Other",Self-taught,80,10,0,0,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,Internet-based,,,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Google Cloud Compute,MATLAB/Octave,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,Often,,,,,,,,,,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,,Sometimes,,,,,,,,Sometimes,,Rarely,,Most of the time,,,Sometimes,,,,,,,Sometimes,,,,20,60,20,0,0,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,,,,Often,Often,Often,,,,,,,,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,,,,,8,,,,,,,,,,,,,,,,,, +Male,Canada,38,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Engineer,Machine Learning Engineer,Researcher,Statistician",University courses,10,20,20,40,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data,Other",Sometimes,1MB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",,,,Often,,Most of the time,,,,,,,,Most of the time,,Sometimes,,,Often,Often,Often,,,,Often,,,,Often,,,,,20,30,10,10,30,0,Enough to run the code / standard library,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Conferences,Friends network,Kaggle,Stack Overflow Q&A",,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,RapidMiner (free version),Spark / MLlib",,Sometimes,,,Often,,Often,,Often,,,,,Often,,,Sometimes,,,,,,,,,,,,,,Rarely,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,Rarely,,,,Often,Sometimes,Sometimes,,,,,Sometimes,,Often,,Often,Most of the time,,,,Sometimes,Sometimes,,,,,Most of the time,Often,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Often,,,,Rarely,,Often,,Sometimes,,,,Often,,,Often,Often,,26-50% of projects,More internal than external,Central Insights Team,Depends upon problem domains,No clear interface spec,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,Somewhat useful,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,"Data Analyst,Researcher",Work,50,5,40,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Internet-based,10 to 19 employees,Decreased slightly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics",,,Rarely,,,,Often,Sometimes,,,,,,Often,,,,,,,Often,,Sometimes,,,Rarely,Rarely,Sometimes,Sometimes,,,,,20,40,10,0,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Often,,,Most of the time,Sometimes,,Most of the time,,,Sometimes,,,,,,Most of the time,Often,,100% of projects,Entirely internal,Standalone Team,mobile device data; census; business databases,limited quantity of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Sometimes,"132,000",USD,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Predictive Modeler",Work,20,15,30,35,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,39,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,Nice to have,Nice to have,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,,Researcher,University courses,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,37,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,Researcher,Self-taught,25,45,25,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Data Stories Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,30,20,10,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",I prefer not to answer,,,,,,Somewhat important,,,,,,,"Jupyter notebooks,Python,Spark / MLlib",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"Data Visualization,Random Forests,Simulation",,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,26-50% of projects,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Italy,25,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer",University courses,80,0,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow",,Sometimes,,,,,,,Sometimes,,,,,Rarely,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Data Analyst,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,20,40,0,35,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Markov Logic Networks","Some college/university study, no bachelor's degree",Mix of fields,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,,"Decision Trees,SVMs","MATLAB/Octave,Perl,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Rarely,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,Often,,,,"A/B Testing,Recommender Systems,SVMs",Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,40,10,20,10,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,45,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,Yes,,Data Scientist,,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,Very useful,,,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Professional degree,,,Business Analyst,University courses,0,0,50,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Technology,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,1TB,"Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Angoss,IBM SPSS Modeler,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner",,,Sometimes,,,,,,,,Rarely,,Rarely,,,,Sometimes,,,,,,Often,,,,,,,,Often,,Most of the time,,Sometimes,,,Most of the time,Most of the time,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Sometimes,Rarely,Most of the time,,Sometimes,,,Sometimes,Most of the time,Most of the time,,,,,,Often,,,,,60,NA,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,26-50% of projects,More internal than external,Business Department,none,access,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,230000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Switzerland,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Friends network,Online courses,Personal Projects",,Very useful,,,,Very useful,,,,,Very useful,Very useful,,,,,,,"Data Stories Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher",University courses,20,20,0,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Insurance,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis",Sometimes,,,,,,,Sometimes,,,,,,Often,,Often,,,,,,,Sometimes,,,Often,,Sometimes,,Most of the time,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Often,,,Most of the time,,,Often,,,,,Often,,,Sometimes,Often,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,100000,CHF,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Data Machina Newsletter,FastML Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,20,10,20,30,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Text data,Rarely,100MB,"CNNs,GANs,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Often,,Often,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Often,,Often,Sometimes,Sometimes,Sometimes,,Sometimes,Often,,Sometimes,,Most of the time,,,Most of the time,Most of the time,Sometimes,,Often,,Often,,,Most of the time,Often,,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",Most of the time,25000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,Very useful,,,,,,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,6 to 10 years,"Data Analyst,Data Scientist,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Amazon Web services,Python,R,SQL,Stan,Tableau,Other",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,Often,,Sometimes,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",Often,,Sometimes,,Rarely,Sometimes,,Sometimes,Often,,,,,,,Often,,Rarely,,,,Often,,,,Often,,,Sometimes,Most of the time,,,,10,60,0,0,30,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input",,Sometimes,,,,,,Sometimes,Most of the time,,Often,,,,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,None,It's pretty clean and easy to ingest actually. No complaints.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,Ukraine,40,Employed full-time,,,Yes,,Programmer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Reinforcement learning,Neural Networks - CNNs,,Internet-based,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,10,0,5,5,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,<1MB,Bayesian Techniques,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,Rarely,,,Most of the time,,,,"Data Visualization,Logistic Regression,Naive Bayes,Time Series Analysis",,,,,,,Often,,,,,,,,,Often,,Rarely,,,,,,,,,,,,Rarely,,,,80,5,5,5,5,0,Enough to tune the parameters properly,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Most of the time,,,Sometimes,Most of the time,,,,,,,,,,,,Sometimes,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Other,Google Drive,"Bitbucket,Git",Never,38400,BRL,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,25,45,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Logistic Regression,A master's degree,Internet-based,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100GB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,Rarely,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Time Series Analysis",Often,,,,Most of the time,,Most of the time,,,,,,,Often,Often,Often,,,Sometimes,,Sometimes,,,Often,,,,,,Sometimes,,,,25,25,2,23,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,Sometimes,,,Often,,Most of the time,,,,,,Often,,,,,,Often,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,200000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Mexico,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos,Other",Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,,"GPU accelerated Workstation,Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Angoss,C/C++,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Spark / MLlib,SQL",Sometimes,Sometimes,Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,GANs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics",Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,80,10,10,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,Sometimes,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,,,Key-value store (e.g. Redis/Riak),"Company Developed Platform,Email",,"Bitbucket,Mercurial,Subversion",,50000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,50,10,0,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,51,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Statistician",Self-taught,90,5,0,5,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,GPU accelerated Workstation,Relational data,Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SAS JMP,Tableau",,,,,,,,,,,Rarely,Rarely,,,,,Sometimes,,,,Rarely,,,Often,,,,,,,Sometimes,,Most of the time,,,,,Rarely,,Most of the time,,,,,Sometimes,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,Sometimes,Often,Often,Often,Often,,,Often,,Often,Sometimes,Often,,Sometimes,,Sometimes,Often,,Often,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,20,40,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database",Often,,,,,,,,,,,Often,Sometimes,,Sometimes,,,Sometimes,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,42,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Jupyter notebooks,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Other,,,,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,,No,Bachelor's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Other,33,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Google Search,"Arxiv,Online courses",Very useful,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,FastML Blog",< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,Necessary,,,,,"Coursera,Udacity",GPU accelerated Workstation,11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,0,20,0,0,"Machine Translation,Natural Language Processing,Speech Recognition","Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Not important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle",Somewhat useful,Very useful,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Software Developer/Software Engineer,Other",Self-taught,65,5,10,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,,10MB,Regression/Logistic Regression,"Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,,,,Rarely,,Rarely,Often,Most of the time,,,,,Rarely,Most of the time,Rarely,,,,Rarely,Rarely,,Often,,,,10,20,0,30,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",Often,Sometimes,,,Often,Often,,Sometimes,,,,,,,,,,,,,Most of the time,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,145000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,,,,,,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,,Nice to have,,Nice to have,,,,Coursera,GPU accelerated Workstation,11 - 39 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,United States,68,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,30,10,0,20,0,Recommendation Engines,Markov Logic Networks,High school,Other,10 to 19 employees,Increased slightly,Less than one year,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Researcher,Software Developer/Software Engineer",Kaggle competitions,50,30,0,10,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Mix of fields,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,100GB,CNNs,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,Most of the time,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,50,10,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,Retired,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,20,10,20,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,6 to 10 years,"Engineer,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Important,Research that advances the state of the art of machine learning,Workstation + Cloud service,Image data,Always,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,,,,,Often,Often,Often,,,,,Sometimes,,Sometimes,,,,,Often,,Often,,,Often,Sometimes,,,,,,,50,20,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,,not enough time to try everything I'd like.,Key-value store (e.g. Redis/Riak),Share Drive/SharePoint,,"Bitbucket,Git",Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,NA,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,47,Employed full-time,,,No,Yes,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Julia,Deep learning,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Operations Research Practitioner",Self-taught,75,25,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM Watson / Waton Analytics,Julia,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,Often,,,Often,Often,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,Often,Often,,,,Rarely,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,Sometimes,Often,Often,,Rarely,Sometimes,Most of the time,Often,,Often,,,Often,Often,Sometimes,Often,Most of the time,,,,70,10,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Most of the time,,Most of the time,,,,,,,,,Often,,,,,,Often,Most of the time,,26-50% of projects,More internal than external,Central Insights Team,Macroeconomic data,Cleaning and reorganizing it so that it's fit for analysis,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Fine arts or performing arts,More than 10 years,"Business Analyst,Statistician",Work,20,10,70,0,0,0,Time Series,Logistic Regression,A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,More than 10 years,"Data Analyst,Data Miner,Programmer,Researcher,Statistician",Work,30,0,50,20,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A doctoral degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,,,"IBM SPSS Statistics,Microsoft SQL Server Data Mining,R,SQL,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,25,30,15,10,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,50,Employed full-time,,,Yes,,Other,Fine,Employed by government,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Government website,"Arxiv,Blogs,Official documentation,Online courses,Textbook",Very useful,Very useful,,,,,,,,Very useful,Very useful,,,,Very useful,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Engineer,Operations Research Practitioner,Other",Kaggle competitions,20,20,20,0,20,20,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,QlikView,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,Often,Often,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Rarely,Most of the time,Often,,Most of the time,,,Most of the time,,Often,,Often,,,,,Often,,Most of the time,Sometimes,,,,,Often,Sometimes,,,,10,10,10,10,10,50,Enough to explain the algorithm to someone non-technical,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Often,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Friends network,Online courses,Personal Projects,Textbook",Very useful,Very useful,,Very useful,Somewhat useful,Very useful,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,10,60,0,0,0,"Natural Language Processing,Recommendation Engines,Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Sometimes,1GB,"CNNs,Neural Networks,Regression/Logistic Regression","C/C++,Google Cloud Compute,Jupyter notebooks,Python,SQL,TensorFlow",,,,Most of the time,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,,Rarely,Sometimes,Sometimes,,,,,,,,Often,,Sometimes,,,Most of the time,Most of the time,Sometimes,,,Often,,,,,,,,,,25,30,35,5,5,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Unavailability of/difficult access to data",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,Less than 10% of projects,Entirely external,Standalone Team,Wikipedia; Stack Overflow; News feeds,Gathering and cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,"54,000",EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Bayesian Methods,SQL,GitHub,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,DataTau News Aggregator,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Data Scientist,University courses,0,0,0,100,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Cloudera,Hadoop/Hive/Pig,Impala,KNIME (commercial version),KNIME (free version),NoSQL,Python,R,SAS Base,Spark / MLlib",,,,,Often,,,,Often,,,,,Often,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,Often,,Often,,,,,Often,,,Often,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Naive Bayes,Random Forests,RNNs",Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,,,,,,80,20,0,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools",Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,,110000,BRL,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by government,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,0,55,0,40,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,"5,000 to 9,999 employees",Stayed the same,Don't know,A tech-specific job board,Important,,,"Text data,Relational data",,,,"Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,5,0,0,15,0,80,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",Most of the time,Often,,,Often,,,,Most of the time,,,,Sometimes,,,,,,,,Sometimes,,76-99% of projects,More internal than external,IT Department,,,,"Company Developed Platform,Share Drive/SharePoint",,Other,Never,1000000,UYU,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Researcher",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Oracle Data Mining/ Oracle R Enterprise,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Researcher,Self-taught,20,40,5,15,20,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Increased slightly,Less than one year,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Never,<1MB,,"C/C++,Java,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,SQL,TensorFlow",,,,Often,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Naive Bayes,Natural Language Processing",,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,40,10,10,30,10,0,Enough to explain the algorithm to someone non-technical,Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,6000,USD,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Germany,21,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,1 to 2 years,I haven't started working yet,University courses,20,15,5,50,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,45,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop or Workstation and local IT supported servers,Traditional Workstation",0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,"Friends network,Online courses,Personal Projects,YouTube Videos",,,,,,Somewhat useful,,,,,Somewhat useful,Very useful,,,,,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,10,10,0,0,Recommendation Engines,Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United Kingdom,64,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Other,Self-taught,40,10,50,0,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,10 to 19 employees,Increased slightly,More than 10 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Survival Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,,Very useful,Very useful,Very useful,Very useful,,,,Very useful,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",30,35,30,0,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,,10GB,"Neural Networks,Regression/Logistic Regression,RNNs","C/C++,MATLAB/Octave,Perl,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,CNNs,Naive Bayes,Neural Networks,RNNs,SVMs",,,Sometimes,Most of the time,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,Most of the time,,,,,,5,35,50,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Often,,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Mercurial,Subversion",Most of the time,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Other,Neural Nets,Python,GitHub,"Arxiv,Conferences,Official documentation,Personal Projects,Textbook,YouTube Videos",Very useful,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Work,40,0,50,10,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,100TB,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Python,TensorFlow",,Most of the time,,Sometimes,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Simulation",Most of the time,,Most of the time,,Most of the time,,Most of the time,Rarely,Sometimes,,,,,,,Most of the time,,Often,Most of the time,Most of the time,,,,Most of the time,Most of the time,,Sometimes,,,,,,,20,20,40,5,5,10,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,,,,Git,Rarely,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Web services,Social Network Analysis,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,6 to 10 years,"Data Scientist,Researcher",Self-taught,65,5,20,5,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Most of the time,100MB,"Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,R",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,Sometimes,,Sometimes,,,Often,Often,Most of the time,,,Often,,,,,,Rarely,Most of the time,Often,Sometimes,,Sometimes,Sometimes,,,,Most of the time,,,,,,40,25,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations of tools,Unavailability of/difficult access to data",,,,Sometimes,Often,Sometimes,,,,,,,Often,,,,,,,,Sometimes,,51-75% of projects,More internal than external,Other,20 newsgroups; craigslist jobs; twitter data,Tuning vocabulary and word vector parameters,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Other",Slack,Git,Sometimes,"72,000",USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Ireland,27,"Not employed, but looking for work",,,,,,,,Java,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Statistician",University courses,0,30,0,70,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",11-15,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Russia,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Researcher,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Other",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,"Business Analyst,Researcher,Other",Self-taught,40,40,10,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Google Cloud Compute,MATLAB/Octave,Python,R,SAS Enterprise Miner,SAS JMP,Tableau",,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Often,,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Segmentation,Simulation",,,Sometimes,,,,Often,Often,,,,,,,,Often,,,,Sometimes,,,Often,,,Often,Sometimes,,,,,,,10,40,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,,,,,,,,,,Often,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Share Drive/SharePoint,Other",Google Drive/Dropbox,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"120,000",USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,25,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Other,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Decision Trees,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",Rarely,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,Often,,,,,Often,,,Rarely,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Often,Most of the time,Sometimes,,,,,,,,,Often,,,,,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,,lack of support / understanding of the back end (how it's collected),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,67000,CAD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,30,10,10,30,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A professional degree,Military/Security,100 to 499 employees,Increased slightly,More than 10 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,45,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,,,,,Often,Most of the time,Often,,,,,,Often,Rarely,Sometimes,,,,,Often,,Often,,,Most of the time,,,,Often,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Often,Most of the time,,,,,,,,,Most of the time,,Often,,,51-75% of projects,Entirely internal,Standalone Team,"country statistics, nielsen",operationalisation of findings,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,18,Employed full-time,,,No,Yes,Other,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Kaggle,Non-Kaggle online communities,Personal Projects",,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,,,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,A social science,1 to 2 years,I haven't started working yet,Self-taught,80,15,0,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,Poland,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,Researcher,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Personal Projects,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Other",Self-taught,90,5,0,5,0,0,Unsupervised Learning,Logistic Regression,A bachelor's degree,CRM/Marketing,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,PCA and Dimensionality Reduction,Segmentation",,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Most of the time,,Often,,,,Often,,,,Sometimes,Most of the time,,,26-50% of projects,Approximately half internal and half external,Standalone Team,Census;PRIZM,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,80000,CAD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Mexico,33,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,R,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Official documentation",,,Very useful,,,,,,,Very useful,,,,,,,,,"Data Machina Newsletter,Data Stories Podcast,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,Yes,Professional degree,,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Other,37,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Anomaly Detection,Matlab,I collect my own data (e.g. web-scraping),"Conferences,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,O'Reilly Data Newsletter,5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",40+,PhD,No,Doctoral degree,Computer Science,,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Brazil,36,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Increased slightly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Sometimes,,Most of the time,,,,Most of the time,,Most of the time,,Rarely,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Often,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation,SVMs",,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,20,20,10,20,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,Often,,,,,,,Most of the time,,,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Other",Most of the time,120000,BRL,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Researcher,,Employed by college or university,,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,College/University,Textbook,YouTube Videos",Very useful,,Very useful,,,,,,,,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,20,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,Often,Often,,Most of the time,Most of the time,,Often,,,Sometimes,,,,,,,,Often,,,Sometimes,,Sometimes,,Often,Sometimes,,Often,,,,15,55,5,15,10,0,Enough to refine and innovate on the algorithm,"Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,Sometimes,,,,,Sometimes,,,Often,,,,26-50% of projects,Do not know,Standalone Team,"CIFAR, ImageNet",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,33000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Iran,31,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Matlab,"Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,20,10,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Mix of fields,Fewer than 10 employees,,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",,100MB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks","C/C++,Java,Julia,MATLAB/Octave,Microsoft Excel Data Mining,Python",,,,Sometimes,,,,,,,,,,,Sometimes,Rarely,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Evolutionary Approaches,Logistic Regression,Neural Networks,Recommender Systems,Segmentation,SVMs",,,Often,,,,Most of the time,,,Most of the time,,,,,,Often,,,,Most of the time,,,,Sometimes,,Sometimes,,Sometimes,,,,,,20,10,20,20,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,,,,,,,Sometimes,,,,,,Often,,51-75% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Ukraine,37,Employed full-time,,,No,Yes,Data Miner,Poorly,Employed by a company that doesn't perform advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,,Less than a year,"Business Analyst,Data Analyst,Data Miner,Researcher",Self-taught,50,20,20,0,10,0,"Adversarial Learning,Reinforcement learning,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Female,United States,36,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Textbook",,,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,Other,University courses,33,0,20,47,0,0,Time Series,"Evolutionary Approaches,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,,,"Evolutionary Approaches,Regression/Logistic Regression","C/C++,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,Sometimes,Sometimes,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,"Data Visualization,Evolutionary Approaches,Logistic Regression,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,,Often,,,Often,,,,,,Sometimes,,,,,,Often,,,,,Sometimes,,,Sometimes,,,,10,20,40,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,Often,,,,,Sometimes,Often,,,,,,,,,,,,51-75% of projects,Do not know,Other,HomeHealthCompare;census data;ArcGIS data,missing data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",cloud services,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,140000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Sweden,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Deep learning,R,Google Search,"Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,More than 10 years,Researcher,Self-taught,90,0,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data,Relational data,Other",Sometimes,1GB,"Neural Networks,Random Forests,SVMs","Amazon Machine Learning,Amazon Web services,R",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Random Forests,Simulation,SVMs",,,,Rarely,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,Sometimes,Often,,,,,,40,10,20,30,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Need to coordinate with IT",,,,,Often,,,,,,,,Often,,Sometimes,,,,,,,,100% of projects,Entirely internal,Other,,Getting correct and structured meta data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,120000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Scientist,,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Very useful,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",Work,10,10,70,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,R,SAS Base",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,Often,,,,,,,,,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,Often,,Often,,,,,,,,,Sometimes,,,26-50% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),"Company Developed Platform,Share Drive/SharePoint",,,Rarely,80000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Norway,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,Very useful,,,Very useful,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Business Analyst,University courses,60,0,10,0,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Financial,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,SVMs,Other","NoSQL,Python,R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,Often,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Sometimes,Rarely,,,Sometimes,,,,Most of the time,,,,,Rarely,,Sometimes,,,,,,,,,,,60,5,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Often,,Often,Sometimes,Most of the time,Most of the time,,Sometimes,Often,,Sometimes,Most of the time,Most of the time,Sometimes,Often,,,,,Sometimes,Sometimes,Most of the time,76-99% of projects,More internal than external,Standalone Team,Macroeconomic data,Data cleaning ,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Rarely,"63,000",,I was not employed 3 years ago,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,Researcher,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Data Scientist,University courses,25,5,0,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Rarely,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Most of the time,,,,,,,,,,Rarely,,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,Sometimes,,Often,Often,Often,Often,,,,,,,,,,,Often,Often,,Often,,,Often,,,,,,,,60,10,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,,,,,,Often,,,,Often,,26-50% of projects,Entirely internal,Standalone Team,,getting it clean and understanding it,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Never,15000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,41,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by company that makes advanced analytic software,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Other",,,,,,,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,,Random Forests,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,75,0,0,25,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Sometimes,,,,,,,,Often,,,,,,,Often,,Sometimes,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,120000,USD,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,I don't plan on learning a new ML/DS method,R,Other,"Blogs,Conferences,Official documentation,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,,,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,University courses,25,0,0,75,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Java,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Time Series Analysis,Other",,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,Often,Sometimes,,,,,Most of the time,,,Often,Most of the time,,,65,5,0,20,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,100% of projects,Entirely internal,Other,How do you know I work with any propriety data?,Manual processes,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Sometimes,122000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,50,25,0,25,0,0,"Computer Vision,Reinforcement learning,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,6-10 years,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Spain,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,31,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Data Scientist,Programmer",Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,30,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,20,Employed part-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,No Free Hunch Blog,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Female,India,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,SQL,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Financial,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,10TB,Decision Trees,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Time Series Analysis",,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,35,10,15,15,0,25,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Subversion,Rarely,1400000,,Has increased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Programmer,Researcher",University courses,20,10,30,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Researcher,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,26,"Not employed, but looking for work",,,,,,,,Amazon Web services,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,10,20,0,0,Computer Vision,"Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,23,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,Other,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,Very useful,,,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,30,20,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Rarely,100MB,"CNNs,Decision Trees,Neural Networks,RNNs,SVMs","C/C++,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,,Often,,Sometimes,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,35,45,10,10,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,,,,,,Often,Most of the time,,,Less than 10% of projects,More internal than external,Standalone Team,,Ontology definition,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,18000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Switzerland,37,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher",University courses,40,10,20,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,High school,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Not very important,Other,Laptop or Workstation and private datacenters,Relational data,,10MB,Regression/Logistic Regression,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,40,5,20,15,20,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Never,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Doctoral degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Recommendation Engines,Neural Networks - CNNs,A bachelor's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Always,100GB,"CNNs,Neural Networks","C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks",,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,20,40,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,,,,,Often,,,Often,,,,,,Often,,,,,Less than 10% of projects,More internal than external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,"Git,Subversion",Rarely,60,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Finland,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,University courses,30,20,40,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Other,Never,100GB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,"Coursera,DataCamp,edX",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,60,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Python,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,20,0,30,0,0,,"Evolutionary Approaches,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Never,1GB,"Evolutionary Approaches,Regression/Logistic Regression","C/C++,Python,SAS JMP,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,Often,,,,,,,,,,"Data Visualization,Evolutionary Approaches,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Time Series Analysis",,,,,,,Often,,,Often,,,,,,,,,,,Sometimes,Often,,,,,Sometimes,,,Sometimes,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely external,IT Department,Federal Reserve;Stock prices; Economic indices,Getting data into useable form,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Russia,43,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,23,Employed full-time,,,Yes,,Programmer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,Less than a year,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Kaggle,Online courses,Textbook",Very useful,,,,,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,,"FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Researcher",Self-taught,20,30,40,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,Rarely,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,,,Sometimes,Most of the time,Most of the time,,Often,,,,,,,Sometimes,,Rarely,Often,Often,Sometimes,,Often,Often,,Most of the time,,,,Rarely,,,,35,30,10,10,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT",Sometimes,Sometimes,,,Often,,,,,,,,,,Often,,,,,,,,51-75% of projects,More external than internal,Business Department,macroeconomic;firm-level,inconsistent quality and shifting field definitions,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"170,000",USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Russia,24,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,,1-2 years,Nice to have,Necessary,Nice to have,,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Physics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",35,50,0,0,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,Very Important,,,Somewhat important,,,,,Somewhat important,,Somewhat important,, +Male,Canada,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,SQL,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,,1 to 2 years,I haven't started working yet,Self-taught,35,30,0,35,0,0,Unsupervised Learning,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,32,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,"Employed by a company that doesn't perform advanced analytics,Self-employed",Other,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,10-15 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,,Self-taught,97,NA,0,3,NA,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Belgium,34,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Decision Trees,Python,I collect my own data (e.g. web-scraping),"Blogs,Online courses,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Retail,"10,000 or more employees",Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,50,10,10,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",Sometimes,Sometimes,,,,,,,Most of the time,,,Most of the time,Often,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Business Department,N/A,spending time to gather the data + formalize it for future data analysis rounds,Other,"Email,Share Drive/SharePoint",,,Sometimes,56000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Decision Trees,Python,I collect my own data (e.g. web-scraping),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,"Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Physics,3 to 5 years,Researcher,Self-taught,25,10,0,65,0,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Markov Logic Networks,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Very Important,Very Important,Not important +Male,United States,60,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM SPSS Statistics,Time Series Analysis,Other,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Bayesian Techniques,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,,,,Very useful,Not Useful,Somewhat useful,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Other,Self-taught,50,0,50,0,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not at all important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Sometimes,,Sometimes,Often,Rarely,,Sometimes,Most of the time,,,Most of the time,Sometimes,Sometimes,Sometimes,Sometimes,,,,50,10,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,Sometimes,,,,,,,Most of the time,,51-75% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Other,SQL Database,Git,Never,185000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,High school,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",,,,"Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,,Most of the time,,Rarely,,,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,PCA and Dimensionality Reduction,Time Series Analysis",Sometimes,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,30,20,40,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Engineer",University courses,10,20,50,20,0,0,"Natural Language Processing,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100TB,,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Often,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",Most of the time,,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,25,20,10,15,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,31,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,Amazon Web services,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Researcher,Statistician",University courses,50,0,20,30,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Other",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",Sometimes,Often,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Often,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,,Often,Often,,Often,Often,Often,Often,,,,,,,,,Often,,Often,,,Often,Often,,,,Often,,Often,,,,20,10,20,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Scaling data science solution up to full database",,Often,,,,,,,Often,,,,,,,,,Often,,,,,100% of projects,Approximately half internal and half external,IT Department,Open database ,Limited access,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Other",,"Bitbucket,Git,Subversion",Sometimes,75000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Spain,40,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Physics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,60,0,0,40,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Not important +Male,United States,33,Employed part-time,,,Yes,,Statistician,Fine,Employed by college or university,R,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,Not Useful,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,Less than a year,"Data Analyst,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",15,75,0,0,10,0,Reinforcement learning,"Bayesian Techniques,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Traditional Workstation",Other,Always,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Minitab,Python,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Simulation",Often,Sometimes,Often,,,Sometimes,,Sometimes,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,70,10,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Kenya,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Other,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Textbook",,,Somewhat useful,,,,Very useful,,,,,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,6 to 10 years,"Data Miner,Researcher",Other,10,10,30,0,40,10,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,Fewer than 10 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Other,Rarely,1GB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Perl,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,Often,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,Sometimes,,,Often,,,,,,,,,,Often,,,,Sometimes,Most of the time,,Often,,,Sometimes,Sometimes,Rarely,,,,,,10,10,0,30,50,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,50000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,25,25,20,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Markov Logic Networks","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Business Analyst,Self-taught,70,10,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,500 to 999 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Ensemble Methods,Neural Networks,SVMs","Java,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Naive Bayes,PCA and Dimensionality Reduction,Time Series Analysis",Sometimes,,,,,Often,Often,,Often,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,Sometimes,,,,70,20,0,10,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,40,0,40,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Technology,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,10TB,"CNNs,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Python,R,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,RNNs",,,,Rarely,Sometimes,Sometimes,Most of the time,,,Rarely,,,,Sometimes,,Often,,,,Sometimes,,,Rarely,Sometimes,Sometimes,,,,,,,,,30,35,25,7,3,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Unavailability of/difficult access to data",,,,Often,Often,,,,,,,,,,,,,,,,Rarely,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",,,Other,Sometimes,,,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Data Analyst,Data Miner,DBA/Database Engineer",Self-taught,30,20,0,30,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Rarely,10TB,,"Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,Rarely,,,,,,"Cross-Validation,Data Visualization,Prescriptive Modeling,Text Analytics",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,50,0,10,10,30,0,Enough to tune the parameters properly,"I prefer not to say,Need to coordinate with IT",,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,Yes,,Statistician,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Company internal community,Conferences,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Statistician",University courses,40,10,50,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",I prefer not to answer,CRM/Marketing,20 to 99 employees,Decreased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","KNIME (free version),R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,Most of the time,Rarely,,,Most of the time,,,Often,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,Often,Sometimes,,,Often,Most of the time,Often,,,,,,Often,Sometimes,Sometimes,,,,Rarely,Sometimes,,,,,Often,,,,,,,,40,20,5,15,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Sometimes,,Often,Often,,,,,,Most of the time,,,,,Most of the time,Often,,76-99% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,90000,EUR,Has decreased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,Python,Google Search,"Arxiv,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,A tech-specific job board,Somewhat important,Other,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data",Sometimes,,"Decision Trees,Neural Networks,SVMs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression,Neural Networks,SVMs",,,,,,Most of the time,,,,,,,,,,Often,,,,Often,,,,,,,,Often,,,,,,50,40,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,I prefer not to say",,,,,,Often,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Official documentation,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1TB,"Bayesian Techniques,Neural Networks,SVMs","NoSQL,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Natural Language Processing,Neural Networks,Segmentation,SVMs,Text Analytics",,,Often,,,Most of the time,Most of the time,Often,Most of the time,,,,,,,,,Often,Most of the time,Sometimes,,,,,,Often,,Most of the time,Most of the time,,,,,80,5,5,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,Sometimes,Most of the time,Most of the time,Often,,,Often,,,,,Often,,,Most of the time,Most of the time,,,Most of the time,,26-50% of projects,More internal than external,Standalone Team,,security,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,160000,AUD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",20,40,20,NA,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Decreased slightly,1-2 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SAS Base,SQL,TensorFlow",,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,Sometimes,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,Sometimes,,,,,Often,Often,,,,,,,,Often,,Sometimes,Sometimes,Sometimes,Sometimes,,Most of the time,,,,,Most of the time,Often,,,,,50,20,NA,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues",,,,,,,,,Often,,,Often,Often,,,,Sometimes,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Company internal community,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Very useful,,,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,10 to 19 employees,Increased significantly,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Always,100MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,SVMs","Mathematica,Python,QlikView,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,Often,Sometimes,,,,,,,,,Often,,,Sometimes,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,Rarely,,,Rarely,Most of the time,Sometimes,,,,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,,Most of the time,,Often,Sometimes,Often,,,,10,40,5,15,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",Most of the time,Most of the time,,,Often,,,,Often,,,Often,Sometimes,Sometimes,,,,Often,,,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,85,BRL,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,"Data Analyst,Statistician",Self-taught,30,40,20,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Genetic & Evolutionary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Business Analyst,Data Analyst",Work,25,25,50,0,0,0,Time Series,Logistic Regression,A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,Often,,Often,,,,,,,,,Most of the time,,,Often,,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,Sometimes,Sometimes,,,,,,Sometimes,Often,,,,Rarely,Sometimes,,,,,,Sometimes,,Often,Most of the time,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,,Most of the time,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Mercurial,Sometimes,130000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,New Zealand,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Perl,Neural Nets,Python,"Government website,I collect my own data (e.g. web-scraping)","Arxiv,College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",Not Useful,,Not Useful,,,,Somewhat useful,,Very useful,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,Researcher,Statistician",Self-taught,40,0,0,50,10,0,"Computer Vision,Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Telecommunications,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100GB,Bayesian Techniques,"C/C++,Java,Mathematica,Microsoft Excel Data Mining,Python,R,SQL,Tableau,Other",,,,Often,,,,,,,,,,,Rarely,,,,,Rarely,,,Rarely,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,,Most of the time,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,90,0,0,10,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,,,,Rarely,,,,,,,,Often,,Rarely,,,,76-99% of projects,Entirely internal,Standalone Team,,Dirty Data,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Mercurial",Never,"80,000",NZD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",15,5,25,50,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A professional degree,Academic,,,,,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Most of the time,,"Bayesian Techniques,Regression/Logistic Regression","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Naive Bayes,Neural Networks,Simulation,Time Series Analysis",,,Often,,,,,,,,,,,,,Often,,Often,,Often,,,,,,,Often,,,Often,,,,15,35,0,10,40,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,32,"Not employed, but looking for work",,,,,,,,Python,Monte Carlo Methods,R,,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Canada,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,NoSQL,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",15,80,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A bachelor's degree,Telecommunications,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,40,10,10,20,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools",Often,,,,Most of the time,,,,Most of the time,,,,Often,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Other,40,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,More than 10 years,"Researcher,Statistician",University courses,90,0,0,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Pharmaceutical,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Relational data,Other",Most of the time,<1MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",Often,,Often,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,Sometimes,,Rarely,,Rarely,Often,,Sometimes,,,,Sometimes,Sometimes,,,,,,20,40,10,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT",,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,Sometimes,79000,GBP,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Poland,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",25,25,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Financial,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,,Regression/Logistic Regression,"R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,Sometimes,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Segmentation",,,,,,Rarely,Often,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,50,5,5,15,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,144000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,SQL,Neural Nets,SAS,,"Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,,University courses,5,10,15,70,0,0,"Natural Language Processing,Survival Analysis",Logistic Regression,,Non-profit,500 to 999 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Rarely,,,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,70,5,0,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,,,,,,,,,,70000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Australia,35,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,Retired,,,Yes,,Other,Fine,Employed by government,Python,Proprietary Algorithms,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,,Work,25,0,50,25,0,0,,Neural Networks - CNNs,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,More than 10 years,"Data Scientist,Other",University courses,80,15,5,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Mix of fields,20 to 99 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,Rarely,Rarely,,Rarely,,,,Sometimes,Sometimes,,,Most of the time,,Often,,,,,,,,,Often,,,,Rarely,,Sometimes,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,Sometimes,Most of the time,Most of the time,Often,Often,,,,,Often,Sometimes,Often,,Often,Sometimes,Rarely,Often,Often,Often,Rarely,,Most of the time,,,Sometimes,Often,,,,10,25,5,20,40,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,Often,,,,,,,Most of the time,,,Most of the time,Most of the time,,100% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,150000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,50,"Not employed, but looking for work",,,,,,,,Other,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,"FlowingData Blog,Linear Digressions Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Data Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,New Zealand,42,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important +Female,United States,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,Amazon Machine Learning,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,,,,Somewhat useful,,,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,I don't write code to analyze data,"Business Analyst,Computer Scientist,Data Analyst,Researcher,Other",Self-taught,60,30,10,0,0,0,Other (please specify; separate by semi-colon),,A master's degree,Mix of fields,,,,,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,,"Amazon Web services,Tableau,Other",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,"Data Visualization,Segmentation,Text Analytics",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,,5,23,23,25,24,0,Enough to run the code / standard library,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,100% of projects,Do not know,Other,various,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Sweden,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,10GB,"Gradient Boosted Machines,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,QlikView,R,Spark / MLlib,SQL",,,,,Most of the time,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Naive Bayes,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Sometimes,,,,Often,Often,Most of the time,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,,Sometimes,Often,,Often,,,,Often,,,,40,30,0,10,20,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,Often,,,,,,,,,,,,,,Most of the time,,,100% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Bitbucket,Git",Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,Very useful,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Workstation + Cloud service,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,,"Data Analyst,Data Scientist",Work,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",Work,15,15,40,20,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Bayesian Techniques,A doctoral degree,Academic,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data",Sometimes,100MB,Bayesian Techniques,"C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,Rarely,,Most of the time,,,,"Bayesian Techniques,CNNs,Logistic Regression,Neural Networks,Time Series Analysis",,,Sometimes,Rarely,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Often,,,,20,40,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Fine arts or performing arts,3 to 5 years,Other,Self-taught,40,40,20,0,0,0,,,A master's degree,CRM/Marketing,20 to 99 employees,Decreased significantly,1-2 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",Never,10GB,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Lift Analysis,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,90,0,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Most of the time,,,Often,,,Often,,,Often,,,,,,,,Most of the time,,,,None,Entirely internal,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,Work,50,20,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SAS Enterprise Miner,TensorFlow,Unix shell / awk",Often,Often,,Sometimes,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,Rarely,,,,,,,Often,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",Often,,,,,Often,Often,Often,Often,Often,,Often,,,,Often,,,,Often,Often,,Often,,,,,,,,,,,40,30,0,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input",Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,26-50% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,60000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,Anomaly Detection,C/C++/C#,University/Non-profit research group websites,"Arxiv,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,,,,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,I don't write code to analyze data,Researcher,University courses,30,0,0,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,500 to 999 employees,Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Most of the time,,,"C/C++,Python",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,60,20,0,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Unavailability of/difficult access to data",Sometimes,,,,,,,,Often,,,,,,,,,,,,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,Never,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,"Coursera,Udacity,Other",Basic laptop (Macbook),40+,Online Courses and Certifications,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Female,United States,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, but looking for work",,,,,,,,Flume,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Trade book,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,,,,Somewhat useful,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,DataTau News Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Traditional Workstation,11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,60,0,0,10,30,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Mexico,33,Employed part-time,,,Yes,,Other,Fine,Employed by government,Hadoop/Hive/Pig,Other,R,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,0,30,0,10,,Logistic Regression,A doctoral degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Other,GPU accelerated Workstation,"Image data,Text data,Relational data",Never,1TB,Other,"IBM SPSS Statistics,SQL",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,0,100,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,None,Do not know,Other,,,Other,Other,,Other,,,,,7,,,,,,,,,,,,,,,,,, +Female,Chile,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Non-Kaggle online communities",,Very useful,,,,,,,Very useful,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,Programmer,University courses,100,0,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Java,R,SQL,Tableau",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,Often,,,,Most of the time,,Often,Most of the time,,,,,,,Most of the time,,,Often,,,,Often,,,,,Often,Often,,,,,60,20,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Most of the time,Often,,,,Sometimes,,,,,Most of the time,,,10-25% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Netherlands,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Not Useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"DataTau News Aggregator,FlowingData Blog,No Free Hunch Blog",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer",Work,80,10,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Personal Projects",,Very useful,,,,,Very useful,,,Somewhat useful,,Very useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,70,0,0,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,500 to 999 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,Often,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,,,Often,Sometimes,,Sometimes,,,,,,,Often,,,,5,20,10,30,35,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,Most of the time,,Often,,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,100% of projects,Do not know,Standalone Team,kaggle;uci,Finding it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Shared network drives,"Git,Subversion",Never,,,,5,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Google Search,"Arxiv,Conferences,Online courses,YouTube Videos",Somewhat useful,,,,Very useful,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,More than 10 years,"Business Analyst,Data Scientist",Self-taught,50,30,20,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Reinforcement learning,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation",Relational data,Sometimes,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,SAS Base,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,Often,,,Rarely,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,Often,Most of the time,,,,,,,Often,,Most of the time,,,,Sometimes,Often,,Most of the time,,,Often,,Often,,,,,,45,20,10,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,Often,,,,,,,,,,Sometimes,,,,100% of projects,Entirely external,Standalone Team,Department of Energy;Energy Information Administration,Often dirty or inconsistent. Lots of adhoc Excel spreadsheets.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,,Never,96000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Other,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,Computer Scientist,Self-taught,0,0,0,100,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,,,,,Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Rarely,,Ensemble Methods,"C/C++,R,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Evolutionary Approaches,GANs,Logistic Regression,Naive Bayes,Recommender Systems,RNNs,Segmentation,SVMs",,,,,,,,,,Often,Often,,,,,Often,,Often,,,,,,Often,Often,Often,,Often,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Poorly,Self-employed,Spark / MLlib,Monte Carlo Methods,Haskell,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,Very useful,,,,,,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"Data Elixir Newsletter,Jack's Import AI Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,60,20,15,5,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Mix of fields,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Time Series Analysis",Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,Rarely,,,,,Often,,Rarely,,Often,Often,Often,Often,,Often,Sometimes,Rarely,,,Rarely,,Rarely,,,,35,35,15,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools",Often,,,Often,,,,,Often,,,Often,Often,,,,,,,,,,10-25% of projects,Approximately half internal and half external,,,,,,,,,130000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Business Analyst,Data Scientist",University courses,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",CRM/Marketing,20 to 99 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,25,25,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",,Sometimes,,,,,,Sometimes,Often,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,,Sometimes,70000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,TensorFlow,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"Arxiv,Kaggle,Textbook",Very useful,,,,,,Very useful,,,,,,,,Very useful,,,,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Female,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Hadoop/Hive/Pig,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,"Data Analyst,Researcher,Statistician,Other",Self-taught,97,3,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,10MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Java,Perl,Python,R,SAS JMP,SQL",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Rarely,,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Text Analytics,Time Series Analysis,Other",Often,Often,,Sometimes,,Often,Often,Sometimes,,,,,,Sometimes,,Most of the time,,,,Sometimes,Often,,Sometimes,,Sometimes,,,,Often,,Often,,,30,20,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Sometimes,,Sometimes,Most of the time,,Sometimes,Most of the time,,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,Often,Sometimes,,100% of projects,More internal than external,Standalone Team,CDC's National Intimate Partner & Sexual Violence Survey;US Census;Gallup Polls;fivethirtyeight.com's GitHub Repository,Streamlining and maintaining access to data stored on our secured server.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Most of the time,34000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,India,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,,Python,,"College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,,1-2 years,Necessary,,Necessary,,,,,,,,,,,,Basic laptop (Macbook),40+,Master's degree,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important +Male,Colombia,53,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,C/C++/C#,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Data Scientist,Self-taught,50,0,50,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,Primary/elementary school,Academic,500 to 999 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,10MB,Decision Trees,"Mathematica,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Decision Trees",,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,5,5,0,20,70,0,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,100% of projects,Entirely internal,Standalone Team,,Empower the effective use of data analysis in the organization,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,60000000,COP,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Africa,38,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos,Other",,Very useful,,,,,Very useful,Very useful,,,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,"Coursera,edX",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,70,0,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Other,Genetic & Evolutionary Algorithms,Python,"Government website,Other","Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,10,30,30,30,0,"Computer Vision,Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Other","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Python,TensorFlow",Sometimes,Most of the time,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,Often,Sometimes,Most of the time,Often,,Sometimes,,,,,,Rarely,Rarely,,,,Sometimes,,,Often,,Often,,,Often,,Sometimes,,,,70,15,7,8,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,Often,,,,,,,,Often,,,,,,Often,,,10-25% of projects,More external than internal,Standalone Team,Planet OSM; Google Streetview; US Census,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,"Bitbucket,Git",Sometimes,110000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Colombia,38,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by college or university,Python,Text Mining,R,Google Search,"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,KDnuggets Blog,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,"Coursera,edX",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,50,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,Data Analyst,Work,50,10,30,10,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",CRM/Marketing,10 to 19 employees,Stayed the same,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses",,,Somewhat useful,,,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",15,50,0,30,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Very Important,Very Important,,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Very Important +Male,Canada,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher",Work,25,0,60,10,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R,SAS JMP",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,63,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Very useful,Not Useful,,,,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Not Useful,,Somewhat useful,,,Not Useful,"Jack's Import AI Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Other,University courses,60,10,10,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Technology,"1,000 to 4,999 employees",Decreased significantly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,1PB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Impala,Jupyter notebooks,Python,R",,,,,,,,,,,,,,Rarely,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Naive Bayes,Random Forests,Text Analytics",Sometimes,,,,,Often,Often,,,,,,,,,,,Often,,,,,Often,,,,,,Often,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues",Often,,,,,,,,Often,,,,Sometimes,,Often,,Often,,,,,,26-50% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Subversion",Always,"152,000",USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Spain,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Researcher,Statistician",University courses,40,0,20,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,18,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,3 to 5 years,"Predictive Modeler,Programmer,Statistician",Self-taught,70,20,10,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,"Amazon Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Other",Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,"FastML Blog,KDnuggets Blog",3-5 years,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,,Doctoral degree,,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important +Male,Other,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Hadoop/Hive/Pig,Other,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Data Miner,Researcher,Software Developer/Software Engineer",Self-taught,30,20,20,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - RNNs,Other (please specify; separate by semi-colon)",High school,Academic,500 to 999 employees,,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",Text data,,100MB,"Neural Networks,Regression/Logistic Regression","C/C++,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,R,Tableau,TensorFlow",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,Sometimes,Often,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,,,"Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,RNNs",,,,,,,Most of the time,,,,,,,Often,,,,,,,Often,,,,Sometimes,,,,,,,,,20,30,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,,Often,,26-50% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,"150,000",LKR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,19,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Web services,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,Not Useful,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer",University courses,0,0,25,70,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Telecommunications,"5,000 to 9,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,Time Series Analysis",Sometimes,Sometimes,,,,,,,Often,,,,,Often,,Most of the time,,,,,,Most of the time,Often,Sometimes,,,,,,Often,,,,50,25,0,25,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Often,,,Most of the time,,,,Sometimes,,,,,,Most of the time,,Most of the time,,Sometimes,,Often,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,75000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Norway,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,Very useful,,,Not Useful,,1-2 years,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Physics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,31,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,KDnuggets Blog,1-2 years,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Electrical Engineering,1 to 2 years,Engineer,University courses,20,30,0,40,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important +Female,United States,26,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Somewhat useful,,,,,"FastML Blog,O'Reilly Data Newsletter",3-5 years,,Nice to have,Necessary,,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,0,0,30,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Online courses,Personal Projects,Textbook",,Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",75,20,0,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs",High school,Other,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Laptop or Workstation and private datacenters,"Image data,Text data",Rarely,100MB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Neural Networks,Simulation",,,Sometimes,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,,,,80,5,0,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools",,,,,Often,Sometimes,,,Often,,,,Sometimes,,,,,,,,,,100% of projects,Do not know,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,41,Employed full-time,,,No,Yes,Programmer,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Physics,1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,Time Series,"Neural Networks - CNNs,Other (please specify; separate by semi-colon)",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,40,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Other",Self-taught,70,30,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Financial,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Decision Trees,Naive Bayes,Random Forests,Time Series Analysis",,,Sometimes,,,,,Often,,,,,,,,,,Sometimes,,,,,Often,,,,,,,Most of the time,,,,10,10,20,10,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Privacy issues",Most of the time,,,,Sometimes,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,,,10-25% of projects,More internal than external,Business Department,market data,finding the data sources and documentation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Subversion,Sometimes,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Mexico,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,19,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,,,,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,"Data Elixir Newsletter,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,20,20,30,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,10TB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Sometimes,,,,,,Often,,,,"A/B Testing,Data Visualization,Decision Trees",Most of the time,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,50,5,10,5,30,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Sometimes,,Sometimes,,,,,,,,,,,,Sometimes,,,Often,,,51-75% of projects,Entirely internal,Business Department,No public datasets used,"Size, cleanliness ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Sometimes,110000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Other,Neural Nets,SQL,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,"O'Reilly Data Newsletter,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Statistician",Work,20,20,60,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Insurance,100 to 499 employees,Stayed the same,3-5 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100GB,Regression/Logistic Regression,"Mathematica,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,40,20,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,,,,,Most of the time,,,Often,,,,,,Most of the time,,,,26-50% of projects,More internal than external,Standalone Team,,cleaning it,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1800000,PKR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,57,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Uplift Modeling,SQL,"GitHub,Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Official documentation,Textbook,Trade book,YouTube Videos",,,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,,Very useful,Somewhat useful,,Somewhat useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Insurance,"5,000 to 9,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,QlikView,R,SQL",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,Often,,,,Most of the time,Sometimes,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,30,50,15,3,2,0,"Enough to code it again from scratch, albeit it may run slowly","Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Sometimes,,,Often,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,Yes,Master's degree,Electrical Engineering,3 to 5 years,Other,Self-taught,35,20,0,30,15,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,20+,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Australia,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Not Useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,1 to 2 years,Researcher,Self-taught,90,0,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Rarely,,,Most of the time,Most of the time,Sometimes,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,,,,Sometimes,,Most of the time,,,,,,,Sometimes,,,,15,30,0,50,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Sometimes,,,,,,Often,,,Often,,,Often,Sometimes,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,70000,,Has decreased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +,United States,35,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,DataRobot,Deep learning,R,Google Search,"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,30,30,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,Researcher,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Survival Analysis,SQL,,"Conferences,Stack Overflow Q&A",,,,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,25,25,25,0,25,0,,Logistic Regression,High school,Financial,500 to 999 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,,Regression/Logistic Regression,"Microsoft R Server (Formerly Revolution Analytics),R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,,Sometimes,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,10,10,0,10,60,10,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,,,,,,,,,,,Often,Often,,26-50% of projects,More internal than external,Business Department,Property; census,Dirty and difficult to access,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Subversion,Sometimes,170000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,,Self-taught,100,0,0,0,0,0,"Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Unix shell / awk,Neural Nets,Python,GitHub,"Blogs,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,Somewhat useful,,,,,,Very useful,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,20,30,40,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Retail,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,MATLAB/Octave,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,Often,,,,Often,,Sometimes,,,,,Sometimes,Rarely,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics",Often,,Often,,,,,Often,,,,,,Often,,Sometimes,,Often,Often,,Rarely,Most of the time,Often,,,Often,,Often,Sometimes,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Often,,,Often,,,Sometimes,,,Rarely,Most of the time,,Sometimes,,Often,,,,100% of projects,More internal than external,Central Insights Team,"Axiom; DnB",Access and data pipelines,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,104000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Chile,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,SQL,Text Mining,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts",,,Very useful,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Business Analyst,University courses,0,20,0,70,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1GB,Regression/Logistic Regression,"Microsoft SQL Server Data Mining,RapidMiner (free version),SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,Rarely,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,Often,,,,,,,,Sometimes,,,,70,10,2,10,8,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,None,Visulization,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,25900000,CLP,Has decreased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,37,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Self-employed",Amazon Machine Learning,Deep learning,Python,"Google Search,Government website","Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,,,,,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,Japan,43,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,Somewhat useful,,Somewhat useful,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",,Manufacturing,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Never,100MB,"CNNs,Neural Networks,RNNs","Jupyter notebooks,Microsoft Excel Data Mining,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Natural Language Processing",,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,50,20,10,10,10,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,Sometimes,"80,000",,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Spain,57,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",10,50,30,10,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,C/C++,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R",Rarely,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,,Often,,,,Sometimes,,,,Often,,,Often,Often,,,Often,Sometimes,,,,Often,Often,Often,,,,20,20,10,10,40,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,66,Retired,,,Yes,,Computer Scientist,Fine,Self-employed,Mathematica,Proprietary Algorithms,C/C++/C#,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,"Data Analyst,Programmer",Self-taught,75,15,0,10,0,0,"Reinforcement learning,Unsupervised Learning",Bayesian Techniques,I don't know/not sure,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,28,"Not employed, but looking for work",,,,,,,,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,PhD,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,Researcher,University courses,30,30,0,30,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important +,,NA,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",,,,,"Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Researcher,Statistician",Self-taught,50,5,30,15,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data,Relational data",Rarely,1MB,"CNNs,Ensemble Methods,Evolutionary Approaches,Neural Networks","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks",,,,,,,Most of the time,Often,Often,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,50,10,5,30,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",Sometimes,Sometimes,,,,,,,Often,,Often,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,Logistic Regression,"Some college/university study, no bachelor's degree",Non-profit,"10,000 or more employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Other,Bayesian Methods,Scala,"Google Search,Government website,University/Non-profit research group websites","Arxiv,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,,Very useful,,,,Very useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,Regression/Logistic Regression,"Python,R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,10,40,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,Often,,,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,26-50% of projects,Entirely internal,Other,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Never,1800000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Australia,44,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Data Analyst,Data Scientist,Engineer,Predictive Modeler",Work,25,15,50,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Often,,,,,Often,Often,Often,,,,Often,,,,Often,,,,,Sometimes,,Often,,,Sometimes,,,,Sometimes,,,,15,15,15,25,30,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,Sometimes,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email,Other",Slack; Tableau,Git,Rarely,300000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Turkey,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,15,40,20,15,10,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,43,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,DataRobot,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",15,0,10,75,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests",,,,,,Most of the time,Sometimes,Often,,,,Often,,,,,,,,,,,Often,,,,,,,,,,,80,10,0,2,8,0,Enough to tune the parameters properly,"Dirty data,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Sometimes,,,Often,Often,,,Often,Sometimes,,10-25% of projects,Entirely internal,Business Department,none,learning R whereas I was trained on Python.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,18000,EUR,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,,University courses,64,20,8,8,0,0,Time Series,Hidden Markov Models HMMs,"Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Decreased slightly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Video data",Never,1GB,"HMMs,Regression/Logistic Regression,Other","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs,PCA and Dimensionality Reduction,Segmentation,Simulation,Time Series Analysis",,,,,,Rarely,Most of the time,,,,,,Often,,,,,,,,Rarely,,,,,Most of the time,Rarely,,,Most of the time,,,,55,5,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Sometimes,,,,,,,,Most of the time,,,,Most of the time,,,76-99% of projects,Entirely internal,Other,none,complexity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,29000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Other,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Trade book",,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,More than 10 years,Data Scientist,Self-taught,40,10,25,0,5,20,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Other,Other",,Rarely,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,Often,,,,,,,Often,Most of the time,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",Rarely,Often,,,,Most of the time,Sometimes,Often,Often,,,Often,,,,Often,,Often,Often,Often,Sometimes,,Often,Sometimes,,Often,,Sometimes,,Often,,,,45,20,5,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Most of the time,Most of the time,,Often,Sometimes,,,,,Most of the time,,,Often,,,Most of the time,Often,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,175000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,Python,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Programmer,University courses,30,20,10,20,10,10,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Text data",Rarely,1GB,Ensemble Methods,"C/C++,Python,R,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Often,,,,,,,,,,"Decision Trees,Ensemble Methods,Naive Bayes,Neural Networks",,,,,,,,Sometimes,Often,,,,,,,,,Often,,Often,,,,,,,,,,,,,,20,20,10,30,10,10,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,35000,RUB,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Mexico,37,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Bayesian Methods,R,I collect my own data (e.g. web-scraping),"Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,30,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Manufacturing,"5,000 to 9,999 employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,,,,,,Sometimes,Often,Sometimes,Most of the time,,Sometimes,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Prescriptive Modeling,Recommender Systems,Segmentation",Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,Often,,,,,,,,50,25,5,15,5,0,Enough to tune the parameters properly,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,Often,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,None,Understand the data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Bitbucket,Git",Most of the time,500000,MXN,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Canada,64,Retired,,,Yes,,Scientist/Researcher,,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",60,0,0,0,30,10,Supervised Machine Learning (Tabular Data),Ensemble Methods,A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,40,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Julia,Bayesian Methods,R,"Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Newsletters,Official documentation,Stack Overflow Q&A,Textbook,Other",Very useful,Very useful,,,,,,Very useful,,Very useful,,,,Somewhat useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,More than 10 years,Other,Self-taught,60,0,30,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Government,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","R,SAS Base,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Often,,,Sometimes,,Rarely,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Text Analytics,Time Series Analysis",Rarely,,Rarely,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,,Often,Sometimes,,,,,Often,,Sometimes,Sometimes,,,,25,20,5,25,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,Rarely,,,Most of the time,,,,,,,,,,Often,,,,,,Sometimes,,100% of projects,More internal than external,Other,Australian Census; NZ Census,Getting access to the right people to source the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial",Sometimes,250000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Somewhat useful,Very useful,,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,20,0,70,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Never,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Modeler,Jupyter notebooks,Orange,Python,R,SQL,Tableau",,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,,,Sometimes,,Sometimes,,Often,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,Rarely,,Sometimes,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,Sometimes,,,,70,5,0,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,Often,Most of the time,,Most of the time,,,,Most of the time,,Often,,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,,76-99% of projects,More internal than external,Central Insights Team,government; credit bureau,Getting acces to it and the metadata regarding it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Other,Never,44000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Operations Research Practitioner,Researcher",Self-taught,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Mix of fields,I prefer not to answer,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Oracle Data Mining/ Oracle R Enterprise,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,,,,,Often,,,,,Often,Most of the time,Often,,,Most of the time,Most of the time,,,Most of the time,,,,50,20,5,15,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Inability to integrate findings into organization's decision-making process,Need to coordinate with IT",,,,,,,,Often,,,,,,,Often,,,,,,,,26-50% of projects,More internal than external,Business Department,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Company Developed Platform",,Git,,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Turkey,42,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,University/Non-profit research group websites,"Blogs,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,Self-taught,20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Self-employed",TensorFlow,Deep learning,Python,Government website,"Arxiv,Blogs,Conferences,Friends network,Online courses,Podcasts",Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,"FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,55,20,15,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Bayesian Techniques,A bachelor's degree,Internet-based,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,HMMs,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,Often,Most of the time,,,,Rarely,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems",Often,,Most of the time,,Sometimes,,,,,,,,,,,Often,,Often,,,Often,,,Often,,,,,,,,,,50,30,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues",,,Rarely,,Often,,,,Sometimes,,,,,Most of the time,,Most of the time,Often,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Mercurial",Sometimes,95000,EUR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,63,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,"College/University,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects",,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,70,20,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Random Forests,SVMs","R,SAS JMP,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,Often,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Often,Most of the time,Often,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,Sometimes,,Often,,,,80,4,3,8,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Most of the time,117000,AUD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Support Vector Machines (SVM),Python,University/Non-profit research group websites,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,,,,,,,,,,,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Bayesian Techniques,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,,,,,,, +Male,Russia,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Somewhat useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,No,Master's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Neural Nets,SQL,"GitHub,Google Search","Blogs,Friends network,Kaggle,Official documentation,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Outlier detection (e.g. Fraud detection),,A master's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Never,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,SQL",,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Random Forests,Segmentation",,,,,,Sometimes,Often,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,,,Often,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,,Often,,,,,,,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Czech Republic,30,Employed part-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by college or university,Google Cloud Compute,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Personal Projects",Somewhat useful,Very useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Predictive Modeler,Programmer",University courses,30,20,20,20,10,0,"Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Text data,Rarely,100GB,"Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,"Cross-Validation,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,,,Most of the time,,,,,,,Rarely,,,Rarely,,Most of the time,Often,Most of the time,Sometimes,,,,Often,,,,,,,,,20,60,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Most of the time,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,very low quality,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,360000,CZK,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Canada,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Conferences,Official documentation,Online courses,Podcasts,Stack Overflow Q&A",,,,,Somewhat useful,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher",Self-taught,40,5,25,25,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Technology,"1,000 to 4,999 employees",Stayed the same,1-2 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,100GB,Other,"Jupyter notebooks,Python,Spark / MLlib",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,40,30,0,10,20,0,Enough to run the code / standard library,"Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues",,,,,,,,,,,Sometimes,Sometimes,,,,,Often,,,,,,10-25% of projects,More internal than external,IT Department,Private obfuscated customers data,Noisy data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,65000,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,RapidMiner (free version),Cluster Analysis,Stata,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,Not Useful,Not Useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Statistician",University courses,25,10,25,25,15,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,Minitab,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",Sometimes,,Sometimes,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,Sometimes,Sometimes,,,,Often,,,Most of the time,,,,30,20,5,15,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Often,,,,,,,,,Often,Sometimes,,Often,,51-75% of projects,Entirely internal,Other,Census Bureau;U.S. Customs;Cass Transportation Index,Data scrubbing for ETL ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Sometimes,75000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Brazil,69,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by non-profit or NGO,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,0,30,15,5,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,Less than one year,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data,Other",Never,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Time Series Analysis,Other",,,,,,Most of the time,Most of the time,,Often,,,Often,,,,Often,,,Often,,,,Often,,,,,,,Often,Often,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations in the state of the art in machine learning",Sometimes,,,,,,,,,,,Often,,,,,,,,,,,76-99% of projects,Entirely internal,Other,,Complex structure,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Never,"135,000",USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Philippines,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Self-employed",Python,Deep learning,Python,"Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,30,50,0,5,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,"Some college/university study, no bachelor's degree",Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Rarely,1MB,"Bayesian Techniques,Regression/Logistic Regression","Java,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization",,,Sometimes,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Sometimes,Sometimes,,,,Rarely,,,,,,,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,OpenStreet Map (OSM); Census;,Data quality;,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,715000,PHP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Proprietary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician,Other",Self-taught,50,25,25,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Relational data,Other",Most of the time,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,SQL,Unix shell / awk",,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,"Association Rules,Data Visualization,Decision Trees,Lift Analysis,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Often,,,,,Often,Often,,,,,,,Often,,,,,,Often,,,,,Often,Often,,Rarely,Most of the time,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input",Often,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,"External Credit Bureau e.g. CIBIL, CRIF HIGH MARK, EQUIFAX, Macro-econonic Indicators data from government web, Social Media data from FB etc","Poor data quality, inconsistency","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,"Data Stories Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Researcher",Work,NA,50,40,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Never,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Python,R,RapidMiner (free version),Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Most of the time,,,,,,Sometimes,Most of the time,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Most of the time,,,,Most of the time,,Often,,,Often,,Most of the time,,,,,,,Often,,,,30,30,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources",Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,76-99% of projects,More external than internal,Standalone Team,"UCI, HCUP, MIMIC",,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Subversion,Most of the time,,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Australia,31,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,22,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Other,University courses,50,50,0,0,0,0,Outlier detection (e.g. Fraud detection),,A master's degree,Technology,500 to 999 employees,Decreased slightly,Don't know,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,,"Cloudera,Hadoop/Hive/Pig,IBM Cognos,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (free version),Tableau",,,,,Sometimes,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems,Text Analytics",,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,,,,,Sometimes,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,46,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,PhD,Yes,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,0,50,0,0,"Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,United States,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Business Analyst,Kaggle competitions,20,30,0,10,40,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Python,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,"A/B Testing,Time Series Analysis,Other",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,0,0,0,10,90,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,SAS,University/Non-profit research group websites,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,0,0,0,100,0,0,,Logistic Regression,High school,Manufacturing,"5,000 to 9,999 employees",Stayed the same,Don't know,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,,"Angoss,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SAS Enterprise Miner,SQL",,,Rarely,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,Rarely,,Rarely,,,Rarely,,Rarely,Rarely,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,30,25,5,40,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,100% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Subversion,Sometimes,130000,,Has decreased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,Very useful,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Other,100 to 499 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,<1MB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,Python,SQL",,,,Rarely,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,90,9,1,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Sometimes,,,,,,,,,,,Often,,,Often,,Sometimes,,Less than 10% of projects,Do not know,IT Department,None,"Dirty, incomplete.",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Bitbucket,Git",Rarely,83000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Turkey,31,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Miner,Operations Research Practitioner,Predictive Modeler",Work,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,Proprietary Algorithms,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Newsletters,Online courses,Textbook",,Somewhat useful,,Somewhat useful,Not Useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,More than 10 years,"Data Analyst,Data Scientist,Other",University courses,40,5,50,5,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Internet-based,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Laptop or Workstation and local IT supported servers,Other,Sometimes,10TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,,Most of the time,,,,Most of the time,,Rarely,,,Sometimes,,,Often,,,,,,,,,,Rarely,,,,Often,,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Natural Language Processing,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics",Often,,Rarely,,Sometimes,Often,Often,Sometimes,,,,,,,Rarely,Often,,,Sometimes,,,,Rarely,Sometimes,,Often,Rarely,,Sometimes,,,,,10,5,5,5,20,55,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Sometimes,,Often,,Most of the time,,,Sometimes,Often,,,,,,,,Sometimes,,,,Sometimes,,76-99% of projects,More internal than external,Central Insights Team,,extracting useful data from the data lake (/ data swamp),"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,,8,,,,,,,,,,,,,,,,,, +Female,Italy,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,Personal Projects,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,,,Not Useful,Somewhat useful,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",University courses,10,5,20,55,10,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting",High school,Technology,20 to 99 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Python,Spark / MLlib,Unix shell / awk",,Most of the time,,,,,,Sometimes,Sometimes,,,,,,,,Often,,Rarely,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Time Series Analysis",Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,20,20,50,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Often,,,,,,,Sometimes,,,,,,,,,Sometimes,,10-25% of projects,Entirely internal,IT Department,,Not all the informationneeded is available and sometimes even logs aren't enough,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Bitbucket,Rarely,28000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,South Korea,38,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,University courses,5,0,95,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Python,R,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,Often,,,,,Often,,Sometimes,,,Sometimes,,Rarely,,Sometimes,,,,60,30,0,5,5,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,Often,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,University courses,35,20,0,15,0,30,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,50,10,20,0,20,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Other,500 to 999 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Rarely,10TB,Random Forests,"Java,Python,Spark / MLlib",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,30,5,20,5,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,Spain,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Other",University courses,0,30,30,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression,Other","Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,Sometimes,Most of the time,,,,,,,,,Often,,,,,,,,,,"Logistic Regression,Time Series Analysis,Other,Other",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,Sometimes,Sometimes,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Sometimes,,,10-25% of projects,Approximately half internal and half external,Standalone Team,AEMET,That the data has been correctly storage,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Google Drive,Bitbucket,Rarely,,EUR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Other,Bayesian Methods,R,"GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,Conferences,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,More than 10 years,"Operations Research Practitioner,Researcher,Statistician",Self-taught,60,10,10,0,0,20,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks",A bachelor's degree,CRM/Marketing,100 to 499 employees,Decreased significantly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Python,R,SAS Base,Stan,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,Sometimes,,,,,Rarely,,Most of the time,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Most of the time,Most of the time,,,Sometimes,,Most of the time,,Most of the time,,Sometimes,,Sometimes,Most of the time,,Most of the time,,,Most of the time,Sometimes,Sometimes,Often,Often,,,,30,15,5,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,,Often,Most of the time,,,,Often,,,,,,Most of the time,,,100% of projects,More internal than external,Central Insights Team,Census; Client data;sales data,Getting them,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,115000,AUD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",80,10,0,0,10,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Very important,Other,Other,,,,,"IBM Watson / Waton Analytics,Python,R,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,Often,,Sometimes,,,,,,,,,,,Often,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Government website,Friends network,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Self-taught,25,25,0,0,0,50,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,500 to 999 employees,Increased significantly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,10GB,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,20,40,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues",,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Bitbucket,,110000,USD,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Friends network,Kaggle,Official documentation,Online courses",Very useful,Very useful,,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",15,20,15,30,10,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection)",Logistic Regression,I don't know/not sure,Technology,10 to 19 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data,Other",Rarely,10MB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Python,R,Spark / MLlib,SQL,Stan",,Often,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Most of the time,Rarely,,,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Recommender Systems,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,,Often,,,Often,,,,,Sometimes,,,,,Often,Sometimes,,,,30,35,15,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",Often,,,,Often,,,,,,,,,,Sometimes,,Sometimes,Often,,,,,76-99% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"3,600,000",JPY,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Philippines,55,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,R,Social Network Analysis,R,Google Search,"Blogs,College/University,Online courses",,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Nice to have,,,Necessary,,,,,,,,Other,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,A social science,I don't write code to analyze data,Researcher,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A professional degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,High school,Financial,20 to 99 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,,,"Java,KNIME (commercial version),KNIME (free version),Python,R,SQL",,,,,,,,,,,,,,,Rarely,,,Most of the time,Most of the time,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Segmentation,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs",Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Sometimes,1TB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,,,,,"A/B Testing,Association Rules,Data Visualization,kNN and Other Clustering,Neural Networks,Recommender Systems,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,36,37,5,20,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Sometimes,120000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,60,0,0,0,10,30,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,10MB,"CNNs,Neural Networks,Other","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,Mathematica,Microsoft Excel Data Mining,Python,R,TensorFlow",Rarely,Often,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,,,,,"A/B Testing,CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,Often,,Often,,Often,Often,,Sometimes,,Often,,,,,,,,,20,20,30,10,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,,,Sometimes,,,51-75% of projects,Approximately half internal and half external,IT Department,Crime Datasets; CDC; American Community Survey,Time and Understanding,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,180000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Mexico,34,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,NoSQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,Computer Vision,Decision Trees - Random Forests,High school,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",Image data,Always,10MB,"Bayesian Techniques,Random Forests","Java,NoSQL,Python,Other",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,"Bayesian Techniques,Data Visualization,Ensemble Methods,Random Forests,Segmentation",,,Often,,,,Most of the time,,Often,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,50,10,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,,,,,Often,,,Often,Most of the time,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,kaggle,clean and filter the images,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,300000,MXN,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,University/Non-profit research group websites,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Canada,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Google Cloud Compute,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Textbook,Other",Very useful,Very useful,,,,,Somewhat useful,,,,,Very useful,Somewhat useful,,Very useful,,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A master's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,SQL,TensorFlow",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"CNNs,Data Visualization,Decision Trees,GANs,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,Sometimes,,,Often,Sometimes,,,Rarely,,,,,,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,,,Sometimes,,,,20,20,40,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Organization is small and cannot afford a data science team,Other",Rarely,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,Often,100% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Commercial Data Platform,,Git,Rarely,90000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by company that makes advanced analytic software,Jupyter notebooks,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Non-Kaggle online communities",Very useful,Very useful,,,,,Very useful,,Somewhat useful,,,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Other",Self-taught,0,0,20,10,70,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",A doctoral degree,Mix of fields,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Rarely,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Machine Learning,C/C++,Microsoft Excel Data Mining,Python",Sometimes,,,Rarely,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,Segmentation",Rarely,,,Often,,Sometimes,Often,,Often,,,,,,,Sometimes,,,,Most of the time,,,Sometimes,,,Often,,,,,,,,55,40,0,5,NA,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Often,Often,Often,,,Often,,Often,Often,,Often,,,Often,,,Often,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,150000,,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,10,10,35,35,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Germany,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,C/C++/C#,"GitHub,University/Non-profit research group websites","Arxiv,Stack Overflow Q&A",Very useful,,,,,,,,,,,,,Very useful,,,,,,5-10 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Physics,6 to 10 years,I haven't started working yet,University courses,50,0,10,40,0,0,Time Series,"Bayesian Techniques,Evolutionary Approaches,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,Mexico,68,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression",,,,,,Sometimes,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,50,0,10,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Explaining data science to others,Need to coordinate with IT,Unavailability of/difficult access to data",,Often,,,,Sometimes,,,,,,,,,Sometimes,,,,,,Most of the time,,10-25% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,Other,Python,I collect my own data (e.g. web-scraping),"Official documentation,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,,,,Very useful,,Very useful,,,,,Somewhat useful,Very useful,"Linear Digressions Podcast,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,93,5,2,0,0,0,"Adversarial Learning,Reinforcement learning,Time Series","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,500 to 999 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Other,Rarely,1GB,"CNNs,GANs,HMMs,Neural Networks,RNNs","C/C++,IBM Watson / Waton Analytics,Jupyter notebooks,Python,SQL,TensorFlow",,,,Often,,,,,,,,,Rarely,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"CNNs,GANs,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,,Sometimes,,,,,,,Often,,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,Often,,,,70,20,3,5,2,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,195000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,Retired,,,No,Yes,Other,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,25,0,50,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Mix of fields,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,,,"Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,Sometimes,,,,Rarely,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Programmer,University courses,0,0,0,100,0,0,,"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Increased significantly,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts",Somewhat useful,,,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX","Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,20,0,0,40,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Not important +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by government,Tableau,Neural Nets,R,"GitHub,Google Search,Government website","College/University,Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,,University courses,0,25,0,75,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning",Logistic Regression,High school,Government,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,20,0,0,10,70,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,Often,,,Most of the time,,,Often,Often,,,,Most of the time,,,Often,,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,112000,USD,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Brazil,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,GitHub,"Blogs,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,,,Very useful,,Very useful,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,10,20,0,0,0,"Time Series,Unsupervised Learning",,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,NoSQL,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,Self-taught,10,20,25,45,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Julia,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Recommender Systems",Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,Sometimes,,,,,Often,,,Most of the time,,Often,Often,,,100% of projects,More internal than external,Standalone Team,"data.gov, NDNC",Finding meaningful data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,,100000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,75,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Textbook,Trade book,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,Somewhat useful,Somewhat useful,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Biology,1 to 2 years,Statistician,Self-taught,80,0,0,0,20,0,Unsupervised Learning,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,,,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,Somewhat useful,,Very useful,Very useful,"Jack's Import AI Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Other",University courses,20,20,10,50,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,,Sometimes,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",Sometimes,,,,,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,,Sometimes,,Most of the time,,,Sometimes,,,,,,,,50,30,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,Often,Most of the time,,Often,,Most of the time,Sometimes,,Often,,,Often,Often,,,,Sometimes,Most of the time,,,26-50% of projects,More external than internal,Business Department,,Poor documentation and bad practices of encoding null values,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Rarely,120000,USD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Other,,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,More than 10 years,"Data Analyst,Researcher,Other",Work,70,5,5,0,0,20,"Outlier detection (e.g. Fraud detection),Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",Manufacturing,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,<1MB,Other,"Microsoft Excel Data Mining,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Other",Most of the time,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,,,,,Most of the time,Less than 10% of projects,Entirely internal,Other,None,Can't get access to the data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,60000,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Tableau,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,,,"Data Stories Podcast,FlowingData Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Other",Self-taught,50,10,30,10,0,0,"Natural Language Processing,Recommendation Engines,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",,,,"Amazon Web services,Hadoop/Hive/Pig,IBM Cognos,Java,NoSQL,Perl,Python,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,Sometimes,Rarely,,,,,Often,,,,,,,,,,,,Most of the time,,,Sometimes,Often,,,,,,,,,,Rarely,Sometimes,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Text Analytics",Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,50,0,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Always,"134,000",,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist",University courses,50,0,0,50,0,0,Survival Analysis,"Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks",High school,Academic,Fewer than 10 employees,Decreased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,,RNNs,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,36,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Bayesian Methods,Python,GitHub,"College/University,Online courses,Textbook,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,Very useful,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),Less than a year,,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Female,United States,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Other,Python,"Google Search,Government website","Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,Very useful,"Data Elixir Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,25,40,10,25,0,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,10GB,Other,"Amazon Web services,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Rarely,,Sometimes,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,10,0,0,0,0,90,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data",,Often,,,Often,,,,,,,,,,,,,,,,,,None,More internal than external,Standalone Team,youtube data,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,125000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Brazil,30,"Independent contractor, freelancer, or self-employed",,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,R,SAS Base,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,,Often,,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Segmentation",,,,,,,Often,Sometimes,,,,,,,,Most of the time,,,,,,Sometimes,,,,Often,,,,,,,,30,40,0,30,0,0,Enough to run the code / standard library,"Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,Often,,,,,,Often,Sometimes,,10-25% of projects,More internal than external,Standalone Team,NA,NA,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,1700000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Brazil,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Very useful,Jack's Import AI Newsletter,< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,,University courses,30,30,0,40,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,,,Very useful,,Somewhat useful,,Very useful,,Very useful,"Data Stories Podcast,KDnuggets Blog",< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,70,20,0,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Germany,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Friends network,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,,,Very useful,Very useful,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,25,0,50,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Other,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Physics,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Software Developer/Software Engineer,Statistician",Self-taught,60,0,40,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,50,20,20,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SAS Base,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,Often,,,,Often,,,Sometimes,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,Sometimes,Sometimes,Often,Often,Often,,,Often,,Sometimes,,Most of the time,,,Often,,Often,,,Often,,,,Often,Most of the time,Most of the time,,,,50,20,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Often,Often,,Sometimes,,,Often,Often,,Sometimes,,Sometimes,,Often,,,,Sometimes,,Often,,26-50% of projects,More internal than external,Central Insights Team,"IXI, credit bureau, Duns and Bradstreet, Market data (security prices)",Volume,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Sometimes,200000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Argentina,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Data Analyst,University courses,30,1,10,59,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Telecommunications,"5,000 to 9,999 employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL",,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Rarely,,,,Sometimes,Often,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,,,51-75% of projects,More internal than external,Business Department,weather,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,34000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Australia,31,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Other",University courses,20,30,0,50,0,0,"Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Don't know,,,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Association Rules,Data Visualization,kNN and Other Clustering,Random Forests",,Sometimes,,,,,Most of the time,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,40,20,10,20,10,0,Enough to tune the parameters properly,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,SEC EDGAR,Collect the large amount of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,AUD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Statistician,Perfectly,Self-employed,R,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,,,,"DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Computer Scientist,Researcher,Software Developer/Software Engineer,Statistician",University courses,40,50,0,10,0,0,Computer Vision,Markov Logic Networks,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +"Non-binary, genderqueer, or gender non-conforming",United States,51,Employed full-time,,,Yes,,Other,Poorly,Employed by government,Other,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,,Kaggle competitions,20,0,0,0,80,0,,,"Some college/university study, no bachelor's degree",Government,500 to 999 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,10,10,0,40,40,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Other",,Most of the time,,,,,,,,,,,,,,,,,,,,Most of the time,51-75% of projects,More external than internal,Other,various GIS datasets,knowing where to find it,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Share Drive/SharePoint",,Other,Never,42000,USD,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Canada,34,"Not employed, but looking for work",,,,,,,,Python,I don't plan on learning a new ML/DS method,R,I collect my own data (e.g. web-scraping),"Conferences,Personal Projects",,,,,Very useful,,,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,90,0,0,10,0,0,Survival Analysis,"Bayesian Techniques,Evolutionary Approaches",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Don't know,10MB,"Regression/Logistic Regression,Other","Jupyter notebooks,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,Rarely,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Often,Often,Often,Often,,,Often,,Sometimes,,Sometimes,,,,,Often,,Sometimes,,,,Most of the time,,,,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Often,,,,,Sometimes,,,,,Sometimes,,Most of the time,,100% of projects,More internal than external,Other,none,dirty and varied sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Other,Never,185,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Bayesian Methods,Julia,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer,Predictive Modeler,Researcher",University courses,10,15,10,60,5,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,Rarely,Most of the time,,,,Rarely,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Rarely,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis,Other",,,,Rarely,,Most of the time,Most of the time,Sometimes,Often,Rarely,,,,Sometimes,,Sometimes,,Sometimes,,Most of the time,Often,,Sometimes,,,,,Sometimes,,Most of the time,Most of the time,,,10,40,10,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,,Sometimes,Most of the time,Often,,Often,,,Sometimes,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,United States,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,Self-taught,50,50,0,0,0,0,"Time Series,Unsupervised Learning",Bayesian Techniques,,Mix of fields,"5,000 to 9,999 employees",Increased slightly,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Online courses,Textbook,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,,,,Very useful,,,Very useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Work,30,30,30,10,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data,Other",Sometimes,1TB,"Gradient Boosted Machines,Random Forests","Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Random Forests",Rarely,,,,,Most of the time,Often,,Most of the time,,,Often,,,,,,,Often,,,,Most of the time,,,,,,,,,,,30,30,30,5,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,,,Sometimes,Often,,,,,,,Often,Most of the time,,,,Most of the time,,Less than 10% of projects,More internal than external,Other,,Lack of good labels.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Cloud,Git,Rarely,155000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Other,Yes,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,University courses,30,10,0,50,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Female,Canada,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Online courses",,,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Taiwan,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Somewhat useful,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important +Male,Brazil,40,Employed full-time,,,No,Yes,Data Analyst,Perfectly,Employed by company that makes advanced analytic software,IBM Cognos,Regression,SQL,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",2,2,2,94,0,0,Survival Analysis,Logistic Regression,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,R,Bayesian Methods,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Official documentation,Online courses",,Somewhat useful,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,"FlowingData Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Researcher",Work,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Google Cloud Compute,IBM SPSS Statistics,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,QlikView,R,SQL,Tableau",,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,Rarely,Sometimes,,,,,,,,,Often,,,Often,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,Sometimes,,Most of the time,Often,,,,,,Often,,Often,,,,,,,,,,Often,,,Sometimes,Sometimes,,,,20,20,10,20,10,20,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data,Other",,,,,Often,,,,,,,,Sometimes,,,,,Sometimes,,,Often,Often,76-99% of projects,Approximately half internal and half external,Central Insights Team,"ABS Census datapacks, Twitter firehose/ API, Google Analytics, Facebook","Getting everything into the one environment, cleaning dirty data, restructuring system data into a form appropriate for analytics or data viz, optimising queries to be efficient on large data sets e,g 100,000,000+ lines. ","Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Other","Microsoft Azure platform, Google Cloud platform, Pentaho","Generic cloud file sharing software (Dropbox/Box/etc.),Git,Other",Sometimes,180000,AUD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Taiwan,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Somewhat useful,KDnuggets Blog,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +A different identity,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,R,Deep learning,Python,"GitHub,Google Search","Blogs,Podcasts,YouTube Videos",,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Biology,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,Decision Trees - Gradient Boosted Machines,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,22,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Matlab,GitHub,"College/University,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,,< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Some college/university study without earning a bachelor's degree,A health science,1 to 2 years,Researcher,University courses,50,0,0,50,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,70,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Other,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,,Self-taught,90,10,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,C/C++,Python",,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Simulation",,,Often,,,,,,,,,Often,,,,Often,,,,Often,,,Often,,,,Most of the time,,,,,,,30,30,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,More external than internal,Standalone Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),"Commercial Data Platform,Company Developed Platform,Email",,"Bitbucket,Git,Other",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,R,Regression,R,Other,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Master's degree,A social science,I don't write code to analyze data,Other,Self-taught,50,0,0,50,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Other,47,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Computer Vision,Natural Language Processing","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,52,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,RapidMiner (free version),Genetic & Evolutionary Algorithms,R,Other,"Blogs,Conferences,Friends network,Kaggle,Official documentation,Stack Overflow Q&A",,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,More than 10 years,,Self-taught,80,0,10,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Other,Never,100GB,"Bayesian Techniques,Random Forests","MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,Often,,Sometimes,,,,,,,,,Rarely,,,,,,Often,,,,"Bayesian Techniques,HMMs,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Sometimes,,,,,,,,,,26-50% of projects,More internal than external,Other,"Genbank Uniprot",Clean data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"180,000",AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,68,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Management information systems,More than 10 years,Other,University courses,0,40,0,60,0,0,"Natural Language Processing,Time Series",Logistic Regression,A doctoral degree,Academic,100 to 499 employees,Increased slightly,Less than one year,A general-purpose job board,Important,Other,Traditional Workstation,"Text data,Relational data",Rarely,,Regression/Logistic Regression,"Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling",,,,,,,Often,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,25,25,0,25,25,0,Enough to explain the algorithm to someone non-technical,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,28,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Computer Science,Less than a year,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,New Zealand,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,"Data Stories Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,Machine Learning Engineer,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Reinforcement learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,United States,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Fine arts or performing arts,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,50,0,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10MB,"CNNs,Decision Trees,Neural Networks,Random Forests,RNNs","Amazon Machine Learning,IBM Cognos,Microsoft Excel Data Mining,RapidMiner (free version),Tableau,TIBCO Spotfire",Rarely,,,,,,,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,GANs,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,R,Google Search,"Stack Overflow Q&A,Other",,,,,,,,,,,,,,Very useful,,,,,"DataTau News Aggregator,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Other,University courses,20,0,0,80,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Other,"5,000 to 9,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Never,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Most of the time,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Random Forests,Segmentation,Time Series Analysis",Sometimes,,,,Often,,Most of the time,Sometimes,Sometimes,,,,,,Often,,,,,,,,Sometimes,,,Often,,,,Most of the time,,,,60,10,0,20,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Privacy issues",,,,,Most of the time,,,,Sometimes,,,,,,,,Most of the time,,,,,,51-75% of projects,More internal than external,Other,DMP data,Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,80000,CAD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,Bayesian Methods,,Google Search,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,20,5,35,35,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A master's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests",Sometimes,,,,,,,Often,,,,Often,,,,Often,,,,,,,Rarely,,,,,,,,,,,45,5,35,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Often,Often,Most of the time,Sometimes,Most of the time,Often,,,Rarely,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Other,Clearbit,Data not instrumented properly,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Company Developed Platform,Email",,Git,Sometimes,120000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Hong Kong,46,Employed full-time,,,No,Yes,Other,Perfectly,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Textbook,Trade book,Other,Other",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,Not Useful,,Very useful,,,Somewhat useful,Very useful,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,3 to 5 years,"Business Analyst,Operations Research Practitioner,Other",Self-taught,60,10,0,0,0,30,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Researcher,Statistician",Self-taught,60,10,0,0,30,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Relational data,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,,,,,,,,Often,,Most of the time,,,,,,,,,,,20,50,0,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,49,Employed full-time,,,Yes,,Programmer,Poorly,Self-employed,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,Very useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Researcher,University courses,50,0,0,50,0,0,Other (please specify; separate by semi-colon),Bayesian Techniques,High school,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10MB,Bayesian Techniques,"Amazon Web services,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,Often,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"Bayesian Techniques,Cross-Validation,SVMs,Text Analytics",,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,60,10,30,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,Less than 10% of projects,Do not know,Standalone Team,,How to apply collected web data. ,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Bitbucket,,4000000,JPY,Has decreased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,,University courses,90,0,0,10,0,0,Survival Analysis,"Bayesian Techniques,Logistic Regression",A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed part-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,10,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,United States,38,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,Very useful,Very useful,,Somewhat useful,,Very useful,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,edX,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer",Self-taught,90,5,2,3,0,0,,,A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important +Male,India,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,Data Scientist,Work,50,10,30,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,10,60,0,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Chile,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher",University courses,20,20,40,20,0,0,Recommendation Engines,Other (please specify; separate by semi-colon),A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,60,0,0,15,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Retail,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Orange,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Often,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,Often,,,,,Often,,Most of the time,,,,,,,Sometimes,,,Often,,,,Most of the time,,,,40,20,5,15,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Most of the time,Sometimes,,,Most of the time,,,Often,Most of the time,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,e-commerce sites; government data,Dirty data... Keep carrying problem from legacy systems,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Subversion,Sometimes,800000,VEF,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Malaysia,30,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Social Network Analysis,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,Self-taught,80,0,5,0,0,15,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",,,"Decision Trees,Neural Networks,Regression/Logistic Regression","KNIME (free version),R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,10,70,5,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,,,,,Often,,,,Often,,Often,,Often,,Often,Often,Often,,26-50% of projects,Do not know,Other,,,Other,I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Python,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,0,10,40,40,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Insurance,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","Jupyter notebooks,Microsoft Azure Machine Learning,Python,QlikView,R,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Sometimes,Often,Most of the time,,,,,,Rarely,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,Rarely,,,Often,Often,Sometimes,Sometimes,,,,,,,Most of the time,,Rarely,,,Often,Sometimes,Often,,,Most of the time,Sometimes,,Sometimes,Most of the time,,,,45,15,5,15,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,,,,Often,,,,Sometimes,,Often,,,,,,Sometimes,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,,Rarely,85000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Colombia,22,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",50,45,0,5,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data",Don't know,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Often,,Most of the time,Most of the time,,,,,,,,,Often,,,,Most of the time,Often,,,,,,,Sometimes,,,,,,50,30,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,14000000,COP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Brazil,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Physics,,Computer Scientist,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,College/University,Conferences,Kaggle,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,Very useful,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Data Analyst,DBA/Database Engineer,Other",University courses,15,5,5,75,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM Cognos,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,Often,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Often,,,,,Sometimes,Often,,,Sometimes,,,Rarely,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Often,Sometimes,,,,,,Sometimes,,Often,,,Often,,,,Rarely,,,,,,Often,Often,,,,35,15,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,Often,,Often,Most of the time,,,,Sometimes,,,,,,Often,,,Most of the time,,,Often,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,95000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,GitHub,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,Necessary,,Necessary,,Necessary,,,,,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,,50,50,0,0,0,0,Natural Language Processing,Neural Networks - RNNs,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Australia,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,"Not employed, but looking for work",,,,,,,,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,Somewhat useful,,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,"DataTau News Aggregator,Jack's Import AI Newsletter,No Free Hunch Blog",3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,Sort of (Explain more),Master's degree,Other,More than 10 years,Other,Self-taught,30,40,30,0,0,0,Recommendation Engines,"Decision Trees - Random Forests,Hidden Markov Models HMMs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important +Male,United States,51,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Official documentation,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,More than 10 years,Computer Scientist,University courses,20,0,70,10,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,GANs,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Orange,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,10,0,0,30,50,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More external than internal,Other,prefer not to say,"acquisition of data, privacy","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Taiwan,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,"Business Analyst,Data Analyst,Data Miner,Data Scientist",Other,45,0,0,0,5,50,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,45,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Neural Nets,Python,"GitHub,University/Non-profit research group websites","College/University,Company internal community,Online courses,Personal Projects",,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,Less than a year,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Academic,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Often,Often,Sometimes,Often,,,,,Sometimes,,Often,,,,,Often,,Often,,,,Often,,,,,,,10,60,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,Often,,,Often,,,,,,,,,,,Sometimes,,,100% of projects,Entirely internal,Other,no,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,30000,RUB,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Brazil,34,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle",,,Very useful,,Very useful,,Very useful,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Computer Scientist,University courses,20,30,20,20,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Academic,20 to 99 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs",,Sometimes,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,50,20,0,10,20,0,,"Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,Most of the time,,Less than 10% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Republic of China,27,"Not employed, but looking for work",,,,,,,,SQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,,,,,,,,,,,,,,Udacity,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Yes,Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,50,0,0,0,10,"Computer Vision,Reinforcement learning",Neural Networks - CNNs,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,,Not Useful,,,Somewhat useful,,,,Very useful,Very useful,Not Useful,Very useful,,Somewhat useful,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",Primary/elementary school,Other,"10,000 or more employees",Stayed the same,Less than one year,Some other way,"N/A, I did not receive any formal education",Other,Laptop or Workstation and private datacenters,Relational data,Always,1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,Sometimes,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,Sometimes,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,,Most of the time,Sometimes,Most of the time,,,Sometimes,,,,Often,,,,,,Sometimes,Sometimes,,,,,,,Most of the time,,,,30,10,40,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Other",Often,,,,,Most of the time,,,Often,,,,Most of the time,,,,,,,,,Most of the time,51-75% of projects,Entirely internal,Standalone Team,We don't use external data,Our large company still has a problem setting up data pipelines. many organizations will not allow direct access to data. Few APIs exist. Data stewards do poor work maintaining accurate data. Most data is clean but not easy to get and not always accurate. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint,Other",Self built web applications,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Always,150000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed part-time,,,Yes,,Researcher,,Employed by college or university,Jupyter notebooks,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer",University courses,40,0,0,40,0,20,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,3-5 years,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Other",Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,Spark / MLlib,Unix shell / awk,Other",,Rarely,,Rarely,,,,,Most of the time,,,,,,Most of the time,,,,,,Sometimes,,,,Rarely,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Most of the time,Most of the time,,,"Cross-Validation,Decision Trees,Ensemble Methods,Random Forests",,,,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,25,25,25,25,NA,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,Rarely,,,,Most of the time,,,Most of the time,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Other,,,Other,I don't typically share data,,Git,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,48,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by government,Microsoft Excel Data Mining,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher,Statistician",University courses,20,0,10,70,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Other",Sometimes,10MB,"Bayesian Techniques,Decision Trees","R,SAS Enterprise Miner,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,,,5,NA,10,80,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization",Most of the time,Often,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,More internal than external,Central Insights Team,Nil,Empty fields due to privacy ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,55000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Management information systems,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,Recommendation Engines,Bayesian Techniques,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Anomaly Detection,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Programmer",University courses,0,5,10,80,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,16-20,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Not important +Male,United States,42,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,Monte Carlo Methods,Python,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Data Stories Podcast,Other (Separate different answers with semicolon)",< 1 year,,,Necessary,,,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Other,,2 - 10 hours,Kaggle Competitions,,Professional degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Male,India,54,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Business Analyst,Self-taught,30,20,30,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Academic,100 to 499 employees,Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and private datacenters,Relational data,Never,10MB,"Decision Trees,HMMs,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Python,R,SAS Base",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Decision Trees,Lift Analysis,Logistic Regression,Prescriptive Modeling",,,,,,,,Most of the time,,,,,,,Often,Most of the time,,,,,,Sometimes,,,,,,,,,,,,10,30,0,0,60,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,Often,,,Often,,,,,,,,,,,,Most of the time,,10-25% of projects,,Business Department,none,none,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Never,1400000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"GitHub,Google Search","Arxiv,Blogs,College/University,Personal Projects,Textbook",Very useful,Very useful,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Talking Machines Podcast",3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,0,0,60,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Indonesia,30,Employed full-time,,,No,Yes,Data Analyst,Fine,"Employed by college or university,Employed by government",R,Random Forests,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Online courses,Textbook,Tutoring/mentoring",,,,,Somewhat useful,,,,,,Very useful,,,,Somewhat useful,,Very useful,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",5-10 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Operations Research Practitioner,Statistician",University courses,50,10,0,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Work,90,0,0,10,0,0,,,A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,53,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",College/University,,,Somewhat useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,More than 10 years,Researcher,Self-taught,80,20,0,0,0,0,Natural Language Processing,Neural Networks - RNNs,A bachelor's degree,Academic,500 to 999 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,1TB,Neural Networks,"Hadoop/Hive/Pig,IBM SPSS Statistics,MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,,,,,"Cross-Validation,Decision Trees,Evolutionary Approaches,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,RNNs,Segmentation,SVMs,Text Analytics",,,,,,Often,,Often,,Often,,,Sometimes,Often,,,,Often,Often,Often,,,,,Often,,,Sometimes,Often,,,,,50,50,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,Sometimes,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,,,Python,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Personal Projects,Textbook",,,,,,,Very useful,,Very useful,,,Very useful,,,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist",Work,60,0,20,10,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Insurance,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,Often,Rarely,,,,Sometimes,,,Sometimes,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,Often,,,,,Often,Most of the time,,,,,,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,too many data issues to be aware of and clean,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,160000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),,A master's degree,Other,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Random Forests,R,"Google Search,Government website,University/Non-profit research group websites","College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,,,,Somewhat useful,Not Useful,Very useful,,Very useful,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,"Researcher,Other",University courses,45,15,20,20,0,0,,,A master's degree,Other,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,100,0,0,0,0,0,,"Dirty data,Need to coordinate with IT",,,,,Most of the time,,,,,,,,,,Often,,,,,,,,None,Entirely internal,Standalone Team,,Dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,50000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Engineer,Programmer,Researcher",University courses,15,10,60,15,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,I don't know,Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Non-Kaggle online communities,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,Very useful,,Very useful,,,,Very useful,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Engineer",University courses,10,40,0,50,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Rarely,1MB,"Decision Trees,HMMs,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Perl,Python,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Most of the time,,Sometimes,,,,Most of the time,,,,,,,,,Often,Often,,Often,,,,,,,,,,,,,,,Sometimes,,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,Sometimes,,,,Often,,Often,Often,,,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,Often,,Most of the time,,,,20,70,0,10,0,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,26-50% of projects,Entirely internal,,UC Irvine Machine Learning Repository,none,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Sometimes,100,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Official documentation,Online courses,YouTube Videos",,,,,,,,,Very useful,Very useful,Somewhat useful,,,,,,,Somewhat useful,Data Machina Newsletter,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Professional degree,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,People 's Republic of China,23,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,University/Non-profit research group websites,"Blogs,College/University,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Stories Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",2 - 10 hours,Online Courses and Certifications,,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Decision Trees - Random Forests,A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Data Scientist,Other",University courses,30,20,20,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,KNIME (free version),Python,R,SQL,Tableau,Other",,Sometimes,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,Most of the time,,,"Association Rules,Collaborative Filtering,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,Sometimes,,,Sometimes,,,,,,Sometimes,,Sometimes,,,Sometimes,,,,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Amazon Machine Learning,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,Somewhat useful,,Very useful,Very useful,,,,,"Becoming a Data Scientist Podcast,FastML Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,25,0,25,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,Fewer than 10 employees,Stayed the same,Less than one year,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,C/C++,MATLAB/Octave,Python",Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Rarely,Often,Rarely,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,Often,,Often,Sometimes,Often,Most of the time,,Most of the time,,,Most of the time,,Often,Often,Often,,,,50,10,20,15,5,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,,,,,,,,Often,Often,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,United States,49,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,1 to 2 years,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Recommendation Engines,Time Series",,A bachelor's degree,Retail,"10,000 or more employees",Increased significantly,6-10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,1TB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Logistic Regression,Text Analytics",,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,Often,,,,,20,15,5,10,50,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,90,5,0,0,5,0,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon),A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10GB,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,40,25,0,15,20,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,,,,,,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Brazil,33,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Tutoring/mentoring",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,,,Somewhat useful,,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,20,10,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Brazil,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,Somewhat useful,Very useful,,,,Very useful,"KDnuggets Blog,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,,"R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",Rarely,,,,Sometimes,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,Sometimes,Sometimes,,Often,,,,Often,,,,30,10,5,15,40,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,,,,,Often,,,,Most of the time,,,,Most of the time,,,,Most of the time,,100% of projects,More internal than external,Other,News; websites;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,120000,BRL,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Australia,57,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Anomaly Detection,R,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,I never declared a major,6 to 10 years,Other,Self-taught,100,0,0,0,0,0,Time Series,Logistic Regression,I don't know/not sure,Other,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Other,Laptop or Workstation and private datacenters,Text data,Sometimes,10MB,"Bayesian Techniques,Regression/Logistic Regression","Julia,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Orange,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,Most of the time,,Rarely,,Rarely,,Sometimes,Sometimes,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,,Often,,Often,Most of the time,,,,20,20,5,30,25,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,,,,,Often,,,Sometimes,,Often,,,,,,,,,Often,,,100% of projects,Entirely internal,Business Department,,Cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,NA,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Survival Analysis,Java,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,I don't write code to analyze data,I haven't started working yet,University courses,50,10,5,35,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,70,10,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Technology,10 to 19 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100MB,"Ensemble Methods,Random Forests,RNNs,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Ensemble Methods,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,,,,Most of the time,,,,,,,,,,,Often,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,,,Often,Often,,100% of projects,Entirely internal,IT Department,,Not having variety of different conditional inputs in data.,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Bitbucket,Rarely,200000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,,3-5 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Other,University courses,25,0,10,50,15,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,Brazil,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Textbook",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,,KDnuggets Blog,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity,Other","GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Physics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,41,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Other,"Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks","Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,Python,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,Sometimes,,Often,,,Sometimes,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,Often,,,,,,Often,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Simulation,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,Often,Often,Often,Often,,,,Often,,,,Often,,Sometimes,Often,Sometimes,,,Sometimes,,,,Often,,Often,Sometimes,,,,65,10,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,None,"Data pre-processing, cleansing, moving","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Sometimes,"160,000",AUD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,30,0,0,70,0,0,Survival Analysis,"Evolutionary Approaches,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,42,Employed full-time,,,Yes,,Other,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,,,A master's degree,Technology,"10,000 or more employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,,,"Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Rarely,,,,,,,Rarely,,,"Data Visualization,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,5,5,0,2,3,85,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,10-25% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,105000,AUD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,FlowingData Blog,Linear Digressions Podcast",3-5 years,,,,,,,,,,,,,,"Coursera,DataCamp,edX","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Github Portfolio,No,Bachelor's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,,,,,,,,,,,,,,,, +Male,Australia,44,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Google Search,"Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,"KDnuggets Blog,Linear Digressions Podcast,Partially Derivative Podcast",15+ years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,,Traditional Workstation,2 - 10 hours,Other,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Work,10,10,70,10,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important +Male,United States,59,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,Business Analyst,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Software Developer/Software Engineer",University courses,5,5,25,60,5,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression",A master's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,100MB,"Ensemble Methods,Regression/Logistic Regression","Microsoft Excel Data Mining,SAS Base",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,"A/B Testing,Ensemble Methods,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,30,20,20,10,20,0,Computer Vision,"Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,100 to 499 employees,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Rarely,10GB,"CNNs,Gradient Boosted Machines,SVMs","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,TensorFlow,Unix shell / awk",,Rarely,,Often,,,,,,,,,,,,,Rarely,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Rarely,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Segmentation,SVMs",,,,Often,,Often,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Often,,Often,,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,Sometimes,,,,,,,Often,,Often,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,"140,000",SGD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Singapore,35,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,Python,Text Mining,SQL,Google Search,"Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,,,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Other,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,40,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Company internal community,Conferences,Friends network,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,Somewhat useful,Not Useful,,,,,,,,Somewhat useful,Somewhat useful,,Not Useful,Not Useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",University courses,10,20,40,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Text data,Relational data",Rarely,1TB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,Often,Often,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Prescriptive Modeling,Text Analytics,Time Series Analysis",Most of the time,,,,,Sometimes,Most of the time,Often,,,,Sometimes,,Sometimes,Rarely,Often,,,,,,Often,,,,,,,Often,Rarely,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,Often,,,,Sometimes,,,,,,,Most of the time,,,Often,,,51-75% of projects,More internal than external,Central Insights Team,comscore,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Rarely,2500000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Korea,34,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,FastML Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Electrical Engineering,Less than a year,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,40,0,0,0,30,Natural Language Processing,"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,South Korea,62,Employed full-time,,,Yes,,Statistician,,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Personal Projects,Textbook,Tutoring/mentoring",,,,,,,,,,,,Very useful,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Not at all important,Other,Basic laptop (Macbook),Relational data,Never,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Sometimes,,Often,,,,Often,Sometimes,,Often,,,,Sometimes,Sometimes,,,,,,0,10,0,10,0,80,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,100% of projects,Entirely external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,100000,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,Other,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,R,,"Blogs,College/University,Company internal community,Friends network,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,Very useful,,Very useful,,,,,,,,Very useful,,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,0,0,50,50,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Never,1GB,"Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Rarely,,,,Rarely,Sometimes,,Most of the time,,,,,,,Rarely,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Most of the time,Sometimes,,,,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,Sometimes,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,23500,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Vietnam,23,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,39,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Conferences,Stack Overflow Q&A,Textbook",Somewhat useful,,,,Somewhat useful,,,,,,,,,Somewhat useful,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,University courses,20,10,40,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,500 to 999 employees,Increased slightly,6-10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,40,0,0,40,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,Sometimes,,Often,,,,,Often,Sometimes,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,150000,BRL,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,64,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Self-employed",TensorFlow,Deep learning,Python,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,3 to 5 years,"Business Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,40,25,0,10,0,,,A bachelor's degree,Government,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,,Regression/Logistic Regression,"C/C++,Microsoft R Server (Formerly Revolution Analytics),R,RapidMiner (free version),SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,Rarely,,,,,,,Often,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,20,30,0,0,50,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team",,,,,,,,Often,,,,,,,,Most of the time,,,,,,,100% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,65000,BRL,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Insurance,I prefer not to answer,Stayed the same,Don't know,An external recruiter or headhunter,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,SQL,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Ensemble Methods,Logistic Regression,Random Forests,Segmentation,SVMs",,,,,,,Often,,Often,,,,,,,Most of the time,,,,,,,Most of the time,,,Most of the time,,Most of the time,,,,,,48,20,20,10,2,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Often,Sometimes,,,,,,,,,,Often,,,Often,Sometimes,,Often,,Less than 10% of projects,More internal than external,Business Department,,Integration of Huge Batch data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,91000,USD,Other,4,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Social Network Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle",,Very useful,,,,,Somewhat useful,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,40,30,0,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100GB,Regression/Logistic Regression,"SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,"Data Visualization,Lift Analysis,Logistic Regression,Segmentation",,,,,,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,Often,,,,,,,,70,5,5,10,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,,Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Female,Taiwan,23,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,Self-taught,40,30,10,10,10,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Government,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Never,1MB,SVMs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,60,30,0,0,10,0,Enough to run the code / standard library,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Do not know,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Bitbucket,,,,,5,,,,,,,,,,,,,,,,,, +Female,Other,40,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Programmer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,45,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,TensorFlow,Deep learning,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Conferences,Official documentation,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,500 to 999 employees,Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Other","Text data,Relational data",Never,10MB,"Decision Trees,Neural Networks,Random Forests,SVMs","Hadoop/Hive/Pig,MATLAB/Octave,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,Often,,,,,Sometimes,,Sometimes,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,SVMs",,Sometimes,,,,Most of the time,Most of the time,Often,,,,,,,,,,Often,,Most of the time,,Most of the time,Often,,,,,Often,,,,,,30,30,0,10,0,30,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Privacy issues,Other",,,,,Often,,,,,,Most of the time,,,,,,Sometimes,,,,,Most of the time,76-99% of projects,Entirely internal,Standalone Team,MNIST for benchmark,"feature selection; parameters optimization (i.e. grid search); class imbalance;","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,42000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,India,27,Employed part-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SAS Base,Regression,SAS,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,KDnuggets Blog,< 1 year,,Nice to have,,,,,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,I never declared a major,,Other,Self-taught,20,30,20,0,10,20,Reinforcement learning,Logistic Regression,High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,Somewhat important, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Self-employed,IBM Watson / Waton Analytics,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,Very useful,,Very useful,,Somewhat useful,Not Useful,,Somewhat useful,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,25,25,25,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - GANs",High school,Government,10 to 19 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,10GB,"Bayesian Techniques,GANs","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,IBM Watson / Waton Analytics,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,Tableau,Unix shell / awk",Often,Often,,,,,,Often,,,,,Sometimes,,,,,,,,Sometimes,Rarely,Rarely,,,,Rarely,,,,Often,,Most of the time,,,,,,,,,Often,,,Sometimes,,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,GANs,Text Analytics",Sometimes,Often,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,75,20,2,1,2,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,,Often,,,Sometimes,,,Sometimes,Sometimes,Sometimes,,Sometimes,,,Less than 10% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,65000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Other,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,40,10,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow",,Sometimes,,,,,,Rarely,,,,Rarely,,,,,Most of the time,,,,Sometimes,Often,,Sometimes,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Often,,,Sometimes,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Often,Often,,,Often,,Often,,Often,,,,Often,Often,,Often,,,Often,Most of the time,,,Often,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Limitations of tools",Often,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Always,,,I am not currently employed,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Republic of China,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,,,"Arxiv,Conferences,Kaggle,Stack Overflow Q&A,Other",Somewhat useful,,,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Scientist,Researcher",Work,10,10,25,50,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,Most of the time,10GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,Often,,Most of the time,Often,,Often,,,Sometimes,,,,,,,,Most of the time,Often,,Often,,,,,Often,,Often,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,,,,,,,Most of the time,Sometimes,,,,Sometimes,,10-25% of projects,More internal than external,IT Department,"UCI, Physionet, Kaggle",Label is accurate enough,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,450000,CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Python,Google Search,"Official documentation,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,,Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A doctoral degree,Other,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,100MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Logistic Regression,Random Forests",,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,70,20,10,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Often,,,,Often,,Often,,,,,Often,Most of the time,,Often,,,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,SFTP,Git,Rarely,105000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,South Korea,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Miner,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,30,70,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,Google Search,"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Other,I haven't started working yet",Self-taught,80,10,10,0,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Female,United States,28,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,30,10,0,0,,,A doctoral degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Python,Deep learning,,,"Conferences,Kaggle,Non-Kaggle online communities,Official documentation",,,,,Very useful,,Very useful,,Very useful,Somewhat useful,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,,Work,30,10,35,15,10,0,"Adversarial Learning,Computer Vision,Time Series","Neural Networks - CNNs,Neural Networks - RNNs",,Government,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Image data,,1TB,"CNNs,Neural Networks,RNNs","Amazon Web services,Java,KNIME (free version),Python,TensorFlow",,Sometimes,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,RNNs,Time Series Analysis",,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Often,,,,20,20,20,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,100% of projects,Approximately half internal and half external,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,Git,Sometimes,"30,000,000",KRW,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Researcher,University courses,10,5,20,60,5,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A humanities discipline,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1MB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Prescriptive Modeling,Segmentation",Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,25,25,0,0,50,0,Enough to tune the parameters properly,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by college or university,R,Social Network Analysis,R,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Unsupervised Learning,"Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,33,33,14,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,DataRobot,Python,R,Unix shell / awk",,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,Rarely,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,Most of the time,Often,,Rarely,,,,,,,,,,,Often,,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,I prefer not to say,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,Often,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,75,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,CRM/Marketing,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Not very important,Other,Basic laptop (Macbook),"Text data,Relational data,Other",Don't know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,RapidMiner (commercial version),Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Conferences,Personal Projects,Tutoring/mentoring",,,,,Very useful,,,,,,,Very useful,,,,,Very useful,,KDnuggets Blog,1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Data Miner,Operations Research Practitioner",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important +Male,United States,24,"Not employed, but looking for work",,,,,,,,SQL,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,Time Series,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,South Korea,36,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,Deep learning,R,Google Search,"Blogs,College/University,Kaggle,Newsletters,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,Not Useful,,,,,,,Not Useful,,,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",0 - 1 hour,Experience from work in a company related to ML,No,Doctoral degree,Computer Science,3 to 5 years,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Natural Language Processing,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression",I don't know/not sure,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,Mexico,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,23,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Company internal community,Online courses",,,,Very useful,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Necessary,,,,DataCamp,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,,"Business Analyst,Data Analyst,Data Scientist",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Not important +Male,Philippines,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Textbook",Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,6 to 10 years,"Predictive Modeler,Researcher,Software Developer/Software Engineer,I haven't started working yet",Self-taught,40,20,10,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Academic,20 to 99 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Other,Laptop or Workstation and private datacenters,Other,Rarely,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,Most of the time,Rarely,Rarely,,,,,Often,,,,,,Rarely,Sometimes,,Rarely,,,,,,,Often,,,,10,30,10,30,20,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,,76-99% of projects,More external than internal,Other,Fluxnet,Metadata accessibility and maintenance,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,"Bitbucket,Git",Most of the time,30000,AUD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Ukraine,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Poorly,Self-employed,R,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,"Coursera,Udacity,Other",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,20,0,0,0,60,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important +Male,Japan,27,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,,Self-taught,60,5,10,25,0,0,Time Series,Bayesian Techniques,,Academic,I don't know,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,,Text data,Never,<1MB,,"Jupyter notebooks,Python,R,Stan",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Bayesian Techniques,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,50,0,20,0,0,Enough to run the code / standard library,"Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,Often,,,Often,,,10-25% of projects,More internal than external,IT Department,,Estimation the effect of some cognitive function on daily negative affect with Bayesian hierarchical liner regression.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,120,JPY,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Predictive Modeler,Fine,,Amazon Web services,Deep learning,C/C++/C#,I collect my own data (e.g. web-scraping),"Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,Very useful,Very useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Predictive Modeler,University courses,0,0,50,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10GB,"Gradient Boosted Machines,Random Forests,SVMs","C/C++,Jupyter notebooks,Python",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,,,,,,Rarely,,,,Rarely,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,80,10,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,,,,,,,,,,,,Most of the time,,,Sometimes,Most of the time,,76-99% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Rarely,200000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,0,0,0,30,"Adversarial Learning,Natural Language Processing,Speech Recognition,Time Series","Bayesian Techniques,Markov Logic Networks",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,1GB,"Bayesian Techniques,Markov Logic Networks","R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques",Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,IBM Watson / Waton Analytics,Deep learning,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,"Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Researcher,Other",University courses,15,5,10,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,100 to 499 employees,Decreased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",Text data,Rarely,10MB,"Bayesian Techniques,Ensemble Methods,Random Forests,SVMs","Amazon Web services,Java,MATLAB/Octave,Python,R,SQL,Tableau,TensorFlow,Other,Other",,Most of the time,,,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Rarely,,,Most of the time,Rarely,,"Bayesian Techniques,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Often,,,,,,Often,,,,,Often,,,,Sometimes,Most of the time,,Sometimes,,Often,,,,,Most of the time,Most of the time,Often,,,,20,10,0,15,55,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Sometimes,,,Most of the time,Rarely,,Sometimes,Often,,Rarely,,,Sometimes,Sometimes,Often,Sometimes,,Sometimes,Sometimes,Sometimes,,76-99% of projects,Approximately half internal and half external,Standalone Team,Census,"Cleaning, matching ids, name changes etc.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,72000,NZD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Cluster Analysis,C/C++/C#,GitHub,"Arxiv,Conferences",Very useful,,,,Very useful,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",5-10 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Machine Learning Engineer",University courses,10,10,0,70,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Cluster Analysis,Python,"GitHub,Google Search,University/Non-profit research group websites","Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Somewhat useful,Very useful,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,45,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Neural Nets,,GitHub,Podcasts,,,,,,,,,,,,,Somewhat useful,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Software Developer/Software Engineer,Work,20,20,60,0,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks","Amazon Web services,Cloudera,Java,Microsoft R Server (Formerly Revolution Analytics),Python,QlikView,R,Tableau,TensorFlow",,Most of the time,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,Often,Most of the time,,,,,,,,,,,,Sometimes,Most of the time,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Naive Bayes,Natural Language Processing,Random Forests",,Often,Often,,,,Often,,,,,,,,,,,Often,Often,,,,Often,,,,,,,,,,,20,30,10,20,20,0,Enough to run the code / standard library,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),"Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,200000,INR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,15,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Text Mining,Python,"GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Newsletters,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,,,Very useful,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Programmer,Researcher",Self-taught,50,20,10,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Internet-based,"10,000 or more employees",Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Most of the time,100GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,Often,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,Often,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation",Sometimes,,,Sometimes,,Often,Sometimes,,,,,,,,,Sometimes,,Sometimes,Often,Often,Often,,Sometimes,Sometimes,Often,Often,,,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,Sometimes,,Often,,,,,,,,,Sometimes,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,Imagenet,how to get well-labeled data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"400,000",CNY,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,10,NA,20,40,0,30,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Other,"Laptop or Workstation and private datacenters,Other",Text data,Sometimes,100MB,"Decision Trees,Other",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Researcher",Self-taught,80,0,15,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Other,Traditional Workstation,"Image data,Video data,Text data",Never,100GB,"CNNs,Neural Networks,RNNs,SVMs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Natural Language Processing,Neural Networks,SVMs,Text Analytics",,,,Often,,Most of the time,,,,,,,,Sometimes,,,,,Rarely,Most of the time,,,,,,,,Sometimes,Sometimes,,,,,5,10,0,5,0,80,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Sometimes,,,,,Sometimes,,,Often,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,No,Yes,Predictive Modeler,Fine,Employed by professional services/consulting firm,Python,Regression,Python,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,No Free Hunch Blog",< 1 year,,,,,Necessary,,Necessary,Necessary,,,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,10,1,0,19,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Australia,55,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Researcher,University courses,5,0,15,80,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,100MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,Very useful,Somewhat useful,"Data Stories Podcast,KDnuggets Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",11 - 39 hours,Github Portfolio,Yes,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher",Self-taught,20,30,0,0,10,40,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Japan,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,Very useful,Very useful,,Very useful,,Very useful,Somewhat useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,50,30,10,5,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,CRM/Marketing,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,"Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Random Forests,Recommender Systems,Segmentation",Often,Often,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,,Often,Often,,Often,,,,,,,,30,20,10,20,20,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,Most of the time,,Sometimes,,,,,Most of the time,Often,Sometimes,Often,,Most of the time,,51-75% of projects,Entirely internal,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,5500000,JPY,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,RapidMiner (free version),Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,10,30,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Retail,"1,000 to 4,999 employees",Increased slightly,6-10 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Most of the time,1GB,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,Spark / MLlib,SQL,Tableau,TensorFlow,Other",Sometimes,Sometimes,,,,,,,Often,,,,,,,,Often,,Often,,,Often,Often,Often,Most of the time,,,,,,,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Most of the time,Sometimes,,,,Most of the time,,"Data Visualization,Decision Trees,Logistic Regression,Recommender Systems",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,40,10,15,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,Sometimes,,,Often,,Rarely,,,,Often,,Sometimes,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,Programmer,Self-taught,50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,20 to 99 employees,Stayed the same,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Never,10MB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Julia,NoSQL,Python,R,SQL",,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Decision Trees,PCA and Dimensionality Reduction",Rarely,,,,,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,90,0,0,0,10,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,Google Search,"Blogs,Kaggle",,Very useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,1 to 2 years,Data Analyst,Self-taught,90,0,5,0,5,0,Speech Recognition,Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",CRM/Marketing,Fewer than 10 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Other,"Text data,Relational data",Sometimes,100MB,Other,"Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,Python,Tableau,TensorFlow",,,,,,,,Often,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,Often,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,50,40,0,10,0,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data",,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Business Department,,,Key-value store (e.g. Redis/Riak),"Email,Share Drive/SharePoint",,,Sometimes,4000,,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,Computer Scientist,University courses,25,25,25,0,25,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"5,000 to 9,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by a company that performs advanced analytics,,Deep learning,Python,Government website,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Scientist,DBA/Database Engineer",Work,10,0,50,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A professional degree,Financial,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Important,Other,Laptop or Workstation and private datacenters,Text data,Sometimes,,,"Java,Jupyter notebooks,Python,Stan,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations in the state of the art in machine learning",,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Other,,,,Other,,"Git,Mercurial,Other",Rarely,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",Very useful,Very useful,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,,,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Talking Machines Podcast",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Other","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,50,0,0,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Colombia,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Decision Trees,Python,Other,"College/University,Online courses,YouTube Videos",,,Very useful,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Partially Derivative Podcast,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important +Female,United States,48,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Engineer,Self-taught,80,0,0,0,20,0,"Recommendation Engines,Time Series","Ensemble Methods,Gradient Boosting","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,Engineer,Self-taught,60,20,20,0,0,0,Supervised Machine Learning (Tabular Data),,A doctoral degree,Technology,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,,,"Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Excel Data Mining,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,40,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Textbook",,,,,,,Very useful,Somewhat useful,,Very useful,Very useful,,Very useful,,Very useful,,,,"Partially Derivative Podcast,Other (Separate different answers with semicolon)",< 1 year,,,Necessary,Necessary,Necessary,,,Necessary,,Necessary,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,Self-taught,50,50,0,0,0,0,,,"Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer",Self-taught,50,0,50,0,0,0,"Time Series,Other (please specify; separate by semi-colon)","Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Internet-based,,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,Often,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,RNNs,SVMs",,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,0,50,0,30,20,0,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Vietnam,25,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Researcher,Kaggle competitions,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",,More than 10 years,I visited the company's Web site and found a job listing there,Important,Other,Traditional Workstation,,Never,1GB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Python,R,Stan",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Sometimes,,,Often,Often,,,,,Rarely,,Rarely,,,,,,,Sometimes,,Sometimes,,,,Often,,,Often,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Rarely,,,,,,Rarely,,,,Often,Often,,,100% of projects,Do not know,Other,Climate and weather data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Taiwan,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Company internal community,Kaggle,YouTube Videos",Very useful,Very useful,,Very useful,,,Very useful,,,,,,,,,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data",Sometimes,100MB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python",,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Often,,,,,Often,,,,,,,,30,40,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Never,1500000,TWD,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,Kaggle",Very useful,Very useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Researcher",University courses,0,0,70,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,"5,000 to 9,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Sometimes,1TB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Ensemble Methods,Neural Networks,RNNs",,,,Most of the time,,Sometimes,,,Often,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,20,60,20,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,300000,CNY,,5,,,,,,,,,,,,,,,,,, +Female,Malaysia,24,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,70,0,0,15,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,,,"Cloudera,Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,R,SAS Base,Spark / MLlib,SQL,Tableau",,,,,Rarely,,,,Rarely,,,,,,,,,,,,Rarely,,Rarely,,,,Rarely,,,,,,Rarely,,,,,Rarely,,,Rarely,Rarely,,,Rarely,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Time Series Analysis",,Rarely,Rarely,,,,Rarely,Rarely,,,,,,,,Rarely,,Rarely,Rarely,,,,,,,,,,,Rarely,,,,0,0,100,0,0,0,Enough to run the code / standard library,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,None,Do not know,Central Insights Team,,,Other,Email,,Other,Never,1850000,INR,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Mathematica,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Online courses,Personal Projects,Podcasts,Other",,,,,,,,,,,Somewhat useful,Not Useful,Not Useful,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",15,70,15,0,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Evolutionary Approaches",A master's degree,Financial,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Traditional Workstation,"Text data,Relational data",Sometimes,,,"Java,NoSQL,SQL,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,40,60,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Privacy issues",Often,,,,,,,,,,,,,,,,Most of the time,,,,,,Less than 10% of projects,Entirely internal,IT Department,None,They are private ,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Mercurial,Never,24000,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Scala,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",0,30,10,0,0,60,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,CRM/Marketing,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Neural Networks,Random Forests","Jupyter notebooks,Python,Spark / MLlib,SQL,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Often,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Often,,,,,Often,Often,Often,,Often,Sometimes,,Most of the time,,,,,,,,50,5,10,10,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data",Often,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,None,"Messy, text, and web logs",Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,"Bitbucket,Git",Sometimes,"10,000,000",JPY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,60,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,"Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,85,0,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Ensemble Methods","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,23,Employed part-time,,,No,Yes,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,Speech Recognition,Ensemble Methods,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Male,Philippines,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Support Vector Machines (SVM),R,Google Search,"Blogs,Company internal community,Stack Overflow Q&A",,Somewhat useful,,Somewhat useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher,Software Developer/Software Engineer",Work,50,30,20,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Always,100MB,"Neural Networks,Other","Microsoft Azure Machine Learning,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering",,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,30,50,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Most of the time,,,Most of the time,,,,,Often,,,,,,Often,,,None,Entirely internal,Standalone Team,,Too many business logic,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"20,000",PHP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Stan,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",0,50,30,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Internet-based,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Stan,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,Sometimes,,Often,,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",Most of the time,,Sometimes,,,Often,Often,Often,Sometimes,,,,,,,Sometimes,,,,,,Often,Sometimes,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Never,"10,000,000",JPY,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Taiwan,31,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,40,0,60,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,23,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",5,85,0,10,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Rarely,1GB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Necessary,,Unnecessary,Necessary,,,,"DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,A health science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,20,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,Yes,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",45,45,0,0,10,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,,Very useful,Very useful,,,Not Useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,Other,Self-taught,50,0,20,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Non-profit,"5,000 to 9,999 employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10MB,"Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Sometimes,,,,Often,,Sometimes,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",,,,,,Sometimes,Most of the time,Rarely,,,,,,Sometimes,,Often,,,Rarely,,Sometimes,,Rarely,,,,Sometimes,Often,Often,,,,,15,15,1,30,39,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Rarely,Rarely,,Most of the time,,,Sometimes,,,,,,Sometimes,,,Often,,Sometimes,Sometimes,Most of the time,,100% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,105000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,University courses,25,25,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,Python,Deep learning,R,Government website,"Blogs,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Other",University courses,50,0,49,1,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,Sometimes,,,Most of the time,Most of the time,,,,,,,Sometimes,,,,,,,Most of the time,,,,10,5,25,5,5,50,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues",Most of the time,Sometimes,,,,,,,Rarely,,Sometimes,,,,Most of the time,Sometimes,Often,,,,,,100% of projects,Approximately half internal and half external,IT Department,"Census, NOAA, American community survey",Getting departments to share or explain the data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Never,100000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Singapore,49,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,,C/C++/C#,University/Non-profit research group websites,"Conferences,Friends network,Kaggle,Online courses,Tutoring/mentoring",,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,Very useful,,,< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Other,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Computer Scientist,Data Miner,Programmer",University courses,0,0,0,100,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Taiwan,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important +Male,Russia,26,Employed full-time,,,No,Yes,Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Speech Recognition,Bayesian Techniques,A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,,,Very Important,,,,,,,,Very Important,,,Somewhat important,Somewhat important +Male,India,21,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Cluster Analysis,R,Other,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Other,,Experience from work in a company related to ML,No,Bachelor's degree,Other,I don't write code to analyze data,I haven't started working yet,Other,20,30,50,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Data Analyst",Work,25,0,50,25,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,,,Rarely,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Often,,,,Often,,Often,,,,,,,,Sometimes,Often,,,Sometimes,,Sometimes,,Most of the time,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,Often,Most of the time,Often,,Often,Sometimes,,,,,Most of the time,Most of the time,,Rarely,Most of the time,Most of the time,Most of the time,,,51-75% of projects,More internal than external,Standalone Team,,Very dirty self-reported source data,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,115000,SPL,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,44,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Self-employed",R,,R,"University/Non-profit research group websites,Other","Online courses,Personal Projects,Textbook",,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,1 to 2 years,Researcher,Self-taught,80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Other,Fewer than 10 employees,Decreased slightly,1-2 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Other,,10MB,"Regression/Logistic Regression,Other",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,Sometimes,Rarely,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Most of the time,,,,,,,100% of projects,Approximately half internal and half external,,Government public dataset; data from website of other organization,Difficulty to import data from pdf file to a data.frame in R.,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,3 to 5 years,Other,University courses,15,15,20,40,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,15,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Other,I don't write code to analyze data,Other,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Julia,Deep learning,Julia,GitHub,Arxiv,Very useful,,,,,,,,,,,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,11 - 39 hours,PhD,Yes,Bachelor's degree,Physics,,I haven't started working yet,Work,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,20+,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle",,Very useful,,,,,Very useful,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"DBA/Database Engineer,Programmer",Self-taught,70,0,0,0,30,0,Recommendation Engines,Decision Trees - Random Forests,A professional degree,Technology,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Always,1GB,Random Forests,"Python,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Random Forests",,,,,,,Most of the time,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,30,60,10,0,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,30000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Social Network Analysis,Python,University/Non-profit research group websites,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,University courses,50,30,0,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,33,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,20,10,40,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,"10,000 or more employees",Increased significantly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,Programmer,Researcher",University courses,30,0,20,50,0,0,Computer Vision,"Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Image data,Sometimes,100GB,Neural Networks,"Hadoop/Hive/Pig,NoSQL,Python,R,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,,Sometimes,,,,,,,"Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,,,,Often,,,,,,,Most of the time,,,,,,Sometimes,Often,,,,,,,,,,,,,30,20,10,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,49,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A",,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Other,"Traditional Workstation,Other",Other,Most of the time,<1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","C/C++,Python,SQL",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Simulation,SVMs",,,,,,Rarely,Often,Sometimes,Sometimes,,,,,Often,,Sometimes,,Often,,Sometimes,,,Sometimes,,,,Often,Often,,,,,,40,30,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,Other,None.,My time os shared between data science projects and other development activities,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Sometimes,67500,,Has decreased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,20,20,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Manufacturing,"10,000 or more employees",Stayed the same,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,10GB,Regression/Logistic Regression,"Cloudera,Python,R,SQL,Unix shell / awk",,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,Simulation,Other",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,Often,,,25,5,40,10,20,NA,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",Often,Sometimes,,,Often,,,Most of the time,Sometimes,,,,,Most of the time,,,Often,,,,,,100% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint,Other",ftp,,Sometimes,85000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",40,20,20,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Not important,Not important +Male,Russia,28,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,20,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,R,University/Non-profit research group websites,"College/University,Kaggle",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,50,0,0,50,0,0,Time Series,Neural Networks - RNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,23,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,0,50,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,I prefer not to answer,Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM Cognos,Jupyter notebooks,Python,R",,,,,,,,,,Rarely,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,,,,,Very useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,Self-taught,30,30,0,0,40,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Internet-based,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",Relational data,Always,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,Often,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests,Recommender Systems",Most of the time,Rarely,Rarely,,Rarely,Most of the time,Most of the time,Most of the time,Sometimes,,,Most of the time,,,,Sometimes,,Rarely,,,,,Sometimes,Most of the time,,,,,,,,,,30,20,20,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning",,,,,Often,Sometimes,,,,,,Sometimes,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Always,"310,000",CNY,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,35,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,41,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Neural Nets,R,Google Search,"Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),R,SQL",,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,Often,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",Often,,,,,Often,Most of the time,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,40,20,5,25,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,Most of the time,,Most of the time,Sometimes,,,Most of the time,,Often,,,,,Most of the time,,,,,,,76-99% of projects,Entirely internal,IT Department,Census,Incomplete; inaccurate data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Git,Other",Sometimes,120000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,34,"Not employed, but looking for work",,,,,,,,Orange,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,Very useful,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",5-10 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,Udacity","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,30,0,20,50,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Hong Kong,NA,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by non-profit or NGO,,,,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Non-profit,,,,"A friend, family member, or former colleague told me",Important,,Basic laptop (Macbook),Text data,,,Regression/Logistic Regression,"C/C++,Perl,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,,,,,2,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle",,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,"Linear Digressions Podcast,Partially Derivative Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Time Series,Other (please specify; separate by semi-colon),A professional degree,Other,"10,000 or more employees",,More than 10 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Other,Other,,,Regression/Logistic Regression,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,0,0,40,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician",University courses,0,0,25,75,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Rarely,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Java,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Perl,Python,R,SQL,Unix shell / awk,Other",,Rarely,,Rarely,,,Rarely,Rarely,Sometimes,,,,,,Sometimes,,,,,,Rarely,,Sometimes,,,,Sometimes,,,Rarely,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,Most of the time,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,SVMs,Text Analytics,Time Series Analysis,Other",Sometimes,,Rarely,,,Sometimes,Most of the time,Sometimes,Often,,,,,Sometimes,Often,Often,,,,,,,Sometimes,,,,,Sometimes,Often,Sometimes,,,Most of the time,40,0,5,0,55,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,R,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Official documentation,Personal Projects,Podcasts,Textbook",,,,,,,,,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Biology,,Data Analyst,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Other,25,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",50,20,0,20,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Monte Carlo Methods,R,Government website,"Kaggle,Podcasts",,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,"O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,Time Series,Logistic Regression,A bachelor's degree,CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation",Sometimes,,,,,,Often,,,,,,,,,Often,,,,,,Sometimes,,,,Sometimes,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Often,,,,,,,Often,,,Often,,,,,Often,,,,,,,51-75% of projects,Entirely internal,Central Insights Team,,,Column-oriented relational (e.g. KDB/MariaDB),,,,,60000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,53,Employed full-time,,,No,Yes,Statistician,Fine,Employed by college or university,Orange,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook,Tutoring/mentoring",,,,,,,,,,,Very useful,,,,Somewhat useful,,Somewhat useful,,KDnuggets Blog,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Biology,Less than a year,"Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,75,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,Other,Neural Nets,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","College/University,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,,,,,,Very useful,,Very useful,Somewhat useful,,,,"DataTau News Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Programmer,Researcher,Statistician",Self-taught,80,0,0,NA,0,20,"Adversarial Learning,Machine Translation,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Financial,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Always,100GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression,Other","C/C++,NoSQL,Python,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,Sometimes,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Text Analytics",,Often,Often,,Often,Sometimes,,Often,,,,,,Often,,Often,Sometimes,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,60,15,10,5,10,0,Enough to refine and innovate on the algorithm,"Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,Often,,Most of the time,,,Sometimes,,,Sometimes,,,Less than 10% of projects,Entirely internal,IT Department,"Analyzing unstructured text from the internet (HTML), forms, SQL databases, csv, XML, JSON",Data is in multiple languages but we solved that.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Other,Most of the time,110,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Tableau,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"DataCamp,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,Time Series,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Very Important,Not important,,Somewhat important +Male,Australia,46,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Other",,,,,,,,,,,,Very useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,60,40,0,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Logistic Regression",A doctoral degree,Technology,20 to 99 employees,Stayed the same,3-5 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,Bayesian Techniques,"Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Most of the time,,,Rarely,,,,,,,"A/B Testing,Naive Bayes,Text Analytics",Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,50,20,5,5,20,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Often,,,Often,,Most of the time,,26-50% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,82000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Online courses,Personal Projects,Textbook,Trade book",,,Somewhat useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Master's degree,Other,I don't write code to analyze data,"Software Developer/Software Engineer,Other",University courses,10,10,0,70,10,0,,Bayesian Techniques,"Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,SQL,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,50,10,10,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Never,10GB,Regression/Logistic Regression,"Java,MATLAB/Octave,Python,R,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,,Sometimes,,,,,,,,,,,Often,,Most of the time,,Most of the time,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,20,70,0,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,Often,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,Less than 10% of projects,Approximately half internal and half external,IT Department,TREC DATASETS; OPEN CLASSIFICTAION DATASETS,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,80000,CNY,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,73,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Genetic & Evolutionary Algorithms,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,25,0,0,0,75,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Text Mining,R,GitHub,"College/University,Personal Projects,YouTube Videos",,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Self-taught,40,15,10,15,15,5,Reinforcement learning,Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +,,NA,Employed part-time,,,Yes,,Operations Research Practitioner,Fine,Employed by government,SQL,Bayesian Methods,SQL,Google Search,"Personal Projects,Textbook",,,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Operations Research Practitioner,Other",University courses,20,0,0,80,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,,,,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,1MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,70,0,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,Sometimes,,,,,,,,,Sometimes,,,Often,Most of the time,,100% of projects,Entirely internal,Other,,Linking disparate data sources consistently,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Other,Never,120000,USD,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Australia,26,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,55,0,20,0,0,Unsupervised Learning,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,1TB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,TensorFlow",,Rarely,,,Rarely,,,,Rarely,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,Most of the time,Most of the time,Rarely,,Often,,,,Rarely,,,,,,"Data Visualization,Lift Analysis,Logistic Regression,Neural Networks,Text Analytics",,,,,,,Often,,,,,,,,Sometimes,Sometimes,,,,Rarely,,,,,,,,,Sometimes,,,,,60,30,0,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,,,,,,Often,,,,,,Often,,,,,,,,76-99% of projects,More internal than external,Business Department,Australian Bureau of Statistics; public demographic data,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,84000,AUD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,,Very useful,,,,,,,Very useful,"DataTau News Aggregator,R Bloggers Blog Aggregator",3-5 years,,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,South Africa,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,Very useful,,,,"Siraj Raval YouTube Channel,Talking Machines Podcast,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",35,60,0,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,South Africa,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,26,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,Linear Digressions Podcast,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Chile,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Data Analyst,Engineer",Work,70,20,10,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Pakistan,30,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,Microsoft SQL Server Data Mining,,Matlab,I collect my own data (e.g. web-scraping),"Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,Very useful,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,I haven't started working yet",University courses,30,5,5,60,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Never,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks","Java,Julia,Mathematica,Microsoft Excel Data Mining,Python,SQL,Tableau",,,,,,,,,,,,,,,Often,Rarely,,,,Most of the time,,,Most of the time,,,,,,,,Often,,,,,,,,,,,Often,,,Rarely,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,Sometimes,Often,,,Often,Often,,,Most of the time,,,,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,Often,,,,,40,20,5,15,10,10,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,Most of the time,,,,,,Most of the time,,,Most of the time,,Most of the time,Most of the time,,,,,,,26-50% of projects,More external than internal,IT Department,UCI,Believe on data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Other,Never,50000,PKR,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,10,20,30,30,10,0,Recommendation Engines,"Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,41,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",40+,Github Portfolio,Yes,Doctoral degree,Physics,,"Data Analyst,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important +Male,India,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,12,15,25,5,20,23,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook",Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,"Data Elixir Newsletter,FastML Blog,Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,6 to 10 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,20,20,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,10GB,Gradient Boosted Machines,"Amazon Web services,Java,Jupyter notebooks,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks",,,,,,,,,,,,Often,,,,Sometimes,,,Often,Often,,,,,,,,,,,,,,15,15,25,15,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,Often,,Often,,,,,,,,Often,,,,Often,,Often,Often,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,155000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,"Business Analyst,Computer Scientist,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,"N/A, I did not receive any formal education",Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,1GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,GANs,HMMs,Neural Networks,RNNs,Other","Amazon Web services,Flume,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Stan,TensorFlow,Unix shell / awk",,Sometimes,,,,,Sometimes,,,,,,,,Sometimes,,Often,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Often,Most of the time,Often,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,GANs,HMMs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Simulation,Time Series Analysis",Sometimes,,Most of the time,Most of the time,,Most of the time,Most of the time,,Often,Often,Sometimes,,Sometimes,Rarely,,,,Rarely,,Most of the time,Often,,Sometimes,,Often,Often,Most of the time,,,Most of the time,,,,5,70,5,5,15,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,Most of the time,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,28,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,"Online courses,Personal Projects",,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,"Data Machina Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",40+,PhD,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,70,10,0,20,0,0,"Adversarial Learning,Computer Vision","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,India,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Self-employed,R,Survival Analysis,C/C++/C#,Google Search,"Conferences,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,,,,,,,,,Somewhat useful,,Very useful,Somewhat useful,"FlowingData Blog,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",1,99,0,0,0,0,Survival Analysis,Support Vector Machines (SVMs),A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,18,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",65,20,0,10,5,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",Text data,Most of the time,100GB,"CNNs,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",Sometimes,Sometimes,,Often,,,,,Sometimes,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,NoSQL,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Blogs,Kaggle,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,,,,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Other,Self-taught,70,0,0,0,30,0,Survival Analysis,Logistic Regression,A master's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Don't know,100MB,Regression/Logistic Regression,"NoSQL,QlikView,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,,,,Most of the time,Sometimes,,,,50,20,20,8,2,0,Enough to tune the parameters properly,"Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,Often,,51-75% of projects,More internal than external,Central Insights Team,,,,,,,,180000,INR,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler",Self-taught,75,5,NA,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Textbook",Somewhat useful,,,,,,Very useful,,,,,,,,Very useful,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Programmer,University courses,10,10,20,20,40,0,"Outlier detection (e.g. Fraud detection),Time Series","Ensemble Methods,Gradient Boosting",High school,Manufacturing,"10,000 or more employees",Increased slightly,Less than one year,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Other,Sometimes,1GB,Neural Networks,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,Often,,,,,,,,Often,Most of the time,,,,,,,,,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,Difficulties in deployment/scoring,,,,Most of the time,,,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Git,Never,50000,,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Weka,Bayesian Methods,R,Google Search,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Statistician,Self-taught,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Academic,500 to 999 employees,Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,RapidMiner (commercial version),RapidMiner (free version),Salfrod Systems CART/MARS/TreeNet/RF/SPM",,,,,,,,,,,Often,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,Often,,Often,Sometimes,Sometimes,Often,,,,,,,,,,,,,,,,"CNNs,Decision Trees,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,,Sometimes,,,,Most of the time,,,,,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,,Often,Most of the time,,,,100,100,80,100,100,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Business Department,,No challenge foreseen,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Other,,1800000,INR,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Time Series Analysis,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,,,,,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Machine Learning Engineer,Predictive Modeler,Programmer",Self-taught,60,20,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Philippines,49,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,DataRobot,Proprietary Algorithms,Matlab,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,,,,,,,,Very useful,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,100,0,0,0,0,0,"Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,Sometimes,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,Often,Often,,Often,Most of the time,Most of the time,Most of the time,,Most of the time,,Sometimes,Often,Often,,Often,,Often,,Most of the time,Most of the time,Often,Sometimes,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,,60,10,5,15,5,5,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues",Most of the time,,,,Most of the time,,,Often,,,,,,,,,Often,,,,,,10-25% of projects,More external than internal,IT Department,,"Dirty Data, not standard, manual input ","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,25000,USD,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,IBM Watson / Waton Analytics,Neural Nets,C/C++/C#,"I collect my own data (e.g. web-scraping),Other","Company internal community,Official documentation,Personal Projects,Tutoring/mentoring",,,,Somewhat useful,,,,,,Very useful,,Very useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,80,0,10,10,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Time Series",Decision Trees - Random Forests,High school,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","C/C++,Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Simulation,Text Analytics",,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Most of the time,,,,,10,15,0,50,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,,,,,Sometimes,Rarely,,Rarely,,,,,,,76-99% of projects,Entirely internal,Business Department,MLS housing data,it's large size exceeding computer and software capacity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,USD,Other,8,,,,,,,,,,,,,,,,,, +Male,Australia,25,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,"Other,I haven't started working yet",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"Data Machina Newsletter,FastML Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,29,"Not employed, but looking for work",,,,,,,,R,Deep learning,SQL,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Survival Analysis,Unsupervised Learning",Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,25,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Somewhat useful,,Somewhat useful,Very useful,,Very useful,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,90,0,9,0,1,0,Computer Vision,Decision Trees - Gradient Boosted Machines,A master's degree,Technology,"10,000 or more employees",Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10TB,Decision Trees,"Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,Often,,Most of the time,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,Often,,,Most of the time,,,,"Data Visualization,Decision Trees,Natural Language Processing",,,,,,,Often,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,60,20,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input",Most of the time,,,,Often,,,,,,Sometimes,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,drug details from fda; upc codes; insurance data,special characters that are supported in one environment but not supported in others.,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Git",Always,400000,INR,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,QlikView,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Friends network,Online courses,Textbook,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,,"Data Analyst,Researcher",University courses,NA,NA,NA,NA,NA,NA,Unsupervised Learning,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,R,Google Search,"Friends network,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,Very useful,,Somewhat useful,,,Very useful,Somewhat useful,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,85,10,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Text data,Sometimes,1MB,SVMs,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,70,15,5,5,5,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,More internal than external,Other,None,Cleaning Data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,350000,INR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Russia,69,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Self-employed",Microsoft Azure Machine Learning,Deep learning,R,Other,"Blogs,Newsletters,Textbook",,Very useful,,,,,,Very useful,,,,,,,Somewhat useful,,,,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Statistician",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Python,R",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Rarely,,,,Often,Often,Often,Sometimes,,,,,Most of the time,,Most of the time,,Most of the time,,Sometimes,Most of the time,,Sometimes,,,Most of the time,Often,Sometimes,Sometimes,Most of the time,,,,50,15,15,10,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,Often,Most of the time,,,,,Often,,,,Often,,,,,Most of the time,Sometimes,,10-25% of projects,Approximately half internal and half external,Standalone Team,???,???,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1800000,INR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Australia,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,TensorFlow,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Machine Learning Engineer,Statistician",Self-taught,50,0,20,0,30,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Government,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Other,Rarely,10GB,"CNNs,Neural Networks,Random Forests,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,,Sometimes,,Often,Most of the time,,Sometimes,,,,,Sometimes,,Sometimes,,,,Often,Often,,Sometimes,,,,Often,Sometimes,,,,,,30,30,0,30,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Share Drive/SharePoint,Other",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,DataRobot,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Blogs,Kaggle,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,,,,Somewhat useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Engineer,Researcher",University courses,20,30,0,40,10,0,"Computer Vision,Reinforcement learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important +Male,India,17,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,,,Very useful,Somewhat useful,,,,,,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,Siraj Raval YouTube Channel",< 1 year,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Deep learning,R,Google Search,"Arxiv,College/University,Tutoring/mentoring",Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,,,,,Necessary,,Nice to have,,,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,70,20,NA,10,0,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Speech Recognition","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,Java,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,10 to 19 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Google Cloud Compute,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL",Sometimes,,,,,,,Often,,,,,,,,,,,,,Often,,Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",,Often,Often,Most of the time,,Most of the time,Most of the time,,,,Most of the time,Most of the time,,Most of the time,,Often,Often,Most of the time,Most of the time,Most of the time,Often,Often,Sometimes,Sometimes,Often,,,Often,Most of the time,,,,,50,30,10,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,I prefer not to say,Lack of funds to buy useful datasets from external sources,Privacy issues",Sometimes,,,Sometimes,,,Most of the time,,,Sometimes,,,,,,,Often,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"10,000",,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,,Self-employed,Microsoft Azure Machine Learning,Proprietary Algorithms,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Tutoring/mentoring",,,,,,,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,More than 10 years,Engineer,University courses,30,0,50,20,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Markov Logic Networks,Neural Networks - CNNs",A doctoral degree,Technology,,,,,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Sometimes,,"Gradient Boosted Machines,Neural Networks","C/C++,IBM Watson / Waton Analytics,MATLAB/Octave,TensorFlow",,,,Sometimes,,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Simulation",,,,,,,Often,,,,,,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,,,,,20,40,20,20,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,Often,,,,,,Most of the time,,Often,,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),,"50,000",USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Taiwan,40,Employed full-time,,,Yes,,Engineer,Fine,Employed by non-profit or NGO,,,,,"Blogs,Friends network",,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,Work,20,10,70,0,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Manufacturing,"5,000 to 9,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,,Sometimes,Often,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,Often,,,Often,Often,Often,Often,Often,,,,,,,Often,,,Often,,Often,,Often,Often,,,,Often,Sometimes,Often,,,,50,30,0,10,10,0,Enough to tune the parameters properly,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Subversion,,,,,8,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",,Technology,500 to 999 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",,100GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,Sometimes,,,,Often,,,,Often,,Often,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics",,,,,,,Often,Sometimes,,,,,,Often,,Often,,Often,Often,,,Often,Often,Often,,Often,Often,,Often,,,,,20,50,10,10,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,Sometimes,,,Often,Often,,Sometimes,,Often,,,,,,,,,,,Often,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Bitbucket,Git",Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Not at all important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,70,10,5,5,10,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,10-25% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,67000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,GitHub,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Github Portfolio,No,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,Google Search,"Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Very useful,Very useful,,,Very useful,,,,,O'Reilly Data Newsletter,1-2 years,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Yes,Doctoral degree,Biology,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Female,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,3 to 5 years,"Data Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",10,5,25,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Data Scientist,Self-taught,50,0,30,20,0,0,Time Series,Neural Networks - CNNs,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,41,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Personal Projects,Textbook,Tutoring/mentoring",,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,40,0,60,0,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Academic,20 to 99 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Image data,Sometimes,10MB,"Neural Networks,RNNs","C/C++,Mathematica,Perl,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,,,,,,,Most of the time,Sometimes,,,,Most of the time,,Most of the time,,,Often,,,,5,10,0,5,10,70,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,,Often,,,,Most of the time,,,Sometimes,Sometimes,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,7500000,JPY,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer",University courses,10,0,0,90,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs",A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Researcher,Self-taught,40,20,20,0,20,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,Primary/elementary school,Academic,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),,,,,Cloudera,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,30,30,10,30,0,0,Enough to run the code / standard library,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,None,Entirely internal,IT Department,,,,,,,,50000,INR,,,,,,,,,,,,,,,,,,,, +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,30,0,60,0,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data",Sometimes,1TB,"CNNs,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",Sometimes,Often,,Often,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Text Analytics",,,Sometimes,Most of the time,,Most of the time,Most of the time,Often,Most of the time,,,Often,Sometimes,Often,,Often,Sometimes,,Often,Most of the time,Most of the time,,Often,,Often,Most of the time,,,Often,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,Often,,,Most of the time,Often,,,Sometimes,Most of the time,,Often,Often,Sometimes,,Most of the time,,Sometimes,,,,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Netherlands,35,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,,Somewhat useful,Very useful,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Decreased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Minitab,Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,,,,,Often,,Often,,,,Sometimes,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,,Often,,,Often,Most of the time,Sometimes,,,,Often,,Often,,Often,Often,Often,,,,Sometimes,Often,,,Often,,,,Sometimes,,,,60,15,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,Sometimes,,,Often,Often,,,Often,,,,Often,,,,,Sometimes,Often,,Often,,26-50% of projects,Approximately half internal and half external,Business Department,,Size ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,45000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Amazon Machine Learning,Support Vector Machines (SVM),Matlab,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,Software Developer/Software Engineer,Work,0,0,100,0,0,0,Computer Vision,Neural Networks - CNNs,Primary/elementary school,Military/Security,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Image data,Most of the time,1GB,"Bayesian Techniques,Neural Networks","C/C++,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction",,,Rarely,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,Sometimes,,,,,,,,,,,,,50,20,15,5,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,51-75% of projects,Do not know,IT Department,,obtaining 90 percentage accuracy ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,,Most of the time,"45,000",INR,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Other,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,48,0,0,50,2,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Decreased significantly,Don't know,A career fair or on-campus recruiting event,Very important,Other,Laptop or Workstation and private datacenters,Relational data,,,Other,"Java,NoSQL,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools",Most of the time,Most of the time,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,,Often,,,,,,,,,,None,Do not know,Standalone Team,None,None,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,,,LKR,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Male,Pakistan,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,KNIME (free version),Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,60,10,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,100 to 499 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,Decision Trees,"Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,Often,,,,,Sometimes,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction",,Sometimes,,,,Sometimes,Often,Often,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,25,25,5,25,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,,Sometimes,,,Often,,,,Often,,,,Often,,Sometimes,,Most of the time,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Never,4200000,PKR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos,Other",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",,2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,55,25,0,0,20,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,54,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,,,3-5 years,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Not important,Not important +Female,United States,34,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,,University courses,30,10,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Pharmaceutical,100 to 499 employees,Increased significantly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Other,Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,32,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Tableau,Uplift Modeling,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,Somewhat useful,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,50,0,30,20,0,0,Unsupervised Learning,Neural Networks - CNNs,High school,Technology,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Sometimes,10GB,Neural Networks,"Microsoft Excel Data Mining,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,40,20,20,20,0,0,Enough to run the code / standard library,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,10-25% of projects,Entirely internal,IT Department,kaggle,unlabelled images and bad quality images,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Share Drive/SharePoint,,Bitbucket,Sometimes,55000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Python,Neural Nets,R,"GitHub,Google Search,University/Non-profit research group websites","Blogs,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,,,,,,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Predictive Modeler,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,"10,000 or more employees",Increased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Microsoft Azure Machine Learning,Python,R",Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,Often,,,Often,Often,Often,Often,,,,,,Often,,Most of the time,,,,,,,Sometimes,Often,,Often,,,Sometimes,Often,,,,50,20,10,5,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,,,,,,,,Often,,Often,,,10-25% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,2100000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Engineer,Fine,,Microsoft Azure Machine Learning,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,"Computer Vision,Natural Language Processing",Other (please specify; separate by semi-colon),A bachelor's degree,Other,"10,000 or more employees",Increased slightly,Don't know,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Never,100MB,"Bayesian Techniques,Other","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Decision Trees,Naive Bayes,Time Series Analysis",,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,50,15,5,25,5,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input",,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,Not sure,Clean data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git,Mercurial,Other",Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,R,Google Search,"Arxiv,Kaggle,Stack Overflow Q&A,Textbook",Very useful,,,,,,Very useful,,,,,,,Very useful,Very useful,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Software Developer/Software Engineer,Other",University courses,20,0,50,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Always,100TB,"Bayesian Techniques,Neural Networks,SVMs","C/C++,Flume,Google Cloud Compute,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,,30,5,30,5,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Sometimes,Sometimes,,,,,,,,,,,Often,Sometimes,,,Often,,10-25% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Never,300000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Australia,26,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Self-employed,IBM Watson / Waton Analytics,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Podcasts,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Physics,3 to 5 years,"Software Developer/Software Engineer,Other",Self-taught,35,35,0,30,0,0,,,High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,India,34,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,56,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"Blogs,Conferences,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,60,0,0,40,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Pharmaceutical,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Text data,Relational data",Rarely,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Sometimes,,Often,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,,Rarely,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,,Most of the time,,,Often,Often,Often,,Most of the time,,Often,,,Sometimes,Sometimes,Most of the time,,,,40,30,10,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Sometimes,Often,,Most of the time,Most of the time,Often,,,Most of the time,,Often,,,Often,,,Most of the time,,,,Most of the time,,10-25% of projects,More internal than external,IT Department,"kaggle;UCI, MIT",Missing domain expertise,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Sometimes,170000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",Work,30,15,5,50,0,0,,,A master's degree,Retail,"10,000 or more employees",Decreased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,90,0,5,5,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,,,,,,,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,"Data Analyst,DBA/Database Engineer,Machine Learning Engineer",Self-taught,NA,NA,NA,NA,NA,NA,Recommendation Engines,"Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,Somewhat important,,,,,,,,,,,,,, +Female,India,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Text Mining,R,"Google Search,I collect my own data (e.g. web-scraping)",Conferences,,,,,Very useful,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Other,University courses,30,20,20,0,0,30,Other (please specify; separate by semi-colon),Support Vector Machines (SVMs),A bachelor's degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,,Laptop or Workstation and local IT supported servers,Text data,Rarely,10MB,SVMs,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Naive Bayes,Natural Language Processing,SVMs,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,Often,Often,,,,,,,,,Often,Often,,,,,20,30,20,20,10,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,51-75% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",,,,,50000,RSD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,Other",,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,Very useful,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Male,Other,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,"Blogs,College/University",,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Scientist,Engineer,Software Developer/Software Engineer",University courses,10,20,40,30,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",CRM/Marketing,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,,,"Decision Trees,Neural Networks","KNIME (free version),Python,QlikView,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Lift Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,,,,,,Sometimes,,Often,,,51-75% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,,,,Has increased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Researcher,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Other,Self-taught,60,20,10,0,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Sometimes,,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Angoss,C/C++,Cloudera,Flume,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python,R,RapidMiner (free version),SAS Base,Spark / MLlib,Tableau,TensorFlow",,,,Most of the time,Often,,Often,,Often,Often,Often,,,,,,Most of the time,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,Most of the time,Most of the time,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10,30,20,30,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,Very useful,,< 1 year,,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,Other,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,20,0,40,30,0,,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,India,37,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Amazon Machine Learning,Factor Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Friends network,Online courses,YouTube Videos",,Very useful,Somewhat useful,,,Very useful,,,,,Very useful,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks",High school,Academic,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL",,,,Sometimes,,,,,,,,Often,Sometimes,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests,Simulation",,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,Often,Often,Often,,,,,,,,Often,,Often,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,450000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Pakistan,21,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Neural Nets,Python,I collect my own data (e.g. web-scraping),"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Reinforcement learning,Speech Recognition,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Poland,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,,,,edX,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,10,10,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Logistic Regression",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important +Male,Pakistan,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data",Sometimes,10GB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,MATLAB/Octave,Python,TensorFlow",,Rarely,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Logistic Regression,Neural Networks,Recommender Systems,Segmentation",,,Rarely,,,,,,,,,,,,,Often,,,,Often,,,,Rarely,,Sometimes,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,,"DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Mathematics or statistics,Less than a year,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Australia,35,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,,,,Very useful,Very useful,"Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,,Less than a year,Other,University courses,20,20,0,50,10,0,Time Series,Logistic Regression,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,24,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Google Search,University/Non-profit research group websites","Online courses,Personal Projects,Trade book,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,,Somewhat useful,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Other,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,Rarely,,,Rarely,,,,,,Often,,,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Random Forests",,,,,,Sometimes,Often,,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,10,30,40,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,Often,,,,,,,,,Rarely,,,26-50% of projects,Entirely internal,Other,None,HIPAA policies,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,I don't typically share data",,Other,Rarely,80000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",23,30,16,0,22,9,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,SVMs","Jupyter notebooks,MATLAB/Octave,Python,RapidMiner (free version),Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Often,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs",,,Sometimes,Often,,Often,,Often,,,,Often,,,,,,,,Often,,,Often,,,,,Often,,,,,,23,17,30,10,15,5,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,Often,,Sometimes,,,,,,,,,Often,,,Less than 10% of projects,More external than internal,Central Insights Team,news feeds;pdf docs; audio tapes;,data mining;feature selection;,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,,1700000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,DataRobot,Monte Carlo Methods,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Friends network,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,Not Useful,Very useful,,,Somewhat useful,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Fine arts or performing arts,1 to 2 years,Other,Self-taught,10,5,5,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,Rarely,Often,,Rarely,Rarely,,,,,,,Sometimes,,,,,,,,,Sometimes,,,Often,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Often,,,,,,,,,Sometimes,,,Sometimes,,,Sometimes,Rarely,,,Often,,,Often,Often,,,,60,15,5,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Sometimes,,,,Most of the time,,,,Sometimes,,,Most of the time,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,65000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,Microsoft Excel Data Mining,Deep learning,R,Other,"Personal Projects,YouTube Videos",,,,,,,,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Data Analyst,Work,80,15,5,0,0,0,Time Series,Logistic Regression,High school,Technology,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Other,Traditional Workstation,Other,Rarely,<1MB,Other,"IBM SPSS Statistics,R",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,10,10,0,70,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,Often,,Sometimes,,,,,,,,,,,Often,,,Sometimes,,,Less than 10% of projects,Do not know,Standalone Team,,to clearly understand the data required for modelling.,Other,Email,,Other,Sometimes,900000,INR,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Pakistan,20,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Official documentation,Online courses,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,Siraj Raval YouTube Channel,< 1 year,Necessary,,Nice to have,,Necessary,Necessary,Necessary,,,Necessary,,,,"DataCamp,Udacity",GPU accelerated Workstation,40+,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Other,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Very Important,,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +,,NA,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,R,GitHub,"College/University,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,,,,,,,,,,Very useful,,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Management information systems,6 to 10 years,"Data Analyst,Data Scientist,Researcher",University courses,10,0,40,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased significantly,Don't know,Some other way,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,MATLAB/Octave,Perl,R,SQL,Tableau,Unix shell / awk",,,,Rarely,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,Rarely,,,Most of the time,,,,,,,,,Sometimes,,,Often,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",Most of the time,,,,,Often,Most of the time,Often,,,,,,Often,,Often,,,,,Most of the time,,Often,,,,Sometimes,Sometimes,,Sometimes,,,,20,30,10,30,10,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,na,na,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"110,000",AUD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Other,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,Decision Trees,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,,,,,,Sometimes,,,,Sometimes,,,26-50% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,University/Non-profit research group websites,"Conferences,Friends network,Kaggle,Online courses,Personal Projects,Tutoring/mentoring",,,,,Somewhat useful,Very useful,Very useful,,,,Very useful,Very useful,,,,,Very useful,,"KDnuggets Blog,Partially Derivative Podcast,Talking Machines Podcast",< 1 year,Necessary,,,,,,,Necessary,Nice to have,,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Physics,Less than a year,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,5,0,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Australia,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,,"GitHub,Google Search,Government website","Arxiv,Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,,,,,Very useful,Very useful,Very useful,,Somewhat useful,,,Very useful,Very useful,,< 1 year,,,,,Necessary,,,Necessary,Necessary,,,,,"Coursera,Udacity","Basic laptop (Macbook),Other",2 - 10 hours,Other,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Engineer,Self-taught,25,75,0,0,0,0,,,I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,South Korea,26,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,I haven't started working yet,University courses,5,15,0,80,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,21,"Not employed, but looking for work",,,,,,,,Python,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Other,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,55,0,0,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,Less than a year,,Self-taught,70,30,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Retail,"1,000 to 4,999 employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Always,,,"Java,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,30,10,303,0,Enough to run the code / standard library,Inability to integrate findings into organization's decision-making process,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,20,20,0,60,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and private datacenters,"Image data,Text data",Most of the time,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Logistic Regression,Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,Sometimes,,,Most of the time,Most of the time,,,,,,,,,,,,,,50,20,30,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,,Often,,,,Most of the time,,,,Often,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,Data Machina Newsletter,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,,Self-taught,20,20,10,0,0,50,"Adversarial Learning,Computer Vision,Machine Translation",,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,Programmer,,20,10,50,10,10,0,"Computer Vision,Speech Recognition,Time Series",,,,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,"C/C++,Microsoft SQL Server Data Mining,Perl,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,Naive Bayes,Natural Language Processing",,,Most of the time,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,,,,40,30,10,15,5,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,,Often,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,1200,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,Natural Language Processing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,Jupyter notebooks,Time Series Analysis,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Programmer,University courses,0,80,0,10,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,,Academic,Fewer than 10 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,Often,,,,Most of the time,,,,,,,,Often,,,,,,Often,Often,,Often,,,,,,,,,,,40,20,10,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Often,,,Often,,,,Often,Often,Often,Often,,Often,,,Often,,,,,Often,,Less than 10% of projects,More internal than external,IT Department,UCI dataset,parsing,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Sometimes,12000,CAD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,"Data Elixir Newsletter,DataTau News Aggregator,FastML Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Miner,Researcher",Self-taught,50,10,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,100GB,"Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,RNNs,SVMs,Time Series Analysis",,,,,,Most of the time,Sometimes,,Rarely,Sometimes,,Often,,,,,,,,Often,,,,,Often,,,Rarely,,Most of the time,,,,20,70,5,5,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of funds to buy useful datasets from external sources,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,,,,,,Sometimes,,,,,,,Rarely,Often,Often,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,30000,USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Female,United States,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,25,15,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,6 to 10 years,"Researcher,Other",Work,60,0,40,0,0,0,Time Series,Logistic Regression,A master's degree,Academic,500 to 999 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Always,,Regression/Logistic Regression,"IBM SPSS Statistics,Mathematica,Microsoft Excel Data Mining,R",,,,,,,,,,,,Rarely,,,,,,,,Often,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,30,40,0,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Retail,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,100MB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests,SVMs",,,,,,Most of the time,Most of the time,Often,,,,Often,,,,Often,,Often,,,,,Most of the time,,,,,Often,,,,,,10,20,10,30,30,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,Sometimes,,,Less than 10% of projects,More internal than external,Standalone Team,,Writing code to tune the parameters,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Don't know,200000,INR,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Africa,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,6 to 10 years,"Data Analyst,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,0,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,Often,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Most of the time,,,Sometimes,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs",Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,,Most of the time,,,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,,,,,30,30,15,15,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,Often,,,,,,,,,,,,,,,Often,Often,,76-99% of projects,Entirely internal,Standalone Team,None,Data integrity,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Always,360000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Australia,31,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,27,Employed full-time,,,Yes,,Data Analyst,,,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,5,40,0,50,5,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Manufacturing,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10GB,,"IBM Cognos,KNIME (free version),Microsoft SQL Server Data Mining,NoSQL,Python,R,TensorFlow",,,,,,,,,,Most of the time,,,,,,,,,Rarely,,,,,,Sometimes,,Rarely,,,,Rarely,,Often,,,,,,,,,,,,,Rarely,,,,,,"Association Rules,Cross-Validation,Data Visualization,Recommender Systems,Text Analytics,Time Series Analysis",,Sometimes,,,,Often,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,Often,,,,70,15,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Sometimes,,,,Often,,,Often,Often,,,,,,Often,,,Sometimes,,,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Sometimes,,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Germany,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,Software Developer/Software Engineer,Self-taught,50,40,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,49,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,Business Analyst,Work,70,0,30,0,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition",Decision Trees - Gradient Boosted Machines,A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Neural Nets,R,Google Search,"Arxiv,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Engineer,University courses,NA,0,50,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)",Support Vector Machines (SVMs),A bachelor's degree,Telecommunications,100 to 499 employees,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1GB,"Decision Trees,Random Forests,SVMs","KNIME (free version),MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,Random Forests,Recommender Systems,Segmentation,SVMs",,Often,,,Sometimes,Often,Often,,,,,,,Often,,,,Sometimes,,,,,Sometimes,Sometimes,,Sometimes,,Often,,,,,,25,25,15,15,20,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Business Department,,,,,,,Never,,,,6,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,R,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,,,,,,,Very useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",5-10 years,Necessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer",University courses,30,10,0,30,30,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important +Male,United States,42,Employed full-time,,,Yes,,Engineer,,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,Software Developer/Software Engineer,Self-taught,60,0,30,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,23,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Azure Machine Learning,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Management information systems,,"Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Machine Translation,Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - RNNs",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Textbook",,,,Very useful,Somewhat useful,,,,,,,,,,Somewhat useful,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,6 to 10 years,Programmer,Self-taught,80,0,10,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Never,10PB,Other,"Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Perl,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Often,,Often,,Often,,,,,Sometimes,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,Rarely,,,,,,,,,,Rarely,Sometimes,,,,Sometimes,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,Sometimes,,Sometimes,Often,,,,,,,Rarely,,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,Sometimes,,,,70,0,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Often,,,,Rarely,,,Often,Most of the time,,,,Most of the time,,Often,,,,26-50% of projects,Entirely internal,Other,Confidential,"Too large for the majority of data tools, most technologies struggle to scale past 1-10PB data set size range.","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other","Company Developed Platform,I don't typically share data,Other",Hadoop/HDFS,Git,Rarely,"300,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist,Engineer,Operations Research Practitioner",Work,30,20,50,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,1TB,"Decision Trees,Regression/Logistic Regression","Java,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,,Often,Often,,,,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,,Most of the time,,Often,,,,Sometimes,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by government,Other,Deep learning,R,Google Search,"Blogs,Textbook,YouTube Videos",,Very useful,,,,,,,,,,,,,Very useful,,,Very useful,"Partially Derivative Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler",Work,25,15,60,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","KNIME (free version),R,SQL",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Often,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,Often,,,,,,,,,,,,,50,15,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Often,Often,,,,Most of the time,,,,,,,,Most of the time,Often,,,,,,,,26-50% of projects,Entirely internal,Business Department,,"data preparation, data cleaning, data management","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Commercial Data Platform,,,Never,40000,OMR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Time Series Analysis,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,Not Useful,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,"Data Machina Newsletter,Data Stories Podcast,FastML Blog",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,PhD,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,0,20,0,80,0,0,"Computer Vision,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Female,Russia,23,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,10,0,20,70,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Other",Relational data,Never,100MB,Neural Networks,"KNIME (free version),Python,R",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Naive Bayes,Neural Networks",,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,0,30,0,40,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,69,0,0,1,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Telecommunications,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Workstation + Cloud service",,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,India,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Amazon Web services,Time Series Analysis,R,GitHub,"Podcasts,Textbook",,,,,,,,,,,,,Somewhat useful,,Very useful,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,Less than a year,Programmer,Self-taught,50,50,0,0,0,0,Time Series,Decision Trees - Random Forests,A master's degree,Internet-based,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,<1MB,,Microsoft Azure Machine Learning,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Graph (e.g. GraphBase/Neo4j),Commercial Data Platform,,Bitbucket,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Spain,46,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by college or university,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,,Very useful,,,,,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",28,60,10,0,2,0,,,A master's degree,Academic,20 to 99 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,,"Amazon Web services,R,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,40,0,15,40,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Rarely,,Often,,Most of the time,Most of the time,,,Most of the time,,Often,,,Often,Sometimes,,,,,,,,51-75% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,Sometimes,35000,EUR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Other,Other",Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Other",Self-taught,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,100MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Often,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Logistic Regression,Naive Bayes",Sometimes,,,,,,Most of the time,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,85,0,0,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Unavailability of/difficult access to data,Other",Most of the time,Sometimes,,,Most of the time,Most of the time,,,,,,,,,,,,,,,Most of the time,Most of the time,76-99% of projects,Entirely internal,Other,Salesforce; Totango; Pendo,Tech leadership forbids me from reporting and/or correcting severe data quality problems.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other","Data sharing is not permitted, but we built custom tools to do it anyway :)",Git,Never,140000,USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Pakistan,35,Employed full-time,,,No,Yes,Researcher,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Engineer,Operations Research Practitioner,Researcher",Work,40,0,40,20,0,0,"Recommendation Engines,Survival Analysis",,A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,26,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Python,Neural Nets,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Logistic Regression",,Internet-based,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,20,10,0,50,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Scaling data science solution up to full database",,,,,Often,,,,,,,,,,,,Often,Often,,,,,100% of projects,Entirely external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,"12,000",USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Pakistan,42,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Julia,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician",University courses,20,0,20,45,5,10,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Academic,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,100MB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","IBM SPSS Statistics,Java,KNIME (free version),Python,R,RapidMiner (free version),SQL,Statistica (Quest/Dell-formerly Statsoft),TensorFlow",,,,,,,,,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Rarely,,Rarely,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",,Sometimes,Sometimes,,,Most of the time,,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,Most of the time,Sometimes,Most of the time,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Sometimes,Most of the time,,,,,30,5,40,5,15,5,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,NONE,No state of the art algorithm available,Other,I don't typically share data,,,Never,103500,PKR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Female,Germany,40,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Other,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Netherlands,29,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Company internal community,Official documentation,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,Very useful,,,,,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Statistician,University courses,20,0,10,70,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased significantly,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Don't know,10GB,Other,"Google Cloud Compute,Java,Jupyter notebooks,Python,R,SQL",,,,,,,,Most of the time,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Data Visualization,Natural Language Processing,Recommender Systems,SVMs",Sometimes,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,Rarely,,,,,,10,40,40,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,Most of the time,Most of the time,,,Most of the time,,,Often,,,Most of the time,Most of the time,,Sometimes,Sometimes,,,,51-75% of projects,Entirely internal,IT Department,,Dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,I don't typically share data",,"Bitbucket,Git",Rarely,36000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Germany,24,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,Very useful,,Very useful,Very useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","IBM Cognos,IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,TensorFlow",,,,,,,,,,Often,,Rarely,,,,,Rarely,,Rarely,,,,,Often,,,,Rarely,,,Sometimes,,Often,,,,,,,,,Sometimes,,,,Rarely,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Recommender Systems,Segmentation,Time Series Analysis",Often,Sometimes,,,Often,,,,,,,,,,Sometimes,Often,,,,Rarely,,Sometimes,,Often,,Often,,,,Sometimes,,,,20,5,5,10,60,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database",,,,,Often,,,,Often,,,,,,,,,Often,,,,,76-99% of projects,Entirely internal,Business Department,None,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,40000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Taiwan,34,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,Work,20,0,80,0,0,0,"Recommendation Engines,Time Series","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Conferences,Official documentation,Personal Projects,Textbook",Very useful,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Computer Scientist,Self-taught,50,20,30,0,0,0,Time Series,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation",Other,Never,1GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Python,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,Rarely,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,,,Sometimes,,Most of the time,Most of the time,,Often,,,,,Most of the time,,Sometimes,,,,Often,Often,,,,Sometimes,Sometimes,,Sometimes,,Most of the time,,,,0,80,0,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Often,,,,Most of the time,,,,,,,,,,,,Often,,,,,,76-99% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Most of the time,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,RapidMiner (free version),Deep learning,R,"Google Search,Government website","College/University,Friends network,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,Somewhat useful,,,,,Very useful,Very useful,,,,,,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,55,20,0,25,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,,Very Important,Very Important,Very Important +Male,Other,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,Amazon Web services,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Somewhat useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Time Series,Bayesian Techniques,A master's degree,Non-profit,"10,000 or more employees",Increased significantly,1-2 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,Decision Trees,"Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Decision Trees",,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,20,30,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,Often,,,,,,,,,,,,Often,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,676000,THB,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,50,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Necessary,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Other,10,30,0,0,0,60,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Australia,56,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,60,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Statistician,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Google Search,Government website","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",25,50,20,0,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Decreased slightly,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Perl,Python,R,RapidMiner (free version),Spark / MLlib",,Rarely,,,,,,Rarely,,,,,,,Rarely,,Sometimes,,,Rarely,Sometimes,,Sometimes,,,,,,,Rarely,Most of the time,,Rarely,,Rarely,,,,,,Often,,,,,,,,,,,"Bayesian Techniques,kNN and Other Clustering,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Rarely,,Rarely,,,,40,30,10,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Often,Often,Often,Often,Often,,,Often,,Often,,,Often,Often,,Often,Often,,,Often,,10-25% of projects,Entirely internal,Central Insights Team,,Handling huge data sets,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Rarely,3500000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Czech Republic,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,Somewhat useful,Not Useful,Very useful,Very useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,DataTau News Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Programmer",Kaggle competitions,25,15,15,5,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,SAS Enterprise Miner,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,Often,,,,Sometimes,,,,Often,,Most of the time,,,,,,Rarely,,,Most of the time,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics",,,,,,Often,Most of the time,Often,Sometimes,,,Most of the time,,,Sometimes,,,,,Sometimes,,Often,Often,,,Often,Often,,Often,,,,,30,10,5,35,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues",Often,,,,Sometimes,,,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,,76-99% of projects,More internal than external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,India,30,Employed full-time,,,Yes,,Predictive Modeler,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler",Work,70,10,20,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,Very useful,KDnuggets Blog,< 1 year,Necessary,,Necessary,,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Female,Argentina,28,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Social Network Analysis,C/C++/C#,Google Search,"Arxiv,Blogs,College/University,Conferences,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Data Scientist,Self-taught,100,0,0,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Evolutionary Approaches,Gradient Boosting,Logistic Regression",Primary/elementary school,Retail,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Image data,Most of the time,100GB,"Bayesian Techniques,Evolutionary Approaches,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,IBM Watson / Waton Analytics,Julia,MATLAB/Octave,Microsoft Excel Data Mining,Minitab",,,,Most of the time,,,,,,,,,Rarely,,,Rarely,,,,,Most of the time,,Sometimes,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes",,,Often,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,20,60,10,10,0,0,Enough to run the code / standard library,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",,,,,Sometimes,,,,,Most of the time,,,Sometimes,Often,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,"unclear,dirty, unreal, disclasified","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Subversion,Sometimes,"70,000",AZN,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer,Programmer",Self-taught,50,20,10,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Financial,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines","Python,R,SAS Enterprise Miner,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,Sometimes,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Gradient Boosted Machines,Lift Analysis",Most of the time,,,,,Often,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,30,50,10,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",Often,,,,,,,,Most of the time,,,Most of the time,Often,,Sometimes,Most of the time,,,,,,,Less than 10% of projects,,Business Department,,small volumn,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,400000,CNY,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Data Analyst,,Employed by a company that performs advanced analytics,Google Cloud Compute,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,"FastML Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Other,Self-taught,30,10,50,0,10,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Mathematica,MATLAB/Octave,Python,Other",,Sometimes,,Often,,,,Sometimes,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Often,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis,Other",Most of the time,Sometimes,,,,Most of the time,Most of the time,Most of the time,Most of the time,Sometimes,,Sometimes,Sometimes,Often,,Often,,Often,,,Most of the time,,Often,,,Often,Often,Often,,Often,Most of the time,,,25,15,10,25,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning",,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,100% of projects,Entirely internal,Business Department,,integrating firmwide datasets,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Bitbucket,Sometimes,2000000,USD,Other,9,,,,,,,,,,,,,,,,,, +Male,Australia,40,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Time Series Analysis,SQL,,"Blogs,Friends network,Kaggle,Newsletters,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,Very useful,,,,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,Business Analyst,Self-taught,50,10,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Perl,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,Often,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Often,,,,"A/B Testing,Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Text Analytics",Often,Often,,,Often,,Most of the time,Sometimes,,,,,,,Often,Sometimes,,,,Sometimes,,Sometimes,Sometimes,Sometimes,,Often,,,Sometimes,,,,,35,5,5,20,35,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Often,,,,,,,,,,,,,,,Often,,,,,,,100% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,,Never,150000,AUD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,"No Free Hunch Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Biology,3 to 5 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,40,15,0,5,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Increased slightly,6-10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Most of the time,,,,"A/B Testing,Cross-Validation,Logistic Regression,Natural Language Processing,Text Analytics",Most of the time,,,,,Sometimes,,,,,,,,,,Often,,,Most of the time,,,,,,,,,,Most of the time,,,,,50,20,15,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,Sometimes,Often,,,,,,,,,Sometimes,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,N/A,Cleanliness,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"125,000",,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Other,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Newsletters,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,Not Useful,,,Somewhat useful,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Other",University courses,40,10,30,20,0,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Never,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Orange,R,SQL",,,,,,,,,,Rarely,Most of the time,Most of the time,,,,,Rarely,,,,,Rarely,Rarely,,Rarely,,,,Sometimes,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Segmentation",Most of the time,Sometimes,Sometimes,,,,Most of the time,Often,,,,,,Most of the time,,Sometimes,,Sometimes,,,Often,,,,,Most of the time,,,,,,,,60,10,0,10,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,Most of the time,Most of the time,Often,,,Often,,Sometimes,,,,Often,,,,,Often,,,76-99% of projects,More internal than external,Business Department,,MDM,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,A humanities discipline,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,60,25,15,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,High school,Other,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Support Vector Machines (SVM),R,"Google Search,University/Non-profit research group websites","Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Very useful,,< 1 year,,Nice to have,Necessary,,Nice to have,Nice to have,,Nice to have,Nice to have,,,,,Coursera,Traditional Workstation,0 - 1 hour,Kaggle Competitions,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer",Self-taught,50,50,0,0,0,0,,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,United Kingdom,46,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,University/Non-profit research group websites,"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Not Useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,70,30,0,0,0,0,"Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,<1MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,Often,,,Most of the time,Most of the time,,,Most of the time,,,,50,20,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Most of the time,Most of the time,,,,Most of the time,,Often,,,Sometimes,,,Most of the time,,,,,,,,,100% of projects,Approximately half internal and half external,Business Department,Quandl,"Irregular updates Ownership changes in the data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,100000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Japan,26,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Anomaly Detection,Python,Google Search,"Arxiv,Blogs,Kaggle,Newsletters,Online courses",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,"FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,40,0,30,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data,Text data",,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Cloudera,IBM Watson / Waton Analytics,Python,TensorFlow",,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"Association Rules,CNNs,Decision Trees,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,Most of the time,,,,Sometimes,,,,,,,,,,,,Sometimes,Often,,,,Most of the time,Often,,Often,Most of the time,Most of the time,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,Often,,,,Most of the time,,,,Sometimes,,51-75% of projects,Approximately half internal and half external,Standalone Team,"MIT licence, bad licence data",Privacy issues,Other,Company Developed Platform,,Subversion,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,South Korea,40,Employed full-time,,,Yes,,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,More than 10 years,Business Analyst,University courses,90,NA,0,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Personal Projects",Somewhat useful,,Somewhat useful,,,,Very useful,,,,,Very useful,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Github Portfolio,No,Master's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Female,United Kingdom,28,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,Python,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,Other (Separate different answers with semicolon),< 1 year,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,edX,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Master's degree,Biology,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Somewhat important +Male,Pakistan,23,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Text Mining,Python,University/Non-profit research group websites,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Elixir Newsletter,Data Machina Newsletter,DataTau News Aggregator",< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,NA,NA,100,NA,NA,NA,Speech Recognition,Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,20+,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,,Very Important,Very Important,Somewhat important,Very Important +Male,People 's Republic of China,28,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A",Very useful,Very useful,Very useful,,,,,,,Very useful,Very useful,,,Very useful,,,,,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,GPU accelerated Workstation,40+,PhD,Yes,Doctoral degree,A social science,,Engineer,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Reinforcement learning,Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important +Male,Singapore,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,80,20,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Image data,Video data",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Other,I haven't started working yet",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,60,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,R,University/Non-profit research group websites,"Conferences,Textbook",,,,,Somewhat useful,,,,,,,,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Experience from work in a company related to ML,No,Doctoral degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Researcher,Software Developer/Software Engineer,Other",Self-taught,0,0,0,0,0,100,"Time Series,Other (please specify; separate by semi-colon)",Bayesian Techniques,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Belarus,31,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,RapidMiner (free version),Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,20,10,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Often,,,,,Often,,,Sometimes,,,,,,,"Association Rules,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,Often,,Sometimes,,,,Sometimes,,,,20,5,5,20,20,30,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Sometimes,,,,Sometimes,,,,,,,,,,,,,,Often,,,76-99% of projects,Do not know,IT Department,-,finding insights,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,40000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,55,30,5,0,10,0,"Computer Vision,Speech Recognition,Other (please specify; separate by semi-colon)","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,,,,,Not at all important,Other,Other,Other,Rarely,100GB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,Random Forests,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,No,Yes,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Other,Work,20,15,40,10,15,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,500 to 999 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,Other,Other",,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Most of the time,,,,,,Often,,,,Often,Often,,,,Most of the time,,,,,,Most of the time,Most of the time,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,Most of the time,,,100% of projects,More internal than external,IT Department,,Data is insufficient,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,"26,000",EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Norway,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Computer Scientist,Other",Self-taught,30,10,10,25,25,0,Reinforcement learning,Neural Networks - GANs,A master's degree,Other,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Other,Relational data,Rarely,10MB,"Neural Networks,Random Forests","C/C++,Jupyter notebooks,Python,R,Unix shell / awk",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,Often,,,,"Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,,,,,,Most of the time,,Sometimes,,Often,,,,,,,Often,,,,,,,Often,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,Often,,26-50% of projects,Do not know,Other,None,data from multiple sources and quality of data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Files shared in a cloud solution,Git,Sometimes,900000,NOK,Other,7,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,Not Useful,,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",70,30,0,0,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,100MB,"Decision Trees,Neural Networks","Jupyter notebooks,Python,Spark / MLlib",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,"CNNs,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction",,,,Rarely,,,,Most of the time,,,,,,,,Most of the time,,,Rarely,Rarely,Often,,,,,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,,,Most of the time,Often,,,,,,,Rarely,,,,,,,None,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Somewhat useful,,1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Management information systems,1 to 2 years,I haven't started working yet,University courses,20,0,0,30,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important +Male,India,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,77,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Amazon Web services,Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences",Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,15,5,20,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Other,Rarely,1GB,"CNNs,Other","C/C++,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,0,20,0,40,Enough to refine and innovate on the algorithm,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,,,,,,,,,Often,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,UCI;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,0,10,20,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",High school,Internet-based,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10GB,"CNNs,Random Forests","MATLAB/Octave,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,CNNs,Decision Trees,Neural Networks",Most of the time,,,Often,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,70,10,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,,,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Udacity,,11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,28,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Weka,Random Forests,Python,I collect my own data (e.g. web-scraping),"Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,,Nice to have,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,Self-taught,30,30,0,0,30,10,Reinforcement learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks",A bachelor's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Indonesia,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,Self-taught,5,20,65,5,5,0,Outlier detection (e.g. Fraud detection),Support Vector Machines (SVMs),High school,Technology,20 to 99 employees,Decreased significantly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Always,100MB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Naive Bayes,Recommender Systems,Segmentation",Often,,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,,,80,15,3,2,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos,Other",,,,,,,Not Useful,,,,,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,25,15,20,20,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Financial,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Most of the time,100MB,Regression/Logistic Regression,"Perl,Python,R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,,Often,,,,,,,,,Most of the time,,,Most of the time,,,,Most of the time,,,Logistic Regression,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Never,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,NoSQL,Monte Carlo Methods,Python,Google Search,"Arxiv,College/University,Friends network,Stack Overflow Q&A",Very useful,,Very useful,,,Very useful,,,,,,,,Somewhat useful,,,,,"Emergent/Future Newsletter (Algorithmia),KDnuggets Blog,Partially Derivative Podcast",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,0,50,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important,Not important,Not important +Male,Australia,39,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,R,,"Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,90,0,0,10,0,0,"Time Series,Unsupervised Learning",Other (please specify; separate by semi-colon),A bachelor's degree,Other,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,100MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,50,10,0,10,30,0,Enough to refine and innovate on the algorithm,"Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,Sometimes,,,,,Often,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Sometimes,100000,AUD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,Amazon Web services,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,,,,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,30,20,10,0,0,Natural Language Processing,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,, +Male,Poland,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Operations Research Practitioner,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Stan,Deep learning,Julia,University/Non-profit research group websites,"Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Not Useful,Somewhat useful,Very useful,,,Somewhat useful,"No Free Hunch Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Programmer,Researcher",University courses,50,10,25,15,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Rarely,1TB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,HMMs,Regression/Logistic Regression,Other","Java,Julia,Jupyter notebooks,NoSQL,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,Sometimes,,,,Often,Most of the time,,,,,,Sometimes,Often,,Most of the time,,,,,Often,,,,,,Sometimes,,,Often,,,,70,10,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Most of the time,Often,,,,,,Sometimes,,Often,,,Often,,,Sometimes,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Sometimes,,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"Data Stories Podcast,Partially Derivative Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,"Engineer,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by company that makes advanced analytic software,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Data Machina Newsletter,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,Nice to have,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",,Experience from work in a company related to ML,No,Doctoral degree,Management information systems,Less than a year,"Business Analyst,DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Very Important +Male,United Kingdom,60,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by professional services/consulting firm,Julia,Other,SQL,"I collect my own data (e.g. web-scraping),Other","Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook,Other",,,,,Somewhat useful,,Very useful,,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,"Business Analyst,Software Developer/Software Engineer",Work,50,0,50,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",High school,Government,10 to 19 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Rarely,10MB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","SAS Base,SQL,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,Sometimes,Sometimes,,,"Logistic Regression,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,Rarely,Sometimes,,,,60,30,0,0,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Often,,Less than 10% of projects,More internal than external,Business Department,Government data sets,Poor quality of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Italy,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,TensorFlow,Neural Nets,Python,"GitHub,University/Non-profit research group websites,Other","Blogs,College/University,Official documentation,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,"Neural Networks,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Perl,Python,R,RapidMiner (free version),TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,Most of the time,,Most of the time,,Sometimes,,,,,,,,,,,Most of the time,,Rarely,,,,"Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,Often,,,,,,,,Sometimes,,Often,,,,Often,Sometimes,,,,,,,Often,,,,,,20,40,30,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,10-25% of projects,Entirely internal,Other,,,,"Commercial Data Platform,Email",,,Most of the time,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Turkey,21,"Not employed, but looking for work",,,,,,,,DataRobot,Social Network Analysis,Java,GitHub,"Arxiv,Blogs,College/University,Friends network,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos,Other",Somewhat useful,Not Useful,Somewhat useful,,,Somewhat useful,,,,Very useful,Very useful,Very useful,,,Very useful,,Very useful,Somewhat useful,"Data Machina Newsletter,Linear Digressions Podcast,The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,10,20,30,40,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,26,Employed full-time,,,No,Yes,Other,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",DataRobot,Bayesian Methods,Haskell,Google Search,"Online courses,Tutoring/mentoring",,,,,,,,,,,Somewhat useful,,,,,,Very useful,,"KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,Coursera,"Traditional Workstation,Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,Work,50,0,50,0,0,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Russia,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,Logistic Regression,A professional degree,Mix of fields,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,,Traditional Workstation,Text data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,42,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer",University courses,25,50,0,25,0,0,Natural Language Processing,"Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs",A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important +Male,India,25,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Computer Vision,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,53,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Julia,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Conferences,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,Very useful,,,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Time Series,Bayesian Techniques,A professional degree,Academic,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",,<1MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Cloudera,Google Cloud Compute,IBM SPSS Statistics,NoSQL,R,SQL,Unix shell / awk",Sometimes,,,,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Logistic Regression,Prescriptive Modeling",Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,50,10,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Rarely,,,,,,,,,,,,,,,,Often,,,Often,Often,,100% of projects,Do not know,Other,,Interpretation of the content of the data. What does data components means?,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,52,Employed full-time,,,No,Yes,Scientist/Researcher,,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,I don't write code to analyze data,Data Miner,Work,10,0,90,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A professional degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,,,,, +Female,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,MATLAB/Octave,Time Series Analysis,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,,University courses,30,5,20,35,10,0,Time Series,Evolutionary Approaches,High school,Technology,"5,000 to 9,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,Regression/Logistic Regression,"Java,Minitab,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Evolutionary Approaches,Simulation,Time Series Analysis",,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,20,30,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,,Often,,,Often,,,,,,,,,,,,Sometimes,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Indonesia,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",5,90,5,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Technology,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Rarely,10MB,Random Forests,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,90,5,0,5,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Often,Often,Often,,,Often,,,Often,,Often,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Never,192000000,IDR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Australia,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Text Mining,Python,I collect my own data (e.g. web-scraping),"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,Very useful,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,"FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,Data Analyst,Work,15,5,45,15,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Rarely,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,,Sometimes,,,,Most of the time,,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,Often,Often,,Often,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,,Sometimes,,Sometimes,,,,Rarely,Sometimes,,Most of the time,Often,,Most of the time,,,Often,Often,,,,20,15,5,15,45,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,Sometimes,,51-75% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,120000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,South Africa,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Other,Python,University/Non-profit research group websites,"Arxiv,College/University,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,Very useful,,,,,,,,,,,Very useful,,,,Very useful,"FastML Blog,FlowingData Blog,Talking Machines Podcast",1-2 years,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,Necessary,Necessary,,,,,GPU accelerated Workstation,11 - 39 hours,PhD,No,Bachelor's degree,Computer Science,,"Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Neural Networks - RNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Hungary,63,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,,33,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,R,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,3 to 5 years,Researcher,Work,33,33,34,0,0,0,Other (please specify; separate by semi-colon),,A doctoral degree,Internet-based,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Other,Basic laptop (Macbook),Other,,,,"IBM SPSS Statistics,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,PCA and Dimensionality Reduction,Segmentation",,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,35,0,0,25,40,0,Enough to run the code / standard library,"Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Most of the time,,100% of projects,Do not know,Other," APA datasets",,,I don't typically share data,,,Always,116000,ILS,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,,Somewhat useful,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Kaggle competitions,10,40,5,0,45,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Not important +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,Other (please specify; separate by semi-colon),,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,,"SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,80,5,5,0,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,Anomaly Detection,Python,,"Arxiv,Company internal community,Official documentation,Personal Projects,Textbook",Somewhat useful,,,Somewhat useful,,,,,,Very useful,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,55,5,30,10,0,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,100 to 499 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation",Image data,Most of the time,10GB,"Gradient Boosted Machines,Random Forests,Other","C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Data Visualization,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,,,Sometimes,,Often,,,,,,,Sometimes,,,,Sometimes,,,Often,,Sometimes,,,Often,Most of the time,,,,,,,0,30,30,30,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Need to coordinate with IT",,,,,Often,,,Sometimes,,,,Most of the time,,,Rarely,,,,,,,,100% of projects,More external than internal,Other,Client data,O(N^2) scaling (and worse) of many algorithms in specific domain limits dataset sizes,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Bitbucket,Git",Never,65000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,Russia,43,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,Very useful,Very useful,Very useful,,,,Very useful,,Very useful,Very useful,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,"Coursera,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),40+,Online Courses and Certifications,Yes,Doctoral degree,Biology,More than 10 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",27,70,0,0,3,0,Recommendation Engines,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,47,Employed full-time,,,No,Yes,Operations Research Practitioner,Poorly,Employed by government,R,Anomaly Detection,R,GitHub,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,No Free Hunch Blog,5-10 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Operations Research Practitioner,Software Developer/Software Engineer,Statistician",University courses,5,15,20,60,0,0,Time Series,"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important +Male,Philippines,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Other",University courses,10,40,20,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,,,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,R,Government website,"Blogs,Friends network,Other",,Somewhat useful,,,,Very useful,,,,,,,,,,,,,"Data Stories Podcast,DataTau News Aggregator,Partially Derivative Podcast",1-2 years,Necessary,Necessary,Necessary,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Master's degree,A health science,,Data Analyst,Work,NA,NA,NA,NA,NA,NA,Survival Analysis,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,No,Yes,Other,Perfectly,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,0,0,70,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Most of the time,10GB,"CNNs,Neural Networks","C/C++,Perl,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Neural Networks,Segmentation",,,,Most of the time,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,Often,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,6 to 10 years,"Business Analyst,Data Analyst,Programmer",University courses,10,10,0,50,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,Regression/Logistic Regression,"KNIME (free version),Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Random Forests,Segmentation,Time Series Analysis",,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,Sometimes,,,,50,10,10,20,10,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses",Very useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,Somewhat useful,,,,,,,,No Free Hunch Blog,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Udacity,"Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Engineering (non-computer focused),,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,Poland,35,Employed full-time,,,Yes,,Data Scientist,Fine,,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,10,10,35,40,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Rarely,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Oracle Data Mining/ Oracle R Enterprise,QlikView,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner",,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,,,Rarely,,,,Sometimes,Most of the time,,Rarely,,,Sometimes,Sometimes,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Often,,,,Often,,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,Often,,Often,,,,,,,,,,,50,10,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,Often,Sometimes,Sometimes,,,Often,,,,,,Often,,,,26-50% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,180000,PLN,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Belgium,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,,,,,"Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",40,10,20,30,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Academic,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Video data",Sometimes,1GB,"CNNs,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,Often,,,,"CNNs,Cross-Validation,Data Visualization,PCA and Dimensionality Reduction",,,,Rarely,,Often,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,10,20,20,40,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,,,,6,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Microsoft Azure Machine Learning,Bayesian Methods,Python,University/Non-profit research group websites,"Blogs,Company internal community,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,Very useful,,,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,20,40,20,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",Primary/elementary school,Technology,"10,000 or more employees",Increased significantly,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10TB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Orange,Python,R,SQL,Tableau,TensorFlow",Rarely,,,,,,,Rarely,Sometimes,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,Most of the time,,,,Rarely,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,Often,Sometimes,,Often,,Most of the time,Sometimes,,Sometimes,Often,,,,48,12,8,12,20,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Inability to integrate findings into organization's decision-making process",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint,Other",PowerBI,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,235000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Germany,41,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,30,0,20,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,Norway,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Bayesian Methods,Python,Google Search,"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,30,0,0,10,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,I prefer not to answer,Other,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,,,"Microsoft Azure Machine Learning,R,SQL,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,Most of the time,,,,Sometimes,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,20,0,0,50,30,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,,Sometimes,540000,NOK,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United Kingdom,34,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",R,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Newsletters,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Very useful,,Very useful,Very useful,,,,Very useful,,,Very useful,,Very useful,Very useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",50,0,0,0,50,0,"Survival Analysis,Time Series",Logistic Regression,A professional degree,CRM/Marketing,"1,000 to 4,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,10MB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Prescriptive Modeling,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,Most of the time,,,,20,30,10,10,30,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,58000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Other,44,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Online courses,Textbook",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Germany,25,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Trade book,Other,Other,Other",,,,,,,,,,,Very useful,,,,,Somewhat useful,,,R Bloggers Blog Aggregator,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,,3 to 5 years,"Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,15,25,15,25,15,5,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,,,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Russia,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer",Other,30,10,50,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Image data,Sometimes,10GB,"CNNs,GANs,Neural Networks,RNNs,SVMs","Google Cloud Compute,Java,NoSQL,Python,SQL,TensorFlow,Other,Other,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,Most of the time,Rarely,Most of the time,"CNNs,GANs,Neural Networks,Segmentation,SVMs",,,,Most of the time,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,60,15,5,15,5,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Poland,33,Employed full-time,,,No,Yes,Other,Poorly,Employed by college or university,Python,Anomaly Detection,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,"FlowingData Blog,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis",Decision Trees - Gradient Boosted Machines,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,South Africa,33,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Personal Projects",,,,,,,Very useful,,,Very useful,,Very useful,,,,,,,No Free Hunch Blog,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Researcher",Kaggle competitions,80,0,0,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines","Evolutionary Approaches,Logistic Regression",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Male,Russia,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Data Scientist,Operations Research Practitioner",University courses,10,10,0,70,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Retail,500 to 999 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,54,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,R,Google Search,"Arxiv,Conferences,Online courses,Personal Projects,Textbook",Somewhat useful,,,,Somewhat useful,,,,,,Very useful,Very useful,,,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,"Programmer,Researcher",Self-taught,60,15,15,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,I don't know,,Don't know,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,Rarely,,"HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Ensemble Methods,HMMs,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,,,,,Often,,Sometimes,,,,Sometimes,,,Sometimes,,,,,Often,,Sometimes,,,,Often,Often,,,,,,30,20,15,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Sometimes,,,,,,,,Sometimes,,10-25% of projects,Do not know,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,EUR,,8,,,,,,,,,,,,,,,,,, +Male,Turkey,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",Spark / MLlib,Bayesian Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,20,10,0,30,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Java,Jupyter notebooks,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Sometimes,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Sometimes,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Often,,Often,,Often,Often,,Most of the time,,Often,,,Sometimes,,Often,Often,,,,,40,10,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,wikipedia,"cleaning, parsing","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,,5,,,,,,,,,,,,,,,,,, +Male,Russia,40,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,Very useful,Very useful,,Very useful,,,Somewhat useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Mathematics or statistics,More than 10 years,Engineer,Self-taught,90,0,0,0,10,0,,,,Manufacturing,,,,,Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,1GB,,"C/C++,Java,SQL",,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling",,,,,,,Often,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,0,0,0,0,0,0,,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Mercurial",Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Denmark,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,70,0,20,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Internet-based,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",50,20,10,0,20,0,"Adversarial Learning,Recommendation Engines","Decision Trees - Random Forests,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,54,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Very useful,Very useful,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Physics,I don't write code to analyze data,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,,Logistic Regression,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,United States,22,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,,Self-taught,70,10,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Most of the time,100MB,"CNNs,Regression/Logistic Regression,SVMs","R,SAP BusinessObjects Predictive Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,,,,,,"CNNs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics",,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,,Most of the time,Sometimes,,,,,Sometimes,,,Most of the time,Most of the time,,,,,50,20,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,24,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Romania,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",5,85,0,10,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,edX,Udacity",GPU accelerated Workstation,11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Machine Translation,Natural Language Processing","Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,France,63,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Anomaly Detection,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,6 to 10 years,"Business Analyst,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Decreased slightly,6-10 years,Some other way,Somewhat important,Other,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Java,Perl,Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Sometimes,,Most of the time,,,,Often,,,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Random Forests,SVMs,Time Series Analysis",,,,,,,Often,Sometimes,Often,,,,,,,,,,,,,,Sometimes,,,,,Often,,Most of the time,,,,20,30,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,Often,,,,,,Most of the time,,,Sometimes,,Sometimes,Often,,,,Often,,,,,10-25% of projects,Approximately half internal and half external,Other,reverse phone lookup,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,,EUR,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Finland,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Conferences,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,20,5,30,45,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,Rarely,Sometimes,,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,Rarely,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,Sometimes,Often,Sometimes,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,Sometimes,,Sometimes,Often,Often,,,,Sometimes,,,Rarely,Sometimes,,,,25,10,5,25,10,25,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,,Most of the time,,Sometimes,Sometimes,,,,,Sometimes,,,Often,Sometimes,,Often,,,76-99% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Subversion,Sometimes,"50,000",EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +,India,29,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,,,,,Somewhat useful,Data Machina Newsletter,< 1 year,Nice to have,,,,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Traditional Workstation,0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Engineer,Programmer",Self-taught,20,80,0,0,0,0,Recommendation Engines,Bayesian Techniques,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,,,,,,,,,,,Somewhat important,,,Very Important +A different identity,Other,23,Employed part-time,,,Yes,,Machine Learning Engineer,,Employed by college or university,Other,Neural Nets,Python,"GitHub,Google Search","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,10,40,40,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,500 to 999 employees,,,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",,,"Bayesian Techniques,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs",TensorFlow,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Time Series Analysis",,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Often,Sometimes,Often,,Sometimes,Sometimes,Often,,,Often,,Often,,,,30,40,10,10,10,0,Enough to run the code / standard library,"Dirty data,Other",,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,51-75% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Git,Other",,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Engineer,University courses,15,15,0,70,0,0,Recommendation Engines,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"5,000 to 9,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Important,,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,10MB,Bayesian Techniques,"KNIME (free version),Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Rarely,,,,,,,"Bayesian Techniques,Naive Bayes,Time Series Analysis",,,Rarely,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,,,10,40,20,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,Microsoft SQL Server Data Mining,Factor Analysis,C/C++/C#,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,"Data Machina Newsletter,Data Stories Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,Programmer,University courses,50,10,10,30,0,0,Computer Vision,Hidden Markov Models HMMs,High school,Manufacturing,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,Rarely,10MB,Decision Trees,C/C++,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,50,20,10,10,10,0,Enough to run the code / standard library,"Dirty data,Limitations of tools",,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Graph (e.g. GraphBase/Neo4j),Email,,"Git,Subversion",,30000,KES,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Female,India,27,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Business Analyst,Predictive Modeler,Other",Self-taught,25,25,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Other,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Prescriptive Modeling,Random Forests,Segmentation",,,,,,Often,Often,Often,Often,,,,,,,Often,,,,,,Often,Often,,,Often,,,,,,,,30,20,10,10,0,30,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,Most of the time,,,,,,,,,,Often,Often,,,10-25% of projects,Approximately half internal and half external,Business Department,",",",","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,Git,Sometimes,0,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Genetic & Evolutionary Algorithms,C/C++/C#,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Friends network,Newsletters,Personal Projects",Very useful,Very useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer",Work,60,20,20,0,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Sometimes,10GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Often,,,,Most of the time,,,,,,"CNNs,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Often,,,,Sometimes,,,,,,,,Sometimes,,,Most of the time,Most of the time,,,Sometimes,,Most of the time,,,Sometimes,Often,Rarely,,,,60,20,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Rarely,,Most of the time,Most of the time,,,Sometimes,Most of the time,,Most of the time,Sometimes,,,Most of the time,,,,Most of the time,Most of the time,Often,,76-99% of projects,Entirely internal,Standalone Team,,Its dirty and domain specific while we are not domain experts. ,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Bitbucket,Most of the time,110000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Spain,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,0,50,30,0,0,"Adversarial Learning,Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Cloudera,Time Series Analysis,Scala,"Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle",,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,Self-taught,80,10,5,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service",Text data,Sometimes,10GB,"Random Forests,SVMs","Cloudera,Python,R",,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Logistic Regression,Naive Bayes,Random Forests",,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,,,,,30,30,30,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Limitations of tools,Privacy issues",Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",,75000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Google Search,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Nigeria,33,"Not employed, but looking for work",,,,,,,,Tableau,Rule Induction,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Data Elixir Newsletter,Data Machina Newsletter,The Data Skeptic Podcast",3-5 years,Nice to have,Unnecessary,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Doctoral degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Natural Language Processing,Neural Networks - RNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Not important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Singapore,55,Employed full-time,,,No,Yes,Other,Fine,Employed by government,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Company internal community,Conferences,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Somewhat useful,Not Useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Necessary,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Republic of China,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Microsoft Azure Machine Learning,Neural Nets,SAS,University/Non-profit research group websites,"College/University,Conferences,Friends network",,,Somewhat useful,,Very useful,Very useful,,,,,,,,,,,,,Data Machina Newsletter,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,Researcher,Work,0,10,30,60,0,0,Reinforcement learning,Neural Networks - CNNs,High school,Academic,100 to 499 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Workstation + Cloud service,"Image data,Text data",Sometimes,1TB,Decision Trees,"C/C++,IBM Watson / Waton Analytics,Microsoft Azure Machine Learning,Tableau",,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,"Data Visualization,Prescriptive Modeling,Simulation",,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,,,,,30,50,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,,,,Often,Often,,,Sometimes,,,,,,,,,,26-50% of projects,More internal than external,Central Insights Team,,lack of enough understanding and tools,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,20000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,Yes,,Researcher,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects",,Somewhat useful,,,Not Useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer,Other",Self-taught,10,60,20,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Other","NoSQL,Oracle Data Mining/ Oracle R Enterprise",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Random Forests",Rarely,,,,,Often,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,,60,10,1,20,9,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Often,Most of the time,,Most of the time,Most of the time,,Most of the time,,,,,,Sometimes,,,Often,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,kaggle,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Rarely,3000000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,South Korea,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Tableau,Text Mining,R,University/Non-profit research group websites,Textbook,,,,,,,,,,,,,,,Very useful,,,,R Bloggers Blog Aggregator,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,Yes,Bachelor's degree,Mathematics or statistics,,"Business Analyst,Data Analyst,Data Miner,Data Scientist",Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,0,0,70,20,0,Other (please specify; separate by semi-colon),Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Very Important +Male,Australia,67,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,43,Employed full-time,,,No,Yes,DBA/Database Engineer,Poorly,Employed by government,Python,Deep learning,R,Google Search,"College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,Belarus,22,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Other,,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Neural Nets,R,University/Non-profit research group websites,"Blogs,College/University,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Other,University courses,0,0,0,80,0,20,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Never,,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,25,30,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization",Rarely,Rarely,Rarely,,,,,,Often,,,,,,,,,,,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Australia,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by government",Spark / MLlib,Anomaly Detection,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Conferences,Official documentation,Online courses,Personal Projects",,Somewhat useful,,,Not Useful,,,,,Somewhat useful,Very useful,Very useful,,,,,,,"FastML Blog,FlowingData Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Miner,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,Government,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10TB,"Decision Trees,Random Forests,Regression/Logistic Regression","Cloudera,Java,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk,Other",,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Often,,Rarely,,,,,,,,Sometimes,Often,,,,,,Most of the time,Often,,,"Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Often,,,,,,,,Rarely,,Sometimes,Sometimes,,,Sometimes,Often,,,,,,Most of the time,Most of the time,,,,40,10,5,10,20,15,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,,,,,,,Often,,,,,Most of the time,,,76-99% of projects,Entirely internal,IT Department,Rather not say,Governance,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Share Drive/SharePoint,Other",HDFS,Git,,"290,000",AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,Software Developer/Software Engineer,University courses,5,70,20,5,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,10GB,Other,"Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,Rarely,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Rarely,,Often,,,,"Logistic Regression,Neural Networks,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,Rarely,,Rarely,,,,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,,Most of the time,,,,,,,,,Often,,,,100% of projects,Entirely external,IT Department,Twitter;Instagram,Rate limits,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,Private server,Git,Most of the time,75000,GBP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Very useful,,Very useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,30,20,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Video data,Text data",Sometimes,1GB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"CNNs,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Text Analytics",,,,Most of the time,,,,,,,,,,Often,,,,,Often,Most of the time,Often,,,,Sometimes,,,,Sometimes,,,,,30,50,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Sometimes,,,,Sometimes,,,,Most of the time,,,Often,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,nil,nil,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,I don't typically share data",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,450000,INR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,40,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,RapidMiner (free version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Friends network,,,,,,Very useful,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,75,10,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Increased slightly,1-2 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,KNIME (free version),MATLAB/Octave,Microsoft Excel Data Mining,Python,R,Spark / MLlib,Tableau",,,,,Rarely,,,,Sometimes,,,,Sometimes,,,,,,Sometimes,,Often,,Often,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Text Analytics",,,,,,,Most of the time,Often,,,,,,,,Often,,,Most of the time,Often,,,Often,,,Often,,,Most of the time,,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,Most of the time,,,,,Most of the time,,,,,Often,,,,Often,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Never,,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Australia,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,Amazon Web services,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,Physics,More than 10 years,"Data Scientist,Predictive Modeler,Researcher,Other",University courses,50,0,0,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Financial,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,Often,,Sometimes,,,,,,,,,,,,,Most of the time,,,Sometimes,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics,Time Series Analysis",Most of the time,,Sometimes,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,Most of the time,Most of the time,,Often,Often,Sometimes,Most of the time,Most of the time,Most of the time,Often,,Most of the time,Most of the time,,Often,Most of the time,,,,35,30,10,15,10,0,,"Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,Often,,,,,,Often,Often,,,,Most of the time,,26-50% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,git; git-annex,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos,Other",,,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,Very useful,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",Work,30,10,45,10,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SAS Base,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,Sometimes,Sometimes,Often,,,,,Often,,,,Often,,,Often,,Sometimes,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,,,Often,Often,Often,,,,,Often,,Often,,Sometimes,,,Often,,Sometimes,Sometimes,,Often,,,Rarely,Often,,,,10,20,10,10,50,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",,,,,,Often,,,,,Often,,,Often,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,Iran,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Engineer,Operations Research Practitioner,Other",Self-taught,35,45,20,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Academic,20 to 99 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,25,35,35,5,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Most of the time,100GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Statistics,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Minitab,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,Sometimes,,,Often,,Often,,,,Rarely,,Sometimes,Most of the time,,Rarely,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Most of the time,,,Most of the time,Often,,Sometimes,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",Rarely,Sometimes,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,Most of the time,Often,Often,,Most of the time,,Often,Sometimes,Often,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,Often,,,,45,15,0,15,10,15,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,Often,Most of the time,,,Sometimes,,Often,Often,Often,Most of the time,,,,Sometimes,Most of the time,Most of the time,,Most of the time,,26-50% of projects,More internal than external,IT Department,Google BigQuery;Stock market;Healthcare;Baketball;Crime;Facebook;Amazon,Cleaning the data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,60000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,55,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,Self-taught,60,20,5,15,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,Academic,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Sometimes,10MB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks","IBM SPSS Statistics,R,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,"Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,Most of the time,,,Often,,,,,,,,Often,,,,Often,,Often,Often,,,,,,,,,Sometimes,,,,20,50,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Programmer",Self-taught,50,30,20,0,0,0,Time Series,Support Vector Machines (SVMs),A bachelor's degree,Retail,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Python,QlikView,R",,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Random Forests",,Most of the time,Often,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,40,40,0,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Scaling data science solution up to full database",Sometimes,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,54,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,Java,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Personal Projects",Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,Somewhat useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,Less than a year,"Researcher,Other",University courses,60,0,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Relational data",Most of the time,100MB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Evolutionary Approaches,Naive Bayes,Simulation,Time Series Analysis",,Sometimes,,,,Sometimes,Most of the time,,,Often,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,40,30,20,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools",Most of the time,Often,,,,,,,,Most of the time,,Often,Most of the time,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,"6,400,000",HUF,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Germany,51,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Management information systems,3 to 5 years,Computer Scientist,University courses,50,0,15,30,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,Don't know,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests","Jupyter notebooks,NoSQL,Orange,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,Rarely,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Text Analytics",Sometimes,,Often,,,Often,Most of the time,Often,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,20,40,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,,,,,,,Sometimes,Often,,,,,,,,,,,,,,26-50% of projects,Entirely internal,IT Department,,,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Most of the time,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,South Africa,36,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Python,Bayesian Methods,R,University/Non-profit research group websites,"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Researcher,Statistician",Self-taught,100,0,0,0,0,0,Survival Analysis,Logistic Regression,No education,Academic,I don't know,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Very important,,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1MB,Regression/Logistic Regression,"IBM SPSS Statistics,Minitab,R,SAS Base",,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,"Logistic Regression,Simulation",,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,20,60,20,0,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,None,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,400000,ZAR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,Somewhat useful,,Very useful,Very useful,,,,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Analyst,Self-taught,30,30,20,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Spain,27,Employed full-time,,,No,Yes,Business Analyst,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",R,Deep learning,R,Google Search,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",Work,50,5,30,15,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Regression,R,GitHub,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Gradient Boosting",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Predictive Modeler",University courses,15,10,65,10,0,0,Time Series,,A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Decision Trees,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,29,"Not employed, but looking for work",,,,,,,,Mathematica,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,Not Useful,,Very useful,,,Very useful,Very useful,,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Other",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Natural Language Processing,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Not important,Somewhat important,,Not important,Very Important,,Somewhat important,Somewhat important,,Not important,Not important,Somewhat important,Not important,Very Important +Male,People 's Republic of China,26,Employed full-time,,,No,Yes,Programmer,Fine,Self-employed,C/C++,Support Vector Machines (SVM),C/C++/C#,GitHub,"Blogs,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,"Talking Machines Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer",Self-taught,80,10,10,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,32,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,C/C++,Monte Carlo Methods,C/C++/C#,"GitHub,Google Search","College/University,Conferences,Online courses",,,Very useful,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,Computer Scientist,Work,40,40,NA,20,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data,Text data",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,Python",,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Decision Trees,HMMs,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",,,,Often,,Most of the time,,Often,,,,,Sometimes,Sometimes,,,,,Most of the time,Most of the time,,,Sometimes,,Often,Sometimes,,Often,Sometimes,,,,,20,70,10,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Dirty data,Privacy issues",,Sometimes,,,Often,,,,,,,,,,,,Sometimes,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,Git,Never,48000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Perl,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Tutoring/mentoring,Other",,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,20 to 99 employees,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Perl,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,Most of the time,,Most of the time,,,,,,,,Often,,,,,Sometimes,,Sometimes,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,Most of the time,Most of the time,Sometimes,,Often,Most of the time,,,,,,Sometimes,Most of the time,,Most of the time,Sometimes,Often,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,,,50,25,10,15,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,Most of the time,,,,,,Often,,,Often,,Most of the time,,Most of the time,Often,Most of the time,Often,,51-75% of projects,Approximately half internal and half external,Other,,Messy data and multiple sources with different encoding schemes.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,18800,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +A different identity,United Kingdom,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Other,Python,"GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,,,,Very useful,,Very useful,,,,,"Data Elixir Newsletter,FlowingData Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,50,0,40,10,0,0,,,A doctoral degree,Academic,500 to 999 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Other,Other,Most of the time,10GB,"Decision Trees,HMMs,Neural Networks,Random Forests,SVMs","C/C++,Java,Jupyter notebooks,Perl,Python,R,Unix shell / awk",,,,Rarely,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,Sometimes,Most of the time,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,,,Most of the time,,,,,,,,,Often,,Often,,,Often,,,,,,Rarely,,Sometimes,Sometimes,,,,40,5,5,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,Most of the time,Most of the time,,,Often,,,,Most of the time,Often,Sometimes,,,,Most of the time,,,,100% of projects,Entirely internal,Standalone Team,All sorts of publicly available biology datasets.,Too much data and not enough people on the team to deal with it.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",open source data platform,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,46000,GBP,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,France,29,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,Siraj Raval YouTube Channel,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Github Portfolio,No,Master's degree,Other,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Other,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,"Data Analyst,Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Very Important +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Employed by government,Python,Text Mining,Stata,"Google Search,Government website,University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Predictive Modeler",University courses,40,0,40,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Sometimes,1GB,Regression/Logistic Regression,"IBM SPSS Statistics,Microsoft Excel Data Mining,R",,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Logistic Regression,PCA and Dimensionality Reduction,Simulation",,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,Sometimes,,,,,,,30,15,30,15,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Most of the time,Often,,,Often,Often,,,Most of the time,Most of the time,,,,,,,,,,,,,76-99% of projects,More external than internal,Central Insights Team,"Administrative data, market data",To use as many statistical techniques as possible in order to find specific results that will help to formulate recommendations,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,6000000,XOF,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Japan,52,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Deep learning,R,University/Non-profit research group websites,"Conferences,Kaggle,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),I don't write code to analyze data,Other,Self-taught,90,0,10,0,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Stayed the same,Don't know,Some other way,Not very important,Other,Basic laptop (Macbook),"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,SVMs","Jupyter notebooks,Orange,Python,R,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,Often,,Often,,Sometimes,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Often,Sometimes,,Most of the time,Most of the time,Often,Sometimes,,,,,Sometimes,,,,Sometimes,Often,Sometimes,Often,,Sometimes,,Sometimes,,Most of the time,Often,Often,Most of the time,,,,20,20,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,Often,,Often,,,,,,Often,Often,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,Other,,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,30,0,0,30,40,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,25,"Not employed, but looking for work",,,,,,,,R,Text Mining,R,Government website,Friends network,,,,,,Very useful,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,Less than a year,,Self-taught,30,30,10,30,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Not important +Male,Poland,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Stack Overflow Q&A",Very useful,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),0 - 1 hour,Master's degree,Yes,Master's degree,Computer Science,,"Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,"Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important +Female,Sweden,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Company internal community,Friends network,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,Other",,,,Very useful,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Researcher",Self-taught,10,30,50,0,10,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Rarely,1GB,"Neural Networks,Regression/Logistic Regression,RNNs","Google Cloud Compute,Jupyter notebooks,Python,TensorFlow,Other",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,Often,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Time Series Analysis,Other",,,,,,Often,Often,,,,,,,Sometimes,,Sometimes,,,,Most of the time,,,,,Sometimes,,,,,Often,Often,,,30,30,30,5,5,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Privacy issues",,,,,,,,,Most of the time,,,Often,,,,,Often,,,,,,Less than 10% of projects,Entirely internal,Other,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Git,Sometimes,700000,SEK,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Miner,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,6 to 10 years,Business Analyst,Self-taught,90,10,0,0,0,0,Natural Language Processing,Bayesian Techniques,,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,,,"Bayesian Techniques,Neural Networks","Amazon Web services,C/C++,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R",,Most of the time,,Sometimes,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Neural Networks,Recommender Systems",,,,,,,Most of the time,,,,,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Kaggle,Podcasts,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,,,Somewhat useful,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Yes,I prefer not to answer,Computer Science,,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Belgium,26,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,IBM Watson / Waton Analytics,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,Very useful,,,,Very useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Physics,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,"Data Scientist,Researcher",University courses,40,20,20,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A bachelor's degree,Manufacturing,20 to 99 employees,Increased significantly,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Always,10MB,Ensemble Methods,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,50,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,Engineer,Self-taught,75,0,25,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests",High school,Manufacturing,"5,000 to 9,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Image data,Text data",Never,10GB,"Bayesian Techniques,Decision Trees,Random Forests","C/C++,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Python,R,SQL,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Random Forests,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Proprietary Algorithms,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,Software Developer/Software Engineer,Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A professional degree,Technology,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation","Text data,Relational data",Rarely,1GB,"Evolutionary Approaches,Neural Networks,RNNs","C/C++,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,"A/B Testing,Data Visualization,Logistic Regression",Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,80,0,5,15,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Sometimes,90000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,France,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,10,0,0,20,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service",Image data,Rarely,1GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,RNNs",,,,Often,,Often,,,,,,,,Often,,Most of the time,,,,Often,,,Sometimes,,,,,,,,,,,0,70,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,,,Often,,,Most of the time,,,Often,,Often,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Sometimes,,,Other,7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Time Series Analysis,Python,,"Blogs,College/University,Company internal community,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,,Very useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,30,20,20,0,0,,,High school,Technology,500 to 999 employees,Increased slightly,1-2 years,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Impala,R,SQL",,Sometimes,,,,,,,Most of the time,,,,Rarely,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Time Series Analysis",Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,65,0,0,20,15,0,Enough to run the code / standard library,"Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,,,,,,,,,Sometimes,Often,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,90000,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Italy,30,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Other,Other,Government website,"Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,Other,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,United Kingdom,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Employed by professional services/consulting firm,Amazon Web services,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Non-Kaggle online communities,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,,Logistic Regression,A professional degree,Telecommunications,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,,"Amazon Web services,C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Perl,QlikView,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Sometimes,,Rarely,Sometimes,,Rarely,,Often,,,,,Sometimes,Sometimes,,,,,,,,,Rarely,,,Sometimes,,,Rarely,,Rarely,Sometimes,,,,,,,,Sometimes,Sometimes,,,Rarely,,,Sometimes,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,10,0,0,0,0,90,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,N/A,Git,Sometimes,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,France,32,"Independent contractor, freelancer, or self-employed",,,No,Yes,Machine Learning Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Singapore,100,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by college or university,Employed by government,Self-employed",,Proprietary Algorithms,Python,,"Arxiv,Blogs,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,,,,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,1GB,"CNNs,Ensemble Methods,GANs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,GANs,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Most of the time,,Most of the time,Most of the time,,Most of the time,,Sometimes,Sometimes,,,,Most of the time,,,,Most of the time,Often,,,,,Often,,Most of the time,,,,,,30,40,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Most of the time,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Sweden,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,27,20,0,3,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Other",,,,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",Work,0,0,50,0,20,30,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Video data,Relational data",Don't know,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Rarely,Sometimes,,,Sometimes,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,Often,,,,,,Sometimes,,Often,,,,,,,Often,,,Sometimes,,,,,,,,60,30,0,10,0,0,Enough to explain the algorithm to someone non-technical,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,Kaggle; AnalyticsVidhya,Data Cleaning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Don't know,600000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Czech Republic,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Unix shell / awk,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Friends network,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,,,,Somewhat useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Researcher",Self-taught,30,5,35,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data,Other",Most of the time,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs,Other","C/C++,Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Rarely,Sometimes,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Often,Sometimes,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Text Analytics",Sometimes,,,Sometimes,Often,Rarely,Most of the time,Rarely,Sometimes,,,Sometimes,,,,Often,,,Rarely,Sometimes,Rarely,,,Often,Sometimes,,,,Rarely,,,,,20,40,5,20,10,5,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",Often,Often,,Most of the time,Often,Often,,,Rarely,,,Rarely,Sometimes,,Often,,,,,,,,76-99% of projects,More internal than external,IT Department,,"cleaning,understanding of generating process","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,server storage,Git,Sometimes,720000,CZK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,44,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Personal Projects,Other",,,,,,,,,,,,Very useful,,,,,,,,< 1 year,Necessary,,,,Necessary,,Nice to have,,,Nice to have,,,,,Traditional Workstation,0 - 1 hour,Other,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,Julia,Deep learning,Python,GitHub,"Arxiv,Blogs,Official documentation,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,,,,Very useful,,,,Very useful,Very useful,,,,"FastML Blog,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher",Self-taught,30,20,40,0,0,10,"Computer Vision,Reinforcement learning,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,A tech-specific job board,Not very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Sometimes,100GB,Neural Networks,"Amazon Web services,Jupyter notebooks,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,HMMs,kNN and Other Clustering,Neural Networks,Segmentation",,,,Most of the time,,Most of the time,Most of the time,,,,Sometimes,,Often,Sometimes,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,40,20,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,IT Department,We have our own partners who provide us the data,Get the data ready for the network is the biggest challenge we face,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Share Drive/SharePoint,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,650000,INR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Ireland,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Textbook",,,,,,,Somewhat useful,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Other",Self-taught,60,30,0,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Random Forests,SVMs","Amazon Web services,C/C++,IBM SPSS Statistics,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow",,Most of the time,,Rarely,,,,,,,,Rarely,,,Sometimes,,Most of the time,,,Rarely,Rarely,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Sometimes,Most of the time,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,Often,,Sometimes,,Often,Often,,,,,Often,Rarely,,,,40,40,10,0,10,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Most of the time,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,30,0,20,0,Time Series,Support Vector Machines (SVMs),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,R,Government website,"Blogs,College/University,Conferences,Newsletters,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Data Miner,Data Scientist,Software Developer/Software Engineer",Self-taught,50,10,30,10,0,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Government,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,1GB,"Bayesian Techniques,Decision Trees,SVMs","C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),Mathematica,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Minitab,Perl,Python,R,RapidMiner (free version),SAS Base,Spark / MLlib,SQL,Stan,Unix shell / awk",,,,Rarely,,,,,Sometimes,,,,,,Sometimes,,Often,,Sometimes,Sometimes,,Sometimes,Sometimes,,,Rarely,,,,Sometimes,Often,,Most of the time,,Sometimes,,,Rarely,,,Often,Often,Most of the time,,,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Most of the time,,,Most of the time,Most of the time,Often,Sometimes,Sometimes,,,,Sometimes,,Often,,Often,,,,Often,Sometimes,Sometimes,,Often,Often,Sometimes,Often,Often,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,,,,,,,Often,,Most of the time,,,,Sometimes,,100% of projects,Entirely internal,Other,https://data.overheid.nl/,Privacy,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Most of the time,"100,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,36,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,18,5,25,50,2,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,A general-purpose job board,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"CNNs,Ensemble Methods,HMMs,Random Forests,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,Rarely,,,,Rarely,,Most of the time,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,Sometimes,,Often,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Sometimes,Often,Most of the time,,Often,Often,,,,Often,Often,,Often,,Often,Most of the time,Rarely,Often,,Often,Most of the time,Sometimes,,,Often,Often,Sometimes,,,,25,45,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",Most of the time,,,,Often,Sometimes,,,,Most of the time,,,,,,,Often,,,,Sometimes,,Less than 10% of projects,More internal than external,IT Department,BNC corpus and other text corpora,Size and dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,"Bitbucket,Git",Sometimes,"30,000",EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Canada,48,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",Very useful,,,,Somewhat useful,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,25,25,0,25,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"GPU accelerated Workstation,Workstation + Cloud service",Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Association Rules,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Prescriptive Modeling",,Sometimes,,,,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,,,,25,25,25,5,20,0,Enough to refine and innovate on the algorithm,Difficulties in deployment/scoring,,,,Sometimes,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,weather observations; weather forecasts; energy prices; energy consumption,Data Quality,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,CAD,Has decreased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Survival Analysis,R,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Business Analyst,Work,70,0,30,0,0,0,Natural Language Processing,,A bachelor's degree,Technology,100 to 499 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,,,"Amazon Web services,Microsoft Excel Data Mining,Python,RapidMiner (free version),SQL",,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Sometimes,,,,Sometimes,,,,,,,Often,,,,,,,,,,Cross-Validation,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,20,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Sometimes,890000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Greece,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,,< 1 year,,,,,Nice to have,,Necessary,,Necessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,45,10,35,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,Spark / MLlib",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,"Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation",,,,,,,,Often,,,,Most of the time,,,,Rarely,,,,,,,Sometimes,,,Sometimes,,,,,,,,75,10,2,8,5,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,Often,,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Most of the time,360000,RUB,Has decreased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Australia,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"1,000 to 4,999 employees",Decreased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,54,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Other,Online courses,,,,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Technology,10 to 19 employees,,,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,,10GB,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,40,0,10,40,10,0,,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,26-50% of projects,Do not know,Other,SAP,"Complexity, volumes",Other,I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,720000,RSD,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Very useful,,,Very useful,,,,,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,Business Analyst,Work,60,0,10,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,,Sometimes,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,Sometimes,Often,Sometimes,Rarely,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,20,20,10,10,40,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,Sometimes,,Most of the time,,,,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,Often,Sometimes,Sometimes,,Most of the time,,51-75% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,90000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Turkey,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Newsletters,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer",University courses,20,10,20,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs",A bachelor's degree,Telecommunications,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Most of the time,,,,,Often,Most of the time,Most of the time,Most of the time,,,Often,,Most of the time,Often,Most of the time,,,,,Most of the time,Often,Most of the time,,,Most of the time,,,Sometimes,Often,,,,50,20,20,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Often,,,,Often,,,,,Often,,,Sometimes,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,"Social media data, open data initiatives data",Documentation about source system or help from someone who owns the system producing data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,USD,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,,Very useful,Very useful,Very useful,,,Very useful,Very useful,,,,,Very useful,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,5,0,5,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Internet-based,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Most of the time,1GB,"CNNs,GANs,Neural Networks","IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,,Most of the time,,,,,,"CNNs,GANs,Natural Language Processing,Neural Networks,Text Analytics",,,,Most of the time,,,,,,,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,,Often,,,,,5,75,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,Often,,,,,,Sometimes,Often,,Less than 10% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Never,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",65,20,10,0,5,0,Natural Language Processing,Other (please specify; separate by semi-colon),A master's degree,Manufacturing,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10MB,,"Microsoft Excel Data Mining,Minitab,Python,SAP BusinessObjects Predictive Analytics,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,Rarely,,,,,,Often,,,,,,,,Sometimes,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,30,20,10,10,30,0,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Rarely,,,,9,,,,,,,,,,,,,,,,,, +Male,South Africa,30,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Machine Translation,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Not important +Male,Poland,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Proprietary Algorithms,Python,Other,"Personal Projects,Tutoring/mentoring",,,,,,,,,,,,Very useful,,,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,70,10,10,5,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Ensemble Methods,A professional degree,Mix of fields,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,10GB,Ensemble Methods,"C/C++,MATLAB/Octave",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,,Most of the time,,Often,,,,,Sometimes,,,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,,,35,15,20,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,,,,,,,Often,,,,Sometimes,,,,,,Sometimes,,76-99% of projects,Entirely internal,Other,,time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,120000,PLN,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,46,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Official documentation,Textbook",,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Programmer",University courses,25,0,40,20,15,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,"Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,Rarely,<1MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","C/C++,Java,Microsoft Excel Data Mining,RapidMiner (free version),SQL",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests",,Often,Often,,,Most of the time,,Often,,Sometimes,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,30,50,5,5,10,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of data science talent in the organization",,,,,,Sometimes,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,UCI,Dimensionality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,31000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,55,5,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Financial,20 to 99 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Other,Laptop or Workstation and local IT supported servers,Relational data,Rarely,100GB,,"Amazon Web services,Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL,Tableau",,Often,,,,,,,,,,,,,Rarely,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by professional services/consulting firm,R,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",Textbook,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,35,0,0,65,0,0,,,A bachelor's degree,Mix of fields,100 to 499 employees,,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Traditional Workstation",Relational data,,,Neural Networks,Java,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,0,100,0,0,0,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Sometimes,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Germany,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,Less than a year,Data Analyst,Self-taught,70,30,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs)",A doctoral degree,Retail,500 to 999 employees,Increased slightly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,44,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,R,Social Network Analysis,Haskell,,"Arxiv,Textbook",Somewhat useful,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Programmer,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased slightly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"Perl,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,kNN and Other Clustering,Logistic Regression,Text Analytics,Time Series Analysis",Most of the time,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Often,Sometimes,,,,10,30,0,10,50,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,Often,,Sometimes,,,,,,,,,Most of the time,,,51-75% of projects,More external than internal,Standalone Team,,,Other,I don't typically share data,,Other,Never,4800000,INR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,SQL,GitHub,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer",Work,20,30,10,0,10,30,"Adversarial Learning,Reinforcement learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,10,0,30,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues",Most of the time,Rarely,,,,,,,Often,,,Sometimes,Most of the time,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,INR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Australia,59,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,"GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Stack Overflow Q&A,YouTube Videos",Not Useful,Very useful,Very useful,,,,,,,,,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Other,Self-taught,10,5,85,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Not important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Italy,24,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,20,10,10,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","KNIME (free version),Python,R,RapidMiner (free version),SQL,TensorFlow",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,Most of the time,,Rarely,,,,,,,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,Sometimes,,Sometimes,,Rarely,Sometimes,,Sometimes,,Often,,,,,,,Often,,,,30,35,5,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,Most of the time,,,,,Sometimes,,Sometimes,,,,,Sometimes,,76-99% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Fine,Self-employed,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,Very useful,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,Less than a year,"Data Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,5,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",,100MB,"Decision Trees,Random Forests,SVMs","Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Ensemble Methods,Random Forests,SVMs,Time Series Analysis",,,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,50,0,25,0,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Sometimes,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,Sometimes,1200000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Ukraine,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,10,0,0,40,,,A doctoral degree,Technology,Fewer than 10 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Workstation + Cloud service",Text data,,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Python,Unix shell / awk,Other",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Often,Most of the time,,,Cross-Validation,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,99,0,0,0,1,0,Enough to run the code / standard library,"Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,3500,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,19,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,,,Very useful,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Somewhat important +Male,Israel,31,Employed part-time,,,Yes,,Statistician,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,University/Non-profit research group websites,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,Statistician,University courses,20,0,0,80,0,0,,,A bachelor's degree,Academic,,,,,Very important,Other,Basic laptop (Macbook),,,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,35,Employed part-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,,,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,Not Useful,,,,,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,Less than a year,Statistician,University courses,10,30,20,40,0,0,Survival Analysis,"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,,,IBM SPSS Statistics,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,,Often,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,,,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Japan,31,Employed full-time,,,No,Yes,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Researcher,Other",Work,10,0,90,0,0,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,Employed part-time,,,Yes,,Scientist/Researcher,,,C/C++,Deep learning,,,Podcasts,,,,,,,,,,,,,Very useful,,,,,,"Data Stories Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Operations Research Practitioner,Researcher",University courses,10,20,20,50,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Academic,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Text data,Other",Rarely,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","C/C++,Python",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Logistic Regression,Neural Networks,RNNs",,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,Most of the time,,,,,,,,,10,40,20,30,0,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,,,,,Often,,,Often,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,"Bitbucket,Git",,900,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Female,Other,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,,,Very useful,Somewhat useful,"DataTau News Aggregator,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Researcher",University courses,10,10,50,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs",,Technology,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,QlikView,R,Stan,Unix shell / awk",,,,,,,,,Often,,,,,Sometimes,,,Often,,,,,,,Most of the time,,,,,,,Often,Sometimes,Most of the time,,,,,,,,,,Rarely,,,,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Time Series Analysis",Often,Sometimes,Sometimes,,,Often,Most of the time,Most of the time,,,,,,Most of the time,Sometimes,Often,,,,Often,Sometimes,,,Sometimes,,Most of the time,,,,Most of the time,,,,70,20,5,4,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,Sometimes,Most of the time,,,Most of the time,,,,,Often,,Sometimes,,,,,Most of the time,,,51-75% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git",Rarely,69000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Switzerland,58,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Newsletters,Online courses,Other",Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,,,Very useful,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Researcher,Other",University courses,10,10,10,70,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Google Cloud Compute,KNIME (commercial version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Perl,Python,R,RapidMiner (commercial version),Spark / MLlib,SQL,Tableau,TensorFlow",Sometimes,,,,,,,Sometimes,,,,,,,,,,Often,,,,Sometimes,Often,,,,Sometimes,,,Often,Sometimes,,Often,Often,,,,,,,Sometimes,Often,,,Often,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Often,Often,Often,,Often,,Often,,,,Sometimes,,Often,,Often,Sometimes,Often,Sometimes,Sometimes,Often,Often,Often,Often,,Often,,Sometimes,Often,Often,,,,70,15,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,,Sometimes,,,Most of the time,,Most of the time,,,Often,Sometimes,,Often,,,Most of the time,Often,,10-25% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Most of the time,75000,CHF,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,Amazon Web services,,R,Google Search,"Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,6 to 10 years,"Data Analyst,Researcher,Statistician",Work,20,10,50,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Military/Security,I prefer not to answer,Stayed the same,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Java,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,Sometimes,,,,,,,,Often,Often,Often,,,Rarely,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,Sometimes,Rarely,,,,Often,,Often,Rarely,,,,40,15,15,15,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,Often,,Often,,,Often,Often,,,,Often,,,,,,Often,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,132000,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Support Vector Machines (SVM),Python,Other,"College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)",,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Germany,28,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,,,edX,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),,Engineer,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important +Male,Germany,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,29,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Kaggle,Textbook",Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),,"Data Analyst,Data Scientist",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Nigeria,40,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Python,Anomaly Detection,Python,"Google Search,Government website",Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Logistic Regression,No education,Academic,20 to 99 employees,Decreased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),Other,,,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,0,0,0,0,0,0,,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,900000,NGN,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Chile,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,,,,Very useful,Very useful,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Bachelor's degree,Electrical Engineering,,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Recommendation Engines,"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Not important +Male,United States,57,"Not employed, but looking for work",,,,,,,,Minitab,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Other",Self-taught,50,0,30,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Survival Analysis",Bayesian Techniques,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,21,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,25,5,70,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,50,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Regression,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Podcasts,Trade book,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Very useful,,,Somewhat useful,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Pharmaceutical,500 to 999 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods","IBM Cognos,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression",Often,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Often,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,100% of projects,Approximately half internal and half external,IT Department,Pharmaceutical market sales data,Segmenting and data cleanising,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,650000,SEK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,Self-taught,60,20,10,0,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,55,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,,Python,"Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Company internal community,Conferences,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,Very useful,"FastML Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Flume,Hadoop/Hive/Pig,Impala,Julia,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Minitab,NoSQL,Perl,Python,R,RapidMiner (free version),SAS JMP,Spark / MLlib,SQL,Tableau,TensorFlow,Other,Other",Sometimes,Often,,Rarely,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,Most of the time,,Sometimes,,,Often,Sometimes,,,Sometimes,Sometimes,,,Rarely,Most of the time,,Most of the time,,Sometimes,,,,,Sometimes,Often,Often,,,Often,Often,,,Sometimes,Sometimes,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Most of the time,,Often,Often,Most of the time,Most of the time,Often,,Often,Most of the time,Often,Most of the time,Often,Most of the time,Often,Most of the time,Most of the time,Often,Often,Often,Most of the time,Often,Often,Often,Most of the time,Often,Most of the time,Most of the time,,,,30,30,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,Often,,,,,,,Often,,100% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Hungary,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Online courses",Very useful,,,,Somewhat useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Programmer,Self-taught,60,30,0,10,0,0,Computer Vision,"Bayesian Techniques,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,Some other way,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Most of the time,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,SVMs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,Often,Often,,Often,Rarely,,Often,,,,,Rarely,,,,Sometimes,,Rarely,Rarely,,,,,,,Sometimes,,,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,Often,,,,,Often,,,Sometimes,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,DiaretDB; Drive; ISIC,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,Sometimes,6000000,HUF,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Czech Republic,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Researcher,Other",Work,30,30,30,5,5,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Perl,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,Often,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,Sometimes,Most of the time,,Most of the time,,,,,,,,Often,Often,,,,Sometimes,,Sometimes,Often,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",Sometimes,,Sometimes,Sometimes,Often,Most of the time,Most of the time,Most of the time,Often,,,Sometimes,,Most of the time,Most of the time,Most of the time,,Sometimes,Often,Often,Most of the time,,Most of the time,Often,,,,Often,Often,,,,,30,10,25,5,10,20,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data,Other",,,,,Most of the time,Most of the time,,Often,,,,,,Often,Often,,Most of the time,,,,Most of the time,Often,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Spain,48,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","GPU accelerated Workstation,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Survival Analysis,Python,Google Search,"Blogs,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,15,0,80,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,Rarely,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Sometimes,Often,Rarely,,,,Sometimes,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,30,20,30,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,,,,,,,Often,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,Standalone Team,bing maps,disparate data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,80000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,GitHub,"Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Very useful,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer",Self-taught,60,20,20,0,0,0,Unsupervised Learning,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,10TB,Neural Networks,"Microsoft Excel Data Mining,Python,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Sometimes,Often,,,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,40,20,20,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT",,,,,Sometimes,,,,Most of the time,,,,Often,,Most of the time,,,,,,,,26-50% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,Git,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Philippines,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,DataCamp,Traditional Workstation,0 - 1 hour,Github Portfolio,Yes,I did not complete any formal education past high school,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Germany,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,CRM/Marketing,20 to 99 employees,Stayed the same,3-5 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Rarely,,Other,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,90,5,5,0,0,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,dirty data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Rarely,68000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,R,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,KDnuggets Blog,< 1 year,,,,,Necessary,Necessary,Necessary,Nice to have,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,80,10,0,0,10,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs",,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,,Very Important,Very Important,Very Important,,,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Portugal,37,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Java,Deep learning,C/C++/C#,University/Non-profit research group websites,"Arxiv,College/University,Conferences,Kaggle,Official documentation",Somewhat useful,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,Engineer,University courses,40,25,10,15,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Evolutionary Approaches,Support Vector Machines (SVMs)",Primary/elementary school,Academic,I don't know,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data,Other",Sometimes,100MB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks","C/C++,Mathematica,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Evolutionary Approaches,Simulation,Time Series Analysis",,,Sometimes,,,,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,Often,,,,10,40,15,15,20,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,Sometimes,,,Often,,Sometimes,,,,,,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,12000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Brazil,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,Not Useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,10,60,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Government,,,,,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Workstation + Cloud service","Image data,Video data,Text data,Relational data,Other",Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Azure Machine Learning,Python,QlikView,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,Often,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",Often,Sometimes,,,,Most of the time,,Sometimes,Most of the time,,,Most of the time,,,,Often,,Rarely,Most of the time,,,,Sometimes,,,,,,Most of the time,Most of the time,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,76-99% of projects,More internal than external,Central Insights Team,Govern OpenData;IBGE;Portal da Transparencia;,We are a team that has 20 years of experience in data integration and we have not encountered difficulties in this area.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,400000,BRL,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,21,Employed part-time,,,No,Yes,Statistician,Fine,Self-employed,C/C++,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,Very useful,Very useful,,,,,,,Very useful,Very useful,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Statistician,Self-taught,60,0,0,40,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Very Important +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,,,,Nice to have,,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Poland,26,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,KDnuggets Blog,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,No,Master's degree,Mathematics or statistics,Less than a year,Other,Other,30,25,0,5,10,30,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important +Male,Other,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"College/University,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Very useful,,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",15,30,10,35,0,10,,,A doctoral degree,Mix of fields,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,100MB,,"Microsoft Azure Machine Learning,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Association Rules,Data Visualization,Time Series Analysis",,Often,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,30,15,30,10,15,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Often,,,,,,,Sometimes,,,Sometimes,Often,,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by college or university,Weka,,Java,,"Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,10,80,10,0,0,0,Computer Vision,"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A professional degree,Academic,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,,,Image data,Sometimes,10MB,"Markov Logic Networks,SVMs","C/C++,Mathematica,MATLAB/Octave,NoSQL,Python,R,RapidMiner (free version),SQL,Statistica (Quest/Dell-formerly Statsoft),Other",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Rarely,,,,Often,,Most of the time,,Often,,,,,,,Often,,Often,,,,,Most of the time,,,"Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,30,30,10,20,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Limitations of tools",,Often,,,,,,,,,,,Often,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,48000000,UGX,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,60,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Other,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,"Blogs,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher",Self-taught,60,20,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Relational data",,1GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Naive Bayes,Neural Networks,Time Series Analysis",,,Often,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,Often,,,,20,40,10,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Rarely,,INR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United Kingdom,41,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,20,10,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,<1MB,Decision Trees,"Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,Python,R,Unix shell / awk",,Sometimes,,Often,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Decision Trees,Evolutionary Approaches",,,,,,Often,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,50,20,10,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,"Bitbucket,Git,Subversion",,105000,GBP,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Canada,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Predictive Modeler,Other","Online courses (coursera, udemy, edx, etc.)",25,25,10,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,,SQL,"I collect my own data (e.g. web-scraping),Other","Personal Projects,Podcasts",,,,,,,,,,,,Very useful,Very useful,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,Programmer,Self-taught,100,0,0,0,0,0,,,"Some college/university study, no bachelor's degree",Internet-based,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Text data,Relational data",Always,100GB,,"Google Cloud Compute,Hadoop/Hive/Pig,Microsoft Excel Data Mining,NoSQL,SQL",,,,,,,,Rarely,Rarely,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Text Analytics",,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,85,10,4,1,0,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,Sometimes,,,,,Often,,,,,Sometimes,,Less than 10% of projects,Entirely internal,Standalone Team,"Primarily analyse the stock market for value investing, namely Benjamin Graham's intrinsic value formula.",Time.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"I don't typically share data,Other",Google Sheets,Other,Always,250000,INR,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,32,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,57,Retired,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Python,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),Kaggle,,,,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Engineer,Self-taught,90,0,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Support Vector Machines (SVMs)",No education,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,52,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Official documentation,Personal Projects",Very useful,,Very useful,,Very useful,,Very useful,,,Somewhat useful,,Very useful,,,,,,,"FlowingData Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,University courses,30,0,20,50,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,Orange,Perl,Python,R,RapidMiner (commercial version),RapidMiner (free version),TensorFlow,Unix shell / awk",Rarely,Rarely,,Sometimes,,,,,,,,,,,Rarely,,Most of the time,,Sometimes,,Often,,,,,,,,Rarely,Sometimes,Most of the time,,Often,Rarely,Rarely,,,,,,,,,,,Often,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,Often,,Most of the time,Most of the time,Often,Sometimes,,Sometimes,Sometimes,,Sometimes,,Often,,Often,,Sometimes,Often,,Sometimes,,,Often,,Most of the time,Often,Sometimes,,,,25,40,10,20,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,,,,Sometimes,,,Often,Sometimes,,,,,,Often,,Sometimes,,51-75% of projects,Entirely external,Other,Collaborative and kaggle datasets,Understanding the data and the scientific question,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,150,BRL,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,20,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Image data,,10GB,"CNNs,Ensemble Methods,Markov Logic Networks,Neural Networks,RNNs","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs",,,,Most of the time,,Often,Often,,,,,,,Sometimes,Sometimes,Often,,,,,Sometimes,,Often,Sometimes,Often,,,,,,,,,30,50,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input",,,,,Often,,,,,,Often,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Sometimes,,,I am not currently employed,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,France,49,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased significantly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Jupyter notebooks,Perl,Python,R,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Often,Often,,Often,,,,,,,,,,,,,Rarely,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,Sometimes,Sometimes,,,Often,Sometimes,Often,Often,,,Sometimes,,,,,,Sometimes,,Sometimes,Often,,Often,,,,Sometimes,Sometimes,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,Sometimes,,,,,,,,Often,,Often,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,55000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,France,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website",Newsletters,,,,,,,,Somewhat useful,,,,,,,,,,,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Other",Kaggle competitions,5,10,40,5,40,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Other,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,10TB,Gradient Boosted Machines,"Amazon Web services,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,Sometimes,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Gradient Boosted Machines,Segmentation,Time Series Analysis",Often,,,,,Often,Most of the time,,,,,Often,,,,,,,,,,,,,,Often,,,,Often,,,,5,30,30,35,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,Often,,,,,,Often,Most of the time,,Often,,,,Sometimes,,100% of projects,Entirely internal,Standalone Team,,the size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Rarely,110000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,20,40,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Evolutionary Approaches,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Don't know,100MB,"Bayesian Techniques,Other","Java,Python,R,SQL",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees",,Often,Often,,,Sometimes,Often,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,20,30,30,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",I don't plan on learning a new tool/technology,Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,Company internal community,Official documentation,Online courses,Personal Projects,Podcasts",,Very useful,,Somewhat useful,,,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,6 to 10 years,"Researcher,Other",Self-taught,80,10,0,10,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Other,20 to 99 employees,Increased significantly,3-5 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Most of the time,100GB,"CNNs,Random Forests","Amazon Web services,Google Cloud Compute,NoSQL,Python,TensorFlow,Unix shell / awk,Other",,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,Most of the time,,,"CNNs,Random Forests",,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,70,5,0,5,20,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Rarely,Most of the time,,,,,,,,,,,,,,Often,,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,,dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,"Cloud Storage, AWS S3, Google Drive, Dropbox",Git,Rarely,180000,USD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"Data Machina Newsletter,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,35,10,5,40,5,5,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Don't know,A career fair or on-campus recruiting event,Important,Other,Traditional Workstation,Other,,,,"C/C++,MATLAB/Octave,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Evolutionary Approaches,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,,,Often,,Often,Often,,Sometimes,Rarely,,,Most of the time,,Often,,Often,,Most of the time,Most of the time,,,Most of the time,Most of the time,Sometimes,,Often,Sometimes,Often,,,,10,40,20,15,15,0,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,None,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Somewhat useful,,,,,"FastML Blog,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,Work,30,5,40,5,20,0,Natural Language Processing,Logistic Regression,I prefer not to answer,Other,100 to 499 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,SVMs","NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,Text Analytics",,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,35,15,25,20,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of significant domain expert input",,,,,Often,Sometimes,,,,,Most of the time,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,I don't typically share data",,Git,Sometimes,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Canada,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,C/C++,Deep learning,Python,,"Arxiv,Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,,Self-taught,80,0,0,10,10,0,,,High school,Academic,"5,000 to 9,999 employees",,Don't know,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Don't know,10GB,Regression/Logistic Regression,"C/C++,Jupyter notebooks,Python",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Simulation",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,40,10,10,40,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,,,,,Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,20,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,,,,Very useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,6 to 10 years,"Software Developer/Software Engineer,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Web services,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler,Other",University courses,10,10,50,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Retail,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,Most of the time,,,,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"Decision Trees,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,Often,,,,25,25,25,0,25,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,Sometimes,,,Often,,,,Often,Sometimes,,,,,,Sometimes,Sometimes,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +A different identity,Other,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,35,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",High school,Technology,20 to 99 employees,Increased slightly,More than 10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,1GB,Regression/Logistic Regression,"IBM SPSS Statistics,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,Often,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,,,,,,Often,,,,Often,,,Rarely,Sometimes,,,,10,30,10,10,40,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Unavailability of/difficult access to data",,,,,,Often,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,10-25% of projects,More internal than external,IT Department,,"Acquiring raw data from public sources (NSI, Ministries etc.)","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,30000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Text Mining,R,Google Search,"College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book",,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Online Courses and Certifications,No,Master's degree,,1 to 2 years,,University courses,10,15,0,75,0,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Psychology,1 to 2 years,I haven't started working yet,Self-taught,45,25,0,0,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,26,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Other",Self-taught,60,10,20,10,0,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,,"Bayesian Techniques,Ensemble Methods,Random Forests","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,"Bayesian Techniques,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests",,,Often,,,,,,Often,,,,,,,Often,,Often,,,,,Often,,,,,,,,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,Often,Sometimes,,,Rarely,,,,,,,,,,,,,,26-50% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,,Always,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,40,15,15,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,67,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,Tableau,Genetic & Evolutionary Algorithms,C/C++/C#,I collect my own data (e.g. web-scraping),"College/University,Conferences,Friends network,Personal Projects,Tutoring/mentoring",,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Data Miner,Engineer,Predictive Modeler,Researcher,Other",University courses,60,0,20,10,0,10,"Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",High school,Academic,10 to 19 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,<1MB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,IBM SPSS Statistics,Java,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,SQL",,,,Often,,,,,Rarely,,,Rarely,,,Sometimes,,,,,Sometimes,Often,,Often,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Text Analytics,Time Series Analysis",,,Sometimes,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,Often,Sometimes,,,,50,10,10,10,20,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,Most of the time,,,Often,Often,Most of the time,,Most of the time,,,,,,Most of the time,,Most of the time,,100% of projects,Entirely external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,68000,USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Denmark,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Friends network,Online courses,Podcasts,Stack Overflow Q&A",,Very useful,Somewhat useful,,,Not Useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Other,University courses,30,10,10,40,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,1GB,"Regression/Logistic Regression,Other","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,Sometimes,,,,,,,,,,Often,,Often,Often,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,10,30,50,5,5,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Sometimes,,,,,,Often,,,,,,Most of the time,Sometimes,,Sometimes,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Other",S3,Git,Rarely,500000,DKK,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Switzerland,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),I don't plan on learning a new ML/DS method,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Official documentation,Personal Projects,Textbook",,Very useful,Somewhat useful,,,,,,,Somewhat useful,,Very useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Programmer,University courses,40,0,30,30,0,0,,,A doctoral degree,Internet-based,20 to 99 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,,10GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,R,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,"Data Visualization,Decision Trees,Ensemble Methods,Random Forests,Recommender Systems",,,,,,,Most of the time,Rarely,Rarely,,,,,,,,,,,,,,Rarely,Rarely,,,,,,,,,,40,0,0,30,30,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,Most of the time,,,,,,Sometimes,,,,,,Rarely,,,Sometimes,Often,,100% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,8400,CHF,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Canada,53,"Not employed, but looking for work",,,,,,,,Python,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,College/University,Kaggle,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,,,,,Very useful,Very useful,Very useful,Very useful,Very useful,,5-10 years,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,Necessary,,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,30,0,0,50,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Male,United States,50,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,,,,,"FastML Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,25,50,0,5,20,0,"Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Male,Pakistan,32,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,R,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,Very useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",5,30,15,10,20,20,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,1GB,Regression/Logistic Regression,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,20,20,10,10,40,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Often,,,,,,,Often,,,,,,,,,,,,Often,,76-99% of projects,Approximately half internal and half external,IT Department,nothing,According to User Needs which are different,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,600000,PKR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,,University courses,30,10,20,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,31,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Not Useful,Very useful,,Very useful,,,,Very useful,,< 1 year,,,,,,,,,,,,,,"DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Management information systems,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",40,30,0,10,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,Spain,50,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,Very useful,Very useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",University courses,90,5,0,5,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,Employed full-time,,,Yes,,Data Scientist,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Statistician",University courses,30,10,0,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,NA,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,51,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,"Business Analyst,Other",Self-taught,40,40,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Other,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation","Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Other,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,Most of the time,Sometimes,,"Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics",,,Sometimes,,,Often,,Most of the time,,,,,,Most of the time,,Most of the time,,Sometimes,Most of the time,,,,,,,,,,Most of the time,,,,,60,10,20,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,10-25% of projects,More internal than external,Business Department,None,Access,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,250000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,43,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Physics,6 to 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A bachelor's degree,Hospitality/Entertainment/Sports,,,,,Important,,GPU accelerated Workstation,Image data,Never,,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,CNNs,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,26,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Personal Projects",,Somewhat useful,Very useful,,,,,,,,,Very useful,,,,,,,,3-5 years,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,40+,Master's degree,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,0,0,0,100,0,0,Unsupervised Learning,"Bayesian Techniques,Ensemble Methods,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,46,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,Very useful,Very useful,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Manufacturing,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,Sometimes,,,,,Most of the time,Most of the time,,,,,,,,Often,,Rarely,Sometimes,,Sometimes,,Often,,,Sometimes,,,Sometimes,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,Often,,,,Sometimes,,,51-75% of projects,Entirely internal,Business Department,Census,ERP system is old and not formatted conveniently ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,120000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Researcher",University courses,10,20,50,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",CRM/Marketing,500 to 999 employees,Decreased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,Often,,,,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Decision Trees,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Random Forests,Text Analytics",Sometimes,,,,,Sometimes,,Sometimes,,,,Often,,,,Often,,,Often,,,,Often,,,,,,Often,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,Amazon Machine Learning,,Python,"GitHub,Google Search,University/Non-profit research group websites","College/University,Conferences,Online courses,Personal Projects,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"Data Stories Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Researcher",University courses,30,15,30,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs",No education,Government,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Hadoop/Hive/Pig,Java,Python",,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Evolutionary Approaches,Neural Networks,Simulation",,,,,,,Often,,,Often,,,,,,,,,,Often,,,,,,,Often,,,,,,,30,25,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,Sometimes,,,,,,,Often,Sometimes,,,,,,,,,,,Often,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,750000,IQD,Has decreased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Canada,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,Business Analyst,University courses,10,0,0,90,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Other (please specify; separate by semi-colon)",High school,Retail,"5,000 to 9,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,0,0,70,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed part-time,,,No,Yes,Engineer,Fine,Employed by college or university,TensorFlow,Other,Python,"Government website,University/Non-profit research group websites","College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,,,,Somewhat useful,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity,Other","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"DBA/Database Engineer,Engineer,Researcher",University courses,30,30,3,30,7,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,Portugal,29,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Researcher",University courses,10,20,30,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,Greece,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,1 to 2 years,"Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",5,75,10,10,0,0,"Recommendation Engines,Unsupervised Learning","Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,Fewer than 10 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,30,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Traditional Workstation,40+,Online Courses and Certifications,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Operations Research Practitioner,Predictive Modeler,Other","Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Not important +Male,United States,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,Somewhat useful,,Not Useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Other (Separate different answers with semicolon),< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Workstation + Cloud service,0 - 1 hour,,No,Bachelor's degree,Other,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,,,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,40,Employed full-time,,,Yes,,Researcher,Poorly,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Statistician",Work,25,25,35,5,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",,,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Stan,TensorFlow",Rarely,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,Most of the time,Most of the time,Most of the time,,Often,Sometimes,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,Rarely,,,Often,Most of the time,Sometimes,,,,,Sometimes,,,Often,,,,Sometimes,Often,,Rarely,,,Most of the time,Often,,,,,,,5,15,5,15,20,40,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,DataRobot,Deep learning,R,GitHub,Conferences,,,,,Very useful,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,"Some college/university study, no bachelor's degree",Financial,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Text data,Relational data",Most of the time,1GB,Bayesian Techniques,Amazon Machine Learning,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Neural Networks",,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,20,20,40,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Often,Often,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Bitbucket,Most of the time,80000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Proprietary Algorithms,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,50,0,40,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Telecommunications,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Impala,Java,Jupyter notebooks,Python,R",,,,,Sometimes,,,,Often,Rarely,Rarely,Rarely,,Sometimes,Often,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Prescriptive Modeling,Random Forests,Simulation",,,,,,,Most of the time,Often,Often,Often,,Often,,,,,,,,Often,,Often,Often,,,,Often,,,,,,,75,5,13,2,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Most of the time,Sometimes,,Most of the time,Rarely,Sometimes,Rarely,,,,,Rarely,,Sometimes,Sometimes,Most of the time,Often,,100% of projects,Approximately half internal and half external,Standalone Team,"weather, census",Getting recent data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Sometimes,140000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,61,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,"Computer Scientist,Operations Research Practitioner,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Analyst,Data Scientist",University courses,20,5,50,20,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,27,"Not employed, but looking for work",,,,,,,,Python,Cluster Analysis,Matlab,University/Non-profit research group websites,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,Less than a year,Other,University courses,10,20,0,70,0,0,"Computer Vision,Unsupervised Learning","Hidden Markov Models HMMs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",Self-taught,40,20,20,20,0,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Tableau,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Company internal community,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Trade book",,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,Very useful,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",15,35,30,20,0,0,Time Series,Logistic Regression,A master's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Increased significantly,1-2 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,Regression/Logistic Regression,"Impala,Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Association Rules,Data Visualization,Lift Analysis,Logistic Regression,Recommender Systems,Segmentation,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,Sometimes,,Sometimes,,,,Often,,,,40,30,5,15,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",,,,,Most of the time,,,,,,Often,,,,,,,Often,,Often,,Most of the time,76-99% of projects,More external than internal,IT Department,Axciom; Census; FRED,"That issues with data quality are known, tolerated and considered 'tribal knowledge'. We're often outside the 'tribe'.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Bitbucket,Never,110000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,Russia,30,Employed part-time,,,No,Yes,Data Miner,,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Traditional Workstation,2 - 10 hours,,Sort of (Explain more),Professional degree,,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",30,40,0,0,30,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,,,,,,,,,,,,,,,, +Male,Israel,39,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,3 to 5 years,,Work,20,30,30,0,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Telecommunications,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,Random Forests,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,Spark / MLlib,SQL",,Sometimes,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,,,,Sometimes,,,,,,,Rarely,,Rarely,,Most of the time,Sometimes,,,,,Sometimes,,,,,25,15,30,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Sometimes,Often,Sometimes,,,,,,Often,,,,,,,,Most of the time,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,250000,ILS,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,10,10,10,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data",Sometimes,10GB,"Neural Networks,SVMs","Amazon Web services,C/C++,MATLAB/Octave,NoSQL,Python,TensorFlow",,Rarely,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,Rarely,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Rarely,,Often,Most of the time,,,,,,,Sometimes,,,,,Rarely,Rarely,Sometimes,,,,,,,Sometimes,,,,,,20,10,20,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,76-99% of projects,Entirely internal,Other,,lack of a central easily accessible database,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,105040,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Non-Kaggle online communities,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,,,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,Don't know,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Modeler,IBM Watson / Waton Analytics,Python,R",,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,SVMs,Text Analytics",,,,,,Sometimes,Sometimes,Sometimes,,,,,,,Most of the time,Often,,Most of the time,Most of the time,,Sometimes,Often,,,,,,Sometimes,Most of the time,,,,,75,25,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Never,1400000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,20,0,30,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Government,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Cloudera,Bayesian Methods,Python,GitHub,"Blogs,Company internal community,Kaggle,Online courses,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Not Useful,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",University courses,10,30,50,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Technology,500 to 999 employees,Increased slightly,More than 10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Never,10TB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,NoSQL,Oracle Data Mining/ Oracle R Enterprise,QlikView,R,Spark / MLlib,Tableau,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,Most of the time,Most of the time,,,,,,,,Often,,,,Sometimes,,,Most of the time,,,,"Association Rules,Data Visualization,Decision Trees,Naive Bayes,SVMs",,Sometimes,,,,,Most of the time,Sometimes,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,10,20,20,20,30,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,95000,BRL,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,65,Retired,,,Yes,,Data Miner,Fine,Self-employed,TensorFlow,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,Business Analyst,Self-taught,100,0,0,0,0,0,Outlier detection (e.g. Fraud detection),Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,Python,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Trade book,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Bachelor's degree,Other,Less than a year,I haven't started working yet,Self-taught,70,20,0,0,10,0,Recommendation Engines,"Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,DataRobot,Deep learning,Scala,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Online courses",,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Machine Learning Engineer,University courses,10,0,30,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,I don't know,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,MATLAB/Octave,Python,R,SQL",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Neural Networks,SVMs",,Often,,,,Often,Often,Often,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,40,30,20,10,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,26-50% of projects,More external than internal,IT Department,,,Graph (e.g. GraphBase/Neo4j),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,41,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Jupyter notebooks,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses",,,,,Somewhat useful,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Researcher",Work,5,10,70,5,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Internet-based,20 to 99 employees,Stayed the same,3-5 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10TB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,Often,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics",Most of the time,,,,Sometimes,Sometimes,Sometimes,Often,Rarely,,,Often,,,Often,Most of the time,,,Sometimes,,Sometimes,,Often,Sometimes,,,,,Sometimes,,,,,30,20,25,5,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Unavailability of/difficult access to data",,,,Sometimes,,Often,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Other,"HDFS, S3",Git,Sometimes,90000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Turkey,37,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,Python,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Very useful,,Somewhat useful,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Other",Self-taught,40,5,20,30,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A master's degree,Academic,100 to 499 employees,,,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Google Cloud Compute,R,SQL,Other",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,,,,,Most of the time,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation",Often,Sometimes,,,Often,Most of the time,,Most of the time,Often,Most of the time,,,,Most of the time,Often,Most of the time,,Sometimes,,,Often,Often,Often,Often,,Often,Most of the time,,,,,,,30,20,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,Often,Most of the time,,Often,Most of the time,Sometimes,Sometimes,Sometimes,,Sometimes,,Most of the time,Most of the time,,Sometimes,Most of the time,Sometimes,,Less than 10% of projects,Do not know,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"250,000",TRY,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,Neural Nets,Python,"Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Unsupervised Learning,Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important +Male,Canada,66,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Business Analyst,Self-taught,40,40,20,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,50,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Proprietary Algorithms,Python,University/Non-profit research group websites,"Arxiv,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,Self-taught,60,0,20,20,0,0,Adversarial Learning,"Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,,Other,,,"CNNs,GANs,Neural Networks","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,Neural Networks,Simulation,Time Series Analysis",,,,Most of the time,,Most of the time,Often,,,,Often,,,,,,,,,Most of the time,,,,,,,Sometimes,,,Rarely,,,,30,60,0,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,Often,,,,,,,,,Sometimes,,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Sometimes,,CAD,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Other,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Other,Poorly,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,Very useful,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,25,15,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Rarely,,,Most of the time,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs,Text Analytics",Sometimes,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,Sometimes,,,,,60,15,0,5,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Often,,Most of the time,,,,,,,,,,,,Rarely,,,Often,Often,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Stayed the same,More than 10 years,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,,10MB,,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,,,,,,,,,,,,Often,Rarely,,Often,Often,,Sometimes,,,,,Sometimes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Weka,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,Data Stories Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Operations Research Practitioner,Researcher,Software Developer/Software Engineer",Self-taught,60,20,10,10,0,0,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Technology,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Most of the time,,Sometimes,,,,,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,Often,Often,,,Sometimes,Often,,,Sometimes,,,Often,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,,,,,,,,,,,Often,,,,76-99% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Sometimes,,EUR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,South Africa,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Random Forests,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,Very useful,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,A humanities discipline,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Natural Language Processing,,High school,Insurance,"5,000 to 9,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,100MB,,"Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,Often,,,,,,,,,Most of the time,,,,,,,,,,"Natural Language Processing,Text Analytics",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,40,20,10,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources",Sometimes,,,,Most of the time,Sometimes,,,,Most of the time,,,,,,,,,,,,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Poland,37,Employed full-time,,,No,Yes,Other,Poorly,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring",,Somewhat useful,,,,,Very useful,,,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,,,Nice to have,,Nice to have,,,Necessary,,Nice to have,,,,,GPU accelerated Workstation,11 - 39 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,80,0,0,0,20,0,,,Primary/elementary school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Text Mining,Python,"Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,Somewhat useful,,,,,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Talking Machines Podcast",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,40,50,0,10,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Ukraine,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Social Network Analysis,R,Other,"Company internal community,Conferences,Kaggle,Newsletters,Personal Projects",,,,Very useful,Very useful,,Very useful,Very useful,,,,Very useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler",University courses,20,0,0,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1TB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","R,SAS Enterprise Miner,Spark / MLlib,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,Often,,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,Often,Most of the time,,,,,,,,,,,Often,,,Most of the time,,,Most of the time,,Often,Sometimes,Sometimes,,,,65,20,5,5,5,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Most of the time,"160,000",USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,51,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Stack Overflow Q&A,Trade book,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,Very useful,Very useful,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United Kingdom,46,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,University courses,0,0,75,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,23,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,Official documentation,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,Very useful,,,,Somewhat useful,Very useful,,,,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Physics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Not important +Male,Other,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,KNIME (free version),Social Network Analysis,,"Government website,I collect my own data (e.g. web-scraping)","Personal Projects,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,Other,Self-taught,80,0,0,0,0,20,Time Series,Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10MB,,"IBM SPSS Statistics,Microsoft Excel Data Mining,SQL,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Simulation",Sometimes,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,10,10,10,15,40,15,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,,,,Often,,,,,Sometimes,Often,,,76-99% of projects,More external than internal,Business Department,Macroeconomic indicators and databases; census results; trade tada,Format; age of data ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,I am not currently employed,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,Ireland,34,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,TensorFlow,Anomaly Detection,R,"Google Search,University/Non-profit research group websites","Company internal community,Online courses,Personal Projects,Stack Overflow Q&A",,,,Not Useful,,,,,,,Somewhat useful,Somewhat useful,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Analyst,Data Scientist",Self-taught,10,50,10,20,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,An external recruiter or headhunter,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Never,1TB,Other,"Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Often,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,50,0,0,0,50,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Other",,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,None,Entirely internal,Other,None,"Lack of knowledge on the structure and the meaning of the data. Very often the people involved with the original data capture no longer work with the company, so information is lost.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Never,75000,EUR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,43,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,26,Employed part-time,,,Yes,,Statistician,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,10,0,60,0,0,,Logistic Regression,A bachelor's degree,Academic,I don't know,,,,,,,,,,,"Minitab,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,30,0,30,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,United States,55,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,40,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Researcher,Kaggle competitions,0,0,30,0,70,0,Time Series,Bayesian Techniques,,Other,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,100MB,,"Microsoft Azure Machine Learning,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,Rarely,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Time Series Analysis",,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important +Male,United States,30,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Company internal community,Conferences,Friends network,Kaggle,Personal Projects,Textbook",Very useful,,,Not Useful,Somewhat useful,Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,6 to 10 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,25,25,25,0,25,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,Always,10GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Python,SQL,TensorFlow,Unix shell / awk,Other",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Rarely,Most of the time,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs",Sometimes,,,,Rarely,Most of the time,,Rarely,Rarely,,,,,,,Most of the time,,,Most of the time,Often,Often,,Rarely,Sometimes,Rarely,,,,,,,,,25,50,25,0,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Most of the time,,,,Sometimes,,,,,,,,,Rarely,Rarely,,,,,,,,None,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,AWS S3,Git,Sometimes,"137,000",USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,R,"Ensemble Methods (e.g. boosting, bagging)",Python,"GitHub,University/Non-profit research group websites","Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,Very useful,,,Very useful,,,,Somewhat useful,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,40,30,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100MB,Random Forests,"Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,Most of the time,,,,,,,,,,"Cross-Validation,Random Forests",,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,75,20,5,0,0,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Explaining data science to others",,,,Most of the time,,Often,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,Rarely,,,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Portugal,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,10,10,60,10,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Trade book,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,,Very useful,Very useful,Very useful,,Very useful,,Very useful,"DataTau News Aggregator,FastML Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important +Male,Iran,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,SQL,Google Search,"Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Data Miner,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,20,30,20,20,0,10,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",Primary/elementary school,Financial,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Orange,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Random Forests,Recommender Systems,Time Series Analysis",,Sometimes,Sometimes,,,Sometimes,Most of the time,Often,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,50,15,10,20,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,Sometimes,,Often,,,,,,,Often,,Most of the time,,Often,Often,Often,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"400,000,000",IRR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,France,27,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Other,Online courses,,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,,Necessary,,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,A health science,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,SAS Base,Support Vector Machines (SVM),R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,"FlowingData Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,"Researcher,Other",University courses,40,5,25,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SAP BusinessObjects Predictive Analytics,SQL",,,,,,,,,Rarely,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Sometimes,,Most of the time,,,,Most of the time,,,,,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,,,,Sometimes,,Most of the time,,,,,Often,Often,Sometimes,Sometimes,,,Sometimes,,,Often,,,,20,40,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,Often,,,,,Often,,,Often,,,,,,Most of the time,Sometimes,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,66000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Online courses",,Somewhat useful,,,,,Very useful,,Somewhat useful,,Very useful,,,,,,,,"Data Elixir Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,60,40,0,0,0,0,,,High school,Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Rarely,1GB,CNNs,"Google Cloud Compute,Python",,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,30,10,30,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Often,,Sometimes,,,,,,,,Often,Sometimes,Sometimes,,Less than 10% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Engineer,Predictive Modeler",University courses,30,30,20,10,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Other,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,R,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Often,Most of the time,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,Segmentation,Simulation,Time Series Analysis",Rarely,,,,,,Most of the time,Sometimes,,,,Often,,,,Often,Sometimes,Sometimes,,,,,,,,Often,Often,,,Often,,,,40,20,20,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Researcher,Other",Self-taught,30,30,20,10,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Most of the time,100GB,,"Amazon Web services,C/C++,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,Most of the time,,Sometimes,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Often,Most of the time,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,10,20,20,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,,Company Developed Platform,,Git,Rarely,120000,USD,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Neural Nets,R,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Partially Derivative Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,20,50,0,0,0,Supervised Machine Learning (Tabular Data),"Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Neural Networks,RNNs,SVMs","IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Random Forests,RNNs,SVMs,Time Series Analysis",,,,,,Often,Often,,,,,,,Often,,Often,Often,,,,,,Often,,Often,,,Often,,Sometimes,,,,20,40,20,20,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Less than 10% of projects,More external than internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,35000000,KRW,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Deep learning,R,"GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,3 to 5 years,"Data Scientist,Researcher",University courses,40,40,10,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,500 to 999 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,Regression/Logistic Regression,"NoSQL,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,Often,,,,,Often,,,Often,Sometimes,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,26400,PLN,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Government website,"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Other,I haven't started working yet",Kaggle competitions,30,10,20,10,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Tableau,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,Not Useful,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,Brazil,22,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",30,50,0,10,0,10,Recommendation Engines,Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,India,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,Very useful,,Somewhat useful,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",< 1 year,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,,I haven't started working yet,Kaggle competitions,NA,NA,NA,NA,NA,NA,"Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Russia,22,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Not Useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",60,25,10,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),Primary/elementary school,Manufacturing,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,Other,Traditional Workstation,"Text data,Relational data",Never,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Text Analytics",,,,,,Rarely,Sometimes,Often,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,65,15,10,5,5,0,Enough to tune the parameters properly,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,IT Department,none,Inventing features,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,33000,RUB,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,Very useful,Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"5,000 to 9,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Mathematica,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,Often,,Rarely,,Sometimes,,,,,Often,Most of the time,,Sometimes,,,Sometimes,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,Most of the time,,,Sometimes,Rarely,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Sometimes,Most of the time,Sometimes,,,,,,,Often,,,,Sometimes,,,Most of the time,,,,,Often,Often,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",Often,Often,,,Most of the time,,,,,Often,,,,Often,,,,,,,Most of the time,,10-25% of projects,More external than internal,IT Department,,"It's not clean, not well organized, need to spend a lot of time understanding what it means","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Git,Subversion",Sometimes,146000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,,University courses,40,10,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,,University courses,30,20,0,0,50,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, and not looking for work",,,,,,,,I don't plan on learning a new tool/technology,,,,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,100,0,Time Series,"Decision Trees - Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,40,5,50,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Logistic Regression",A master's degree,Technology,10 to 19 employees,Decreased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,10GB,"Bayesian Techniques,Regression/Logistic Regression","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Time Series Analysis",,,Sometimes,,,Often,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,15,10,5,30,40,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Often,76-99% of projects,Entirely internal,Standalone Team,radar weather data,Lack of validation data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Other,Google Drive,Git,Rarely,"118,000",USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Canada,27,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Data Analyst,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,Self-taught,40,25,5,NA,30,0,Supervised Machine Learning (Tabular Data),Gradient Boosting,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,30,10,30,29,1,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,KNIME (free version),NoSQL,Python,R,Spark / MLlib,Tableau",,Rarely,,,,,,,,,Often,Often,,,,,Often,,Most of the time,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Often,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Natural Language Processing,Segmentation,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,,,,Sometimes,,,,,,,Often,,,Sometimes,Sometimes,,,,50,15,5,20,10,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Often,,,Often,,,,,Often,,,,Often,Often,,Often,,Often,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,60000,EUR,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Belarus,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,,"Blogs,Online courses,Stack Overflow Q&A,Other",,Somewhat useful,,,,,,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,,Logistic Regression,,CRM/Marketing,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,10GB,"Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Often,,,Rarely,,,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,80,10,0,10,0,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Other",,,Often,,,,,,Sometimes,,,,,,,,,,,,,Often,51-75% of projects,Entirely internal,Other,,Self learning,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,28000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by professional services/consulting firm,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Psychology,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,100GB,"Decision Trees,Random Forests,Other","IBM SPSS Modeler,Python,R,SQL,Unix shell / awk",,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,Often,Sometimes,Sometimes,,,,,,,,,,,,,Sometimes,,Often,,,Sometimes,,,,,,,,50,20,20,5,5,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",,400000,CNY,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,37,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Tableau,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Podcasts,Tutoring/mentoring",,,Very useful,,,,Very useful,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,More than 10 years,"Data Analyst,Other",University courses,40,10,0,50,0,0,,,A doctoral degree,Government,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,,"Microsoft SQL Server Data Mining,R,SAP BusinessObjects Predictive Analytics,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,Sometimes,,,,,,,"Data Visualization,Logistic Regression,Segmentation",,,,,,,Often,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,20,20,20,30,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,Sometimes,,,,Often,Sometimes,,,Sometimes,Often,,Sometimes,,,51-75% of projects,More internal than external,IT Department,None,Accuracy,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,146000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Brazil,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM Watson / Waton Analytics,Association Rules,C/C++/C#,Government website,"Blogs,College/University,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,"Basic laptop (Macbook),Traditional Workstation,Other",0 - 1 hour,Master's degree,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",Work,15,20,60,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Financial,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Always,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,KNIME (free version),Python,R,SQL",Sometimes,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation",Often,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,Most of the time,,Often,,,Often,,,,,,,,30,10,5,25,30,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,Often,Often,,,,Often,,Sometimes,,,Rarely,,,,Often,,,,,76-99% of projects,More internal than external,IT Department,Credit bureaus; scraped bank account transaction data; internal bank' datasets (we are in consulting),"Big Data, scaling algorithms for huge datasets; Expert advice","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Telegram,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,24000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Hungary,35,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Podcasts,YouTube Videos",,,,,,,Very useful,,,,,,Somewhat useful,,,,,Not Useful,Becoming a Data Scientist Podcast,1-2 years,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,"Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,43,Employed full-time,,,Yes,,Data Miner,,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Employed by government",Cloudera,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer",Self-taught,20,0,20,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A professional degree,Other,"5,000 to 9,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Often,,,,,,,,Often,,,,,Sometimes,,Often,,,,,,,,,,,40,30,15,5,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Privacy issues",,,,,Most of the time,Sometimes,,,,,,,,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,Never,,,,7,,,,,,,,,,,,,,,,,, +Female,Portugal,52,Employed full-time,,,Yes,,Other,,Employed by college or university,Hadoop/Hive/Pig,Deep learning,R,University/Non-profit research group websites,Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,Other,Self-taught,30,10,0,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",Primary/elementary school,Academic,100 to 499 employees,Decreased slightly,Less than one year,Some other way,Very important,Other,Basic laptop (Macbook),"Text data,Relational data",Never,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,Microsoft SQL Server Data Mining,R",,,,Sometimes,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Segmentation,SVMs",,Often,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,Often,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Often,,Often,,Most of the time,,,,,,50,10,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,,,,,Often,,,,,Often,,,,,,,,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Never,2000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Brazil,37,"Not employed, and not looking for work",,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Time Series Analysis,R,"I collect my own data (e.g. web-scraping),Other",Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Doctoral degree,Physics,,Researcher,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,43,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Personal Projects,Podcasts",,,,,,,Very useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX,Udacity,Other",Traditional Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,43,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,SAS Enterprise Miner,,,,Friends network,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Miner,Programmer",Self-taught,NA,100,0,0,0,0,Time Series,Other (please specify; separate by semi-colon),A master's degree,Internet-based,10 to 19 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,<1MB,Other,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,0,0,0,0,0,0,,Company politics / Lack of management/financial support for a data science team,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,20,20,15,45,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Mathematica,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,Rarely,,,Most of the time,,Often,,,,,,Rarely,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Simulation,Time Series Analysis",,,,,,Often,Sometimes,Sometimes,Sometimes,,,,,Often,,Often,,,Rarely,Rarely,,,Sometimes,,,,Rarely,,,Sometimes,,,,60,10,10,10,10,NA,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Sometimes,Often,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,Often,,10-25% of projects,More internal than external,Standalone Team,ISO;SNL;US Department of Labor,It is coded inconsistently in different business units.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,,Never,30000,USD,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Udacity,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,53,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,50,10,40,0,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,SQL",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,Time Series Analysis",Often,,,Often,,,Sometimes,Often,Often,,,Often,,Sometimes,Sometimes,Sometimes,,,Often,Often,,,,,,,,,,Sometimes,,,,70,10,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Programmer,Software Developer/Software Engineer",Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,67,Retired,,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,TensorFlow,Genetic & Evolutionary Algorithms,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,Software Developer/Software Engineer,Self-taught,90,10,0,0,0,0,Reinforcement learning,"Logistic Regression,Neural Networks - GANs",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,SAS Enterprise Miner,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects",,,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer,Other",University courses,20,40,0,30,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Important,Other,"Basic laptop (Macbook),Traditional Workstation","Text data,Other",Rarely,,,"Hadoop/Hive/Pig,Java,KNIME (free version),MATLAB/Octave,NoSQL,Python,R,SAS Base,Spark / MLlib",,,,,,,,,Rarely,,,,,,Most of the time,,,,Most of the time,,Rarely,,,,,,Often,,,,Sometimes,,Most of the time,,,,,Rarely,,,Rarely,,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis",Sometimes,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,,,Often,,,Often,,Often,,Often,,,,40,20,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Git,Other",,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Predictive Modeler,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Business Analyst,Predictive Modeler",Self-taught,50,10,40,0,0,0,"Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,"Blogs,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,,,,,Very useful,KDnuggets Blog,< 1 year,,,Necessary,,,,,,,,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,No,Bachelor's degree,Electrical Engineering,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,Very Important,,Very Important,,,,,Very Important,,,Very Important, +Male,Japan,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,R,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","College/University,Company internal community,Online courses,Trade book,YouTube Videos",,,Somewhat useful,Not Useful,,,,,,,Very useful,,,,,Very useful,,Very useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,,"Data Analyst,Data Miner,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,Employed full-time,,,Yes,,Data Miner,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Spark / MLlib,Random Forests,Scala,GitHub,"Blogs,College/University,Kaggle",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Statistician",Work,40,40,20,0,0,0,"Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Mix of fields,"10,000 or more employees",,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Most of the time,1PB,"Bayesian Techniques,Decision Trees,Random Forests","Java,Microsoft Excel Data Mining,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,"Decision Trees,Naive Bayes,Random Forests,Time Series Analysis",,,,,,,,Sometimes,,,,,,,,,,Often,,,,,Most of the time,,,,,,,Most of the time,,,,70,10,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,Most of the time,,,,,,Sometimes,,Sometimes,,,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Git,Rarely,,,Other,8,,,,,,,,,,,,,,,,,, +Male,Brazil,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,"Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,,1-2 years,,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,University courses,25,25,25,25,0,0,Speech Recognition,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,C/C++,Bayesian Methods,Python,Google Search,"Arxiv,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,"Predictive Modeler,Researcher",Self-taught,80,0,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Military/Security,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,10GB,"CNNs,Evolutionary Approaches,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow,Other",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,"CNNs,Cross-Validation,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks",,,,Most of the time,,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,70,25,0,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Most of the time,,,,,,,,,,,,,,100% of projects,More external than internal,Business Department,Imadenet; mscoco; pascal; ,Labelling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","I don't typically share data,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,96000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,51,"Not employed, but looking for work",,,,,,,,R,Factor Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Online courses,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",15+ years,,,,,,,,,,,,,,,Other,11 - 39 hours,Master's degree,Yes,Master's degree,A social science,More than 10 years,Other,Work,30,0,40,30,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,3 to 5 years,Researcher,Self-taught,70,5,15,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Academic,"10,000 or more employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Sometimes,100TB,"Bayesian Techniques,Ensemble Methods,Random Forests,SVMs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,RapidMiner (commercial version),Bayesian Methods,SQL,University/Non-profit research group websites,"Conferences,Newsletters,Online courses,Textbook",,,,,Very useful,,,Very useful,,,Very useful,,,,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Business Analyst,Researcher,Other",University courses,10,10,50,30,0,0,,Logistic Regression,A doctoral degree,Academic,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Relational data,Rarely,1TB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,RapidMiner (commercial version),SQL,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,Often,,,,,,,"Decision Trees,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,,Rarely,,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,Sometimes,,,,65,5,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",Most of the time,Most of the time,,Often,Most of the time,Most of the time,,Most of the time,,,,,,Often,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,Integrated Postsecondary Educational Data System,Data that are collected are not the right data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"83,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,50,5,15,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,Fewer than 10 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and private datacenters,Image data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Mathematica,Deep learning,R,"GitHub,Other","Arxiv,Textbook,Trade book",Somewhat useful,,,,,,,,,,,,,,Very useful,Very useful,,,"FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Other",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Logistic Regression",A professional degree,Insurance,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Bayesian Techniques,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression",,,Often,,,Most of the time,Most of the time,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Sometimes,,,,,,,,Sometimes,,100% of projects,More internal than external,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Share Drive/SharePoint,,"Bitbucket,Git",,,,,9,,,,,,,,,,,,,,,,,, +Male,Colombia,31,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,Google Search,"Blogs,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Somewhat useful,,,Very useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Statistician",University courses,10,10,0,80,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Telecommunications,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,Sometimes,,,,,Often,Often,,,,,,,,Often,,,,,Most of the time,,Most of the time,,,Most of the time,,,Often,,,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,Often,,,,,,,Most of the time,,,,,,Most of the time,,76-99% of projects,More internal than external,Business Department,,access,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,54000000,COP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Python,,Python,"Government website,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,YouTube Videos",,Very useful,,,,,,,,,Very useful,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,"Coursera,DataCamp",Other,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Reinforcement learning,,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,,Somewhat important,Somewhat important +Male,United States,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,Google Search,"College/University,Personal Projects",,,Somewhat useful,,,,,,,,,Very useful,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,,Necessary,,,Nice to have,,,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Machine Learning Engineer,Self-taught,25,20,35,20,0,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Company internal community,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,Somewhat useful,Not Useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,6 to 10 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,Often,,Often,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,Often,Rarely,Sometimes,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,Often,Often,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,Sometimes,Sometimes,Sometimes,Often,Often,,Sometimes,Most of the time,,,,40,15,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,Sometimes,Rarely,Sometimes,,Sometimes,Sometimes,,,Rarely,Sometimes,Sometimes,Often,,,Rarely,Often,Sometimes,,,100% of projects,More internal than external,Standalone Team,Eventful; RWunderground;,Ensuring accuracy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,115000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,Mexico,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Microsoft Excel Data Mining,Monte Carlo Methods,R,"Google Search,Government website","Conferences,Kaggle,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Researcher,Work,10,20,10,40,20,0,"Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A bachelor's degree,Academic,Fewer than 10 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Image data,Most of the time,10MB,"Bayesian Techniques,Markov Logic Networks","C/C++,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,R,SAS Base",,,,Sometimes,,,,,,,,,,,,Most of the time,Sometimes,,,Sometimes,Often,,Most of the time,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Simulation,Time Series Analysis",,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,,15,45,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,Often,,,,,Sometimes,,,,Often,,,Sometimes,,,,,Most of the time,,51-75% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,47800,MXN,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Germany,NA,Employed full-time,,,No,Yes,Predictive Modeler,Poorly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Cluster Analysis,R,Other,"College/University,Company internal community,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,,Very useful,Very useful,KDnuggets Blog,3-5 years,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,,"Business Analyst,Computer Scientist,Data Analyst,Predictive Modeler,Software Developer/Software Engineer",University courses,20,0,40,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Iran,30,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher",Self-taught,20,40,40,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Other,10 to 19 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,< 1 year,,,,,,,,,,,,,,"Coursera,edX","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,,No,I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,Canada,53,"Not employed, but looking for work",,,,,,,,Other,I don't plan on learning a new ML/DS method,Python,Google Search,"Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Researcher,Other",Self-taught,40,20,40,0,0,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,Brazil,42,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,Statistician,Work,50,10,40,0,0,0,Other (please specify; separate by semi-colon),Bayesian Techniques,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Software Developer/Software Engineer",University courses,40,30,20,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,10GB,,"Jupyter notebooks,Python,SQL,Tableau,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Often,,,,Often,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,35,20,15,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Privacy issues,Unavailability of/difficult access to data",Often,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,51-75% of projects,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,Always,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Friends network,Kaggle,Personal Projects,YouTube Videos",Very useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,Very useful,"Partially Derivative Podcast,Talking Machines Podcast,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Yes,Master's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,20,20,0,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Business Analyst,University courses,0,0,0,100,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,,"Amazon Web services,Python",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Other",Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,70,0,30,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",60,20,10,8,2,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Stack Overflow Q&A",Somewhat useful,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,Less than a year,Other,University courses,0,20,30,50,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,1MB,,"Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SAS Enterprise Miner",,,,,,,,,,,,,,,,,Often,,Often,,Rarely,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,Often,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,80,0,0,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,,,,Often,,,,Most of the time,,Sometimes,Most of the time,,,,Sometimes,Often,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,20000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Japan,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Web services,Time Series Analysis,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Time Series,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,France,45,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by college or university,Employed by non-profit or NGO,Employed by government",NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Company internal community,Conferences,Official documentation,Personal Projects",,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,Less than a year,"DBA/Database Engineer,Software Developer/Software Engineer",Work,25,0,75,0,0,0,"Natural Language Processing,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Most of the time,100GB,,"C/C++,Minitab,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Python,QlikView,R,SAS Enterprise Miner,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,Rarely,Often,,Rarely,Sometimes,Sometimes,,,,,,Rarely,,,Often,,,,,,Sometimes,,,,"Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Prescriptive Modeling,Simulation,Time Series Analysis",,Often,,,,,Often,Sometimes,Often,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,Sometimes,,,,30,30,10,30,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,,Often,,,Most of the time,,,Often,,,,,Often,,,,Most of the time,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,"FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,0,20,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks",High school,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,25,Employed part-time,,,No,Yes,Computer Scientist,Fine,Employed by non-profit or NGO,Amazon Web services,Deep learning,Python,I collect my own data (e.g. web-scraping),"Official documentation,Personal Projects,Stack Overflow Q&A",,,,,,,,,,Somewhat useful,,Very useful,,Very useful,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,22,15,10,53,0,0,,,A doctoral degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,,,,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,,Work,35,5,60,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,10 to 19 employees,Stayed the same,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,IBM SPSS Statistics,Java,MATLAB/Octave,Minitab,NoSQL,R,SQL,Other",,Sometimes,,Sometimes,,,,,,,,Often,,,Sometimes,,,,,,Sometimes,,,,,Rarely,Rarely,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,Most of the time,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,,Sometimes,,,Most of the time,Most of the time,Often,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,Sometimes,Rarely,Rarely,Most of the time,,Rarely,,,,,Sometimes,Most of the time,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Sometimes,,,Rarely,,Often,,,,,,Often,,,Often,,Often,,100% of projects,More internal than external,Central Insights Team,Market Psych; IQfeed,Preparation of data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Other,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Spark / MLlib,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,,,,,,,,,,Very Important,,,,, +Female,Other,27,"Not employed, but looking for work",,,,,,,,RapidMiner (free version),Association Rules,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Podcasts,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Female,United States,22,"Not employed, but looking for work",,,,,,,,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,3 to 5 years,I haven't started working yet,University courses,50,0,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Other,25,10,0,0,25,40,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,70,20,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Python,R,SQL,Tableau,Unix shell / awk",Rarely,Often,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,Most of the time,,,Sometimes,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",Most of the time,,,,,,Most of the time,Sometimes,,,,,,,Often,Often,,,,,,,Sometimes,,,Often,,,,Sometimes,,,,5,10,10,25,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),,,,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Israel,31,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,,,,Very useful,Very useful,Somewhat useful,Very useful,,,Very useful,Very useful,"No Free Hunch Blog,Talking Machines Podcast",5-10 years,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Yes,Master's degree,Electrical Engineering,,"Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important +Female,United States,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Kaggle competitions,30,30,0,20,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,42,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,Very useful,,,,Very useful,,,,Very useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,Coursera,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",40+,Master's degree,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,38,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Cluster Analysis,SQL,Other,"Kaggle,Personal Projects,Other",,,,,,,Very useful,,,,,Very useful,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Data Analyst,DBA/Database Engineer,Other",Self-taught,100,0,0,0,0,0,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon),No education,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Brazil,45,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle,Tutoring/mentoring",,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Data Analyst,Statistician",University courses,0,0,70,30,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Insurance,500 to 999 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,"Decision Trees,Regression/Logistic Regression","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression",,,,,,Often,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,,,50,10,20,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools",Sometimes,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"100,000",BRL,Has decreased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,Mexico,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Anomaly Detection,C/C++/C#,"GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,Self-taught,20,0,40,40,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1GB,Random Forests,"C/C++,Java,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,"Decision Trees,Logistic Regression",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,65,5,10,15,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,,,Often,,,,Most of the time,,,,Often,,10-25% of projects,Do not know,Business Department,,"determine how complex the retrieved data will will need to be analyzed, finally how should I expose it","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Subversion,,380000,MXN,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed part-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,15,25,10,10,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Hospitality/Entertainment/Sports,10 to 19 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,,10GB,"CNNs,Neural Networks","C/C++,MATLAB/Octave,Python,Other",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,Segmentation",,,,Most of the time,,Sometimes,Often,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,30,30,10,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,Often,,Most of the time,,,,,Often,Sometimes,,,76-99% of projects,More external than internal,IT Department,MPII;MSCOCO Keypoints,"Data transfer and processing, due to low specs hardware.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,18000,BRL,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Brazil,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,Somewhat useful,Not Useful,,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",5-10 years,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,"Coursera,edX,Udacity",GPU accelerated Workstation,2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer",University courses,30,30,0,30,10,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle",Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Necessary,Necessary,,Necessary,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,30,0,0,70,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Conferences,Friends network,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Very useful,Very useful,Very useful,Very useful,,,Very useful,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Miner,Researcher",University courses,10,10,0,70,0,10,Time Series,"Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,"Bayesian Techniques,HMMs,Random Forests,Regression/Logistic Regression","Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,Often,Sometimes,,,,35,5,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",Often,Most of the time,,,Often,,,Often,,,,,,Often,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,Moody's,Getting fields into the right format.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,70000,USD,,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Programmer,University courses,30,20,0,20,30,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Pharmaceutical,I don't know,Stayed the same,Don't know,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Kaggle,Non-Kaggle online communities",,,,,Very useful,,Very useful,,Very useful,,,,,,,,,,,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Bayesian Methods,Python,"GitHub,Google Search,Government website","Arxiv,Blogs,College/University,Conferences,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,Most of the time,,,,Often,,,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",Sometimes,Sometimes,,Sometimes,Often,Most of the time,Most of the time,Sometimes,Often,,,Often,,Often,Sometimes,Sometimes,,Sometimes,Often,Sometimes,Often,,Often,Often,Sometimes,,,Often,Often,,,,,50,10,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,150000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Norway,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,More than 10 years,Software Developer/Software Engineer,Self-taught,80,0,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,36,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",20,35,5,30,10,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,"1,000 to 4,999 employees",Decreased slightly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Sometimes,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Orange,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,Rarely,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Time Series Analysis",,Rarely,,,,Sometimes,Most of the time,Often,,,,,,Sometimes,,Often,,Sometimes,,Sometimes,,,Often,,,,,,,Most of the time,,,,40,10,10,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Organization is small and cannot afford a data science team",Often,Sometimes,,,Most of the time,Often,,,Most of the time,,,,,,Most of the time,Sometimes,,,,,,,76-99% of projects,More internal than external,Standalone Team,"INEGI, COFETEL",incomplete data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,300000,MXN,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,IBM SPSS Statistics,Time Series Analysis,Python,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,Somewhat useful,"Data Stories Podcast,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,3 to 5 years,"Data Scientist,Researcher",University courses,25,50,15,10,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics,Time Series Analysis",,Sometimes,Often,,,,Often,Sometimes,,,,,,,,Sometimes,,Rarely,Often,,,,,,,,,,Most of the time,Often,,,,70,10,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,Most of the time,,100% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"300,000",USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,R,"Google Search,University/Non-profit research group websites","College/University,Official documentation,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Business Analyst,Self-taught,20,10,40,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,University/Non-profit research group websites,"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,50,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Work,60,0,40,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,32,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,1 to 2 years,Other,Other,NA,10,0,0,0,90,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Most of the time,100MB,Regression/Logistic Regression,"Amazon Web services,Python,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,75,0,10,0,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Rarely,"100,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Fine,,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Other",Work,10,30,30,30,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,I prefer not to answer,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,NoSQL,Perl,Python,Spark / MLlib,SQL,Unix shell / awk",,Often,,,,,,,Often,,,,,,Often,,Most of the time,,Most of the time,,,,Most of the time,,,,Most of the time,,,Often,Most of the time,,,,,,,,,,Often,Often,,,,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics",Often,Rarely,Sometimes,,Sometimes,Often,Often,Sometimes,Often,,,Sometimes,,Sometimes,,Often,,Sometimes,Sometimes,Rarely,Sometimes,Rarely,Rarely,,,,,Rarely,Often,,,,,35,25,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Poland,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist",Work,60,18,10,10,2,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",,Manufacturing,100 to 499 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,TensorFlow",,,,,,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,,,,,Sometimes,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis,Other",Sometimes,Often,Sometimes,Sometimes,Sometimes,Most of the time,Often,Sometimes,Most of the time,,,Often,,Often,Sometimes,Often,,,,Sometimes,Often,,Often,,,Sometimes,,,,Sometimes,Sometimes,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,,I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Computer Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,50,0,40,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Manufacturing,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Most of the time,1GB,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk,Other",,Often,,Sometimes,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,Often,,,,,Sometimes,,Most of the time,Most of the time,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks,Recommender Systems,Simulation,SVMs",Sometimes,,Sometimes,,Often,,,,Sometimes,Most of the time,,,,,,Sometimes,,,,Rarely,,,,Often,,,Most of the time,Sometimes,,,,,,30,40,20,10,0,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,Sometimes,,,Sometimes,,Often,,,,,,,,Sometimes,,,,26-50% of projects,More internal than external,Other,,One-off logic/relationships,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Always,220000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Netherlands,52,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by non-profit or NGO,Python,Deep learning,R,I collect my own data (e.g. web-scraping),"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,1 to 2 years,Other,Self-taught,70,0,10,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Non-profit,"1,000 to 4,999 employees",Stayed the same,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Random Forests,Other",,,Often,,,Often,,,,,,,,Often,,,,,,,,,Often,,,,,,,,Often,,,30,10,10,30,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Most of the time,,,,Sometimes,,Sometimes,,Most of the time,,Often,,Sometimes,,51-75% of projects,More internal than external,Other,regional data,privacy issues,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,55000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Text Mining,Python,"Google Search,Government website","Blogs,College/University,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Business Analyst,University courses,40,40,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,10 to 19 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,100MB,"CNNs,Neural Networks,RNNs","Jupyter notebooks,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Natural Language Processing,Neural Networks,RNNs",,,,Often,,,,,,,,,,,,,,,Often,Often,,,,,Often,,,,,,,,,20,10,60,0,10,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Most of the time,,,,Often,,,,,Often,,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Never,650000,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,KNIME (free version),Deep learning,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A",,Very useful,Very useful,,,,Very useful,,,Very useful,,Very useful,Very useful,Very useful,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,CRM/Marketing,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Most of the time,1GB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","IBM Watson / Waton Analytics,KNIME (free version),Microsoft Azure Machine Learning,R,TensorFlow",,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,,,Often,,Often,Often,,Often,,,,,Often,,,,,Often,Often,Often,Often,,Often,Often,Often,Often,,Often,Often,,,,0,50,30,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Need to coordinate with IT",Often,,,,Most of the time,,,,,,Sometimes,,,,Often,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Subversion",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,,"Blogs,Kaggle,Stack Overflow Q&A,Trade book,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,Very useful,,Very useful,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer",Self-taught,70,30,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods",A doctoral degree,Manufacturing,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks,Python,R,Tableau",,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Random Forests",,,,,,Sometimes,Often,,Often,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,85,5,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,Cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Email,Share Drive/SharePoint",,,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A social science,6 to 10 years,"Data Analyst,Predictive Modeler,Researcher,Statistician",University courses,50,20,20,10,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects",Very useful,Somewhat useful,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,80,5,0,10,5,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,10 to 19 employees,Decreased slightly,,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,,,"C/C++,NoSQL,Python,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Natural Language Processing,Recommender Systems,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,Often,,,,,Often,,,,,40,35,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Often,,,,,,,Most of the time,,,,,,,None,Entirely external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,50000,BDT,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,51,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by government",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,Very useful,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Government,500 to 999 employees,Decreased significantly,More than 10 years,Some other way,Very important,Other,Traditional Workstation,"Image data,Relational data",Sometimes,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,30,20,0,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others",Most of the time,Often,,,,Often,,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Other,research originated datasets,getting the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other","pendrive, Google Drive, Dropbox","Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,200000,BRL,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Denmark,28,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Kaggle,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,Business Analyst,University courses,0,40,0,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Not Useful,,Not Useful,,Not Useful,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Sometimes,Often,,,Rarely,,,,Rarely,,,,,Rarely,,,Sometimes,,,,Rarely,,,,,,Most of the time,,,,Often,,,,,,,,,,Often,Often,,,Rarely,Sometimes,,Often,,,,"A/B Testing,Cross-Validation,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,,,,,Most of the time,,Rarely,,,,,,,,Often,,,,,Often,,Sometimes,Rarely,,,,Sometimes,Sometimes,Sometimes,,,,40,10,10,10,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,mongodb nosql data store,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,125000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",SQL,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Official documentation,Stack Overflow Q&A",Very useful,Very useful,,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,,,,"DataTau News Aggregator,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,20,20,10,40,10,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Other",Sometimes,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,SVMs","C/C++,Google Cloud Compute,Python,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,SVMs,Time Series Analysis",,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,Often,,,,,Most of the time,Most of the time,,,,Sometimes,,Most of the time,,,Often,Often,,,26-50% of projects,More internal than external,IT Department,Sentinel2 satellite imagery; Landsat satellite imagery; NexRad weather data;,"data scraping, maintaining databases of millions of rows, complicated SQL queries","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,140000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Amazon Machine Learning,Time Series Analysis,R,GitHub,Podcasts,,,,,,,,,,,,,Very useful,,,,,,FlowingData Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Predictive Modeler,Researcher",Kaggle competitions,10,30,20,10,20,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Text data,Relational data",Sometimes,1MB,"Bayesian Techniques,Decision Trees,Neural Networks","Hadoop/Hive/Pig,IBM Cognos,R,RapidMiner (commercial version)",,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests",,Often,Often,,,Often,,Often,,,,,,Often,,,,Often,,Often,,,Sometimes,,,,,,,,,,,30,20,10,20,20,0,Enough to tune the parameters properly,"Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database",,,,,,Sometimes,,,Often,,,,,,Most of the time,,,Most of the time,,,,,51-75% of projects,More internal than external,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,480000,INR,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Google Cloud Compute,Time Series Analysis,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,Very useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,15,15,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data",Always,10GB,"CNNs,HMMs,Neural Networks,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs",,,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,Sometimes,Often,,Sometimes,,,,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Sometimes,,,,,,25,20,20,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,Often,,Often,,,,,,,,,,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,30000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Google Cloud Compute,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Newsletters,Official documentation",Somewhat useful,,,,Somewhat useful,,Very useful,Very useful,,Somewhat useful,,,,,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,40,10,40,0,5,5,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Academic,20 to 99 employees,,,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Rarely,1GB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Ensemble Methods,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",Sometimes,,,,Most of the time,,Often,,Sometimes,,,,,,,,,,Most of the time,Often,Most of the time,,,Most of the time,,,,,,,,,,35,15,10,30,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Dirty data,Privacy issues",,,,Often,Often,,,,,,,,,,,,Sometimes,,,,,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Always,35000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,18,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,80,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Software Developer/Software Engineer,Statistician",Work,0,0,100,0,0,0,,,A master's degree,CRM/Marketing,,,,,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,,,,Other,C/C++,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,20,50,0,0,30,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,None,Entirely internal,IT Department,,,,,,,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,"Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Newsletters,Non-Kaggle online communities,Online courses,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Somewhat useful,,,Very useful,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,"DataCamp,edX,Udacity,Other",Other,2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,,I don't write code to analyze data,"Business Analyst,Researcher,Other",Self-taught,50,50,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,23,Employed full-time,,,Yes,,Engineer,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",40,50,0,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,,"Data Elixir Newsletter,FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Programmer,Software Developer/Software Engineer,Other",University courses,0,10,10,60,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Government,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,Sometimes,Often,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,Often,,,,Most of the time,,,Often,,,Often,,Rarely,,,,,Often,Sometimes,,,Often,,,,,,Often,,,,,25,50,5,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,,,,Often,Most of the time,,,,,Sometimes,,,Sometimes,,Most of the time,Often,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,77500,USD,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,Mexico,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Proprietary Algorithms,C/C++/C#,University/Non-profit research group websites,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Other,Work,30,5,60,4,1,0,Other (please specify; separate by semi-colon),Logistic Regression,Primary/elementary school,Technology,"5,000 to 9,999 employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Always,1TB,Regression/Logistic Regression,C/C++,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,20,30,20,10,20,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,"R dasites, ML datsites",Format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Other,Sometimes,75000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,TensorFlow,,Python,Google Search,"Online courses,Personal Projects,Textbook",,,,,,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,< 1 year,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Anomaly Detection,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,0,20,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",A professional degree,Government,"10,000 or more employees",Decreased slightly,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Other","Relational data,Other",Sometimes,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,Python,R,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,Most of the time,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks,Random Forests,RNNs,SVMs",,,,,,Most of the time,,Most of the time,Often,Most of the time,,,,,,Often,,,,Often,,,Most of the time,,Often,,,Often,,,,,,45,25,5,10,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,26-50% of projects,More external than internal,Other,"Climate data, general geographical data","Complexity problems with few data by class, moreover imprecise and incomplete records","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Other,,Git,Sometimes,13200,BRL,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,Switzerland,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Survival Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,,,,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",20,40,0,30,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"5,000 to 9,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,R,SQL,Tableau",,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,Most of the time,,,Rarely,Sometimes,,,,,,Most of the time,Sometimes,,,,30,20,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,,,Often,,,,,,Sometimes,,,Often,,,Often,,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,158000,CHF,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Online courses",,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,I prefer not to answer,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Egypt,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Researcher",Work,40,40,10,0,10,0,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Stayed the same,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Sometimes,10GB,"CNNs,Ensemble Methods,Random Forests,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Random Forests,Segmentation,SVMs",,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,Most of the time,,Sometimes,,,,,,20,20,20,10,30,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Never,70000,EGP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Canada,46,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,35,Employed part-time,,,No,Yes,Researcher,Fine,Employed by professional services/consulting firm,Python,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook",,Very useful,,,,,Very useful,,,,,,,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,Less than a year,"Business Analyst,Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",35,45,0,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",High school,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,Less than a year,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Self-taught,70,20,10,0,0,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Financial,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Engineer,Researcher",Self-taught,80,0,20,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning",,High school,Technology,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,<1MB,,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,30,30,0,20,20,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,22,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by company that makes advanced analytic software,Microsoft Azure Machine Learning,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Podcasts,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,,,,,Somewhat useful,,,,,Somewhat useful,"Data Elixir Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Data Miner,Engineer","Online courses (coursera, udemy, edx, etc.)",10,25,5,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,100 to 499 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Decision Trees,Random Forests","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Decision Trees,Random Forests",,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,70,10,4,10,6,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Privacy issues",Sometimes,,Often,,,,,,,,,,,,,,Sometimes,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Most of the time,35000,EUR,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Male,,NA,Employed part-time,,,Yes,,Data Analyst,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,Data Analyst,Work,20,20,20,20,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,10 to 19 employees,Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Text data,Other",Sometimes,1GB,"Bayesian Techniques,CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,MATLAB/Octave,R",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"HMMs,Naive Bayes,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Often,,Sometimes,,Most of the time,Most of the time,,Most of the time,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,40,5,55,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Google Search,"Friends network,Kaggle,Online courses,Personal Projects",,,,,,Very useful,Very useful,,,,Very useful,Very useful,,,,,,,"No Free Hunch Blog,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,30,30,5,25,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,France,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Statistician",University courses,40,0,0,20,40,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A master's degree,Government,Fewer than 10 employees,Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Other,Laptop or Workstation and private datacenters,,Never,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression","C/C++,Python,R,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,,Sometimes,,,Often,Often,Often,Rarely,,,Often,Sometimes,Sometimes,,Most of the time,,,,,Often,,Rarely,,,,Often,,,Rarely,,,,30,40,0,20,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by college or university,IBM Watson / Waton Analytics,Monte Carlo Methods,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer",University courses,25,30,30,10,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,10 to 19 employees,Stayed the same,6-10 years,Some other way,Important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Most of the time,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",Sometimes,,Often,,,Most of the time,Most of the time,Often,,,,,,,,,,Often,,,Often,,Sometimes,,,Most of the time,Often,,,,,,,15,50,10,25,0,0,"Enough to code it again from scratch, albeit it may run slowly","I prefer not to say,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Canada,29,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,71,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Not Useful,,,Very useful,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,45,0,0,5,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,"Not employed, but looking for work",,,,,,,,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Other,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Other",,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,More than 10 years,"Data Analyst,DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,10,15,5,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,100 to 499 employees,Increased significantly,1-2 years,A tech-specific job board,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Rarely,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Random Forests,Segmentation,SVMs,Text Analytics",Often,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,Sometimes,Sometimes,,,,,40,10,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",shared folders,Git,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,United Kingdom,32,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Master's degree,A humanities discipline,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Egypt,49,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Spark / MLlib,Deep learning,R,Google Search,"Conferences,Kaggle,Online courses,Personal Projects,Textbook",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,Machine Learning Engineer,University courses,50,0,20,30,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Academic,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Java,MATLAB/Octave,Python,R,Spark / MLlib",,,,Often,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Often,,Often,Often,Most of the time,,,,,,,,,,Often,Often,,Often,,,,,Often,,Most of the time,,,,20,30,20,20,10,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,,,,,,Often,,,Often,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Never,6000,EGP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Russia,50,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Computer Science,More than 10 years,DBA/Database Engineer,Work,60,20,20,0,0,0,,,A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,90,5,1,4,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Most of the time,,76-99% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Git,Most of the time,,,,6,,,,,,,,,,,,,,,,,, +Female,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,I collect my own data (e.g. web-scraping),"Company internal community,Friends network,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,20,40,20,0,10,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Relational data,Other",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,Python,SQL",,,,Sometimes,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Random Forests,Time Series Analysis",,,Sometimes,,,Often,Often,Often,Most of the time,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,Most of the time,,,,40,30,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,Sometimes,,Often,,Most of the time,Often,,,100% of projects,More internal than external,IT Department,,"the size of data, quality control and storage","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Company Developed Platform,,"Git,Subversion",Rarely,120000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Data Scientist,Machine Learning Engineer,Researcher",Self-taught,30,20,30,0,20,0,Computer Vision,"Bayesian Techniques,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by college or university,TensorFlow,Deep learning,R,Google Search,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,25,25,0,25,25,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,Primary/elementary school,Non-profit,,,,,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Sometimes,10GB,Random Forests,"Java,Jupyter notebooks,Python,R,Unix shell / awk",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Most of the time,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools",Often,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,University; public,cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Bitbucket,Sometimes,,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,United States,31,"Not employed, but looking for work",,,,,,,,Python,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,Very useful,,,Very useful,,,Very useful,,"KDnuggets Blog,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Statistician",University courses,20,0,40,30,10,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Russia,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Master's degree,No,Master's degree,A social science,3 to 5 years,Business Analyst,Self-taught,40,20,0,40,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Female,United States,43,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,NA,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,University/Non-profit research group websites,Kaggle,,,,,,,Very useful,,,,,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,20,20,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Often,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests",Often,,,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,50,10,5,5,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,125000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Other",Self-taught,30,30,30,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,Fewer than 10 employees,Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Relational data",Most of the time,10TB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Tableau,TensorFlow",Rarely,,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,Sometimes,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,Random Forests,Segmentation",Sometimes,Sometimes,,,,Most of the time,Most of the time,,Often,,,,,Most of the time,Often,Most of the time,,,,Often,,,Most of the time,,,Most of the time,,,,,,,,60,20,10,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,Sometimes,Sometimes,,Most of the time,,,,,,,,,,Often,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,65,Retired,,,Yes,,Scientist/Researcher,Fine,Self-employed,Stan,Bayesian Methods,R,I collect my own data (e.g. web-scraping),"Arxiv,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,,"No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Other,University courses,90,10,0,0,0,0,,"Decision Trees - Random Forests,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,30,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Not Useful,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Researcher,Software Developer/Software Engineer",University courses,50,0,50,0,0,0,"Computer Vision,Machine Translation,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,,10GB,RNNs,"C/C++,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,Often,,,,"A/B Testing,Natural Language Processing,Neural Networks,RNNs",Rarely,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,Most of the time,,,,,,,,,0,34,66,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,None,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Subversion,Sometimes,,EUR,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,70,20,0,10,0,Time Series,,A bachelor's degree,CRM/Marketing,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Don't know,10GB,Other,"Spark / MLlib,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Association Rules,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,100,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,None,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Never,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Microsoft Excel Data Mining,,Python,Other,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,< 1 year,,,Necessary,,,,,,,,,,,,Basic laptop (Macbook),,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,,Programmer,Work,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,43,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Other",Other,10,0,0,0,0,90,Reinforcement learning,Decision Trees - Random Forests,I prefer not to answer,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important +Male,Australia,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Stan,Bayesian Methods,,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Friends network,Personal Projects,Podcasts,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Scientist,DBA/Database Engineer,Programmer",Work,20,20,60,0,0,0,Time Series,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100TB,"Decision Trees,Ensemble Methods,Random Forests","Cloudera,Jupyter notebooks,NoSQL,Python,R,SAS Base,Spark / MLlib,SQL,Unix shell / awk",,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,Rarely,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Rarely,Often,,,,Most of the time,Most of the time,Most of the time,,Sometimes,,,,,,Most of the time,,,Sometimes,,Sometimes,,Most of the time,,,Rarely,Often,,,Often,,,,50,5,20,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,Often,,,,,,,,,,,,,,,Often,Most of the time,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Rarely,250000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Textbook,YouTube Videos",,Very useful,,,,,,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,,Nice to have,Necessary,,Nice to have,Nice to have,,Nice to have,Nice to have,,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,29,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,Very useful,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,"Linear Digressions Podcast,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,6 to 10 years,"Computer Scientist,Data Analyst,Operations Research Practitioner,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,50,20,20,10,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Other,Most of the time,100GB,"Bayesian Techniques,HMMs,Regression/Logistic Regression,Other","C/C++,MATLAB/Octave,NoSQL,Python,R,SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics,Time Series Analysis",Most of the time,,Often,,,Most of the time,Most of the time,,,,,,Sometimes,,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,,,,,Sometimes,Most of the time,,,,40,30,5,15,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,Sometimes,,,,,,,,,,Often,Most of the time,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Rarely,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,40,30,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Other,500 to 999 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests","Julia,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Stan",,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Often,Sometimes,Sometimes,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests",,,Often,,,Often,Often,Sometimes,Sometimes,,,,,Often,,Most of the time,,,,,,,Often,,,,,,,,,,,50,30,0,20,0,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,Census; voter files; exit polls; vote reports ,Voter files are a bit of a mess and inconsistent ,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Dropbox ,Other,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,Coursera,"Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,PhD,Sort of (Explain more),I did not complete any formal education past high school,,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Germany,24,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,College/University,Kaggle,Newsletters,Official documentation,Personal Projects,Stack Overflow Q&A",Somewhat useful,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,I haven't started working yet,University courses,40,0,10,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,"10,000 or more employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data",Most of the time,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,Random Forests",,,,Sometimes,,Most of the time,Most of the time,Sometimes,,,,,,Sometimes,,,,,,Often,,,Often,,,,,,,,,,,40,30,20,8,2,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Sometimes,,,,,,,Sometimes,,,Often,,,Sometimes,,Sometimes,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Other",Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Factor Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,3-5 years,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,PhD,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Operations Research Practitioner,Statistician","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important +Female,Russia,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,20,30,40,10,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Text data,Relational data",Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Java,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,TensorFlow",Rarely,Often,,Rarely,,,,,,,,,,,Rarely,,,,,,Most of the time,Sometimes,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,Rarely,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",Sometimes,Sometimes,,,Sometimes,Most of the time,Most of the time,,Often,,,,,Often,,Often,,,,,Often,,Sometimes,,,Often,Most of the time,Often,,,,,,0,30,30,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Scaling data science solution up to full database",,,,,,,,,,,Often,,,,,,,Often,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,59,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Physics,More than 10 years,"Computer Scientist,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,99,0,0,0,1,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Financial,,,,,"N/A, I did not receive any formal education",Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Text data,Relational data,Other",Always,100GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Neural Networks,RNNs","Amazon Web services,C/C++,Hadoop/Hive/Pig,MATLAB/Octave,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,Most of the time,,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,"Evolutionary Approaches,Simulation",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,5,90,1,4,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,,,,,,,,,,,,,,,,76-99% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,70,5,0,0,0,,,A bachelor's degree,Telecommunications,"10,000 or more employees",Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Never,10GB,Regression/Logistic Regression,"Java,R",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Newsletters,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,Not Useful,,,Very useful,Somewhat useful,,,,,Somewhat useful,,Very useful,,,Very useful,"KDnuggets Blog,Linear Digressions Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,10,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"1,000 to 4,999 employees",Increased significantly,6-10 years,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Often,,,,60,5,0,25,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization",Often,,,Sometimes,Most of the time,,,,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,Macro economics,Obtain it,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Subversion,Sometimes,60000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",5,40,5,20,30,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,TensorFlow",,Most of the time,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests",,,,Sometimes,,Often,,,Often,,,Most of the time,,,,Often,,Often,,,,,Often,,,,,,,,,,,70,15,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,Often,,Most of the time,,,,,,Most of the time,,Most of the time,,,Less than 10% of projects,More internal than external,Standalone Team,US Census Bureau,"Not enough data point, especially when many features are needed.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,"120,000",USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,Software Developer/Software Engineer,Work,20,0,80,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Biology,1 to 2 years,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,3 to 5 years,"Business Analyst,DBA/Database Engineer,Other",Self-taught,35,30,25,10,0,0,,Logistic Regression,High school,Technology,100 to 499 employees,Increased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Time Series Analysis,Python,Google Search,"College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,DBA/Database Engineer,University courses,20,0,60,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,Rarely,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Rarely,,,,,Most of the time,Often,Often,Often,,,Often,,,Often,Often,,,,,Sometimes,,Sometimes,,,Often,,Rarely,Rarely,,,,,50,13,30,2,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects",Often,Often,Sometimes,Sometimes,Most of the time,,,Sometimes,Sometimes,,Sometimes,,,Sometimes,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,110000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Canada,49,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Deep learning,R,Google Search,"Blogs,Online courses,Stack Overflow Q&A",,Very useful,,,,,,,,,Somewhat useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,Fewer than 10 employees,Stayed the same,3-5 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Other,Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","C/C++,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Random Forests,Segmentation,Time Series Analysis",,,,,,Sometimes,Most of the time,Most of the time,Sometimes,,,Often,,Sometimes,,,,,,,,,Most of the time,,,Often,,,,Often,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues",,,,,,Sometimes,,,,Often,Sometimes,Often,,,,Most of the time,Sometimes,,,,,,100% of projects,Approximately half internal and half external,Other,"yahoo finance data, FRED data",quality issues,Other,"Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,125000,CAD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,DataRobot,Time Series Analysis,Python,"GitHub,University/Non-profit research group websites","Friends network,Personal Projects,Podcasts,YouTube Videos",,,,,,Very useful,,,,,,Very useful,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,40,20,0,20,0,20,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Retail,"10,000 or more employees",Stayed the same,1-2 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Video data,Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Markov Logic Networks","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,Unix shell / awk",,,,,,,,,Often,,,,,,Often,,Often,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Often,,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Logistic Regression,Naive Bayes,Segmentation",,Often,Often,,,,,,,,,,,,,Often,,Often,,,,,,,,Often,,,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",Often,,,,Often,Often,,,Often,,Often,Often,,,,,,,,,,,Less than 10% of projects,More external than internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Never,190000,USD,,7,,,,,,,,,,,,,,,,,, +Male,Other,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel,Talking Machines Podcast",< 1 year,Necessary,,,,Necessary,,Necessary,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,PhD,No,Master's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,University courses,0,0,0,100,0,0,Computer Vision,Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,,Very Important,,,,,,,,,,,, +Male,Japan,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Association Rules,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses",Very useful,Somewhat useful,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Other,More than 10 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,95,0,5,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,1TB,Gradient Boosted Machines,"Cloudera,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,,Most of the time,,,,Most of the time,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,Naive Bayes,PCA and Dimensionality Reduction",,,Sometimes,,,Most of the time,Most of the time,,,,,Most of the time,,,,,,Sometimes,,,Often,,,,,,,,,,,,,70,10,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Sometimes,,,,Often,,,,,,,Most of the time,Sometimes,,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Bitbucket,Never,7000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,Australia,55,Employed full-time,,,Yes,,Other,Fine,Employed by government,IBM Cognos,Monte Carlo Methods,C/C++/C#,Google Search,"Kaggle,Official documentation,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,DBA/Database Engineer,Self-taught,90,0,5,5,0,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Rarely,100MB,Regression/Logistic Regression,"C/C++,IBM Cognos,Microsoft Excel Data Mining,SQL",,,,Rarely,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,0,20,80,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Most of the time,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,N/A,Lack of dimensional data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,140000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,New Zealand,40,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Random Forests,R,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"DBA/Database Engineer,Software Developer/Software Engineer,Other",University courses,0,7,20,70,3,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,India,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,Online courses,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Deep learning,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Don't know,10MB,"Evolutionary Approaches,Neural Networks","C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Perl,Python,R,Unix shell / awk,Other",,,,Often,,,,,,,,,,,,,Often,,,Sometimes,Sometimes,,,,,,,,,Sometimes,Most of the time,,Sometimes,,,,,,,,,,,,,,,Most of the time,Often,,,"Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,,,Sometimes,Often,,,Often,,,,Sometimes,,,,,,Often,Often,Often,,,,,,,,Often,,,,45,30,0,10,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Limitations of tools,Unavailability of/difficult access to data",Often,Often,Often,,,,,,,,,,Often,,,,,,,,Often,,10-25% of projects,More external than internal,IT Department,ml free repositories,building general models ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Git,Rarely,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Other",Self-taught,15,10,50,20,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,I don't know,,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Other",University courses,50,0,30,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,46,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,Employed by government,R,Monte Carlo Methods,SQL,Government website,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,,Self-taught,100,0,0,0,0,0,,,High school,Government,"5,000 to 9,999 employees",Stayed the same,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,Bayesian Techniques,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,SAP BusinessObjects Predictive Analytics,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,Often,Rarely,,,,,,,,,,,Sometimes,Often,,,,Often,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,20,0,0,20,60,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Often,Most of the time,,,,Often,,,,,,,Most of the time,,,26-50% of projects,Do not know,,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Data Miner,Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,50,39,0,1,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Python,Spark / MLlib",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,"A/B Testing,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics",Often,,,,,,,,,,,,,,,,,Sometimes,Often,Sometimes,,,,,,,,,Sometimes,,,,,70,25,1,3,1,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Somewhat useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",25,25,35,0,15,0,Natural Language Processing,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Somewhat important,Not important +Male,United States,35,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,,,,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,I never declared a major,6 to 10 years,"Data Analyst,Researcher",Work,30,0,20,30,20,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,Government,"10,000 or more employees",,,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,,,,"Amazon Web services,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,Often,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Limitations of tools",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Colombia,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Very useful,"Data Stories Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,20,20,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",Primary/elementary school,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,KNIME (free version),NoSQL,R,SAS Enterprise Miner,SQL",,,,,Often,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,Often,,,,,,Sometimes,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,Often,,,,Most of the time,,,,,Sometimes,,Often,,,Often,,,,Most of the time,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues",,,,,Often,,,,Often,,,,Often,,,,Often,,,,,,100% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,,Rarely,6000000,COP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,Google Search,"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Most of the time,100MB,"Neural Networks,SVMs","Hadoop/Hive/Pig,Python,Spark / MLlib,TensorFlow",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Often,,,,,,"Decision Trees,kNN and Other Clustering,Random Forests,SVMs",,,,,,,,Often,,,,,,Rarely,,,,,,,,,Often,,,,,Most of the time,,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,Sometimes,,,,,,Often,,Often,,Most of the time,,10-25% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,Git,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,10,10,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,41,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Anomaly Detection,Python,Government website,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Very useful,,Very useful,,Somewhat useful,,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,"Business Analyst,DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,70,20,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A bachelor's degree,CRM/Marketing,"1,000 to 4,999 employees",Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,Decision Trees,"Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL",,,,,Most of the time,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,Often,,,,,,,Often,,,,,,,Often,,,,60,20,5,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,,,,,,Often,Often,,Often,,,,,Often,,,Often,Often,Often,,10-25% of projects,Entirely internal,Central Insights Team,Epsilon,Unclear Metadata,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Never,"95,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,22,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Deep learning,Python,,"Conferences,Friends network,Official documentation,Stack Overflow Q&A",,,,,Very useful,Very useful,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Researcher,Other",Self-taught,10,0,60,30,0,0,,Bayesian Techniques,A professional degree,Academic,I don't know,Stayed the same,Don't know,Some other way,Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Never,,"Bayesian Techniques,Other","C/C++,Python",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Simulation,Time Series Analysis",,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,0,0,10,15,65,10,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data",Often,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Other,,Our data is from a scientific experiment. Understanding the data itself is our biggest challenge.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,Put on common archive on supercomputing resources,"Bitbucket,Git",Sometimes,21483,CAD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Colombia,44,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,20,15,25,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,17,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,"Kaggle,Tutoring/mentoring",,,,,,,Somewhat useful,,,,,,,,,,Somewhat useful,,,< 1 year,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,Researcher,Other,30,10,0,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Telecommunications,"10,000 or more employees",Stayed the same,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,50,0,40,0,0,"Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,Decision Trees,"R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,,,,,"kNN and Other Clustering,Random Forests,Time Series Analysis",,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,0,40,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,44,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","College/University,Personal Projects,YouTube Videos",,,Very useful,,,,,,,,,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",15,25,50,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,500 to 999 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","IBM SPSS Statistics,Python,R,SQL,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,Most of the time,,,"Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,,Often,,,,,,,,Most of the time,,,,,Most of the time,,Often,,,,,,,Most of the time,,,,50,20,15,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",Often,,,,,,,,Most of the time,Often,,,,,,,Often,,,,Often,,26-50% of projects,Approximately half internal and half external,Other,Department of Statistics of Colombia,Privacy,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,100000000,COP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",MATLAB/Octave,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,90,NA,5,5,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Python,Spark / MLlib,SQL,Unix shell / awk",Often,Often,,,,,,,Often,,,,,,,,,,,Often,Often,,Often,,,,,,,,Often,,,,,,,,,,Often,Often,,,,,,Often,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs",Most of the time,Sometimes,,,Sometimes,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,,,,Sometimes,,,,,,90,5,5,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,640000,CNY,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Amazon Machine Learning,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,15,15,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",High school,Academic,Fewer than 10 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,100MB,"Bayesian Techniques,HMMs,RNNs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,HMMs,RNNs,Time Series Analysis",Most of the time,,Sometimes,,,Sometimes,Most of the time,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,40,10,0,30,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Most of the time,,Most of the time,Often,,Often,,,,,,,,,Most of the time,,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,22000,USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Other,University courses,5,15,5,75,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A professional degree,Insurance,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",,1GB,,"Hadoop/Hive/Pig,Unix shell / awk",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,80,0,0,20,0,0,Enough to run the code / standard library,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Often,,51-75% of projects,Entirely internal,Standalone Team,,Cleaning and filtering the data,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Never,72000,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Philippines,26,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Other,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,50,25,0,0,25,0,Unsupervised Learning,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Official documentation,Stack Overflow Q&A,Trade book",,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,,Very useful,,Somewhat useful,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Data Scientist,University courses,0,0,20,50,30,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Never,10GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Julia,Jupyter notebooks,MATLAB/Octave,Python,R,SQL",,,,,,,,Often,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Sometimes,Sometimes,,,,,,,,,,,,Often,,Often,,,,,,,,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Sometimes,,,,,,Sometimes,,,,Often,,51-75% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,KNIME (free version),"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Company internal community,Official documentation,Online courses,Stack Overflow Q&A",,Very useful,,Somewhat useful,,,,,,Somewhat useful,Very useful,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Researcher",University courses,20,10,20,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Sometimes,100GB,"Ensemble Methods,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Python,R,SQL,Tableau",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Sometimes,Most of the time,Most of the time,,,,Most of the time,,,,Often,,,,,,,Most of the time,,,,,,,,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,Sometimes,,Most of the time,Often,,,,,Often,Sometimes,,,,,,,Most of the time,,,,51-75% of projects,More internal than external,IT Department,Census data; Property Sales,Dirty inconsistency ,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,97000,AUD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, but looking for work",,,,,,,,Python,Random Forests,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Other,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Outlier detection (e.g. Fraud detection),Bayesian Techniques,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Russia,27,"Not employed, but looking for work",,,,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,,Not Useful,,,,,,,,,,Very useful,Very useful,,< 1 year,,,,,,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,50,0,40,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Data Elixir Newsletter,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,< 1 year,Necessary,,Nice to have,Nice to have,Necessary,Necessary,Nice to have,,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",70,20,0,0,0,10,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Time Series Analysis,R,Google Search,"College/University,Friends network,Kaggle,Stack Overflow Q&A",,,Somewhat useful,,,Very useful,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Researcher,University courses,50,10,20,10,10,0,"Outlier detection (e.g. Fraud detection),Time Series",Decision Trees - Random Forests,Primary/elementary school,Academic,20 to 99 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Text data,Most of the time,,"Decision Trees,Neural Networks,Random Forests",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Neural Networks,Random Forests,Time Series Analysis",,,,,,Most of the time,Often,Often,,,,,,,,,,,,Often,,,Often,,,,,,,Most of the time,,,,20,40,20,20,0,0,Enough to tune the parameters properly,"Dirty data,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,Often,,,,,,,Most of the time,,,,,,,,,Most of the time,,51-75% of projects,More external than internal,Standalone Team,stock market price,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Never,26000,THB,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,R,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Engineer,Work,30,30,20,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Google Cloud Compute,Julia,Jupyter notebooks,Python,R,SQL,Stan",,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,Often,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Often,,,Most of the time,,,,Often,,,,,Often,,,,,Often,,Often,Sometimes,Often,,,,20,20,30,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,Often,Often,,Often,,,,Often,,Less than 10% of projects,Entirely internal,Central Insights Team,PHI,Fuzzy semantic,Document-oriented (e.g. MongoDB/Elasticsearch),Other,Jupyter,Git,Rarely,170,,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Australia,74,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",SAS JMP,Genetic & Evolutionary Algorithms,Stata,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook",,Very useful,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,Less than a year,"Business Analyst,Data Analyst,Operations Research Practitioner,Researcher",University courses,50,5,0,40,0,5,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A professional degree,Other,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,Other","IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,SAS JMP,Tableau,Other",,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Most of the time,,Often,,,,Most of the time,,Most of the time,,,,Often,Most of the time,,,,,Most of the time,Most of the time,,Often,Often,,,,20,15,5,20,40,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Often,,,,,,,,,,,,,,,Often,,,100% of projects,More internal than external,Central Insights Team,census data,difficulty dealing with government departments.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,"140,000",AUD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Other,Fine,"Employed by a company that performs advanced analytics,Employed by non-profit or NGO",,,,,"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Work,10,50,40,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A doctoral degree,Mix of fields,,,,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Most of the time,,"CNNs,Neural Networks,Random Forests,SVMs","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,Often,,Most of the time,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,"CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Most of the time,,,,,,,,,,Often,,,,,,Most of the time,Sometimes,,,,,,,Often,,,,,,10,10,10,20,50,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,I prefer not to say,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,Most of the time,,Sometimes,,Often,,,Most of the time,,,Often,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,,,,,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Deep learning,C/C++/C#,Other,"College/University,Kaggle,Textbook",,,Somewhat useful,,,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,PhD,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Speech Recognition,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,37,Employed part-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Miner,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",23,75,1,1,0,0,Outlier detection (e.g. Fraud detection),,,Academic,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Sometimes,10GB,Bayesian Techniques,"NoSQL,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,United States,NA,I prefer not to say,,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,,,"Blogs,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,22,Employed part-time,,,No,Yes,Programmer,,Self-employed,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,,Nice to have,Necessary,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,Self-taught,20,60,0,10,10,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Colombia,23,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,31,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Friends network,Online courses,YouTube Videos",,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Physics,I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,DataRobot,Random Forests,Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Data Analyst,Data Scientist,Programmer",University courses,60,0,10,20,10,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1GB,"Bayesian Techniques,CNNs,HMMs,Neural Networks,Random Forests,SVMs","NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,Often,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",Often,,,Often,,Often,Often,Sometimes,,,,,Sometimes,Sometimes,,,,Sometimes,Sometimes,Often,Often,,,,,,,Often,,Sometimes,,,,50,15,15,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Often,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,Entirely external,IT Department,,,"Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)",Email,,,Sometimes,100000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,,23,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,R,Other,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,1-2 years,Necessary,,,,Necessary,Nice to have,Necessary,,,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Github Portfolio,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,Self-taught,50,10,10,30,0,0,Computer Vision,"Bayesian Techniques,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,,Very Important,,,,,,,,,,,,,Very Important +Male,United States,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,36,44,5,15,0,0,"Computer Vision,Recommendation Engines,Speech Recognition,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Telecommunications,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Not at all important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Most of the time,,,"Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Microsoft Excel Data Mining,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Non-Kaggle online communities,Personal Projects,Textbook",,,,,,,Very useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,69,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Tutoring/mentoring",Somewhat useful,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,,,Very useful,,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher",University courses,10,10,15,60,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Sometimes,1TB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,IBM SPSS Modeler,IBM SPSS Statistics,Java,Julia,Jupyter notebooks,MATLAB/Octave,Minitab,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,Sometimes,,,,,,Rarely,Rarely,,,Sometimes,Rarely,Most of the time,,,,Sometimes,,,,,Rarely,Sometimes,,,,Most of the time,,Often,,,,,Rarely,Rarely,,Most of the time,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Often,,Often,Often,Most of the time,Most of the time,Most of the time,,,,,,Often,,,,,,Often,Often,Most of the time,Most of the time,,Often,Often,Often,Often,Often,,,,35,20,20,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Sometimes,,,,,,Sometimes,Often,,Most of the time,,,,Often,,,Often,Often,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,Amazon Marketplace API Data.,The Volume of data produced per day.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,125000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Japan,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Matlab,,"Official documentation,Textbook,YouTube Videos",,,,,,,,,,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,Researcher,Self-taught,40,0,40,20,0,0,"Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Decreased slightly,Don't know,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Relational data",Sometimes,,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,MATLAB/Octave",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,10,30,20,20,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",Often,,,,,Most of the time,,,Often,,Often,,,,,,,,,,,,26-50% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,,Necessary,,,,,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Natural Language Processing,Neural Networks - RNNs,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,,,,,Very Important,,,,,,,,,,, +Male,South Korea,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,I don't write code to analyze data,Other,Self-taught,50,0,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,C/C++/C#,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,Very useful,,Somewhat useful,,,Very useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Jack's Import AI Newsletter",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,40+,Kaggle Competitions,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Miner,Machine Learning Engineer,Researcher",Self-taught,40,10,30,10,10,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Japan,35,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Software Developer/Software Engineer",University courses,60,0,20,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,39,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Python,Deep learning,Other,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Kaggle,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,,,,,Very useful,,,,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,"Computer Scientist,Researcher",Self-taught,50,50,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Logistic Regression,A professional degree,Academic,100 to 499 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Basic laptop (Macbook),Text data,Never,<1MB,Other,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,5,95,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team",Most of the time,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,Most of the time,Most of the time,Most of the time,,,,,,,100% of projects,Entirely internal,Other,,to get them,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,110000,MXN,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,,Nice to have,Unnecessary,Necessary,,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Unnecessary,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,Kaggle competitions,20,0,0,0,80,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important +Female,India,21,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Neural Nets,R,Google Search,"Kaggle,Tutoring/mentoring",,,,,,,Very useful,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Laptop or Workstation and local IT supported servers,,Master's degree,Yes,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,50,0,20,10,0,"Computer Vision,Machine Translation,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,Very useful,Very useful,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer,Other",University courses,40,30,15,10,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,3-5 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Relational data,Other",Most of the time,100GB,"CNNs,Ensemble Methods,RNNs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,Orange,Python,R,SQL,TensorFlow,Unix shell / awk",Rarely,Most of the time,,,,,,,Sometimes,,,,,,Often,,Sometimes,,,Sometimes,Sometimes,,,,,,,,Often,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,"CNNs,Ensemble Methods,GANs,HMMs,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,,,Often,,Often,,Often,Often,,,,,,,Often,Often,,,,,50,10,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,Often,,Often,,,,,Sometimes,,Often,,,,,Often,,Sometimes,,,,26-50% of projects,More internal than external,Standalone Team,UniProt; SwissProt; NCBI,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,170000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,,Very useful,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Udacity,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Physics,,Software Developer/Software Engineer,Work,NA,NA,NA,NA,NA,NA,Computer Vision,"Ensemble Methods,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,NA,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Programmer",University courses,20,30,20,20,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,28,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos",,Not Useful,Very useful,,Not Useful,,Very useful,,,,,Very useful,,,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Sort of (Explain more),Master's degree,,3 to 5 years,Researcher,Self-taught,70,20,0,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - RNNs",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Not important,Not important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Very Important,Not important,Very Important,Very Important +Male,Brazil,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Iran,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","College/University,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Somewhat useful,Very useful,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",5,90,5,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Financial,20 to 99 employees,Stayed the same,6-10 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10TB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,NoSQL,Oracle Data Mining/ Oracle R Enterprise,R,Other",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,"CNNs,Logistic Regression,Markov Logic Networks,RNNs,SVMs,Text Analytics",,,,Often,,,,,,,,,,,,Often,Often,,,,,,,,Often,,,Most of the time,Most of the time,,,,,10,50,30,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,Most of the time,Most of the time,,,,,Often,,,Sometimes,,,,,,Most of the time,,,,26-50% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j)",Company Developed Platform,,Git,Rarely,800000000,IRR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"FastML Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher",Kaggle competitions,10,10,10,0,70,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Government,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,100GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Often,Rarely,Often,,,Most of the time,,Sometimes,,Often,,,,,Often,,Most of the time,Rarely,,,Often,,Rarely,Often,,,,70,15,2,8,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Often,Sometimes,,,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,Central Insights Team,Geographic datasets; National Statistics Bureau,Preparation and cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,125000,AUD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,France,37,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,,"Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,10,0,50,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Manufacturing,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Python,Spark / MLlib",Rarely,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Sometimes,,Sometimes,,,,,,,Most of the time,,,,,,,Most of the time,,,,30,30,0,10,30,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Commercial Data Platform,,Bitbucket,Never,43000,EUR,Other,9,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Stack Overflow Q&A",Very useful,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,< 1 year,Nice to have,Nice to have,,,Necessary,,Necessary,,,,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",11 - 39 hours,Github Portfolio,No,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,53,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Software Developer/Software Engineer",Work,30,40,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,100 to 499 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Perl,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,Often,Most of the time,,Sometimes,,,,,,,,Most of the time,Often,,,Rarely,,,Sometimes,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Text Analytics,Time Series Analysis",Sometimes,,,,,,Often,,,,,,,Sometimes,,Sometimes,,,Rarely,,,,,,,,,,Sometimes,Often,,,,20,20,40,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,45,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,0,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,Indonesia,22,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Textbook",,,,,,,Very useful,,,,,,,,Somewhat useful,,,,O'Reilly Data Newsletter,< 1 year,Necessary,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,5,0,30,0,25,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important +A different identity,India,28,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,R,Time Series Analysis,R,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Data Analyst,Researcher",Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,United States,25,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,"Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,Less than a year,"Engineer,Researcher,Statistician",University courses,20,40,0,10,5,25,Unsupervised Learning,"Logistic Regression,Support Vector Machines (SVMs)",,Academic,20 to 99 employees,Increased significantly,3-5 years,Some other way,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,,,,"Decision Trees,Neural Networks,SVMs","Minitab,Python",,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Sometimes,,Often,,,,,,,,,,Often,,Most of the time,Most of the time,,Often,,,,Often,,,,,,,0,0,0,0,0,0,,"Data Science results not used by business decision makers,Unavailability of/difficult access to data",,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,20,10,20,10,40,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Mathematica,Minitab,Perl,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",Rarely,Sometimes,,Most of the time,Rarely,,,,Rarely,,Rarely,Rarely,,,,,,,,Rarely,,,,,,Rarely,,,,Sometimes,Often,,Often,,,,,,,,Often,Often,,,Rarely,Sometimes,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs",Rarely,Rarely,Rarely,,,Often,Often,Often,Often,Rarely,,Often,,Sometimes,,Sometimes,Often,,,Sometimes,Often,Often,Often,,,,Most of the time,Often,,,,,,30,20,10,10,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,Often,,Sometimes,,,,Sometimes,,,Often,,,,,Most of the time,,,,Most of the time,,,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,65000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Iran,27,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,Google Search,"Company internal community,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Very useful,,,Somewhat useful,,,,,,,Very useful,Very useful,,,Very useful,FastML Blog,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Other",University courses,30,10,30,30,0,0,Computer Vision,Neural Networks - CNNs,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,"Not employed, but looking for work",,,,,,,,C/C++,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Blogs",Very useful,Very useful,,,,,,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Yes,Master's degree,Computer Science,More than 10 years,"Data Scientist,Other",University courses,25,25,25,25,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important +Male,United States,52,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Time Series,"Hidden Markov Models HMMs,Markov Logic Networks",A bachelor's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Other,Sometimes,100GB,"HMMs,Regression/Logistic Regression","C/C++,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,HMMs,Simulation,Time Series Analysis",Often,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,20,20,0,20,40,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,Data Analyst,University courses,5,10,25,60,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","Cloudera,Hadoop/Hive/Pig,Impala,Julia,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk,Other,Other",,,,,Often,,,,Often,,,,,Sometimes,,Rarely,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,Rarely,Most of the time,,,Sometimes,Rarely,,Most of the time,Sometimes,Rarely,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,Text Analytics,Other,Other,Other",,Rarely,,,,Sometimes,Often,Sometimes,Sometimes,,,Sometimes,,,Rarely,Rarely,,,Sometimes,Rarely,Sometimes,,Rarely,,Rarely,Sometimes,,,Sometimes,,Sometimes,Sometimes,Often,60,10,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Often,,Often,Most of the time,,,,Often,,,,,,,,Most of the time,,Often,,Sometimes,,100% of projects,Entirely internal,Other,Census; Land Information Agency Data; Facebook API; Reddit API,Privacy,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Rarely,115000,NZD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Iran,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,,,DataCamp,Other,2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Engineering (non-computer focused),,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Other,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,,"Emergent/Future Newsletter (Algorithmia),Jack's Import AI Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,60,10,25,0,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Government,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Workstation + Cloud service","Image data,Video data,Text data",Rarely,1GB,"CNNs,Decision Trees,HMMs,RNNs","Microsoft Azure Machine Learning,Minitab,Python,RapidMiner (free version),Salfrod Systems CART/MARS/TreeNet/RF/SPM,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,Most of the time,,,,Rarely,Sometimes,,,,,,Often,,,,Most of the time,,,,,,"A/B Testing,CNNs,Cross-Validation,kNN and Other Clustering,RNNs,Text Analytics",Sometimes,,,Most of the time,,Most of the time,,,,,,,,Often,,,,,,,,,,,Often,,,,Often,,,,,20,50,10,10,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,,,,Often,,Often,,,,,Sometimes,,,Often,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,"Keggle, UTC",,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Git",Most of the time,"50,000,00",BDT,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,Very useful,Very useful,,,,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,Less than a year,"Business Analyst,Researcher",Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Never,1MB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,SVMs",,Rarely,,Rarely,,,Often,Often,,,,Often,,,,Often,,,,Often,,,Often,,,,,Sometimes,,,,,,60,20,0,10,10,0,Enough to run the code / standard library,Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Cluster Analysis,Python,"Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,Other,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,"Recommendation Engines,Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I don't know/not sure,CRM/Marketing,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",,<1MB,,IBM Cognos,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Recommender Systems,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,50,0,40,10,0,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Never,45600,,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",20,20,20,35,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL,Other",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,Sometimes,,Sometimes,,Sometimes,,,,50,30,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Lack of data science talent in the organization,Need to coordinate with IT",Often,,,Often,,,,,Often,,,,,,Most of the time,,,,,,,,100% of projects,Entirely internal,Business Department,prefer not to say,Upstream data changes that are not communicated to us,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,115000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Ukraine,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,University courses,10,10,0,70,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,,,Regression/Logistic Regression,"Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Often,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,5,10,30,15,40,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",,,,,Often,,,,Most of the time,,,,Sometimes,,,Most of the time,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,"Currently, it is the sheer size of it and the infeasibility of handling it with the tools that the organization has always been used to. Also the lack of a data engineer or person devoted especially to structuring the raw data in a way that complex insights can be extracted or that algorithms can be run on it. ",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,70000,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,53,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other,,,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,< 1 year,,,Nice to have,,Nice to have,Necessary,Nice to have,,,,,,,,Other,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,Less than a year,Other,Other,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,I prefer not to answer,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,,,,,,,Very Important,,,,,Somewhat important,,,, +Male,Germany,32,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Random Forests,R,,College/University,,,Very useful,,,,,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Statistician,University courses,30,0,5,65,0,0,,Logistic Regression,High school,Retail,"1,000 to 4,999 employees",Increased slightly,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,50,0,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,Sometimes,,Most of the time,Most of the time,,,,,Sometimes,Often,,,,,Most of the time,Sometimes,,100% of projects,More internal than external,IT Department,amazon reviews,unclear analytics questions,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Never,33000,EUR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,Very useful,,,,,,,,,,Very useful,Very useful,O'Reilly Data Newsletter,3-5 years,Necessary,,,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,A social science,3 to 5 years,Statistician,University courses,40,40,0,20,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Not important,Somewhat important,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Very useful,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Data Miner,Engineer",University courses,10,20,0,70,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Female,United States,30,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,I don't plan on learning a new ML/DS method,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,3-5 years,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,No,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,30,10,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",India,38,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Factor Analysis,Other,,College/University,,,Not Useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,1-2 years,Nice to have,Necessary,Necessary,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,1 to 2 years,Business Analyst,Self-taught,20,20,20,20,20,0,Computer Vision,Decision Trees - Gradient Boosted Machines,A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,,,,,,,,,,,, +Male,Singapore,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by government",TensorFlow,Deep learning,Python,Google Search,"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,,,Very useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,Researcher,Self-taught,40,10,20,20,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,100 to 499 employees,Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Image data,Sometimes,100MB,"CNNs,Neural Networks","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,10,60,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Sometimes,,,10-25% of projects,More internal than external,IT Department,"MNIST, Imagenet, Cifar 10/100",classification accurately,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,60000,SGD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,35,Employed full-time,,,No,Yes,Operations Research Practitioner,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Researcher,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,Recommendation Engines,Evolutionary Approaches,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,10,20,20,20,30,0,"Adversarial Learning,Survival Analysis",Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,YouTube Videos",,,,,,Somewhat useful,Very useful,,,,Not Useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,Linear Digressions Podcast",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,Gaming Laptop (Laptop + CUDA capable GPU),,Kaggle Competitions,,I did not complete any formal education past high school,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,Natural Language Processing,Logistic Regression,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Female,India,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,Researcher,University courses,20,10,10,60,0,0,"Computer Vision,Unsupervised Learning","Ensemble Methods,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,I don't plan on learning a new tool/technology,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",Somewhat useful,,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,6 to 10 years,Other,Self-taught,50,0,25,25,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,"5,000 to 9,999 employees",,Don't know,A general-purpose job board,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,,"CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Java,Microsoft Excel Data Mining,Python,Other",,Sometimes,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,Time Series Analysis",Often,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,30,10,0,30,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",Often,,,,Most of the time,Sometimes,,,Often,,,,,,,,Most of the time,,,,Often,,51-75% of projects,More internal than external,Standalone Team,,Privacy with healt related data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,75000,CHF,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Russia,20,"Not employed, but looking for work",,,,,,,,R,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,,,1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Netherlands,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",65,20,15,0,0,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,10 to 19 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Python,R,Spark / MLlib,SQL",,,,,,,,,Often,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,Sometimes,,Often,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Social Network Analysis,Python,GitHub,"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",50,25,25,0,0,0,"Natural Language Processing,Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,"1,000 to 4,999 employees",Decreased significantly,6-10 years,A general-purpose job board,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,1GB,Regression/Logistic Regression,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",Sometimes,,,,,,,,,,,,,,,Often,,Often,Often,,,,Often,,,,,Often,Often,,,,,50,20,10,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Subversion",Rarely,1600000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Czech Republic,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Somewhat useful,Not Useful,,,Not Useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,40,10,30,10,10,0,Time Series,Logistic Regression,High school,Financial,20 to 99 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Relational data,Most of the time,100GB,Other,"Julia,Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Association Rules,Logistic Regression,Segmentation,Simulation,Time Series Analysis",,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,Most of the time,,,Often,,,,10,30,40,5,15,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,Rarely,,,,,,,,,Sometimes,,,,,Rarely,Often,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Other",Never,800000,CZK,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Python,Time Series Analysis,SQL,University/Non-profit research group websites,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Other,Less than a year,Other,Other,30,10,0,0,10,50,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Logistic Regression,Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important +Male,Other,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,KDnuggets Blog,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Other,2 - 10 hours,Master's degree,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Other",University courses,50,10,0,35,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",< 1 year,Nice to have,,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,,Necessary,,,,,Basic laptop (Macbook),,Master's degree,No,Master's degree,Other,1 to 2 years,I haven't started working yet,Self-taught,30,20,0,30,20,0,Time Series,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Other",,,,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Other,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Professional degree,,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,,,,Very useful,,,,,"R Bloggers Blog Aggregator,Talking Machines Podcast,The Analytics Dispatch Newsletter",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Outlier detection (e.g. Fraud detection),Support Vector Machines (SVMs),No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,24,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"FastML Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Necessary,,Necessary,,Necessary,Necessary,Necessary,Nice to have,,,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,50,25,0,25,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Female,Canada,NA,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,"Ensemble Methods (e.g. boosting, bagging)",Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,0,0,0,100,0,0,Time Series,Neural Networks - GANs,,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Never,,,"Amazon Web services,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,100,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,None,Do not know,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,19,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,30,Employed full-time,,,Yes,,Data Miner,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Data Analyst,Data Miner,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,30,0,10,60,0,0,Machine Translation,"Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,I prefer not to answer,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,,,,"Java,MATLAB/Octave,Other",,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,"Cross-Validation,Logistic Regression,Neural Networks,SVMs,Text Analytics",,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Often,,,,,,,,Sometimes,Often,,,,,20,50,0,30,0,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database",,,Sometimes,,Sometimes,,,,,Often,,Often,,,,,Sometimes,Often,,,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Other,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,32,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Conferences,Official documentation,Personal Projects",,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,,,Data Elixir Newsletter,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Computer Science,3 to 5 years,Researcher,University courses,10,5,0,80,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Australia,28,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Hadoop/Hive/Pig,,Python,I collect my own data (e.g. web-scraping),"Company internal community,Conferences,Friends network,Kaggle,Personal Projects,YouTube Videos",,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer",Self-taught,5,80,15,0,0,0,,,A bachelor's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1TB,,"Amazon Web services,Hadoop/Hive/Pig,Python,R,SQL,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Most of the time,,,,,,Most of the time,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,80,0,20,0,,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Sometimes,,,,Less than 10% of projects,Do not know,Other,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Most of the time,1200000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,Newsletters,Official documentation,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,Very useful,,,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,Self-taught,30,15,0,10,15,30,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,France,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Very useful,,Very useful,Very useful,,,Very useful,Very useful,,Somewhat useful,,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Engineer",University courses,20,30,0,30,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Insurance,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,1GB,Decision Trees,"Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Most of the time,,Often,,,Often,,Most of the time,Sometimes,,,,,Sometimes,Sometimes,,,,60,10,10,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,Often,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,Most of the time,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,60000,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,49,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,India,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Engineer,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed part-time,,,No,Yes,Computer Scientist,Fine,Self-employed,MATLAB/Octave,Neural Nets,Python,Google Search,College/University,,,Very useful,,,,,,,,,,,,,,,,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,PhD,Yes,Master's degree,A health science,Less than a year,Computer Scientist,University courses,0,20,5,50,0,25,Adversarial Learning,Neural Networks - GANs,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important +Female,Spain,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Neural Nets,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Personal Projects,YouTube Videos",,,,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,20,1,44,35,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Hidden Markov Models HMMs",A bachelor's degree,Other,20 to 99 employees,Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation",Text data,Most of the time,10GB,"Evolutionary Approaches,Other","IBM SPSS Statistics,KNIME (free version),Python,R",,,,,,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,PCA and Dimensionality Reduction,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,1,20,60,10,9,0,Enough to tune the parameters properly,"Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,,,,,,,,,Often,,,100% of projects,More external than internal,Standalone Team,IRIS data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Subversion",Sometimes,111000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Spain,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Python,Text Mining,Python,"Google Search,Government website,University/Non-profit research group websites","Blogs,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,,,,Somewhat useful,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,Data Scientist,Self-taught,60,30,10,0,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Other (please specify; separate by semi-colon)",A doctoral degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Never,10GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Java,Perl,R",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Rarely,,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Neural Networks",,,Often,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,70,20,0,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",Often,Most of the time,,,Most of the time,,,,,Most of the time,Often,,Most of the time,,,,,Most of the time,,,Often,,76-99% of projects,More external than internal,Standalone Team,,difficulty to get sufficient support from the group responsible for creating/managing the data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,0,EUR,Has decreased 20% or more,2,,,,,,,,,,,,,,,,,, +Male,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",Kaggle competitions,50,10,0,0,40,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,SQL",,Sometimes,,,,,,,Rarely,,,,,,Often,,Most of the time,,,,Often,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Sometimes,,,,Sometimes,Most of the time,,Often,Sometimes,,,Sometimes,,,,Most of the time,,,,,,,Most of the time,Sometimes,,,,Rarely,,Sometimes,,,,70,20,0,10,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,,USD,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Vietnam,33,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Bayesian Methods,Python,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Personal Projects",Very useful,,Very useful,,,,,,,,,Very useful,,,,,,,,3-5 years,,,,,,,,,,,,,,,Workstation + Cloud service,40+,Other,Yes,Bachelor's degree,Mathematics or statistics,,,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,29,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,IBM Watson / Waton Analytics,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,,,Very useful,,,,Very useful,"FastML Blog,FlowingData Blog,KDnuggets Blog",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,,,,"Coursera,DataCamp,edX,Udacity,Other","Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Other,29,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,6 to 10 years,"Data Miner,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,Fewer than 10 employees,Increased slightly,More than 10 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Decision Trees,Natural Language Processing,Text Analytics",,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,30,20,40,10,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,33,Employed part-time,,,Yes,,Data Analyst,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,Business Analyst,Work,20,30,40,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,22,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Deep learning,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,Very useful,,,,,,,,,Somewhat useful,,,Very useful,,"FlowingData Blog,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,Self-taught,50,0,0,50,0,0,Unsupervised Learning,Decision Trees - Gradient Boosted Machines,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,,,,, +Female,Ukraine,75,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Online courses,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,,Yes,Doctoral degree,Electrical Engineering,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,Data Miner,Kaggle competitions,40,10,0,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,Often,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,,,,Most of the time,Often,Sometimes,Often,,,Most of the time,,,,Often,,,,Often,Often,,,,,Sometimes,,,,,,,,10,30,5,10,45,0,Enough to refine and innovate on the algorithm,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools",,,,,Sometimes,,,Sometimes,Often,,,,Sometimes,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,None,Avoid overfitting and build the model that does not degrade over time; build neural networks on relational data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,170000,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Other,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,Very useful,,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,5,0,5,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Reinforcement learning,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased significantly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,1GB,Neural Networks,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,Sometimes,,,,,Often,,,,Often,,,,80,5,0,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,Sometimes,,,,,Most of the time,,Often,,,,,,Often,Often,,Most of the time,,,76-99% of projects,More internal than external,Business Department,,"Bad access, GUI interface","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,1881,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Czech Republic,41,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,Data Analyst,University courses,0,5,70,25,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Logistic Regression,Markov Logic Networks",A doctoral degree,CRM/Marketing,"1,000 to 4,999 employees",Stayed the same,6-10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"GPU accelerated Workstation,Traditional Workstation",11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,11-15,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,Iran,30,Employed full-time,,,No,Yes,Statistician,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,35,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities",,,,,,,Very useful,,Very useful,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Miner,Engineer,Machine Learning Engineer",Work,45,20,20,0,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Random Forests,Primary/elementary school,Insurance,"10,000 or more employees",Decreased significantly,1-2 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,Microsoft Excel Data Mining,R,RapidMiner (free version),SAS JMP",,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Sometimes,,,,,Sometimes,,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Segmentation",,,,,,,Often,Often,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,45,20,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,Most of the time,,,Often,,Often,,,,,Often,,,,,Most of the time,,51-75% of projects,More external than internal,Central Insights Team,,Data mapping frim various atomic layer tables,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Bitbucket,,"130,000",,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,United Kingdom,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,RapidMiner (free version),Uplift Modeling,SQL,I collect my own data (e.g. web-scraping),"Conferences,Kaggle,Personal Projects",,,,,Somewhat useful,,Very useful,,,,,Very useful,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Statistician",Kaggle competitions,70,10,0,0,20,0,"Outlier detection (e.g. Fraud detection),Survival Analysis",Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Most of the time,100GB,"Decision Trees,Evolutionary Approaches","IBM Cognos,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,SAP BusinessObjects Predictive Analytics,Unix shell / awk",,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,Often,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Random Forests",,Often,,,,Often,Often,Often,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Most of the time,Often,,Sometimes,Often,,,,,,,,,,,Most of the time,,,51-75% of projects,Entirely internal,IT Department,,Data quality and access protocols.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"40,000",,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,25,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,Work,30,0,70,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,Mathematica,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,20,0,60,10,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Academic,20 to 99 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Sometimes,1GB,"CNNs,GANs,Neural Networks","MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,,Often,,Sometimes,Sometimes,,,,Often,,,Sometimes,,Often,,,,Most of the time,Sometimes,,,,,,,,,,,,,10,55,5,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Privacy issues",,,,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,CNY,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Spain,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Data Analyst,Data Scientist,Programmer",Work,10,80,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,<1MB,"Random Forests,SVMs","Java,Jupyter notebooks,MATLAB/Octave,Python,SQL",,,,,,,,,,,,,,,Most of the time,,Rarely,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"kNN and Other Clustering,Random Forests,SVMs",,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,40,40,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Limitations of tools,Need to coordinate with IT",,,,,,Often,,,,,,,Most of the time,,Most of the time,,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Never,20000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Netherlands,38,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,R,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Researcher,Other",Self-taught,25,40,25,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Government,100 to 499 employees,Stayed the same,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,10GB,Other,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Simulation",,,,,,,Sometimes,,,,,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,15,5,20,20,20,20,Enough to refine and innovate on the algorithm,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,privacy issues,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,60000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Russia,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,Arxiv,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,40,20,10,0,30,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Workstation + Cloud service","Image data,Text data,Relational data",Rarely,100GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,NoSQL,Python,R,Spark / MLlib,TensorFlow",,Often,,,,,,Sometimes,Rarely,,,,,,Often,,Most of the time,,,Rarely,,,,,,,Sometimes,,,,Often,,Rarely,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,Often,,Rarely,,,,,,,,,,,,Rarely,Sometimes,Most of the time,Rarely,,Often,,Sometimes,Rarely,,Rarely,Sometimes,Often,,,,10,60,15,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process",,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,26-50% of projects,Do not know,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Git,Always,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,10,20,0,70,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,37,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,Other,"Conferences,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,,University courses,5,10,10,75,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Financial,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Other,Most of the time,100GB,"Evolutionary Approaches,Neural Networks,RNNs","Amazon Web services,C/C++,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow",,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,Often,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Evolutionary Approaches,Logistic Regression,Neural Networks,Random Forests,RNNs,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,,,Rarely,,,,,,Sometimes,,,,Most of the time,,,Rarely,,Sometimes,,,,,Sometimes,,,,10,20,10,5,5,50,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Rarely,,,,Sometimes,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,Sometimes,,Less than 10% of projects,Approximately half internal and half external,Business Department,,Idiosyncracies of different capital market exchange platforms,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,"250,000",EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Other",Rarely,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,Often,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,Often,,,,50,15,5,25,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,50,0,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Kenya,25,Employed full-time,,,Yes,,DBA/Database Engineer,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,College/University,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,DBA/Database Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",40,43,7,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Non-profit,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Rarely,1GB,Regression/Logistic Regression,"IBM Cognos,IBM SPSS Statistics,MATLAB/Octave,Python,QlikView,R,SQL,Tableau",,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,Often,Most of the time,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Most of the time,,,,,,,Often,,Most of the time,,,,,Most of the time,,,,,,,,,Most of the time,,,,55,10,10,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,Most of the time,,Often,Sometimes,,,Sometimes,Often,,Most of the time,,Most of the time,,,Most of the time,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,86905,KES,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Canada,49,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Engineer,Programmer",Self-taught,20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,70,5,0,0,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,500 to 999 employees,Increased significantly,Less than one year,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,SAS Enterprise Miner,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,Engineer,Self-taught,25,15,45,15,0,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Other (please specify; separate by semi-colon)",High school,Financial,I prefer not to answer,Stayed the same,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,100MB,"Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Decision Trees,Logistic Regression,Prescriptive Modeling,Time Series Analysis",Rarely,Sometimes,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,65,15,10,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input",,,,,,,,,Often,Most of the time,Most of the time,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,Data cleaning and understanding business problems to provide insughts.,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,45000,INR,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",33,34,0,33,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Somewhat important +Male,Ukraine,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Decision Trees,Python,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,FastML Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,Self-taught,30,30,0,0,40,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Markov Logic Networks",A bachelor's degree,Manufacturing,500 to 999 employees,Decreased significantly,Less than one year,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Rarely,,,Orange,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,PCA and Dimensionality Reduction,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,0,100,0,0,0,0,Enough to refine and innovate on the algorithm,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,10-25% of projects,,,,,,,,Other,,,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Ireland,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Predictive Modeler",University courses,20,10,25,45,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Government,500 to 999 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation",Sometimes,,,,,Often,Most of the time,Often,Often,,,,,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,Often,,,Most of the time,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,34,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,Very useful,,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,Somewhat useful,,Very useful,Very useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Master's degree,Electrical Engineering,Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,45,25,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,Nigeria,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,R,Deep learning,R,Other,"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,Data Analyst,Self-taught,90,0,0,0,5,5,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",,Manufacturing,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,1GB,"Bayesian Techniques,Decision Trees,SVMs","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,Often,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Recommender Systems,Segmentation,Time Series Analysis",,Often,,,Sometimes,Often,Most of the time,Most of the time,,,,,,,Often,Most of the time,,Often,,,,,,Often,,Often,,,,Often,,,,30,20,10,10,30,0,Enough to run the code / standard library,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,100% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",,,,Other,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Company internal community,Conferences,Textbook",Somewhat useful,,Very useful,Very useful,Very useful,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Researcher,Other",University courses,10,0,50,40,0,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Hadoop/Hive/Pig,Java,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,Often,,,,,,Often,,Sometimes,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,Sometimes,,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,Text Analytics",,,Often,,,,Most of the time,,,,,,,,,,,,Often,,Often,,,Sometimes,Sometimes,Often,,,Often,,,,,20,40,20,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,Sometimes,,,,,Sometimes,,,Often,,,51-75% of projects,Approximately half internal and half external,Standalone Team,Wikipedia (multiple languages),,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,160000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Other,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,16,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Online courses,Textbook",,Very useful,,,Very useful,,,,,,Somewhat useful,,,,Somewhat useful,,,,"KDnuggets Blog,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Nice to have,,,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Friends network,Online courses,Textbook",Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,,FlowingData Blog,3-5 years,Nice to have,,Necessary,,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Software Developer/Software Engineer",University courses,0,5,45,50,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,South Korea,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,Online courses",,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Researcher,Statistician",University courses,10,3,5,80,2,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs",A bachelor's degree,Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Rarely,10GB,"Bayesian Techniques,Neural Networks","C/C++,MATLAB/Octave,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Neural Networks,Simulation",,,Often,,,Often,Most of the time,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,,,,40,10,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",Often,,,,Sometimes,,,Sometimes,Often,Most of the time,,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,IT Department,I'm using only internal data source.,"Spending lots of time to creating data by modeling simulation. It is quite far from typically used in machine learning or data science.",Other,"Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"60,000,000",KRW,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Singapore,31,Employed full-time,,,Yes,,Data Scientist,,,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,,,Self-taught,20,20,20,20,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",No education,Technology,"5,000 to 9,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,Random Forests,"Amazon Machine Learning,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,QlikView,R,SAS Enterprise Miner,Spark / MLlib,Tableau",Often,,,,,,,,Sometimes,,,,,,,,,,,,,Often,Often,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,,Often,,Sometimes,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",,,,,,Often,Often,Often,Often,,,Often,,Often,,Often,,,,,,,Often,,,,,,Often,Often,,,,70,20,3,7,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,I prefer not to say,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,,,100% of projects,Do not know,Standalone Team,na,keep on changing,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Rarely,65000,SGD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Other",,,Very useful,,,,Very useful,,,,Somewhat useful,,,,,,,,Becoming a Data Scientist Podcast,1-2 years,,Necessary,Necessary,,Necessary,Necessary,Necessary,,,Necessary,,,,Coursera,Traditional Workstation,0 - 1 hour,Master's degree,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,"Bayesian Techniques,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,,,,,,,,,,,,,,,Very Important +Male,Belgium,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,Spark / MLlib,Deep learning,Python,"GitHub,Google Search",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Workstation + Cloud service",11 - 39 hours,Github Portfolio,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,90,10,0,0,0,0,Time Series,,"Some college/university study, no bachelor's degree",Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,23,"Not employed, but looking for work",,,,,,,,Python,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,"Other,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,39,Employed full-time,,,Yes,,Other,Fine,Employed by government,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Arxiv,Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos,Other",Very useful,Very useful,,,Very useful,Somewhat useful,Very useful,,,Somewhat useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",30,25,10,10,25,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Never,10GB,Gradient Boosted Machines,"Java,NoSQL,Python,SQL,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,Often,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression",,,,,,Often,Most of the time,,,,,Often,,,,Sometimes,,,,,,,,,,,,,,,,,,45,5,0,45,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,I prefer not to say,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Often,,,,Most of the time,,Rarely,,,,,,,,Most of the time,,,Often,Sometimes,,76-99% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,QlikSense,Git,Never,52,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,"Coursera,Udacity",Other,11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,40,40,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Pakistan,23,"Not employed, but looking for work",,,,,,,,Amazon Web services,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Friends network,Kaggle,Online courses,Personal Projects",,,Very useful,,,Somewhat useful,Very useful,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,10,30,0,50,10,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important +Male,Sweden,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,30,"Not employed, but looking for work",,,,,,,,Julia,Deep learning,Python,Google Search,"Arxiv,Conferences,Official documentation,Stack Overflow Q&A,Textbook",Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,,,5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,Yes,Doctoral degree,Computer Science,6 to 10 years,Researcher,University courses,60,10,0,30,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Denmark,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Very useful,,Very useful,,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher",University courses,30,20,20,20,10,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Government,I prefer not to answer,Decreased slightly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,1GB,"Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow",,Sometimes,,,Often,,Rarely,,Most of the time,,,,,Sometimes,Most of the time,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,,Sometimes,,,,,,"Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,,Sometimes,,,,,,,,,Often,,Sometimes,Often,,Sometimes,,Sometimes,,,,,Often,Often,,,,,77,5,15,3,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,10-25% of projects,More internal than external,Central Insights Team,,,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Sometimes,600000,DKK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,20,20,20,20,0,Computer Vision,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,500 to 999 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Other,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle,Personal Projects",,,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,20,0,0,0,80,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Stayed the same,Don't know,A tech-specific job board,Very important,Other,Workstation + Cloud service,Relational data,Most of the time,,,"Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,feature extraction and selection,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Other",Rarely,120000,LKR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Israel,36,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data",Sometimes,1GB,CNNs,"C/C++,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization",,,,Often,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,40,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,Sometimes,,,,Often,Most of the time,,,Often,,51-75% of projects,More external than internal,Standalone Team,FDDB; IJB,"communication speed missing annotation parameters","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,200000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Israel,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,Online courses,Textbook",,Very useful,,,,,,,,,Very useful,,,,Very useful,,,,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,3 to 5 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,50,40,10,0,0,0,Supervised Machine Learning (Tabular Data),,High school,Mix of fields,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,100MB,Decision Trees,"Microsoft Excel Data Mining,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Often,,,Often,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,,Most of the time,,Most of the time,Most of the time,,,,,,,,Most of the time,,,76-99% of projects,Approximately half internal and half external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,250000,INR,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Ireland,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,Python,Monte Carlo Methods,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Company internal community,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,Work,10,0,50,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,>1EB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Data Visualization,Decision Trees,Simulation,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,Most of the time,,,,30,0,20,20,30,0,Enough to explain the algorithm to someone non-technical,"Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Most of the time,,,,,,,,Often,,76-99% of projects,More internal than external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Other,Sometimes,100000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,Other,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Factor Analysis,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,Online courses,Personal Projects,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,,,,Very useful,,Very useful,,,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Researcher,Statistician",University courses,20,10,35,25,10,0,"Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,500 to 999 employees,Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,IBM SPSS Statistics,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Minitab,Python,R,SAS Base,SQL,Tableau",,,,Rarely,,,,,,,,Sometimes,,,Sometimes,,Often,,,,Sometimes,,Most of the time,,Sometimes,Rarely,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,Most of the time,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,,,Most of the time,Rarely,,,,,Most of the time,Often,,,Most of the time,Sometimes,,,Rarely,,,,20,40,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data",,Sometimes,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,Financial institutions,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Other",Sometimes,,BGN,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,Somewhat useful,Very useful,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),40+,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,Less than a year,Engineer,University courses,40,20,NA,30,10,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +"Non-binary, genderqueer, or gender non-conforming",Turkey,42,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,Python,Support Vector Machines (SVM),SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Bachelor's degree,Other,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,57,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Conferences,Kaggle,Online courses",,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Mix of fields,20 to 99 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Most of the time,Often,,,,,,,,Often,,,,Often,Most of the time,,Often,,,,,,,,,,,70,10,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,Sometimes,,,,,,,,,,,,Most of the time,,,76-99% of projects,Entirely internal,IT Department,,dirty data and integrating various internal sources of data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Git,Other",Sometimes,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,43,Employed full-time,,,Yes,,Other,Poorly,Employed by company that makes advanced analytic software,Python,Time Series Analysis,SQL,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,Less than a year,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,,Technology,20 to 99 employees,Decreased significantly,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10MB,"Regression/Logistic Regression,Other","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling",Sometimes,,,,,,Often,,,,,,,,,Sometimes,,,,,Rarely,Sometimes,,,,,,,,,,,,30,20,10,10,30,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,21,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,35,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Online courses,Textbook,Tutoring/mentoring",,Somewhat useful,Very useful,,,,,,,,Somewhat useful,,,,Very useful,,Very useful,,,10-15 years,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Engineer,Researcher",University courses,50,10,0,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important +Female,India,19,Employed part-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,"Arxiv,Non-Kaggle online communities",Very useful,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Time Series,Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,"5,000 to 9,999 employees",Stayed the same,More than 10 years,An external recruiter or headhunter,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,10TB,Regression/Logistic Regression,"NoSQL,Python,QlikView,R,SAS Enterprise Miner,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,Most of the time,,,Often,,Often,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,,Standalone Team,,,,,,,,408000,RUB,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Very useful,"Data Stories Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,10,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,IBM Cognos,Julia,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,QlikView,R,RapidMiner (free version),SQL,TensorFlow",Sometimes,,,,,,,,,Often,,,,,,Sometimes,Often,,,,,Often,Often,,,,,,,,Often,Often,Often,,Rarely,,,,,,,Often,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,Logistic Regression,Naive Bayes,Neural Networks,Recommender Systems,Segmentation,Time Series Analysis",,Sometimes,Often,,,,,,,,,,,,,Often,,Sometimes,,Often,,,,Often,,Often,,,,Most of the time,,,,20,40,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Most of the time,Sometimes,,,,Often,Sometimes,,,,,Often,,,Sometimes,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,41000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,46,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by college or university,Employed by non-profit or NGO",Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,YouTube Videos",,Somewhat useful,,,Very useful,,Somewhat useful,,,,,,,,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Data Analyst,Researcher",Self-taught,50,0,20,10,0,20,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Image data,Video data,Text data,Relational data",,1GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Neural Networks","C/C++,Hadoop/Hive/Pig,IBM Cognos,Java,R,RapidMiner (free version),SQL,Tableau",,,,Often,,,,,Most of the time,Sometimes,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis",,Most of the time,Often,,,,Most of the time,Most of the time,,Most of the time,,,,Often,,Most of the time,,Often,Most of the time,Often,,,,,,,,,Most of the time,Most of the time,,,,30,50,0,20,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,,,,,,,,,,,,Often,,,,,,Most of the time,,Often,,26-50% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,,Never,158000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Egypt,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,1GB,,"Python,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,0,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Often,,,,,,,,,,,Most of the time,,,76-99% of projects,More internal than external,IT Department,Ookla;mobile handset sheets,Making the model,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Rarely,150000,EGP,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,25,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,I haven't started working yet,University courses,70,5,20,5,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Manufacturing,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,100MB,"Gradient Boosted Machines,Random Forests,SVMs","MATLAB/Octave,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,25,10,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,SAP BusinessObjects Predictive Analytics,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,,,Rarely,,,,,,,,Most of the time,Rarely,,,,,Rarely,,Often,,,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",Often,,,,,Often,Most of the time,Sometimes,,,,Sometimes,,Often,,Often,,,,Rarely,,Often,Often,Most of the time,,Most of the time,Most of the time,Rarely,,Most of the time,,,,50,20,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,Business Department,,Clean,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Never,4440000,RUB,Other,7,,,,,,,,,,,,,,,,,, +Male,Republic of China,37,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,26,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Statistician",University courses,25,0,25,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,India,28,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by government,R,Deep learning,SQL,GitHub,"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Jack's Import AI Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Researcher",Self-taught,20,40,0,0,40,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,48,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,Employed by college or university,Python,Bayesian Methods,Python,GitHub,"Blogs,Conferences,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,Very useful,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Doctoral degree,Other,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,,,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,India,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,,,,Very useful,,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,10,10,0,Unsupervised Learning,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,43,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Anomaly Detection,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Business Analyst,University courses,5,0,5,80,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,High school,Insurance,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,,1GB,Regression/Logistic Regression,"Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,Rarely,,,,Most of the time,,,Most of the time,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,25,25,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,Sometimes,,,,,,,,,,,Often,,,76-99% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Other,Sometimes,85000,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Russia,20,Employed part-time,,,Yes,,Data Scientist,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,Survival Analysis,"Decision Trees - Gradient Boosted Machines,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,More than 10 years,"Programmer,Researcher,Statistician",Other,30,5,60,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Government,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Relational data,Other",Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,52,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Psychology,More than 10 years,Data Analyst,Self-taught,100,0,0,0,0,0,,,Primary/elementary school,Non-profit,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,56,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Conferences,YouTube Videos",Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher",Self-taught,0,0,0,100,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,500 to 999 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Never,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Naive Bayes,Recommender Systems",,,,,Often,Most of the time,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,0,100,0,0,0,0,Enough to tune the parameters properly,"Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,26-50% of projects,More external than internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,150000,CAD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Female,Taiwan,50,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by college or university,SAP BusinessObjects Predictive Analytics,Social Network Analysis,Java,Google Search,"Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Management information systems,I don't write code to analyze data,Other,Self-taught,50,30,0,0,0,20,Survival Analysis,Other (please specify; separate by semi-colon),High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,India,36,Employed full-time,,,Yes,,Engineer,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer",Self-taught,59,0,20,0,0,21,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",,Technology,"10,000 or more employees",Increased slightly,6-10 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",,,,"Bayesian Techniques,CNNs","Amazon Machine Learning,C/C++,Java,TensorFlow",Rarely,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Collaborative Filtering,kNN and Other Clustering,Natural Language Processing",Rarely,,,,Rarely,,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,GitHub,"Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Government,,,,,Very important,Other,,Image data,,10GB,Bayesian Techniques,"Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,Random Forests",,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,Sometimes,,,Sometimes,,,,,,,,,,,0,0,0,80,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,I prefer not to say",,,,,Often,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,55000,EUR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Russia,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Computer Vision,Neural Networks - CNNs,I prefer not to answer,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Textbook",Very useful,Very useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Master's degree,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer",University courses,20,40,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Neural Networks,Regression/Logistic Regression,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by professional services/consulting firm,Tableau,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,"KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,50,15,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,,10MB,Other,"Amazon Web services,Python,Tableau,TensorFlow",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Rarely,,,,,,"Time Series Analysis,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,25,25,0,25,25,0,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Most of the time,,,100% of projects,Entirely internal,IT Department,None,Cleaning and understanding it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,57000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Belgium,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Podcasts,Stack Overflow Q&A",,,,,,,,,,,Very useful,,Very useful,Very useful,,,,,"Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Pharmaceutical,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,Sometimes,Rarely,Most of the time,,,,,Rarely,Rarely,,,Sometimes,,,Rarely,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,,,Sometimes,Sometimes,,,,15,15,10,30,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,Often,,,,Sometimes,,,Most of the time,,,,,,,,Most of the time,,,,,,76-99% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,115000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Poland,30,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,NA,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,TensorFlow,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,Conferences,Textbook",Very useful,,,,Very useful,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,,University courses,0,0,0,100,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,500 to 999 employees,Increased slightly,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk,Other",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Most of the time,,Most of the time,Often,,,"CNNs,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,50,0,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues",,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,Less than 10% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git",Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Neural Nets,R,Google Search,"Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,,Very useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Manufacturing,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,Often,,,,,Often,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests",Sometimes,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,50,10,5,20,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Often,,,,,,Sometimes,,,Sometimes,,,,,,Often,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Rarely,38000,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Natural Language Processing,Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Retail,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,Other,"QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,,,,,"Logistic Regression,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,60,0,20,10,10,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Arxiv,Blogs,Personal Projects",Very useful,Very useful,,,,,,,,,,Very useful,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,"Predictive Modeler,Researcher",University courses,40,5,25,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Manufacturing,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Sometimes,1MB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,MATLAB/Octave,Python,R,SAS JMP,Stan,Unix shell / awk,Other",,,,Rarely,,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Rarely,,Most of the time,,,,,,,Sometimes,,,Often,,,,,Rarely,Sometimes,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Most of the time,,,Often,Most of the time,Rarely,Rarely,,,,Rarely,,,Often,,Sometimes,,,Sometimes,Most of the time,Rarely,,,,Most of the time,,Sometimes,Sometimes,,,,31,20,4,15,10,20,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,Most of the time,,,,,,Often,,,,,,,,,,,,Often,,100% of projects,Approximately half internal and half external,Other,Purchased data from marketing research firms; census bureau & other government open data,Cleaning data; incompleteness of meta-data/provenance info,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Subversion,Sometimes,"300,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,YouTube Videos",Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,Somewhat useful,FastML Blog,< 1 year,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Kaggle competitions,30,0,0,0,70,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Belarus,21,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Other",Self-taught,60,20,10,5,5,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Military/Security,10 to 19 employees,Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data",Sometimes,100GB,"CNNs,GANs,Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"CNNs,Cross-Validation,Data Visualization,GANs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation",,,,Most of the time,,Often,Most of the time,,,,Sometimes,,,Sometimes,,,,,,Most of the time,Sometimes,,,,Often,Often,,,,,,,,25,30,25,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,"Kaggle,Official documentation,Personal Projects,Textbook,YouTube Videos",,,,,,,Very useful,,,Very useful,,Somewhat useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,,,A professional degree,Pharmaceutical,,,,,Important,,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",,,,,"Microsoft Excel Data Mining,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,"Data Visualization,Logistic Regression,Simulation,Other",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,30,20,10,15,15,10,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Email,Share Drive/SharePoint,Other",,,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,India,30,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle",,Somewhat useful,,,,,Very useful,,,,,,,,,,,,"No Free Hunch Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,,,Necessary,,Necessary,,,Necessary,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,I prefer not to answer,A humanities discipline,,Statistician,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,Somewhat important,Very Important,,,,,,,,,,,,, +Female,Canada,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Self-employed,Python,"Ensemble Methods (e.g. boosting, bagging)",R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Friends network,Official documentation,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Somewhat useful,Very useful,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,Researcher,University courses,10,0,30,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Other,,,,,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Python,R,SQL,Stan,Unix shell / awk,Other,Other",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,Sometimes,,,,,Often,Most of the time,Most of the time,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Simulation,Time Series Analysis",,,Most of the time,,,Most of the time,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,25,25,15,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools",,,,,Often,Often,,,,Often,,,Often,,,,,,,,,,100% of projects,More internal than external,Other,Census; remote sensing; public health data (WHO),Confounding and bias,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,75000,CAD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Very useful,,,,Very useful,,,Very useful,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Other",Work,10,40,30,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,SAS Base,SQL,Tableau,TensorFlow",Sometimes,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,Often,,,,Most of the time,,,Often,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Segmentation,SVMs,Text Analytics",Sometimes,,,,Often,,Often,Most of the time,Most of the time,,,Most of the time,,Often,,,,Often,Often,Sometimes,,,Most of the time,,,Often,,Most of the time,Often,,,,,40,30,15,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT",Most of the time,,,,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,,,,10-25% of projects,More internal than external,Standalone Team,Alteryx,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,1540000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Spain,NA,Employed part-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",Spark / MLlib,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,,Very useful,,Somewhat useful,Very useful,Very useful,,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,,University courses,20,10,50,20,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Never,1GB,"CNNs,Neural Networks,Random Forests,SVMs","Google Cloud Compute,Jupyter notebooks,Python,R,TensorFlow",,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,Rarely,,,Often,,,,,,,,,,,,,Often,Often,,,,,,,Sometimes,,Often,,,,70,20,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Often,,Sometimes,,,Sometimes,Sometimes,Often,,Most of the time,Sometimes,,Most of the time,,Most of the time,Most of the time,,,,100% of projects,Entirely external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Never,5754,EUR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Other,50,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Researcher,Other",Work,30,20,30,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,Google Search,"Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,6 to 10 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,500 to 999 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Relational data,Never,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,75,0,0,25,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,10-25% of projects,Approximately half internal and half external,Other,None,Storage.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Commercial Data Platform,,Other,Sometimes,75000,USD,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Other,44,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Cloudera,Text Mining,,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,Nice to have,,,,,,,,,,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Weka,Time Series Analysis,R,"GitHub,Google Search,Government website","Blogs,Kaggle,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation",0 - 1 hour,,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,48,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,R,Decision Trees,R,I collect my own data (e.g. web-scraping),"Friends network,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,,,,,,,,,,,Very useful,Somewhat useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Management information systems,1 to 2 years,"Programmer,I haven't started working yet",Self-taught,80,20,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Japan,36,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses",,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,Business Analyst,Work,30,40,30,NA,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,500 to 999 employees,Stayed the same,Don't know,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,Rarely,Rarely,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,SVMs",,,,,,Most of the time,Often,,Sometimes,,,Often,,,,Often,,,,,,,Often,,,,,Sometimes,,,,,,70,10,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Privacy issues",Often,,,,,Most of the time,,,Often,,,,,,,,Often,,,,,,Less than 10% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,,Never,9000000,JPY,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Female,Vietnam,20,Employed part-time,,,No,Yes,Business Analyst,Fine,Self-employed,C/C++,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,Not Useful,,Very useful,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",25,60,0,10,5,0,Survival Analysis,Logistic Regression,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,28,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,20,20,60,0,0,"Natural Language Processing,Time Series","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Pharmaceutical,10 to 19 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Image data,Sometimes,100MB,Neural Networks,"C/C++,Java,Mathematica,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,Sometimes,,,,,Sometimes,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,20,50,0,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues",,,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Anomaly Detection,Python,"Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher",Work,25,10,50,10,5,0,"Computer Vision,Time Series,Unsupervised Learning","Bayesian Techniques,Other (please specify; separate by semi-colon)",A professional degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Relational data",Rarely,10GB,"Bayesian Techniques,Regression/Logistic Regression,Other","Jupyter notebooks,Python,R,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,Often,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis,Other",,,Often,,,Most of the time,Most of the time,,,,,,,Often,,Often,,Sometimes,,,Often,,Sometimes,,,Rarely,,,,Often,Often,,,30,20,20,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,Often,,,Sometimes,,,,,,,,Most of the time,,,Sometimes,,,Often,Most of the time,,51-75% of projects,Entirely internal,Standalone Team,Public medical datasets,Data heterogeneity ,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Female,Mexico,57,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,Python,Deep learning,,,"Kaggle,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,Less than a year,Statistician,"Online courses (coursera, udemy, edx, etc.)",80,0,10,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Random Forests,Logistic Regression",High school,Academic,"1,000 to 4,999 employees",Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Basic laptop (Macbook),Text data,Rarely,1MB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,Minitab,R,Tableau",,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Simulation,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,,Often,,,Often,,,,50,10,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,Often,,76-99% of projects,Approximately half internal and half external,Business Department,,,,"Commercial Data Platform,I don't typically share data",,,,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,SAP BusinessObjects Predictive Analytics,Proprietary Algorithms,R,Google Search,"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Retired,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Julia,"Ensemble Methods (e.g. boosting, bagging)",Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Somewhat useful,Very useful,,,,Very useful,Not Useful,,,,Very useful,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Other,Other,10,0,35,55,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,50,0,50,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Most of the time,10GB,"Gradient Boosted Machines,Regression/Logistic Regression",SAS Enterprise Miner,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,35,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Online courses,Stack Overflow Q&A",,Somewhat useful,,,Not Useful,,,,,,Very useful,,,Somewhat useful,,,,,"Data Machina Newsletter,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner",University courses,0,50,20,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Decreased slightly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Microsoft Azure Machine Learning,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,Segmentation,Text Analytics",Often,Sometimes,,,,Often,Most of the time,Sometimes,,,,,,,,Sometimes,,,Sometimes,,,,Sometimes,,,Often,,,Sometimes,,,,,10,35,15,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team",Often,,Sometimes,,,Rarely,,Sometimes,Often,,,,Often,,Most of the time,Often,,,,,,,76-99% of projects,Entirely internal,Central Insights Team,"Census, competitors data, weather",,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,75000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,No,Yes,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by government",Python,I don't plan on learning a new ML/DS method,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Master's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,44,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Personal Projects",,,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,Software Developer/Software Engineer,University courses,90,0,0,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,100GB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Machine Learning,Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib",Sometimes,Often,,,,,,,,,,,,,Often,,Often,,,,,,,,,,Often,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems",Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,Often,Sometimes,,,,,,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues",,,,,Often,,,,Most of the time,,,,Often,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,265,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Czech Republic,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,0,30,30,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Random Forests","C/C++,Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,Sometimes,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Segmentation",,,,,,Often,Most of the time,Often,Often,,,Sometimes,,Often,,,,Sometimes,,,Most of the time,,,,,Often,,,,,,,,50,20,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,Sometimes,,,,Sometimes,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,17,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,"Programmer,Other",Self-taught,50,25,12,10,3,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","Arxiv,College/University,Kaggle,Official documentation,Online courses,Personal Projects",Very useful,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,,,,,,"Data Machina Newsletter,Data Stories Podcast,Talking Machines Podcast",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",2 - 10 hours,Github Portfolio,Sort of (Explain more),Doctoral degree,Computer Science,,"Data Scientist,Machine Learning Engineer",University courses,NA,NA,NA,NA,NA,NA,Reinforcement learning,"Hidden Markov Models HMMs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Ukraine,25,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Engineer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",40,25,15,20,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,43,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,TensorFlow,I don't plan on learning a new ML/DS method,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Government,100 to 499 employees,Increased slightly,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10MB,Regression/Logistic Regression,"Amazon Web services,IBM Cognos,R",,Sometimes,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Logistic Regression,Other",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,40,20,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,Australian Bureau of Statistics SEIFA data,"No code book for a huge, legacy IBM AS400 iSeries RMDBS. Yeah, that's right, no code book. What fields are what sits in a few people's heads, including mine.",Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"170,000",AUD,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,20,10,20,40,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,10GB,,"Amazon Web services,Jupyter notebooks,Python,R,SQL,Unix shell / awk,Other,Other,Other",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,Most of the time,Most of the time,Often,Often,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Most of the time,Sometimes,Sometimes,,,,Rarely,Sometimes,,Often,,,,,Often,,Often,,,,Often,Sometimes,Sometimes,Sometimes,,,,50,10,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Unavailability of/difficult access to data",Most of the time,Often,,Often,,Often,,Most of the time,Sometimes,,,Most of the time,,Most of the time,,,,Often,,,Most of the time,,76-99% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git,Other",Sometimes,200000,USD,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Other,37,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Personal Projects,Textbook,YouTube Videos",,,Very useful,,Somewhat useful,,,,,,,Somewhat useful,,,Very useful,,,Very useful,,3-5 years,Nice to have,Necessary,Nice to have,,Nice to have,Nice to have,,Nice to have,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,PhD,Sort of (Explain more),Master's degree,Computer Science,,Computer Scientist,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Statistica (Quest/Dell-formerly Statsoft),Neural Nets,Python,University/Non-profit research group websites,"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,"FlowingData Blog,Jack's Import AI Newsletter,The Analytics Dispatch Newsletter",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important +Male,India,25,"Not employed, but looking for work",,,,,,,,Mathematica,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Textbook,YouTube Videos",,Very useful,Very useful,,,,Very useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,35,10,20,35,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,20,0,40,40,0,0,"Computer Vision,Machine Translation,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Time Series Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,Somewhat useful,,,,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Nice to have,Nice to have,Necessary,,Nice to have,Unnecessary,Nice to have,Nice to have,,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Github Portfolio,No,Master's degree,Mathematics or statistics,1 to 2 years,"Data Miner,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,India,36,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Link Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,20,0,20,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Jupyter notebooks,NoSQL,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,Often,,Sometimes,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Often,,Often,Often,,Often,Often,,,,,Often,,Often,,Often,,,,,Often,,,Often,,Often,Often,Most of the time,,,,50,20,10,10,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Often,,Often,,Often,Often,Often,Often,,,Often,Often,,,,Often,Often,,,100% of projects,More internal than external,Standalone Team,None,Extremely high volume live data stream.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Subversion",Sometimes,2600000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Sweden,39,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Talking Machines Podcast",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,I don't know/not sure,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,,,,,,,,,, +Male,Brazil,33,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,GitHub,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,,Nice to have,,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Engineering (non-computer focused),,Programmer,Kaggle competitions,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,Very useful,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Programmer,Researcher",Self-taught,30,10,5,30,25,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Most of the time,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow",,Most of the time,,,,,,,Most of the time,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Most of the time,Most of the time,,,Sometimes,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Random Forests,Time Series Analysis",,,,Sometimes,,Most of the time,Often,,,,,Most of the time,,,,Sometimes,,,,,,,Most of the time,,,,,,,Often,,,,60,15,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Unavailability of/difficult access to data",,Rarely,,,Most of the time,,,Rarely,,,Sometimes,,,,,,,,,,Sometimes,,10-25% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Rarely,110000,BRL,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by government,R,Decision Trees,Python,"Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos,Other",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Other,Work,20,30,40,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,100 to 499 employees,Stayed the same,Don't know,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data,Other",Most of the time,,Regression/Logistic Regression,"IBM SPSS Statistics,Python,R",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,10,40,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",Sometimes,,,Sometimes,Most of the time,Often,,Sometimes,Often,,Sometimes,,,,,Often,,Often,,,,,26-50% of projects,More internal than external,Standalone Team,various real estate data services depending on the level of detail their data offers,Its messy and some variables that would be necessary for new models are incomplete. There is not enough data collection staff and time available to make these more complete and it would potentially be detrimental to infer what these attributes may be. It is important to maintain public trust in the work we do so we've been steered away from using inference to fill in the blanks. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,65000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Friends network,Personal Projects",,,Somewhat useful,,,Somewhat useful,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Talking Machines Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Netherlands,32,Employed full-time,,,Yes,,Researcher,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Programmer,Researcher",University courses,25,25,25,25,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,"5,000 to 9,999 employees",Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,100MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Time Series Analysis,Python,University/Non-profit research group websites,"College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",1-2 years,,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst",Kaggle competitions,0,25,0,15,60,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important +Female,United States,46,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,SAS JMP,Other,SAS,"Government website,University/Non-profit research group websites","Company internal community,Official documentation,Online courses,Personal Projects,Textbook",,,,Very useful,,,,,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,"Researcher,Software Developer/Software Engineer,Statistician",Work,20,0,80,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Other,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,1MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,SAS JMP",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis,Other",,Sometimes,Sometimes,,,Most of the time,Most of the time,Most of the time,Sometimes,,,Rarely,,Often,Sometimes,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,Most of the time,,Often,Sometimes,Most of the time,,,30,20,0,20,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects",,Most of the time,,,Often,,,Often,,,,,,Often,,,,,,,,,100% of projects,Entirely internal,Other,UC Irving machine learning library; .gov datasets local state and federal;,doesn't directly address the question of interest,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint,Other",Box,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"150,000",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,28,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Hadoop/Hive/Pig,Neural Nets,R,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,Very useful,,,,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Other,University courses,30,10,0,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data,Other",Always,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Statistics,Julia,MATLAB/Octave,Orange,Python,R,SQL,Unix shell / awk,Other",,,,Sometimes,,,,,,,,Most of the time,,,,Often,,,,,Often,,,,,,,,Most of the time,,Often,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,Often,,,Most of the time,,Often,,Most of the time,,,,40,20,5,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team",Most of the time,,Sometimes,,,,,,,,,,,Sometimes,,Often,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,60000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Turkey,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Spark / MLlib,Deep learning,SQL,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Online courses,Stack Overflow Q&A",Very useful,Very useful,,,,,,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer",University courses,10,90,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,High school,Technology,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Most of the time,100GB,CNNs,"Jupyter notebooks,NoSQL,Python,TensorFlow",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,Rarely,,,,,,"CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,,,,,,,,,Often,,,,,,Often,Sometimes,,,,,,,,,,,,,10,50,40,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",,,,,,,,,Often,Most of the time,Often,,,,,Often,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Most of the time,15000,USD,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Italy,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,30,5,50,10,5,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),Primary/elementary school,Academic,"5,000 to 9,999 employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Other,Sometimes,10GB,SVMs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,20,40,10,10,20,0,Enough to refine and innovate on the algorithm,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Standalone Team,UCI; CIFAR; others,,Other,Commercial Data Platform,,Git,Sometimes,40000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,C/C++,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Very useful,Very useful,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Very useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,20,25,0,25,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Other,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Always,100GB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Text Analytics",,,,Most of the time,Sometimes,Sometimes,,,,,,,,Sometimes,,,,Rarely,Often,Most of the time,Often,,,Sometimes,Sometimes,,,,Often,,,,,40,20,0,5,10,25,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,,Often,Often,,Often,,,,,Sometimes,Sometimes,,Less than 10% of projects,Entirely internal,Standalone Team,Imagenet,Developing a good pipeline for data to be useable.,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",I don't typically share data,,Git,Rarely,144000,HRK,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,SQL,Google Search,"College/University,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,"KDnuggets Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",Self-taught,10,5,7,73,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,Rarely,1GB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,IBM Cognos,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau",,Rarely,,,,,,,Sometimes,Rarely,,,,,,,Sometimes,,,,,,Often,,Sometimes,,,,,,Rarely,,Sometimes,,,,,Most of the time,Rarely,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,20,5,0,25,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Most of the time,Sometimes,,Rarely,Often,Often,,Often,,,,,,,Most of the time,,Most of the time,,Rarely,,Often,,76-99% of projects,Entirely internal,Business Department,none,accuracy at the appropriate level,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"92,500",USD,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +,United Kingdom,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Company internal community,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,Very useful,,Somewhat useful,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,60,20,NA,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Mix of fields,20 to 99 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,,,,,,55,10,5,15,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Sometimes,,,,,,,,,,,Often,Sometimes,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Rarely,37700,GBP,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,33,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,,,,Very useful,,,Somewhat useful,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Data Analyst,Predictive Modeler",University courses,20,0,60,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Oracle Data Mining/ Oracle R Enterprise,Python,RapidMiner (commercial version),RapidMiner (free version),SQL",,Rarely,,,,,,,,,,,,,,,Sometimes,Sometimes,Often,,,,,,,,,Sometimes,,,Often,,,Most of the time,Most of the time,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Most of the time,,,,Often,,Often,,Often,,,,Often,,,Most of the time,,,,,Most of the time,,Often,,,,40,15,10,20,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Scaling data science solution up to full database",Most of the time,Most of the time,,,Often,,,,Most of the time,,Often,,,,Most of the time,,,Often,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Subversion,Rarely,,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,29,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,C/C++,Deep learning,C/C++/C#,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",1-2 years,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Computer Science,,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Stan,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Conferences,Friends network,Podcasts,Textbook",Very useful,,Very useful,,Somewhat useful,Very useful,,,,,,,Somewhat useful,,Somewhat useful,,,,"FlowingData Blog,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,30,0,10,60,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Other","Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,Sometimes,,Often,,,,Sometimes,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,Rarely,,Sometimes,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",Most of the time,Sometimes,Often,,Sometimes,,Most of the time,Often,Often,,,Often,,Most of the time,,Often,,Sometimes,,,Most of the time,,Often,Sometimes,Often,,,,Most of the time,Often,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Most of the time,,Often,,,,,,,,,,,,Often,,,Sometimes,,,10-25% of projects,Entirely internal,Other,who;bls;census ;world bank;,scaling data acquisition across heterogeneous sources,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Other",Rarely,200000,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,25,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,50,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Other",,Very useful,,,,,Very useful,,,Very useful,Very useful,,Somewhat useful,Very useful,Very useful,Very useful,,,"KDnuggets Blog,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,Nice to have,Nice to have,Necessary,Nice to have,,,,"Coursera,DataCamp","GPU accelerated Workstation,Traditional Workstation,Other",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",10,85,0,0,5,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Netherlands,41,Employed part-time,,,No,Yes,Other,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,KNIME (free version),Other,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,,,Very useful,Not Useful,,,Very useful,Very useful,Very useful,Not Useful,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator,The Data Skeptic Podcast",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,,,"Coursera,DataCamp","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Psychology,,"Data Analyst,Data Scientist",Self-taught,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Not important +Male,United States,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,66,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,R,"GitHub,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,,,,,Somewhat useful,,,,,,1-2 years,Necessary,,Necessary,,,Necessary,,Necessary,,,,,,,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",,Kaggle Competitions,Yes,Master's degree,Other,1 to 2 years,"Predictive Modeler,Researcher",Self-taught,70,0,20,0,0,10,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3-5,Very Important,,,,,,,,,,,,,,, +Male,United States,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Engineer,Self-taught,35,20,5,15,15,10,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Engineer",University courses,25,0,60,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,100GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,R,Spark / MLlib,SQL,Stan,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Often,,Often,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Often,Rarely,,Rarely,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,32,Employed full-time,,,Yes,,Data Miner,Fine,Employed by professional services/consulting firm,R,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",University courses,20,0,60,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,R,SAS Base,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"Association Rules,Decision Trees,kNN and Other Clustering,Random Forests,Time Series Analysis",,Often,,,,,,Often,,,,,,Often,,,,,,,,,Often,,,,,,,Often,,,,30,30,20,10,10,0,Enough to tune the parameters properly,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,None,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,,520000,ARS,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Other,42,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",Tableau,Bayesian Methods,R,I collect my own data (e.g. web-scraping),"Arxiv,Conferences,Friends network,Kaggle,Podcasts,Stack Overflow Q&A",Somewhat useful,,,,Not Useful,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,Not Useful,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Engineer",Self-taught,100,0,0,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Never,1GB,"Bayesian Techniques,Markov Logic Networks,Neural Networks","Java,Python,R,Tableau",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,Most of the time,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Time Series Analysis",,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,Often,Often,,,,,,,,,,,,,Often,,,,40,20,0,30,10,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,,,,Sometimes,,,,Often,,,,,,Often,,,,,,,,26-50% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,40000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Insurance,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Sometimes,100GB,Regression/Logistic Regression,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Evolutionary Approaches,Logistic Regression,Simulation",,,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,30,30,25,15,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,23,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,College/University,Official documentation,Personal Projects,YouTube Videos",,Somewhat useful,Very useful,,,,,,,Very useful,,Very useful,,,,,,Very useful,Siraj Raval YouTube Channel,3-5 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Other,11 - 39 hours,Other,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,50,30,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,1-2,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Not important,Very Important +Male,Chile,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Necessary,,Necessary,,,,Necessary,Necessary,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,Self-taught,60,10,0,10,20,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,20,0,10,70,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,30,Employed part-time,,,Yes,,Statistician,Fine,Employed by college or university,SAS Enterprise Miner,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Researcher,Statistician",University courses,0,50,0,50,0,0,Survival Analysis,"Bayesian Techniques,Logistic Regression",A master's degree,Academic,I don't know,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,1GB,"Bayesian Techniques,Regression/Logistic Regression","IBM SPSS Statistics,Minitab,R,SAS Base",,,,,,,,,,,,Often,,,,,,,,,,,,,,Rarely,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling",,,Sometimes,,,Sometimes,,,,,,,,,,Often,,,,,Often,Sometimes,,,,,,,,,,,,40,20,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,Most of the time,,,,,,,Often,Most of the time,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,TTD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Very useful,Somewhat useful,,,,Very useful,Somewhat useful,Very useful,Very useful,,,,,5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,A social science,,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler",Work,NA,NA,NA,NA,NA,NA,"Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,37,"Independent contractor, freelancer, or self-employed",,,No,Yes,Researcher,Fine,Employed by government,Microsoft Azure Machine Learning,Monte Carlo Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,The Data Skeptic Podcast",3-5 years,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Necessary,,,,"Coursera,DataCamp,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,3 to 5 years,"Data Analyst,Other",University courses,20,20,20,40,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Portugal,62,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Conferences,Textbook,Other",Very useful,,Very useful,,Very useful,,,,,,,,,,Very useful,,,,"Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,Yes,Master's degree,Computer Science,More than 10 years,Engineer,University courses,20,30,0,50,0,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Poland,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,50,10,10,0,0,Adversarial Learning,"Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs",,Technology,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,,SVMs,"Amazon Machine Learning,NoSQL,Python",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression,SVMs",Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data",Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,75000,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,Data Scientist,Poorly,Employed by company that makes advanced analytic software,DataRobot,,,,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,Engineer,Self-taught,100,0,0,0,0,0,,,"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,R,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,25,25,0,0,25,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Technology,20 to 99 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,NoSQL,Python,R",,Sometimes,,,,,,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Often,Sometimes,,,,,,,,Sometimes,,,,,,Sometimes,,,,Often,,,Sometimes,Often,,,,20,10,5,15,50,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Often,,,,,,,Often,,,Often,,Often,,51-75% of projects,Entirely internal,Central Insights Team,IDC,Inconsistent and conflicting data sources,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,150000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,Somewhat useful,"FastML Blog,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Psychology,1 to 2 years,"Business Analyst,Engineer,Predictive Modeler",Self-taught,60,20,10,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,Most of the time,,Rarely,Sometimes,,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,Often,Sometimes,,Rarely,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",Often,Rarely,,,Rarely,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Often,,Most of the time,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,Often,Sometimes,,,,40,10,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Often,,,,,,,Sometimes,,,,,,,51-75% of projects,More internal than external,Business Department,kaggle; UCI Meachine Learning,Missing data,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,40000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,MATLAB/Octave,,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,500 to 999 employees,Stayed the same,Don't know,A tech-specific job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,10MB,Regression/Logistic Regression,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,40,50,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization",Sometimes,Often,,,,Sometimes,,,Often,,,,,,,,,,,,,,10-25% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Deep learning,Matlab,Government website,"Personal Projects,Textbook",,,,,,,,,,,,Very useful,,,Very useful,,,,"Data Machina Newsletter,FlowingData Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Computer Vision,Speech Recognition","Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Government,20 to 99 employees,Decreased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Sometimes,10GB,"CNNs,Decision Trees,HMMs,Neural Networks,SVMs","C/C++,MATLAB/Octave",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"HMMs,kNN and Other Clustering,Neural Networks,SVMs",,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,,,,Often,,,,,,20,10,0,60,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data",,Sometimes,,,Often,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"200,000",USD,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Textbook",,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,60,40,0,0,0,0,Natural Language Processing,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important +Male,United Kingdom,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Partially Derivative Podcast",3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,Coursera,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important +Male,Japan,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,R,"Google Search,University/Non-profit research group websites","Blogs,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,NA,50,0,50,0,0,,,A master's degree,Internet-based,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",,Basic laptop (Macbook),Text data,,,,Java,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,The lack of a clear question to be answering or a clear direction to go in with the available data",,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,Less than 10% of projects,,,,,,,,,,,,,7,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,Very useful,Somewhat useful,Not Useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",Self-taught,30,20,50,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Often,,,,Often,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",Often,Often,Sometimes,Often,Most of the time,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,,Sometimes,Often,,,,Often,Most of the time,,Often,Most of the time,,Often,,Sometimes,Sometimes,,,,,20,30,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,10-25% of projects,Entirely internal,Other,,To build recommender systems to promote our services.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,"64,000",,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Australia,42,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,Not Useful,,Very useful,Somewhat useful,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX,Udacity","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Github Portfolio,Sort of (Explain more),I did not complete any formal education past high school,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I don't know/not sure,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Very Important +Female,United States,23,"Not employed, but looking for work",,,,,,,,NoSQL,Social Network Analysis,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,Becoming a Data Scientist Podcast,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,40,20,0,40,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,11-15,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Sweden,29,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,Microsoft SQL Server Data Mining,Factor Analysis,SQL,I collect my own data (e.g. web-scraping),"Official documentation,Online courses,Personal Projects,YouTube Videos",,,,,,,,,,Somewhat useful,Somewhat useful,Very useful,,,,,,Very useful,"Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Management information systems,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner,Data Scientist",Self-taught,50,10,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,New Zealand,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Scientist,Programmer,Researcher",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Other (please specify; separate by semi-colon),High school,Academic,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,43,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Personal Projects",,,,,,,Very useful,,Very useful,,Very useful,Very useful,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,20,0,60,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - RNNs",High school,Technology,"5,000 to 9,999 employees",Stayed the same,Don't know,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Rarely,100MB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","IBM SPSS Modeler,IBM Watson / Waton Analytics,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,Sometimes,,Sometimes,,Often,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Logistic Regression,Neural Networks",,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,10,60,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Sometimes,Sometimes,,,Sometimes,,,,,,,Often,Sometimes,,,Sometimes,,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Always,0,CAD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,Very useful,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Software Developer/Software Engineer,Other",University courses,20,10,0,60,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Other",,,,,,,,,Sometimes,,,,,,Sometimes,,Rarely,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,,,,,Sometimes,Often,,,Sometimes,,,,Sometimes,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",Often,,,,,Often,Most of the time,Sometimes,Sometimes,,,Sometimes,,,Sometimes,Often,,,,,Often,,,,,Often,,Sometimes,Often,,,,,50,15,5,10,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,,Often,,,,,,Most of the time,,,,,Often,Sometimes,,76-99% of projects,More internal than external,Other,Various Social Media;IMS Health;Forrester Research,"Too many sources, not enough time and people","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Other",Confluence,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,"185,000",USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Python,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Trade book,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler,Other",Other,50,25,5,0,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Telecommunications,500 to 999 employees,Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,,100MB,Other,"Microsoft SQL Server Data Mining,QlikView",,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,60,0,25,10,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,,,,,,,Often,,,10-25% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,,Never,755000,,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Australia,24,"Not employed, but looking for work",,,,,,,,Python,Time Series Analysis,Python,,"College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,"KDnuggets Blog,Partially Derivative Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,50,0,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Canada,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Very useful,Data Elixir Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,90,5,5,0,0,0,,,A master's degree,Retail,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,100MB,,"Amazon Web services,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,Often,,,Sometimes,,,Often,,,,"Collaborative Filtering,kNN and Other Clustering,Recommender Systems",,,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,70,5,25,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Organization is small and cannot afford a data science team",,,,,,,,,Often,,Sometimes,,,,Often,Sometimes,,,,,,,None,More external than internal,IT Department,,dirty data,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git,Subversion",Sometimes,103000,CAD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),Other",Other,,,,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,60,10,30,0,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation",Other,Don't know,10GB,Bayesian Techniques,"C/C++,MATLAB/Octave,Python,R",,,,Rarely,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction",,,,,,Sometimes,Often,,,,,,,Rarely,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,80,15,0,5,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,76-99% of projects,Do not know,IT Department,,no tag,Other,Share Drive/SharePoint,,Git,Rarely,4500000,JPY,I was not employed 3 years ago,3,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Impala,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Friends network,Newsletters,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,Very useful,Very useful,,,Somewhat useful,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,80,0,15,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",Insurance,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Never,10GB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,Rarely,Often,,,Often,,,Sometimes,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,Rarely,,,,Sometimes,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,,,,Sometimes,,,,60,0,0,20,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Sometimes,,Sometimes,Rarely,,,,,,,,,,Most of the time,,Often,,76-99% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,91000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,34,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,,"Coursera,DataCamp,Udacity","Basic laptop (Macbook),Workstation + Cloud service",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,C/C++/C#,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Official documentation,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,,,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,,Technology,Fewer than 10 employees,Decreased significantly,1-2 years,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Most of the time,10GB,"CNNs,GANs,Neural Networks,RNNs","Amazon Web services,C/C++,Jupyter notebooks,TensorFlow,Other,Other",,Sometimes,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,"CNNs,GANs,Neural Networks,RNNs",,,,Most of the time,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,60,20,5,10,5,0,Enough to tune the parameters properly,"Dirty data,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Often,,,,,Most of the time,,,,,,Most of the time,,,,,Most of the time,,76-99% of projects,More internal than external,IT Department,MNIST;CIFAR-10,Lack of data; lack of resources ; less tutorials ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Sometimes,100000,INR,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Web services,Time Series Analysis,Python,I collect my own data (e.g. web-scraping),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,Very useful,,Very useful,Very useful,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",40,10,30,20,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,SQL,Unix shell / awk",,Often,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Sometimes,,Sometimes,,Sometimes,Most of the time,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,Often,,Most of the time,Sometimes,,,,Sometimes,,Often,,,,60,10,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Often,Often,,Often,,,Often,,,,,Often,,,,Often,Often,Often,Often,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,"155,000",USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Very useful,"Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,Self-taught,0,30,30,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,1TB,Random Forests,"Jupyter notebooks,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests",,,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,60,20,10,5,5,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,120000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Newsletters,Non-Kaggle online communities,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,Very useful,Very useful,,,,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,60,15,0,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important +Male,Other,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,,I don't plan on learning a new tool/technology,Deep learning,Python,,"Blogs,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,,,,,,Very useful,,Very useful,,,Somewhat useful,Very useful,,3-5 years,Necessary,Nice to have,Necessary,,Nice to have,Nice to have,,Nice to have,,,,,,,,2 - 10 hours,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,45,30,5,10,10,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher",Self-taught,70,10,10,0,10,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Other",Sometimes,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,Rarely,,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Text Analytics",,,,Sometimes,,,,,,,,,,,,Sometimes,,Rarely,Often,Most of the time,Sometimes,,,,Most of the time,,Sometimes,,Sometimes,,,,,10,50,25,10,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Privacy issues,Scaling data science solution up to full database",Sometimes,,,,Often,,,,,,,Often,,,,,Rarely,Sometimes,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,common crawl,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,63,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Julia,,R,Google Search,"Arxiv,Conferences",Very useful,,,,Very useful,,,,,,,,,,,,,,"FastML Blog,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,More than 10 years,"Business Analyst,Data Scientist,Predictive Modeler,Statistician",Self-taught,50,10,20,20,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,,,,,Important,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters",Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Julia,Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Sometimes,,Often,,,,,,Often,Sometimes,Often,,,,,,,,,,,,Often,Often,Sometimes,,,,30,40,15,10,5,0,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,Most of the time,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Other,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,GitHub,"Kaggle,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,,Very useful,,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Work,20,30,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Pharmaceutical,,,,,Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"CNNs,Regression/Logistic Regression,RNNs,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Data Visualization,Natural Language Processing,RNNs,SVMs,Text Analytics",,,,Sometimes,,,Often,,,,,,,,,,,,Often,,,,,,Sometimes,,,Often,Often,,,,,20,50,20,10,0,0,Enough to tune the parameters properly,Lack of significant domain expert input,,,,,,,,,,,Sometimes,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Git",,20000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,50,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,"10,000 or more employees",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer",Self-taught,40,10,30,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,10 to 19 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service,Other","Image data,Video data,Text data,Relational data,Other",Most of the time,10TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Cloudera,Impala,Jupyter notebooks,Python,R,Spark / MLlib,Unix shell / awk",,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,,Most of the time,,,,,30,20,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",30,40,10,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,6-10 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,,Often,,,Most of the time,,,,Most of the time,,,,,,,Sometimes,,,,,,,Often,,,,30,10,20,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,Most of the time,,,Sometimes,,,,,Sometimes,,Sometimes,,,Often,Sometimes,Often,,,100% of projects,Entirely internal,IT Department,,Outliers. Time and memory limitations.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Singapore,27,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,C/C++,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Podcasts,Trade book",,Somewhat useful,Somewhat useful,,,,,,,,,,Somewhat useful,,,Somewhat useful,,,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,25,40,10,20,5,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Technology,"5,000 to 9,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,"Image data,Video data",Sometimes,1GB,"CNNs,Random Forests,RNNs,SVMs","C/C++,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"CNNs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation",,,,Often,,,,,,,,,,Often,,,,,,Often,Often,,Often,,Often,Often,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",Sometimes,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,100% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Subversion",Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,30,50,15,0,5,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Other,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,1GB,Other,"SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition,Survival Analysis",,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,,,,,,,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer,Other",University courses,10,10,10,70,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Rarely,10GB,"HMMs,SVMs","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,R,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,Rarely,,,,Sometimes,,Rarely,,,,,,,,,,,,,Rarely,,Sometimes,,,,"HMMs,Natural Language Processing,SVMs",,,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,40,0,20,40,0,0,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Most of the time,,,,,,10-25% of projects,Entirely internal,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Other,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Other,Fewer than 10 employees,Decreased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Other,Workstation + Cloud service,Text data,Never,1GB,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,"Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Rarely,Rarely,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",,,,40,10,5,40,5,0,Time Series,Bayesian Techniques,,Financial,Fewer than 10 employees,Stayed the same,Less than one year,A general-purpose job board,,,,,,,,Amazon Machine Learning,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Company politics / Lack of management/financial support for a data science team,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Amazon Web services,,R,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Yes,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,50,0,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,22,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,,,Somewhat useful,,Very useful,,,Very useful,,,,,,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,39,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Deep learning,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Elixir Newsletter,FlowingData Blog,The Data Skeptic Podcast",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Master's degree,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Predictive Modeler,Researcher",University courses,40,0,0,60,0,0,"Computer Vision,Speech Recognition","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Japan,24,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,Other,Self-taught,70,30,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Data Analyst,Researcher",University courses,20,25,25,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,41,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,,,,,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,17,"Not employed, but looking for work",,,,,,,,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Very useful,Very useful,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,Yes,I prefer not to answer,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,40,40,10,0,10,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Other,25,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Somewhat useful,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,11 - 39 hours,Github Portfolio,Yes,Master's degree,Biology,1 to 2 years,I haven't started working yet,Self-taught,70,20,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,South Korea,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",24,75,0,0,1,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Other,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",35,55,5,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Mix of fields,500 to 999 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Sometimes,100MB,Neural Networks,"IBM Watson / Waton Analytics,Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",Sometimes,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Sometimes,,,,50,35,5,10,0,0,Enough to refine and innovate on the algorithm,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,76-99% of projects,Do not know,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Most of the time,240000,NGN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Ukraine,25,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by government,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Analyst,University courses,15,60,15,5,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs,Other","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Often,Often,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,Sometimes,,,,,,Sometimes,Sometimes,,,,50,25,10,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Organization is small and cannot afford a data science team",Most of the time,,Sometimes,,Most of the time,,,,,,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,Business Department,,,Column-oriented relational (e.g. KDB/MariaDB),I don't typically share data,,,Never,120000,UAH,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Taiwan,46,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Google Search,"Conferences,Non-Kaggle online communities,Textbook,YouTube Videos",,,,,Very useful,,,,Very useful,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,20,0,0,10,Time Series,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,"10,000 or more employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Never,1GB,"Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,R",,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,Sometimes,,Sometimes,,,,40,10,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT",,,,,,,,,Sometimes,,Often,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,"1,800,000",TWD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Iran,28,Employed full-time,,,Yes,,Engineer,Poorly,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Engineer,Researcher,Other",University courses,60,0,0,30,0,10,Time Series,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Somewhat useful,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,Less than a year,Researcher,Self-taught,50,5,0,0,45,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",,,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,,,,Sometimes,Most of the time,Often,Most of the time,Sometimes,,,Often,,,,Often,,,,,,,Often,Sometimes,,,,,Sometimes,Sometimes,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,Often,,,,,,Often,,,,10-25% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Java,GitHub,"Blogs,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,,,Very useful,,Very useful,,,,,O'Reilly Data Newsletter,< 1 year,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,,No,Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - RNNs,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,,,,,,,,,,,,,,, +A different identity,Pakistan,22,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Stories Podcast,FlowingData Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Professional degree,,I don't write code to analyze data,I haven't started working yet,Self-taught,10,40,10,20,10,10,Supervised Machine Learning (Tabular Data),Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,6 to 10 years,"Data Scientist,Engineer,Researcher",University courses,15,15,15,50,5,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,"5,000 to 9,999 employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Miner,Fine,Employed by college or university,Amazon Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Friends network,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Very useful,,,,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Operations Research Practitioner",Work,70,0,10,10,10,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A master's degree,Academic,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10MB,Neural Networks,"C/C++,Microsoft Excel Data Mining,Python,SQL",,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Neural Networks,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,40,20,10,20,10,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,,,,,,Often,,,,Sometimes,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Rarely,900000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Russia,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Financial,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,1TB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,DataRobot,Neural Nets,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Newsletters,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,Somewhat useful,,,,Very useful,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Engineer,Programmer",Self-taught,60,15,5,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,500 to 999 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Flume,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Most of the time,,Rarely,,Often,,,,,Often,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",Most of the time,Rarely,,,,Most of the time,Most of the time,Sometimes,Often,,,,,Sometimes,Sometimes,Sometimes,,Most of the time,Often,,Often,,Most of the time,,,Sometimes,,Sometimes,Often,,,,,60,20,5,10,5,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Limitations of tools,Privacy issues",,,,,Most of the time,Often,,,,,,,Often,,,,Most of the time,,,,,,10-25% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Other,Rarely,,,,8,,,,,,,,,,,,,,,,,, +Male,Italy,36,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Engineer,Machine Learning Engineer",Self-taught,60,5,30,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,1 to 2 years,Researcher,Self-taught,40,10,10,10,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Increased slightly,3-5 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,Less than a year,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Technology,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Relational data",Rarely,10MB,Regression/Logistic Regression,"Jupyter notebooks,Python,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,,Sometimes,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,40,30,0,20,10,0,Enough to tune the parameters properly,"Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,,,,,,,,,,Often,,,26-50% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",,70000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,"Blogs,College/University,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,,University courses,5,5,20,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Technology,I prefer not to answer,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Decision Trees,HMMs,Random Forests,Regression/Logistic Regression,Other","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,HMMs,Lift Analysis,Logistic Regression,Random Forests,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,10,0,10,10,10,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Other,,,,,,,,,,I do not want to share information about my salary/compensation,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Japan,50,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,30,Employed part-time,,,No,Yes,Machine Learning Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,,,,,,,Somewhat useful,,1-2 years,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,11 - 39 hours,Kaggle Competitions,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Female,Philippines,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Enterprise Miner,Link Analysis,Stata,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,,,,Somewhat useful,Very useful,,1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,SQL,,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,50,5,0,5,0,Reinforcement learning,Gradient Boosting,A master's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Never,10MB,Regression/Logistic Regression,"Oracle Data Mining/ Oracle R Enterprise,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,30,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,4,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed part-time,,,No,Yes,Statistician,Poorly,Employed by government,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","College/University,Kaggle,Trade book",,,Very useful,,,,Very useful,,,,,,,,,Somewhat useful,,,,1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Psychology,1 to 2 years,I haven't started working yet,University courses,15,5,0,70,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Turkey,16,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,47,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Microsoft Azure Machine Learning,Social Network Analysis,R,Government website,"Non-Kaggle online communities,Textbook",,,,,,,,,Very useful,,,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,More than 10 years,"Data Analyst,Researcher,Statistician",Self-taught,90,10,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",,Academic,"10,000 or more employees",Decreased significantly,More than 10 years,Some other way,Important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,KNIME (free version),MATLAB/Octave,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,Other",,,,,,,,,Sometimes,,Often,Often,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Most of the time,,,,,Often,Most of the time,,,Most of the time,,,Rarely,,,,Often,,,"Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,Segmentation,Simulation,Time Series Analysis,Other",,,,,,,Most of the time,Most of the time,Often,,,Often,,,,Most of the time,,,,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,30,30,NA,10,30,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,Most of the time,,,,,Most of the time,,,,,,,,,,,Most of the time,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Other,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",Very useful,Somewhat useful,,,Very useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,,,,,Somewhat useful,"Jack's Import AI Newsletter,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,30,30,30,10,0,"Computer Vision,Natural Language Processing,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,I prefer not to answer,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Video data,Text data",Sometimes,100GB,"CNNs,Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation",,,,Most of the time,,Sometimes,Often,,,,,,,,,Rarely,,,Often,Most of the time,Sometimes,,,,Often,Sometimes,,,,,,,,30,50,0,20,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,Sometimes,Sometimes,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial",Never,"3,000,000",RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Trade book",Very useful,Very useful,Very useful,,Very useful,Somewhat useful,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,Somewhat useful,,,"FastML Blog,KDnuggets Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,1 to 2 years,"Data Miner,Engineer,Machine Learning Engineer,Predictive Modeler",Self-taught,30,40,10,5,15,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Text data,Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,Sometimes,,,,,Sometimes,Sometimes,Sometimes,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,Often,,Rarely,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics",Often,,Sometimes,Sometimes,Often,Most of the time,Most of the time,Often,Sometimes,,Sometimes,Often,,Sometimes,,Often,,Sometimes,Most of the time,Most of the time,Often,,Often,Most of the time,Often,,,Often,Often,,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,Often,,Most of the time,,,Sometimes,Often,Often,Most of the time,Often,,,,Often,Sometimes,,Sometimes,Often,Sometimes,,Less than 10% of projects,More internal than external,IT Department,Chinese public data on language modeling,Modeling text data in a proper/effective form,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Rarely,"220,000",CNY,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,R,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Online courses,Textbook,YouTube Videos",,Very useful,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Data Analyst,Self-taught,80,20,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,1GB,"Decision Trees,GANs,Gradient Boosted Machines,Markov Logic Networks,Random Forests","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,,Most of the time,,,,Most of the time,,Often,Sometimes,,,,,,Often,Rarely,Most of the time,,,,,Sometimes,,Sometimes,,,,60,20,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Need to coordinate with IT",,,,,,,,,,,Sometimes,,,,Often,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,1000000,INR,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,31,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,Less than a year,"Data Analyst,Machine Learning Engineer,Programmer,Researcher",Kaggle competitions,60,30,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,I prefer not to answer,Decreased slightly,3-5 years,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Workstation + Cloud service",Image data,Sometimes,10GB,"CNNs,GANs,Neural Networks,RNNs,SVMs","C/C++,MATLAB/Octave,Python,SQL,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Data Visualization,GANs,Neural Networks,RNNs,SVMs",,,,Most of the time,,Often,Often,,,,Sometimes,,,,,,,,,Often,,,,,Often,,,Often,,,,,,30,40,20,10,0,0,Enough to tune the parameters properly,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,MNIST;ImageNet;CIFAR;COCO;PASCAL VOC,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Rarely,"200,000",CNY,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,Taiwan,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,,"O'Reilly Data Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,United States,22,"Not employed, but looking for work",,,,,,,,Other,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Biology,1 to 2 years,,Self-taught,90,0,0,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Not important,Not important,Very Important,Not important,Not important +Female,United Kingdom,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,Spark / MLlib,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,No education,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,Australia,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,75,0,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Government,,,,,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,100MB,Other,"Jupyter notebooks,MATLAB/Octave,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,Rarely,,,,Often,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,15,5,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,No,Yes,Other,Fine,"Employed by a company that performs advanced analytics,Employed by government",TensorFlow,Deep learning,Python,GitHub,"Kaggle,Newsletters,Online courses,Podcasts",,,,,,,Very useful,Very useful,,,Very useful,,Very useful,,,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,DataCamp,Traditional Workstation,2 - 10 hours,Github Portfolio,No,Master's degree,,Less than a year,"Business Analyst,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",5,80,10,0,5,0,Time Series,,A professional degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Egypt,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,SQL,Support Vector Machines (SVM),SQL,Google Search,"Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,Very useful,Very useful,,,Very useful,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",No education,Other,10 to 19 employees,Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,Decision Trees,"Amazon Web services,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL",,Often,,,,,,,,,,,,,,,,,,,,Often,Often,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Recommender Systems,Segmentation",,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,5,10,5,50,30,0,Enough to run the code / standard library,"I prefer not to say,Organization is small and cannot afford a data science team",,,,,,,Often,,,,,,,,,Often,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Sometimes,1000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Australia,24,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Text Mining,Python,"Google Search,I collect my own data (e.g. web-scraping)","College/University,Official documentation,Stack Overflow Q&A",,,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer",Self-taught,80,0,0,20,0,0,Recommendation Engines,,A bachelor's degree,Telecommunications,"10,000 or more employees",Decreased significantly,Less than one year,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),,Rarely,10MB,,"Amazon Web services,MATLAB/Octave,Python",,Rarely,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,50,40,10,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,Most of the time,Most of the time,Most of the time,,Less than 10% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,100000,AUD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,24,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Web services,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Official documentation,Personal Projects,Textbook",,,,,,Very useful,Somewhat useful,,,Somewhat useful,,Very useful,,,Very useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Miner,Software Developer/Software Engineer",Work,10,0,50,30,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,C/C++,IBM Cognos,Java,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SQL,Tableau,TensorFlow",,Most of the time,,Sometimes,,,,,,Rarely,,,,,Rarely,,Most of the time,,,,,,,Rarely,Often,,Sometimes,,,,Most of the time,,Often,,,,Rarely,,,,,Most of the time,,,Often,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,Rarely,Often,Often,Most of the time,Often,,,,,Most of the time,,Often,,Often,Rarely,,Often,,Sometimes,,,Often,,Sometimes,Sometimes,Most of the time,,,,55,10,15,15,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Sometimes,Sometimes,,Sometimes,Often,Sometimes,,,,Often,,Rarely,,,Often,,Sometimes,,,Sometimes,Often,Most of the time,10-25% of projects,More internal than external,Other,google places; here; weather data; census data,Access to 'trusted' data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Git,Other",Rarely,139000,AUD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Argentina,54,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,Python,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Online courses,Personal Projects,Textbook",Very useful,,,,Very useful,,,,,,Very useful,Very useful,,,Very useful,,,,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Machine Learning Engineer,Researcher",Self-taught,50,0,25,25,0,0,"Computer Vision,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Neural Networks - CNNs",High school,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Video data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,Neural Networks","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"Bayesian Techniques,CNNs,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation",,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,Often,,,,Often,,Most of the time,Most of the time,,,,,Most of the time,,,,,,,,50,20,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,Most of the time,,,,,,,,Often,,,Often,,100% of projects,Approximately half internal and half external,Standalone Team,Imagenet,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Company Developed Platform,,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,Matlab,I collect my own data (e.g. web-scraping),"Company internal community,Online courses,Textbook,YouTube Videos",,,,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,,Bayesian Techniques,High school,Military/Security,20 to 99 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Other,Don't know,,Bayesian Techniques,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,25,20,25,30,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,Entirely internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Other,Never,140000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,MATLAB/Octave,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,70,10,10,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,1MB,Decision Trees,"C/C++,Java,MATLAB/Octave,Python,SQL",,,,Sometimes,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Decision Trees,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,30,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",Often,Sometimes,Often,,Sometimes,Sometimes,,Often,Sometimes,Most of the time,Often,,Often,,Sometimes,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,,,,,,,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Textbook",,Very useful,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,,,Somewhat useful,,,,Other (Separate different answers with semicolon),< 1 year,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),0 - 1 hour,PhD,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,0,50,0,40,10,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important +Male,Singapore,48,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,"Conferences,Friends network,Textbook,Tutoring/mentoring",,,,,Somewhat useful,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Workstation + Cloud service",Relational data,Sometimes,100GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R,RapidMiner (free version),SQL,TensorFlow,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics",,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,Sometimes,,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,Sometimes,,,,,15,50,5,10,20,0,Enough to refine and innovate on the algorithm,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,None,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,,,,,,9,,,,,,,,,,,,,,,,,, +Male,Australia,54,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Financial,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Web services,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Other",,,,,,,Very useful,,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,30,20,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Insurance,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Most of the time,,Most of the time,,,,,Sometimes,,,,,,40,20,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team",,Most of the time,,,Often,,,,Often,,Most of the time,,Rarely,,,Often,,,,,,,51-75% of projects,More internal than external,Standalone Team,Insee,,Column-oriented relational (e.g. KDB/MariaDB),"Commercial Data Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,"52,000",,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Russia,22,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,Fine,"Employed by college or university,Employed by non-profit or NGO",Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Textbook",,Very useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,,Data Machina Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Master's degree,,Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",University courses,0,30,20,30,20,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,France,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Other",University courses,15,0,30,45,10,0,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Financial,"5,000 to 9,999 employees",Increased significantly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Always,100TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Rarely,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests",,,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,60,15,15,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"GitHub,University/Non-profit research group websites","Arxiv,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,,,,,,,,,Very useful,,,Very useful,Somewhat useful,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Miner,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,20,40,30,10,0,0,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Internet-based,I don't know,Increased slightly,Less than one year,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Always,1GB,"CNNs,Decision Trees,Ensemble Methods,Random Forests","C/C++,Java,Python,R,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,PCA and Dimensionality Reduction,Random Forests,Segmentation",,Sometimes,,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,Most of the time,,Most of the time,,,Most of the time,,,,,,,,30,40,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,Often,,,,,,Often,,,Less than 10% of projects,Entirely internal,Other,,"The size, the data is not preclassified",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Other,Always,37000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Russia,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,10,50,40,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Other,University courses,10,0,30,55,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,Company internal community,Conferences,Friends network,Newsletters,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Somewhat useful,Somewhat useful,,Very useful,Very useful,Somewhat useful,,Not Useful,,,,Very useful,Not Useful,Very useful,Somewhat useful,,Somewhat useful,Not Useful,"Data Machina Newsletter,FlowingData Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk,Other,Other",,Most of the time,,,,,,,Sometimes,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,Often,Most of the time,Most of the time,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,Rarely,Most of the time,,,,,,,,,Rarely,,,Often,,Often,,Rarely,,,Often,,,Often,Most of the time,,,,20,10,10,30,30,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database",,Rarely,,,Most of the time,,,,Sometimes,,,,Often,,,,,Sometimes,,,,,76-99% of projects,More external than internal,Other,,Lack of centralized data engineering,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,160000,AUD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,Somewhat useful,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer,Other",Self-taught,80,0,0,0,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Other,Most of the time,100GB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,Tableau,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,Most of the time,,,,"Data Visualization,Ensemble Methods,Logistic Regression,Neural Networks,Simulation,Time Series Analysis",,,,,,,Often,,Often,,,,,,,Often,,,,Often,,,,,,,Often,,,Most of the time,,,,50,25,5,20,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Sometimes,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Git,Rarely,,,,6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,48,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,Somewhat useful,Not Useful,Somewhat useful,Somewhat useful,,,,"Linear Digressions Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",1-2 years,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,"DataCamp,Other",Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Professional degree,,,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important +Male,India,43,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,,,,Online courses,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,"Some college/university study, no bachelor's degree",Internet-based,"5,000 to 9,999 employees",Stayed the same,,A tech-specific job board,Important,,,,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Other,Other,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Other,No,Bachelor's degree,Other,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Very Important +Male,United Kingdom,NA,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Textbook",,Very useful,,,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,,"DataTau News Aggregator,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,Researcher,University courses,40,0,10,20,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",,,,,,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Text data,Rarely,100GB,Regression/Logistic Regression,"C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,SQL,Unix shell / awk",,,,Rarely,Sometimes,,,,Sometimes,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,Sometimes,,,,40,20,15,20,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",Often,Often,,,,,,,Often,Often,,,Often,Often,,,Often,,,,Often,,76-99% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Online courses,Podcasts,Stack Overflow Q&A",Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,Not Useful,Very useful,,,,,"Linear Digressions Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",60,25,15,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,Often,,Sometimes,,Sometimes,,,Often,,Sometimes,,Sometimes,,,,,Sometimes,Often,Often,,,,70,3,2,10,15,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,Often,,Sometimes,Sometimes,,,Most of the time,,,Most of the time,Often,,76-99% of projects,Do not know,Business Department,Census,Poor data quality; the data comes from state government agencies that do not understand how to structure data or put controls in place to improve quality,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,137500,USD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Iran,27,"Not employed, but looking for work",,,,,,,,Java,Deep learning,Python,"Google Search,University/Non-profit research group websites",College/University,,,Very useful,,,,,,,,,,,,,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),40+,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher",University courses,30,20,20,30,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,,,Somewhat important,Very Important,,,,,, +Female,United States,59,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,37,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,Very useful,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,"Data Elixir Newsletter,Linear Digressions Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",40,30,5,5,20,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,CRM/Marketing,500 to 999 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,Sometimes,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,,Often,Often,,,Most of the time,,Most of the time,,Most of the time,,,,,Sometimes,,Most of the time,,,Often,,,,Sometimes,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,,,,Most of the time,Most of the time,,,,Often,,Most of the time,,,,,Most of the time,,,Less than 10% of projects,More internal than external,Standalone Team,postal code information; weather;economic,"No data democracy, different people know datasets better than others and that's not good for data science tasks","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Other,Sometimes,106000,CAD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,United States,20,Employed part-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Python,Neural Nets,Python,University/Non-profit research group websites,"College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"Data Stories Podcast,FlowingData Blog",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Biology,Less than a year,Researcher,University courses,0,40,0,60,0,0,,"Decision Trees - Random Forests,Logistic Regression",A professional degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Singapore,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"College/University,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher,Other",Self-taught,25,50,25,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,Not very important,Other,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,10GB,"CNNs,Gradient Boosted Machines,Regression/Logistic Regression,Other","C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,Rarely,Sometimes,,,,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,RNNs,Simulation,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,Sometimes,,Often,,Sometimes,,,,Sometimes,,Often,,Often,,,Often,,Often,,,,Sometimes,,Sometimes,,Often,Sometimes,,,,35,35,0,15,15,0,Enough to refine and innovate on the algorithm,"Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,,Often,,,,,,,,Sometimes,,,,,,,,,100% of projects,More internal than external,Other,None,Getting the data with the right scope from clients,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Rarely,75000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Other,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Hadoop/Hive/Pig,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,"Information technology, networking, or system administration",,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Not important +Male,Finland,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,Siraj Raval YouTube Channel,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,0 - 1 hour,Github Portfolio,No,I did not complete any formal education past high school,,,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Singapore,33,Employed full-time,,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Genetic & Evolutionary Algorithms,Python,University/Non-profit research group websites,"College/University,Conferences",,,Very useful,,Somewhat useful,,,,,,,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Predictive Modeler,Statistician",University courses,30,0,50,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Angoss,Python,R,SAS Base",,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Sometimes,,,,Most of the time,Most of the time,Most of the time,Often,,,,,Often,,Often,,,,Rarely,Sometimes,,Sometimes,,,Most of the time,,,,Often,,,,10,50,10,5,25,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Most of the time,,,,,,,,,,,,,,,Often,,,,76-99% of projects,More internal than external,Central Insights Team,none,none,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,100000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Israel,46,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,1 to 2 years,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,Less than one year,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,,,,"Amazon Web services,NoSQL,Python,R,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,CNNs,Data Visualization,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Microsoft Azure Machine Learning,Deep learning,R,GitHub,"Company internal community,Tutoring/mentoring,YouTube Videos",,,,Very useful,,,,,,,,,,,,,Very useful,Very useful,"Data Elixir Newsletter,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,25,50,25,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,A bachelor's degree,Manufacturing,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Sometimes,100GB,Decision Trees,"Hadoop/Hive/Pig,QlikView,SQL,Tableau",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,Often,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,25,25,20,30,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,,,,,,Often,,,,Often,,,76-99% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Subversion",Sometimes,2700000,INR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Canada,35,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,University/Non-profit research group websites,"Arxiv,Blogs,College/University,Conferences,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,KDnuggets Blog,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Other,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer,Other",University courses,50,20,10,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",6-10,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United Kingdom,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,Self-taught,90,5,0,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Neural Networks,Random Forests,SVMs","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,Most of the time,,Most of the time,,,,"Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,Most of the time,,Sometimes,,,,,Sometimes,,,,,,40,30,20,7,3,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Most of the time,,,,Often,,Sometimes,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,None,Trying to extract data which has bona fide relationships within it.,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Bitbucket,Never,35500,GBP,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,42,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,PhD,No,Master's degree,Computer Science,,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,42,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Kaggle,,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,1-2 years,Necessary,,Necessary,,Necessary,,,Nice to have,,,,,,,Laptop or Workstation and local IT supported servers,11 - 39 hours,Experience from work in a company related to ML,No,Doctoral degree,Engineering (non-computer focused),Less than a year,Researcher,Self-taught,70,0,0,0,30,0,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,,,,,,,,,Very Important,,Very Important,,Somewhat important,,Somewhat important +Male,Other,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Programmer,Researcher",University courses,80,0,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,,Somewhat useful,,Very useful,,Somewhat useful,,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Computer Science,,Computer Scientist,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,20,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Computer Vision,Support Vector Machines (SVMs),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher",Work,40,10,20,30,0,0,"Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",,Internet-based,"1,000 to 4,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Text data,Relational data",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Conferences,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,,,Very useful,,,,Somewhat useful,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,"Business Analyst,Other",University courses,10,40,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Rarely,Most of the time,,,,Often,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs",,,,,,Often,Most of the time,Sometimes,Rarely,,,Rarely,,Sometimes,,Often,,,Rarely,Often,Rarely,Often,Sometimes,,,Sometimes,,Sometimes,,,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT",,,,,Most of the time,,,,Often,,,,,,Often,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,"Weather data; Royal Post Data, Open Source Maps",Data quality issues,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Never,43000,GBP,Other,9,,,,,,,,,,,,,,,,,, +Male,Israel,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Other,Deep learning,Python,University/Non-profit research group websites,"Online courses,Other",,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,,,Work,40,0,60,0,0,0,"Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Most of the time,1TB,"Bayesian Techniques,SVMs","Amazon Web services,Cloudera,Java,Python,Spark / MLlib",,Often,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",,,Most of the time,,,Often,Sometimes,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Often,,Often,Most of the time,,,,,20,40,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,Less than 10% of projects,Do not know,Other,"wiki, taxonomies",diversity of textual data,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Bitbucket,Never,,,Other,9,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,30,20,20,20,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,CRM/Marketing,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Rarely,,,,Sometimes,,,,,,,,Most of the time,,,,Rarely,,,,,,Often,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,34,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by government,I don't plan on learning a new tool/technology,Deep learning,Matlab,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,Very useful,,Very useful,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,30,70,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Military/Security,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",Image data,Most of the time,100MB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,,Most of the time,,,,,,,,,,Most of the time,,,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,50,30,10,5,5,NA,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Privacy issues",Often,,,,,,,,,,,,Often,,,,Often,,,,,,100% of projects,More internal than external,Standalone Team,,Embedded implementation of modern data analysis computer vision algorithms ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Never,2000000,PKR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,Anomaly Detection,R,University/Non-profit research group websites,"Arxiv,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,3 to 5 years,Data Scientist,Self-taught,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Rarely,10PB,"Decision Trees,Gradient Boosted Machines,Random Forests,RNNs","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,5,0,5,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,,,,4,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Not Useful,,,,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,Necessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,"Coursera,DataCamp,edX,Udacity",GPU accelerated Workstation,0 - 1 hour,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,90,0,5,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,,Very useful,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,"DataTau News Aggregator,Partially Derivative Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Researcher,Statistician",Work,30,30,30,5,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,10PB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Rarely,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Rarely,Rarely,,,Most of the time,,,Often,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Time Series Analysis",Sometimes,,Sometimes,,,Often,Most of the time,Often,Sometimes,,,,,Often,,Sometimes,,Sometimes,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,Most of the time,,,,30,10,0,40,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,Sometimes,,,Often,,,,Often,,Most of the time,,,,Often,Often,,,76-99% of projects,More internal than external,Central Insights Team,Mostly we only use internal data,Dirty data and IT not providing full access to all data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,138000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,0,10,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,27,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites,Other","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Other","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Physics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",15,65,0,15,5,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Other,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by government,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Official documentation,Online courses,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,,,,Very useful,,Very useful,Very useful,"Becoming a Data Scientist Podcast,FlowingData Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer",Self-taught,85,5,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees","C/C++,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,,,Rarely,Sometimes,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,Sometimes,,Sometimes,,,,,,Often,,,,Often,Sometimes,,,Most of the time,Sometimes,Sometimes,,,,,,,,Often,Most of the time,,,Most of the time,Sometimes,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,Natural Language Processing,Text Analytics,Time Series Analysis",Most of the time,,,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,50,5,30,5,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,,,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,,Most of the time,,,,76-99% of projects,Entirely internal,IT Department,Twitter;Facebook,Dirty Data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Sometimes,13000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +,,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,Somewhat useful,Very useful,,,,,Very useful,,,Somewhat useful,,Somewhat useful,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,3 to 5 years,Researcher,Other,20,0,20,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Academic,500 to 999 employees,Stayed the same,Don't know,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",Sometimes,,Often,,,Often,Most of the time,Often,,,,,,,,Most of the time,,,,,,,Often,,,,,,Sometimes,Often,,,,60,10,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Most of the time,Sometimes,,Often,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,,Most of the time,,Most of the time,,100% of projects,Approximately half internal and half external,Other,"It is hard to say, but it depends on the task at hand. I do a lot of political science research, which requires us to integrate Census data, electoral data (at the local, state, national, and international level), etc.",Lack of time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Always,63000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,Hungary,26,Employed part-time,,,No,Yes,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",R,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Stack Overflow Q&A,Textbook",,Very useful,,,,,,,,,,,,Very useful,Very useful,,,,,3-5 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Bachelor's degree,Management information systems,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important,Very Important,Somewhat important,Not important +Male,United States,39,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that doesn't perform advanced analytics,Tableau,Deep learning,Matlab,University/Non-profit research group websites,"Blogs,College/University,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Very useful,,,Very useful,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,Researcher,Work,30,30,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","MATLAB/Octave,Microsoft Azure Machine Learning,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,Often,Most of the time,Often,Often,,,,,Often,,Often,,Often,,Often,Most of the time,,Often,,,,,Often,,Often,,,,40,40,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,Often,,,,Most of the time,,,,Often,,,Sometimes,Often,,10-25% of projects,Entirely internal,Other,All proprietary data,"Not being able to figure out how much data to use / data sufficiency for a given problem, and long time needed for rial and error/experimentation.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Sometimes,100000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Germany,49,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,,,R,,"Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,,,,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Other","Online courses (coursera, udemy, edx, etc.)",90,0,10,0,0,0,,,I prefer not to answer,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Very Important,Not important,Somewhat important +Male,Norway,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,Python,Cluster Analysis,R,I collect my own data (e.g. web-scraping),"Arxiv,College/University,Company internal community,Conferences,Kaggle,Official documentation,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos,Other",Somewhat useful,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,Very useful,,Very useful,Very useful,Very useful,,,,Somewhat useful,"Data Stories Podcast,Linear Digressions Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,Researcher,Work,26,20,26,26,2,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,Often,,,,Rarely,Rarely,Sometimes,Rarely,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,Rarely,Rarely,,,,Most of the time,Often,Sometimes,,,,,Often,,Often,,Sometimes,,,Often,,Sometimes,,,,,Rarely,,Often,,,,40,20,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Sometimes,Often,Most of the time,Sometimes,,Often,Sometimes,,Sometimes,,,Often,Often,,Often,,Rarely,Most of the time,Often,,76-99% of projects,Entirely internal,Other,map data; transport data; weather data; geomapping; gov. statistics; ,toolkit and prep.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,650000,NOK,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,0,0,90,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Don't know,100GB,"Bayesian Techniques,Decision Trees","MATLAB/Octave,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,,,Often,,,,,,,,,,Rarely,Often,,,,,,Sometimes,,,,"Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Recommender Systems,Text Analytics",,,,,,,,Rarely,,,,,,Often,,,,Often,Often,,,,,Rarely,,,,,Often,,,,,40,10,40,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,Often,Often,,,,,,,,,Often,,,,,,Most of the time,Most of the time,,10-25% of projects,Entirely internal,Standalone Team,,Multiple data sources,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Never,"105,000",BRL,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Tableau,,,,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,3 to 5 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,,,,Government,500 to 999 employees,,,A general-purpose job board,Somewhat important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,,,,"Java,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Data Visualization,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,30,10,0,,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,C/C++,Support Vector Machines (SVM),Python,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Programmer,Self-taught,50,30,0,20,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Other,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,100MB,Other,"Amazon Web services,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,80,0,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,Sometimes,,Often,,,,,,,,,,Often,,26-50% of projects,More external than internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Bitbucket,,190000,INR,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Sweden,23,"Not employed, but looking for work",,,,,,,,Amazon Web services,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Friends network,Kaggle,Personal Projects",,Somewhat useful,Very useful,,,Somewhat useful,Very useful,,,,,Very useful,,,,,,,No Free Hunch Blog,1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",40+,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,20,0,0,65,15,0,"Computer Vision,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Female,United States,41,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Time Series Analysis,Python,Other,"Blogs,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Master's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important +Female,United States,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",Very useful,Very useful,,,Somewhat useful,,Not Useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,45,30,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service,Other","Text data,Relational data,Other",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL",,,,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Time Series Analysis",Often,,Sometimes,Often,,Often,Most of the time,Most of the time,Most of the time,,,,,,,Often,,Sometimes,,Most of the time,,,Most of the time,Often,,,,,,Sometimes,,,,60,15,3,5,12,5,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Sometimes,,Most of the time,Most of the time,,Most of the time,Sometimes,Sometimes,,,Sometimes,,Often,Sometimes,,Often,,Most of the time,Most of the time,,51-75% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,200000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,,Very useful,"KDnuggets Blog,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer,Other",Self-taught,18,15,50,16,1,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Rarely,,Most of the time,,,,Most of the time,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Rarely,Often,,,,Sometimes,,Most of the time,,,,"A/B Testing,Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Rarely,,Sometimes,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,Rarely,,Often,,Often,,Most of the time,,,Most of the time,Most of the time,Most of the time,,Most of the time,Sometimes,Often,Often,,Most of the time,Most of the time,Most of the time,,,,50,10,20,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,Most of the time,Most of the time,Often,,Often,Often,Often,Often,,Often,Often,Sometimes,,Rarely,Often,,Often,Often,,76-99% of projects,More internal than external,Central Insights Team,real state pricing; national census,Making sense of the system usage by all the operations teams.,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,Google Drive,"Bitbucket,Git",Never,168000,BRL,Other,7,,,,,,,,,,,,,,,,,, +Male,Denmark,25,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,10,35,35,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,20,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Official documentation,YouTube Videos",,,Somewhat useful,,,,Very useful,,,Very useful,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Data Stories Podcast",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,University courses,40,0,0,20,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Brazil,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,Scala,University/Non-profit research group websites,"Blogs,College/University,Company internal community,Conferences,Friends network,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring",,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,Very useful,,Very useful,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,0,30,30,30,0,10,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,1TB,"Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Rarely,Often,,,Most of the time,,Sometimes,,Most of the time,,,,,Often,Sometimes,,Sometimes,,,,Sometimes,Rarely,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,Sometimes,,,Sometimes,,,,,,,,Sometimes,,Often,,Sometimes,,Often,Often,,,,Often,Sometimes,,Sometimes,,,,,,60,5,20,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,,,,,,,,,,Often,,,Often,,,26-50% of projects,Approximately half internal and half external,IT Department,,frequently changes at schemas,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git",Sometimes,98000,BRL,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,65,Retired,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that performs advanced analytics,Employed by government",Other,I don't plan on learning a new ML/DS method,Matlab,Google Search,"Friends network,Online courses,Personal Projects",,,,,,Somewhat useful,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,,Self-taught,35,0,35,30,0,0,Time Series,"Ensemble Methods,Evolutionary Approaches",A bachelor's degree,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Other,Bayesian Methods,Python,Google Search,"Kaggle,Online courses,Podcasts,Textbook",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,,Very useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",1-2 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,Other","Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,85,0,0,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,Sweden,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"DataTau News Aggregator,KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,20,20,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Stayed the same,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1TB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,QlikView,R,SQL",,,,,Most of the time,,,,Most of the time,,,,,Most of the time,Often,,Often,,,,,,,,,,,,,,Most of the time,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Most of the time,,,,,Often,Most of the time,Often,,,,,,Often,,Often,,,,Sometimes,Sometimes,,Often,,,Most of the time,Sometimes,Sometimes,,Most of the time,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Other",Sometimes,Often,Often,,Sometimes,Rarely,,Sometimes,Often,,,,,Sometimes,,,Rarely,,Most of the time,Sometimes,,Often,76-99% of projects,Approximately half internal and half external,Other,,Missing data entries,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Rarely,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,35,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,R,I collect my own data (e.g. web-scraping),"Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Not Useful,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,5-10 years,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),11 - 39 hours,,Sort of (Explain more),Master's degree,Other,,"Data Analyst,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important +Male,France,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",University courses,70,8,0,20,2,0,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Image data,Sometimes,1TB,"Bayesian Techniques,CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction",,,Sometimes,Most of the time,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,34,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,"Predictive Modeler,Researcher,Statistician,Other",Work,0,10,40,10,40,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Other,Sometimes,1MB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression,Other","Java,Jupyter notebooks,Python,R,Other,Other",,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Most of the time,,"Bayesian Techniques,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,Most of the time,,,Sometimes,,,,5,15,20,20,40,0,Enough to refine and innovate on the algorithm,Other,,,,,,,,,,,,,,,,,,,,,,Often,76-99% of projects,Entirely internal,Other,Financial data purchased by company.,We are not storing all internal data. Continuous effort needed to get data stored.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Other",Rarely,"150,000",GBP,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Portugal,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Personal Projects,YouTube Videos",,,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,,3-5 years,Nice to have,Necessary,Necessary,,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Other","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Online Courses and Certifications,No,Master's degree,Other,3 to 5 years,Researcher,University courses,20,10,0,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Not important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Poland,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Python,Google Search,"Blogs,College/University,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,Somewhat useful,Not Useful,Not Useful,,,Somewhat useful,Somewhat useful,Siraj Raval YouTube Channel,3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,Software Developer/Software Engineer,University courses,70,20,0,10,0,0,"Computer Vision,Time Series,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Somewhat important,Not important,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Computer Science,,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Recommendation Engines,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,47,"Not employed, but looking for work",,,,,,,,SQL,Time Series Analysis,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Textbook",,,Somewhat useful,,,,Very useful,,,,,,,,Very useful,,,,,1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Bachelor's degree,,1 to 2 years,I haven't started working yet,University courses,30,30,0,40,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Neural Nets,Python,"Google Search,Government website,University/Non-profit research group websites,Other","College/University,Kaggle,Personal Projects,Podcasts,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,6 to 10 years,"Business Analyst,Computer Scientist,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,35,10,0,45,0,10,"Computer Vision,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Other,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,HMMs,Markov Logic Networks,Regression/Logistic Regression","Amazon Web services,Python,R,TIBCO Spotfire",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Rarely,,,,,"Bayesian Techniques,Evolutionary Approaches,HMMs,Logistic Regression,Natural Language Processing",,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,Often,,,Often,,,,,,,,,,,,,,,40,40,0,0,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Often,,,,,Most of the time,,,,,,,,,Sometimes,,,10-25% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Subversion,Most of the time,208000,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Italy,59,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Neural Nets,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A",,,,,,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,"Data Machina Newsletter,FastML Blog",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,60,20,0,0,20,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Operations Research Practitioner",University courses,15,5,0,80,0,0,Natural Language Processing,"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Military/Security,500 to 999 employees,Stayed the same,3-5 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Other",Text data,,10MB,"Decision Trees,SVMs","Amazon Web services,Mathematica,R,SAS JMP",,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,,"Data Visualization,Decision Trees,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,,Most of the time,Often,,,,,,,,,,,Often,Sometimes,Sometimes,,Often,,,,,Often,Most of the time,,,,,10,10,0,40,40,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Spark / MLlib,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,"Data Elixir Newsletter,Linear Digressions Podcast,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,5,25,20,0,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Bayesian Techniques,RNNs","Amazon Machine Learning,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,Recommender Systems",Sometimes,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,60,5,20,10,5,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,Often,Often,,,,,,,,Often,,,,,,Often,,,,,76-99% of projects,Entirely internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,"120,000",USD,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Belgium,48,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Blogs,College/University,Friends network,Official documentation,Stack Overflow Q&A,YouTube Videos",,Very useful,Very useful,,,Very useful,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,50,0,30,20,0,0,Time Series,"Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,1GB,"Regression/Logistic Regression,Other","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,Often,,,,,Often,,,,,,Most of the time,,,Most of the time,,,,30,20,0,20,0,30,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,Often,,Most of the time,,Often,,76-99% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Subversion,Sometimes,90000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,Canada,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,,"Blogs,Conferences,Online courses,Personal Projects",,Very useful,,,Somewhat useful,,,,,,Very useful,Very useful,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,Researcher,Self-taught,90,5,0,5,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Technology,100 to 499 employees,Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Don't know,,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,Often,,,,,,,Sometimes,,Often,,,Sometimes,,,,Rarely,,,,15,6,0,12,12,55,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Other",,,,,Often,,,,Sometimes,,,,,Often,,,,,,,,Most of the time,100% of projects,Do not know,Standalone Team,Nearly all datasets are 3rd party (consulting),,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other","Slack, Dropbox",Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,100000,,Other,7,,,,,,,,,,,,,,,,,, +Male,Canada,56,Employed full-time,,,Yes,,Data Analyst,,,TensorFlow,Deep learning,,,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,FlowingData Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,,"Ensemble Methods,Regression/Logistic Regression","Hadoop/Hive/Pig,MATLAB/Octave,Microsoft Excel Data Mining,R,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Random Forests",,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,35,10,10,25,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Sometimes,Most of the time,,,,,,,,Often,,,,,,,Often,Sometimes,,26-50% of projects,Do not know,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,Argentina,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,38,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,0,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,1 to 2 years,Data Analyst,University courses,20,20,10,40,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,25,20,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,SVMs","Java,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,Often,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,SVMs,Time Series Analysis",Often,,Often,,,Often,Most of the time,,Often,,,,,Sometimes,,Most of the time,,Often,,,,,,,,,,Most of the time,,Often,,,,30,15,15,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by a company that doesn't perform advanced analytics,Stan,Factor Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Arxiv,Blogs,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book",Very useful,Very useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,,"Linear Digressions Podcast,Partially Derivative Podcast,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Other",Self-taught,65,15,10,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Other,I prefer not to answer,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,Stan,Unix shell / awk,Other",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,Sometimes,,,,,Often,Often,,,"Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests,Simulation,Time Series Analysis",,,Sometimes,,,,Often,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,Often,,,Often,,,,45,15,1,10,29,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,Most of the time,,,,,,Most of the time,,26-50% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Git,Other",Rarely,113000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,RapidMiner (commercial version),Deep learning,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Data Analyst,Data Miner,Operations Research Practitioner",Self-taught,100,NA,0,0,0,0,Recommendation Engines,Bayesian Techniques,A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,32,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Stack Overflow Q&A,Other",Very useful,Very useful,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,"FastML Blog,Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,20 to 99 employees,Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,,Laptop or Workstation and private datacenters,Relational data,Most of the time,,,"Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Other,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Other,,,,,,,,65000,,Other,7,,,,,,,,,,,,,,,,,, +Male,Canada,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,R,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Personal Projects,Podcasts,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,Somewhat useful,Very useful,,,,,"Jack's Import AI Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data,Other",Never,100MB,Regression/Logistic Regression,"Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Text Analytics",,Sometimes,,,,Often,Often,,Sometimes,Sometimes,,,,,,Sometimes,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,40,25,0,25,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,,,,26-50% of projects,Entirely internal,Central Insights Team,"allen brain data, open scientific research data sets",,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,50000,CAD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,Japan,33,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,26,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Python,Link Analysis,R,"GitHub,Government website,University/Non-profit research group websites","Blogs,Friends network,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Very useful,,,,Very useful,Somewhat useful,,,,,,,Very useful,,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Operations Research Practitioner,Researcher,Statistician",Self-taught,80,10,10,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Oracle Data Mining/ Oracle R Enterprise,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Association Rules,Collaborative Filtering,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Sometimes,,,Sometimes,,Most of the time,Often,,,,,,,,Most of the time,,,,,Most of the time,,Often,,,Most of the time,,,,Most of the time,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Sometimes,,,Sometimes,Often,,,,Most of the time,,,Most of the time,Most of the time,,Often,,,100% of projects,Entirely internal,Standalone Team,,limited insights,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,60000,PHP,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Mexico,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,,Somewhat useful,,,,,,Very useful,Very useful,Somewhat useful,,,Very useful,"Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Researcher,Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression","Amazon Machine Learning,Python,SQL",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,"Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Often,,Most of the time,,,,,,,Sometimes,,,,30,30,5,10,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others",Often,,,,,Most of the time,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,None,Permissions,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Never,"240,000",MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Russia,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Personal Projects",,,,,,,Very useful,,,,,Somewhat useful,,,,,,,,< 1 year,Necessary,Nice to have,Nice to have,,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,,"Evolutionary Approaches,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Brazil,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,"Siraj Raval YouTube Channel,The Data Skeptic Podcast",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Engineering (non-computer focused),,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important +Female,United States,50,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by non-profit or NGO,Microsoft R Server (Formerly Revolution Analytics),MARS,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Trade book,Tutoring/mentoring,YouTube Videos",,,Very useful,,Very useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,,,,Very useful,Very useful,Somewhat useful,"R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Business Analyst,Other",Work,15,10,0,75,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Non-profit,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SAS JMP,Tableau,Other,Other",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,,,,,Often,,,,,Often,,,,Often,Rarely,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,Often,Often,,Often,Often,Most of the time,Most of the time,,,,,,Often,,Most of the time,Often,Often,,Often,,Often,Often,Often,,,,Often,Often,Often,,,,60,25,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Often,,,Often,,Often,,Sometimes,Most of the time,,,,,,Most of the time,Most of the time,,26-50% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Always,"150,000",USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Tableau,Neural Nets,Python,Google Search,"Blogs,Friends network,Online courses",,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,,,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,,"Business Analyst,Other",Work,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,Taiwan,38,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,Very useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,35,20,30,10,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,"5,000 to 9,999 employees",Increased significantly,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Perl,Python,Tableau,TensorFlow,Unix shell / awk",Most of the time,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,Most of the time,Sometimes,,Most of the time,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,Sometimes,Often,Often,,Most of the time,Often,Often,Most of the time,,,Most of the time,,Most of the time,Often,Most of the time,,Most of the time,Sometimes,Often,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,,,,,40,10,25,5,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,Most of the time,,,,,,,Often,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Share Drive/SharePoint",,Git,Most of the time,,,,8,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Deep learning,Java,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,PhD,No,Bachelor's degree,Computer Science,,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Reinforcement learning,Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important +Female,United States,20,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Textbook",,Very useful,,,,,Very useful,,,Very useful,Very useful,,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Support Vector Machines (SVMs),Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,People 's Republic of China,24,Employed full-time,,,No,Yes,Machine Learning Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",,,,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Other,I haven't started working yet",University courses,10,5,15,30,10,30,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +A different identity,Japan,68,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,GitHub,"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Stack Overflow Q&A",Very useful,Somewhat useful,,Very useful,Somewhat useful,,Very useful,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",Self-taught,50,0,30,0,20,0,"Natural Language Processing,Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - RNNs",A bachelor's degree,Financial,20 to 99 employees,Increased significantly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Traditional Workstation","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs","Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,Rarely,,Most of the time,,,,,,,,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,Rarely,Sometimes,Most of the time,,Often,Most of the time,,,Often,,Sometimes,Most of the time,Sometimes,,,Rarely,Rarely,Sometimes,,Sometimes,Rarely,Sometimes,Sometimes,Sometimes,,Sometimes,Sometimes,,,,60,25,0,0,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Central Insights Team,Almost none;,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,20000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Australia,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by government,Java,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,College/University,Kaggle,Personal Projects,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Very useful,,,,,Very useful,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,50,5,15,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,,,,,,,,,,,,,,, +Male,United States,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Python,Monte Carlo Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects,Textbook",,,Somewhat useful,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,A bachelor's degree,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,1MB,"Bayesian Techniques,Regression/Logistic Regression","Java,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Naive Bayes,Text Analytics",,Sometimes,Sometimes,,,,Often,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,60,10,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,Often,,,Often,,,,,,,,,,Often,Often,,,10-25% of projects,Do not know,IT Department,,Cleaning it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Subversion,Rarely,60000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,64,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,GitHub,"Arxiv,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Other",Very useful,,,Very useful,Very useful,,Somewhat useful,,,,Very useful,Very useful,,Somewhat useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,I don't write code to analyze data,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,10,15,25,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Other,"5,000 to 9,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Image data,Other",,10TB,"CNNs,Neural Networks,RNNs","Jupyter notebooks,Python,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Sometimes,Most of the time,,,"CNNs,Data Visualization,Neural Networks,RNNs,Segmentation,Time Series Analysis",,,,Most of the time,,,Often,,,,,,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,Sometimes,,,,85,10,0,5,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",Often,,,,Sometimes,Often,,,,,,,,Often,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,191000,,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Brazil,46,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,19,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,,Self-taught,30,70,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,Python,Genetic & Evolutionary Algorithms,Python,"GitHub,Google Search",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,80,0,15,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,SQL,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Random Forests,Recommender Systems,SVMs",Often,Most of the time,,,,Most of the time,Most of the time,,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,Often,,,,Often,,,,,,60,20,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,10-25% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Subversion,Most of the time,,,,5,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,5,50,0,15,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,20 to 99 employees,Stayed the same,Less than one year,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,40,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,R,Deep learning,R,"GitHub,University/Non-profit research group websites","Stack Overflow Q&A,Tutoring/mentoring",,,,,,,,,,,,,,Very useful,,,Very useful,,"Data Stories Podcast,Emergent/Future Newsletter (Algorithmia),O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,3 to 5 years,"Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Logistic Regression,Markov Logic Networks",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,Brazil,30,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Personal Projects,Textbook,YouTube Videos",Somewhat useful,,Very useful,,,,Very useful,,,,,Very useful,,,Somewhat useful,,,Somewhat useful,"Data Stories Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,30,0,40,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Don't know,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Julia,Jupyter notebooks,MATLAB/Octave,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests",,,,,,,Most of the time,Often,,,,,,Often,,Most of the time,,,,,,,Often,,,,,,,,,,,50,10,10,30,0,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,Sometimes,,,,,Often,Most of the time,Often,,,76-99% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,52000,BRL,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,39,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Machine Learning,Deep learning,Python,Google Search,"Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,1 to 2 years,Researcher,University courses,30,30,30,0,10,NA,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data",Rarely,100GB,"CNNs,Neural Networks,RNNs,SVMs","Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Natural Language Processing,Neural Networks,RNNs,Text Analytics",,,,Often,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,30,10,10,10,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Often,,,,Often,,,,Most of the time,Most of the time,,,,,,Often,,,,,,,10-25% of projects,More external than internal,Standalone Team,opensubs; cornell-movies; imagenet; ,clean wrong data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Vietnam,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,63,Employed full-time,,,Yes,,Other,Poorly,Self-employed,Google Cloud Compute,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Personal Projects,Podcasts,YouTube Videos,Other",Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,,,,Somewhat useful,Somewhat useful,,,,,Very useful,"DataTau News Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Other",Self-taught,50,10,20,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,Fewer than 10 employees,Increased significantly,1-2 years,Some other way,Somewhat important,Other,GPU accelerated Workstation,Other,,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Often,,,,Rarely,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Time Series Analysis",,,,Most of the time,,Most of the time,Often,,Often,,Sometimes,,,Sometimes,,Sometimes,,,Sometimes,Most of the time,Often,,,,Often,Sometimes,,,,Often,,,,20,20,50,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,Most of the time,Sometimes,,Often,,,,,,,,,Sometimes,,10-25% of projects,More external than internal,Other,VCTK;LibriSpeech,"We're blessed with some really good, easily available data sets. However there are more out there, that are kind of industry standards, that are hard for small companies to access.",Other,Company Developed Platform,,"Git,Other",Always,,,Other,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Other,NA,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Other",,Somewhat useful,Very useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Other (Separate different answers with semicolon)",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Workstation + Cloud service",40+,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,,"Data Analyst,Data Miner",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,United States,21,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Researcher",Self-taught,30,30,5,25,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Never,100MB,CNNs,"Jupyter notebooks,Minitab,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,Rarely,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,Most of the time,,Most of the time,Often,,,,,,,,,,,Sometimes,,Most of the time,Sometimes,,,,,Sometimes,,Sometimes,,,,,,50,20,20,5,5,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",SAP BusinessObjects Predictive Analytics,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,0,0,0,100,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10TB,"Decision Trees,Gradient Boosted Machines,HMMs,Random Forests,RNNs,SVMs","IBM Watson / Waton Analytics,Python",,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,HMMs,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,Most of the time,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,Most of the time,Most of the time,,,,20,20,20,20,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Explaining data science to others,Lack of data science talent in the organization,Privacy issues",,,,,,Often,,,Most of the time,,,,,,,,Most of the time,,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git,Subversion",Rarely,3300000,INR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,"Not employed, but looking for work",,,,,,,,Python,Social Network Analysis,Python,Google Search,"Arxiv,College/University,Personal Projects,YouTube Videos",Somewhat useful,,Somewhat useful,,,,,,,,,Somewhat useful,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Management information systems,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,40,30,0,30,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,India,39,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Friends network,Kaggle,Online courses,Textbook,YouTube Videos",,,,,Very useful,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,20,0,30,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Video data,Text data,Relational data",Rarely,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Modeler,IBM SPSS Statistics,Python,Spark / MLlib,Other",,,,Most of the time,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,Most of the time,Often,,,Often,,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,,,Most of the time,Often,,Most of the time,,,,,,,,,,,30,35,5,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,,,,Most of the time,,Most of the time,,Most of the time,,,,,,,,Most of the time,,Less than 10% of projects,More internal than external,Standalone Team,"nvd, cve, kdd",,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,2500000,INR,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Australia,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,Engineer,Self-taught,60,10,0,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Java,Jupyter notebooks,MATLAB/Octave,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,Self-taught,70,10,0,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Telecommunications,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Flume,Impala,Java,Python,Spark / MLlib,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,Often,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,Sometimes,,,Most of the time,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression",,,,,,,Often,Often,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,39,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Text Mining,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,Other,Self-taught,70,0,0,25,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,500 to 999 employees,Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"Decision Trees,Logistic Regression,Text Analytics",,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,Often,,,,,30,30,0,10,30,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,130000,AUD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Text Mining,Java,Google Search,"Kaggle,Personal Projects",,,,,,,Very useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,40,10,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",,Retail,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,SVMs,Perl,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,40,10,40,10,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,None,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j)",Share Drive/SharePoint,,Subversion,Never,700000,,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,37,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Decision Trees,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,Not Useful,Somewhat useful,Not Useful,,,,,< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Basic laptop (Macbook),GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,PhD,Yes,Master's degree,Psychology,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Vietnam,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Python,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,Less than a year,"Engineer,Researcher",Self-taught,40,30,30,0,0,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Manufacturing,"10,000 or more employees",Stayed the same,6-10 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data",Sometimes,10GB,"CNNs,Neural Networks,RNNs","Amazon Web services,C/C++,MATLAB/Octave,Python,R,SQL",,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"CNNs,Neural Networks,RNNs",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,30,30,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Email",,Git,Sometimes,50000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Other",University courses,40,0,0,60,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A doctoral degree,Technology,"10,000 or more employees",,,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,,,"C/C++,IBM Watson / Waton Analytics,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,Computer Vision,Logistic Regression,High school,Technology,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,1GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Microsoft Azure Machine Learning,SQL",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,10,20,0,20,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Conferences,,,,,Very useful,,,,,,,,,,,,,,Data Machina Newsletter,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,Survival Analysis,Neural Networks - GANs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Very Important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,24,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,10,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,46,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Analyst,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,80,10,5,0,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,100GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Python,R,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,"Gradient Boosted Machines,Logistic Regression,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,,,,,,,,,,Often,,,,Sometimes,,,,Sometimes,,,Often,,,,,Often,,Most of the time,,,,50,15,5,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,100% of projects,Entirely internal,IT Department,none,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,45000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Japan,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Textbook,Tutoring/mentoring",,,,,Very useful,,,,,,,,,,Very useful,,Somewhat useful,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,,University courses,0,20,80,0,0,0,Natural Language Processing,"Hidden Markov Models HMMs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Academic,I don't know,Stayed the same,Don't know,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Text data,Don't know,100GB,"CNNs,SVMs","Perl,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,Sometimes,,Often,,,,,,,,Sometimes,,,,Often,Most of the time,,Sometimes,,,,,,,Often,Often,,,,,20,50,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues",,,,,,,,,Often,Often,,,,,,Sometimes,Most of the time,,,,,,10-25% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Bitbucket,Rarely,8000000,JPY,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Australia,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by company that makes advanced analytic software,,,,,"Arxiv,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Other,"1,000 to 4,999 employees",,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",,,,"C/C++,Python,R,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,,Often,Often,,,Often,,Often,,,,Often,,Often,Often,,Often,,,,,Often,Often,Often,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,Has increased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,India,29,"Not employed, but looking for work",,,,,,,,Cloudera,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Tutoring/mentoring",,Somewhat useful,,Somewhat useful,,Very useful,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,Very useful,,"Emergent/Future Newsletter (Algorithmia),FlowingData Blog,Linear Digressions Podcast",3-5 years,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp","Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Other",Work,30,10,40,0,20,0,"Computer Vision,Machine Translation,Recommendation Engines,Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,,,,,,,,,,,,,,,, +Female,Turkey,24,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,RapidMiner (commercial version),Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Software Developer/Software Engineer",University courses,40,10,35,15,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Java,Julia,KNIME (free version),Python,R,SAP BusinessObjects Predictive Analytics,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,Often,,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,,Most of the time,,Most of the time,,,,Often,,,,Most of the time,Most of the time,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,SVMs,Time Series Analysis",,Often,Often,,,Most of the time,Most of the time,Most of the time,,Often,,Most of the time,,Often,,Often,,Often,,Often,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,35,30,20,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,Often,,,,,,Sometimes,,,Sometimes,,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,8000,TRY,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Iran,28,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,,1-2 years,,Necessary,Nice to have,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer",University courses,20,30,40,10,0,0,"Computer Vision,Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,28,Employed full-time,,,No,Yes,Researcher,Fine,Employed by government,R,Cluster Analysis,,,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,,No,Doctoral degree,Mathematics or statistics,Less than a year,"Engineer,Researcher",Self-taught,100,0,0,0,0,0,,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United Kingdom,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,"Ensemble Methods (e.g. boosting, bagging)",Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Personal Projects",Very useful,,,,,,,,,,,Very useful,,,,,,,,< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Necessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important +Female,Greece,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,IBM Watson / Waton Analytics,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,,Very useful,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,Machine Learning Engineer,Self-taught,40,0,60,0,0,0,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - RNNs",High school,Technology,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data,Text data",Most of the time,100GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Segmentation",,,,Most of the time,,Most of the time,Most of the time,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,40,30,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,51-75% of projects,,,,,,,,,,,,,9,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,University/Non-profit research group websites,"College/University,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,Very useful,Very useful,,Very useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,40,20,30,10,0,0,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Image data,Text data",Most of the time,1GB,"Decision Trees,Ensemble Methods,HMMs,Regression/Logistic Regression","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,Often,Most of the time,Most of the time,Most of the time,,,,Sometimes,Often,,Sometimes,,,,,,,Often,,,Often,,,Most of the time,,,,,20,30,20,20,10,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,Often,,,,,Most of the time,,Most of the time,,,Most of the time,,,,,Most of the time,,51-75% of projects,Entirely internal,Other,None,"Data not enough, and needs extensive cleaning","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Sometimes,70000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,30,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft SQL Server Data Mining,Time Series Analysis,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Friends network,Kaggle,Newsletters,YouTube Videos",,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,,,,,,,,,,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,,2 - 10 hours,,No,Master's degree,,I don't write code to analyze data,"Business Analyst,Data Analyst",Self-taught,40,0,60,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation",Evolutionary Approaches,High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Other,47,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,KNIME (free version),Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"College/University,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,Very useful,Very useful,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Researcher",University courses,20,20,10,50,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,20 to 99 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Collaborative Filtering,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,Recommender Systems,SVMs",,,Sometimes,,Often,,,Often,,,,,,Sometimes,,Sometimes,Rarely,Sometimes,,Sometimes,,,,Often,,,,Often,,,,,,40,20,10,20,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources",Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,10-25% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Other,Rarely,1500000,IQD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,France,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Deep learning,Python,Google Search,"College/University,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,,,,,,,,,,Somewhat useful,,,Very useful,,"FlowingData Blog,Talking Machines Podcast",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,Mathematics or statistics,,"Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Spain,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Newsletters,Online courses",Somewhat useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,Very useful,,,,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,3 to 5 years,"Data Analyst,Machine Learning Engineer,Programmer,Researcher",Other,0,40,20,35,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Image data,Most of the time,10GB,"Neural Networks,SVMs","C/C++,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python",,,,Most of the time,,,,Rarely,,,,,,,,,Sometimes,,,,Most of the time,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,,,,Often,,,,,,,,,,,,,,Most of the time,Often,,,,,Most of the time,,Sometimes,,,,,,30,20,40,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Unavailability of/difficult access to data",,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More internal than external,Other,,Interclass variability,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,South Africa,33,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by government,MATLAB/Octave,Deep learning,Matlab,Google Search,"College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,3-5 years,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,More than 10 years,Programmer,University courses,0,0,50,50,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Not important,Very Important,Very Important,Very Important,Very Important +Female,Republic of China,45,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,,University courses,40,0,0,60,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,,,,"Amazon Web services,Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,NoSQL",,Rarely,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,49,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,Microsoft R Server (Formerly Revolution Analytics),Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","College/University,Conferences,Kaggle,Non-Kaggle online communities,Personal Projects,YouTube Videos",,,Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,,,,,,Very useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator",3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,PhD,No,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,DBA/Database Engineer,Researcher,Software Developer/Software Engineer",Self-taught,70,0,0,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,28,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Amazon Machine Learning,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,Somewhat useful,Very useful,,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,University courses,30,20,0,30,20,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,IBM Cognos,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,RapidMiner (free version),SQL,Tableau,TensorFlow",Sometimes,Often,,,Often,,,Often,Often,Sometimes,,,,,Most of the time,,Most of the time,,,,Sometimes,Often,Sometimes,,Sometimes,,Often,,,,Most of the time,,Often,,Sometimes,,,,,,,Often,,,Often,Most of the time,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",Often,,Often,Often,Sometimes,,Often,Sometimes,,,,,Often,Often,,Sometimes,Often,Most of the time,Most of the time,Most of the time,,,Often,Often,Most of the time,,,Most of the time,Often,Most of the time,,,,30,30,10,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,Sometimes,,Often,Sometimes,,Sometimes,Sometimes,,,,,,,,Often,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,I don't typically share data,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,120000,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Scientist,Researcher,Software Developer/Software Engineer,Statistician",Work,30,0,30,0,40,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Most of the time,10MB,"Bayesian Techniques,Gradient Boosted Machines,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Java,Mathematica,Microsoft Excel Data Mining,Python,SQL,Unix shell / awk,Other",,Rarely,,,,,,,Rarely,,,,,,Most of the time,,,,,Rarely,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,Sometimes,Most of the time,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Naive Bayes,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Other",,,Often,,Rarely,Most of the time,Often,Sometimes,,,,Sometimes,,,,,,Sometimes,,,,Often,Sometimes,,,Often,Often,,,,Most of the time,,,35,25,5,10,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations of tools,Scaling data science solution up to full database",,,,,Often,,,,,,,,Often,,,,,Often,,,,,100% of projects,Entirely internal,Central Insights Team,kaggle,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Subversion,Sometimes,"44,000",EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Other,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Self-employed,C/C++,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Personal Projects,Textbook",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,"No Free Hunch Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler,Researcher,Statistician",University courses,60,0,20,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Financial,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Regression/Logistic Regression,"C/C++,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Time Series Analysis",,,Sometimes,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,10,40,30,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Sometimes,Most of the time,,,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,Most of the time,,51-75% of projects,More internal than external,Standalone Team,Uganda beural of statistics ,Incomplete data,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,12000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,GitHub,"Kaggle,Newsletters,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Somewhat useful,Very useful,,,,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,"Data Analyst,Data Scientist,Engineer",Self-taught,20,55,20,0,5,0,"Natural Language Processing,Speech Recognition,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Python,R,SQL,TensorFlow",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Rarely,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Text Analytics,Time Series Analysis",,,Sometimes,,,,Often,Often,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,Most of the time,Often,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,Often,,,Often,,,,,,,,,,Most of the time,,Most of the time,,26-50% of projects,Entirely internal,Standalone Team,github;,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Never,600000,INR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,36,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,1-2 years,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important +Male,Germany,41,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Stack Overflow Q&A",,Very useful,,,Somewhat useful,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,20 to 99 employees,Decreased slightly,Less than one year,A general-purpose job board,"N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,,,,"Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Rarely,,,,,,Rarely,Rarely,,,,,,,,Often,,Rarely,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Rarely,,,Sometimes,,,,"A/B Testing,Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Often,,,,40,0,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,Sometimes,,,,Often,Often,Sometimes,,,Often,,Often,Often,,,Most of the time,,,26-50% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,,,I do not want to share information about my salary/compensation,2,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,60,30,10,0,0,0,,,High school,Technology,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,,"Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,SAS Base,Tableau,Unix shell / awk",,Rarely,,,Often,,,,Often,,,,,Often,,,,,,,,,Often,,Sometimes,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,Often,,,,"Prescriptive Modeling,Text Analytics",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Often,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,34,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Somewhat useful,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Other",Self-taught,50,35,5,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,,100MB,"Ensemble Methods,Gradient Boosted Machines,Random Forests","Amazon Web services,C/C++,R,SAS Base",,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,PCA and Dimensionality Reduction,Simulation",,,,,,Often,Most of the time,Sometimes,Often,,,,,Often,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,25,30,0,30,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Often,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Other,public health data; epidemiologic data,clear definition of research questions and data cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",Dropbox,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,19500,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,South Africa,35,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Google Cloud Compute,Time Series Analysis,Julia,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,,,"Data Elixir Newsletter,Data Stories Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Operations Research Practitioner,Researcher",Self-taught,70,10,10,5,0,5,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,100TB,"Bayesian Techniques,CNNs,Decision Trees","C/C++,Python,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,"CNNs,Data Visualization,Decision Trees,Naive Bayes",,,,Often,,,Often,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,20,30,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,Sometimes,,,,,,,,Rarely,,,,,,,,,10-25% of projects,Entirely internal,Central Insights Team,no,understand and clean them,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Company Developed Platform,,Git,Most of the time,1000000,TWD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Other,36,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Other,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Textbook",Very useful,,,,,,Very useful,,,,,,,,Very useful,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,Researcher,University courses,30,0,70,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,100 to 499 employees,Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,100MB,"Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Random Forests,Other","MATLAB/Octave,R,SAS Base,SQL,Stan,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,,Rarely,,,,Most of the time,Rarely,,,,,Often,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Simulation,Other,Other,Other",,,Sometimes,,,Most of the time,Most of the time,Sometimes,Often,Most of the time,,Most of the time,,Sometimes,,,,,Sometimes,,Most of the time,,Most of the time,,,,Often,,,,Sometimes,Most of the time,Most of the time,50,10,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Sometimes,,Often,,,,,,Often,Sometimes,,,,,,,,Sometimes,Often,,100% of projects,Entirely internal,Other,UCI ML Repository,Transformation and cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",Subversion,Subversion,Sometimes,70000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Germany,33,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,Time Series Analysis,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,Very useful,,Somewhat useful,,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,,,,Coursera,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",2 - 10 hours,Master's degree,No,Master's degree,Electrical Engineering,,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Very Important +Female,Switzerland,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,NoSQL,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences",,Somewhat useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Researcher",Self-taught,20,0,50,25,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A professional degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Ensemble Methods,Random Forests","R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Cross-Validation,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics",Sometimes,,,,,Often,,,Sometimes,,,Rarely,,,,,,,Rarely,,Sometimes,Sometimes,Most of the time,,,,,,Rarely,,,,,50,10,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database",Often,,Often,,Most of the time,,,,,,Sometimes,,,,Sometimes,,Sometimes,Often,,,,,51-75% of projects,More internal than external,Central Insights Team,,incomplete data of questionable quality and lack of documentation,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Subversion",Rarely,140000,CHF,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Korea,NA,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,Data Analyst,Self-taught,90,10,0,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,IBM SPSS Modeler,Support Vector Machines (SVM),Python,University/Non-profit research group websites,"College/University,Textbook",,,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,30,20,10,20,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,100MB,Neural Networks,"Mathematica,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,Sometimes,,,,10,50,20,20,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT",,,,,Often,,,,,,,,,,Often,,,,,,,,Less than 10% of projects,More internal than external,Other,UCL Datasets,dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,"40,000",CNY,I am not currently employed,8,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Tableau,"Ensemble Methods (e.g. boosting, bagging)",Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Friends network,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,Very useful,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Operations Research Practitioner,Statistician",Self-taught,65,0,35,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Government,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Jupyter notebooks,Microsoft Excel Data Mining,Minitab,Python,R,RapidMiner (free version),SAS Base",,,,,,,,,,,,Sometimes,,,,,Often,,,,,,,,,Sometimes,,,,,Often,,Most of the time,,Sometimes,,,Sometimes,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Time Series Analysis",,Sometimes,,,,Often,,Often,Sometimes,,,,,Sometimes,,Often,,Often,,Often,Often,,Often,,,Sometimes,,Sometimes,,Often,,,,20,60,10,5,5,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j)",Email,,,,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Kenya,27,"Not employed, but looking for work",,,,,,,,Oracle Data Mining/ Oracle R Enterprise,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Necessary,Necessary,Nice to have,,Necessary,Necessary,,,,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Statistician",Self-taught,30,20,20,20,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,,Very Important,Very Important,,Very Important,Very Important,,,,,Very Important,Very Important,Very Important,,Very Important +Male,France,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,"Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,45,0,0,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,MATLAB/Octave,Microsoft Azure Machine Learning,R,Other",,Rarely,,,,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,Simulation,SVMs,Text Analytics",Most of the time,Often,,,,Often,Most of the time,Often,Sometimes,,,,,Sometimes,,Most of the time,,Often,Sometimes,,,,Often,,,Often,Sometimes,Sometimes,Often,,,,,50,30,3,14,3,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,Sometimes,,,,,Most of the time,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Google Search,"Blogs,Newsletters,Online courses,YouTube Videos",,Somewhat useful,,,,,,Not Useful,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher,Software Developer/Software Engineer",University courses,20,35,15,30,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Russia,24,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,,,,Very useful,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Other,500 to 999 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,,,,"Decision Trees,Gradient Boosted Machines,Random Forests","IBM SPSS Modeler,Microsoft Azure Machine Learning,Python",,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",,,,,,Most of the time,Sometimes,Often,Often,,,Often,,Often,,Often,,,,,,,Often,,,Often,,,,,,,,30,70,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Privacy issues,Unavailability of/difficult access to data",,,,,,Often,,,,,,,,,,,Often,,,,Often,,76-99% of projects,Do not know,Standalone Team,Kaggle datasets,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Poland,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Blogs,Conferences,Friends network,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,Data Analyst,Work,35,5,60,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Financial,500 to 999 employees,Stayed the same,3-5 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Sometimes,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM Cognos,IBM SPSS Modeler,Jupyter notebooks,Python,R,SQL,Tableau",,,,,,,,,,Rarely,Often,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,Often,Most of the time,Sometimes,,,,Often,,Often,Sometimes,Often,,,,,Sometimes,,Sometimes,,,,Often,,,,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,,,,,,,,Sometimes,Sometimes,,,,Often,Sometimes,,,,76-99% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,85000,PLN,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,,,,Somewhat useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,Less than a year,"Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,Italy,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,I don't plan on learning a new ML/DS method,R,Google Search,"Blogs,College/University,Friends network,Newsletters,Non-Kaggle online communities,Online courses,Textbook",,Not Useful,Somewhat useful,,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,10,10,0,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,100MB,SVMs,"Hadoop/Hive/Pig,Jupyter notebooks,R,Other",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,"Cross-Validation,Data Visualization,Decision Trees,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,Often,Most of the time,,Often,,,,60,15,0,20,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,Often,Sometimes,,Sometimes,Most of the time,,Often,Most of the time,,,,,Most of the time,,,,,,Most of the time,Sometimes,,100% of projects,More external than internal,Other,Weather data,Having enough data to analyze,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",USB drives,"Git,Subversion",Rarely,16000,EUR,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Female,Finland,28,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,I don't plan on learning a new tool/technology,Genetic & Evolutionary Algorithms,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,6 to 10 years,Researcher,University courses,30,0,20,50,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Text data,Other",,10MB,"Bayesian Techniques,Neural Networks","MATLAB/Octave,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,"Bayesian Techniques,Neural Networks,Simulation",,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,,,,,,,40,20,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Often,,Often,,,,,,,,,,,,Most of the time,,,100% of projects,Entirely internal,Other,,"Getting time to work with it. Nobody in my group does machine learning and thus they tend to appreciate more traditional methods with less risk (and lower reward). As a result, the starting of my ML project has been shifted ""one month later"" again and again for already a year...","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,30000,EUR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Other,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities",,,,,,,Very useful,,Very useful,,,,,,,,,,"Data Machina Newsletter,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,Self-taught,40,30,10,NA,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Evolutionary Approaches","C/C++,Java,Jupyter notebooks,NoSQL,Perl,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,Often,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,Sometimes,Often,,Rarely,,,,,,,,Rarely,Often,,,,,,Most of the time,,,,"Cross-Validation,kNN and Other Clustering,Simulation,Text Analytics",,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,,Rarely,,,,,1,1,0,2,1,95,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Organization is small and cannot afford a data science team,Other",,,,,,,,,,,Sometimes,,,,,Often,,,,,,Most of the time,Less than 10% of projects,Entirely internal,IT Department,,Having real project in the field.,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Male,Italy,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,Very useful,,Somewhat useful,,,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Data Scientist,Self-taught,25,0,25,0,50,0,Other (please specify; separate by semi-colon),Decision Trees - Gradient Boosted Machines,A master's degree,Mix of fields,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Always,10GB,Gradient Boosted Machines,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,,Sometimes,,,Most of the time,,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Often,,Often,Often,,Often,,,Often,Often,,,,30,15,15,10,30,0,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,26-50% of projects,More internal than external,Other,"Open-data, Kaggle-dataset",find useful data online,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git,Subversion",Sometimes,150000,EUR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Neural Nets,Scala,Google Search,Arxiv,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,0,20,0,0,"Natural Language Processing,Recommendation Engines",Logistic Regression,A master's degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Relational data,Other",Most of the time,100GB,"Regression/Logistic Regression,Other","Hadoop/Hive/Pig,Java,Jupyter notebooks,Spark / MLlib,Tableau",,,,,,,,,Often,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Recommender Systems,Segmentation,Text Analytics",Often,,,,Often,Sometimes,Sometimes,,,,,,,Sometimes,,Often,,,Sometimes,,,,,Often,,Sometimes,,,Sometimes,,,,,40,10,40,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,Size,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Git,Sometimes,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Israel,34,"Not employed, but looking for work",,,,,,,,Other,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,3-5 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,Yes,Doctoral degree,Physics,3 to 5 years,"Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",40,10,10,0,40,0,,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important +Male,Egypt,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,80,10,0,10,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data",Sometimes,10GB,"SVMs,Other","Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Spark / MLlib,SQL,Tableau",,,,,,,,,Often,,,,,Sometimes,Most of the time,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,Often,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Naive Bayes,Natural Language Processing,Random Forests",,,Sometimes,,,Sometimes,,,Sometimes,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,,20,30,20,20,10,0,Enough to explain the algorithm to someone non-technical,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Sometimes,,,,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,70,5,10,10,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Text data,Relational data,Other",Rarely,10GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,58,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft R Server (Formerly Revolution Analytics),Survival Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Researcher",Self-taught,40,20,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Other,100 to 499 employees,Stayed the same,3-5 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,Sometimes,Often,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",Sometimes,Sometimes,,,,Sometimes,Often,Often,Sometimes,,,,,,,Often,,,,,,Often,,,,Often,Often,,,Often,,,,40,15,15,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,Often,,,Often,,,,,,,,,Often,,,26-50% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Data Scientist,,Employed by government,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Official documentation,Online courses,Personal Projects",,Very useful,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,,,,,,"Data Elixir Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,30,10,10,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs",High school,Government,"1,000 to 4,999 employees",Increased significantly,1-2 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",,,,"Amazon Web services,Jupyter notebooks,Python",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Natural Language Processing,Text Analytics",,,,,,Sometimes,Often,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,,60,0,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",,,,,Most of the time,,,,,,,,,Sometimes,Often,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,Udata,Git,Sometimes,65000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,R,Google Search,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Statistician",Work,20,35,45,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,IBM Watson / Waton Analytics,KNIME (free version),Python,R,Spark / MLlib,SQL",,Often,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",,Sometimes,Sometimes,Often,,,Most of the time,Often,,,,,,Often,Sometimes,,,Sometimes,Sometimes,Often,,,Most of the time,Often,Often,Sometimes,,Most of the time,Often,,,,,20,50,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,Often,,,,Sometimes,Often,,,Often,Often,Sometimes,Sometimes,,Less than 10% of projects,More external than internal,Standalone Team,"FDA, ImageNet",Dirty Data; Privacy,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,I don't typically share data",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,210000,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Portugal,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,30,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation",Very useful,Very useful,,,,,Very useful,,,Very useful,,,,,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",5-10 years,Nice to have,Necessary,,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Mathematics or statistics,,"Data Analyst,Data Scientist",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Female,Spain,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,R,I collect my own data (e.g. web-scraping),"College/University,Company internal community,Online courses,Stack Overflow Q&A",,,Somewhat useful,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist",University courses,20,30,30,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,10 to 19 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,NoSQL,Python,R,SQL,Tableau,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,Often,Sometimes,,,,,,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,Sometimes,,Often,,,Often,,Sometimes,Sometimes,Often,,,,50,10,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,,Most of the time,,,,,,,,,Often,,,,Most of the time,,,,,51-75% of projects,More internal than external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Rarely,30500,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Poland,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Manufacturing,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,<1MB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Microsoft R Server (Formerly Revolution Analytics),R,SQL",Sometimes,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling",,,,,,,Often,Sometimes,,,,,,,,Sometimes,,,,,Often,Often,,,,,,,,,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Often,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,,Sometimes,,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,South Africa,30,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Python,University/Non-profit research group websites,"Blogs,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Trade book,Tutoring/mentoring",,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,,,,Somewhat useful,Not Useful,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,,Necessary,Necessary,Necessary,Necessary,,Nice to have,Necessary,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Physics,I don't write code to analyze data,Researcher,Self-taught,70,0,0,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important +Male,Czech Republic,34,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by a company that performs advanced analytics,Employed by government",Python,Deep learning,Python,Google Search,"Online courses,Textbook",,,,,,,,,,,Somewhat useful,,,,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,30,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",A master's degree,Government,20 to 99 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Relational data",Sometimes,10GB,"Decision Trees,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests","C/C++,Jupyter notebooks,Python,R,SQL,Other",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,Most of the time,Most of the time,Sometimes,,,,Sometimes,Sometimes,Often,,,,,,Sometimes,Often,,Often,,,Often,Most of the time,,,Often,,,,70,10,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,IT Department,"sentinel, landsat",size,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",mix,Git,Sometimes,25000,CZK,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,,,Not Useful,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,,Somewhat useful,,,,,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,20,10,50,10,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Financial,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,,Rarely,,,,,,Sometimes,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Rarely,,,,,Most of the time,Often,Most of the time,Most of the time,,,Most of the time,,Sometimes,Rarely,Sometimes,,Sometimes,Most of the time,,Often,Sometimes,Most of the time,,,Sometimes,Rarely,Often,Most of the time,Sometimes,,,,60,10,10,5,15,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Often,,Sometimes,,,,,Sometimes,Often,,,,Most of the time,,Less than 10% of projects,More internal than external,Other,,cleansing;labeling,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Bitbucket,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Ukraine,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Cloudera,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,Very useful,,,,,,Somewhat useful,"Data Elixir Newsletter,KDnuggets Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Business Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,60,20,10,0,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,I prefer not to answer,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"CNNs,Random Forests","Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,TensorFlow",,Often,,Rarely,,,,,Rarely,,,,,,Rarely,,Sometimes,,,,,Sometimes,,Rarely,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,"Association Rules,Decision Trees,Natural Language Processing,PCA and Dimensionality Reduction,SVMs",,Sometimes,,,,,,Often,,,,,,,,,,,Most of the time,,Often,,,,,,,Sometimes,,,,,,70,10,0,0,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,Most of the time,,,,,Often,,,Sometimes,,Less than 10% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Git,Most of the time,40000,USD,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Ukraine,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,Very useful,,,,Somewhat useful,,,,,,Somewhat useful,Very useful,"O'Reilly Data Newsletter,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Data Scientist,Software Developer/Software Engineer",Work,50,20,10,15,5,0,Computer Vision,"Ensemble Methods,Neural Networks - CNNs",A master's degree,Pharmaceutical,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Image data,Sometimes,100GB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,Python",,Often,,Often,,,,Often,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs",,,,Often,,Often,Often,,Often,,,,,Often,,Often,,,,Rarely,Sometimes,,Sometimes,,,Often,Often,Rarely,,,,,,15,40,20,25,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,Often,,,,Sometimes,,,,,,,Sometimes,,,,,,,100% of projects,Entirely internal,Standalone Team,MS Coco;ImageNet,Cleaning up,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,40000,USD,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Spain,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Mathematica,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"FlowingData Blog,Linear Digressions Podcast,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Other,No,Doctoral degree,Electrical Engineering,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Reinforcement learning","Bayesian Techniques,Hidden Markov Models HMMs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,India,28,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,R,"Ensemble Methods (e.g. boosting, bagging)",SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,10,10,20,20,0,Time Series,Logistic Regression,A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,10MB,,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,Often,,,,,,,,,,"Association Rules,Time Series Analysis",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,70,10,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,,Often,,,,,,,,,,Most of the time,,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,,Company Developed Platform,,,Sometimes,1400000,INR,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,France,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,0,8,2,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",,100GB,"Decision Trees,Regression/Logistic Regression,RNNs,SVMs","Google Cloud Compute,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Jupyter notebooks,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,Spark / MLlib,SQL,TensorFlow",,,,,,,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,Most of the time,,,,Most of the time,Sometimes,,,,Often,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,,Often,Often,Often,Most of the time,Most of the time,Often,,Often,Often,,,Often,Most of the time,Often,Often,Most of the time,,,,Most of the time,Often,Sometimes,,Most of the time,Most of the time,Often,Most of the time,,,,65,20,5,5,0,5,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,,,,Often,,,Often,Sometimes,,,Sometimes,,,Often,Most of the time,Often,Often,,26-50% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,50000,EUR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,,,,,Very useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,"FastML Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,90,5,0,5,0,"Computer Vision,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Image data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,Spark / MLlib,SQL,Tableau,TensorFlow,TIBCO Spotfire",,,,,,,,,,,,,,,,,Often,,,,Sometimes,Sometimes,,,,,,,,,Often,,,,,,,,,,Sometimes,Most of the time,,,Sometimes,Sometimes,Most of the time,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Often,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,,,,Often,Sometimes,Sometimes,,,,Often,Sometimes,,,,,Sometimes,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Never,160000,INR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,No,Yes,Business Analyst,Fine,,Hadoop/Hive/Pig,Anomaly Detection,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping)","Blogs,Friends network,Kaggle,Personal Projects,Tutoring/mentoring",,Not Useful,,,,Somewhat useful,Very useful,,,,,Very useful,,,,,Very useful,,"Becoming a Data Scientist Podcast,The Analytics Dispatch Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,,Nice to have,,,Necessary,,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,A health science,I don't write code to analyze data,Business Analyst,Self-taught,100,0,0,0,0,0,Unsupervised Learning,,,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Australia,31,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Kaggle,Non-Kaggle online communities,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,Somewhat useful,,,Somewhat useful,,Very useful,,,Somewhat useful,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,Laptop or Workstation and local IT supported servers,,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +Male,France,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,YouTube Videos",,Somewhat useful,,,,,Very useful,Very useful,,Very useful,Very useful,Very useful,Somewhat useful,,,,,Somewhat useful,"Data Elixir Newsletter,No Free Hunch Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",45,45,0,0,10,0,"Computer Vision,Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Other,"10,000 or more employees",Increased slightly,Less than one year,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,10GB,,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R,Spark / MLlib",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,50,5,5,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Unavailability of/difficult access to data",Most of the time,,,,,,,,Most of the time,Sometimes,,,,,Often,,,,,,Sometimes,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Never,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Ukraine,42,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,0,20,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,IBM SPSS Modeler,Cluster Analysis,Java,"Google Search,Government website,I collect my own data (e.g. web-scraping)",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,Researcher,Self-taught,60,20,20,0,0,0,Survival Analysis,"Bayesian Techniques,Logistic Regression",A master's degree,Mix of fields,500 to 999 employees,Stayed the same,6-10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,10MB,"Bayesian Techniques,Decision Trees",IBM SPSS Statistics,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Decision Trees,Logistic Regression,Text Analytics",Sometimes,,Sometimes,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,Sometimes,,,,,30,20,0,10,40,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,census,finding appropriate data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,86000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,France,41,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Java,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,Coursera,GPU accelerated Workstation,2 - 10 hours,Online Courses and Certifications,No,Professional degree,,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,Logistic Regression,A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,, +Male,Spain,32,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Spark / MLlib,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",A doctoral degree,Academic,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Rarely,1GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Java,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,Often,,Often,,,,Often,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"A/B Testing,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks",Most of the time,,,,,,Often,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,40,20,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly",Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,18000,EUR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Germany,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,Not Useful,,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Data Miner,DBA/Database Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,45,45,0,0,10,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,100 to 499 employees,Increased significantly,Don't know,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Don't know,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs,Other","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,,,,Sometimes,Most of the time,Most of the time,Often,Sometimes,,,,,Often,,Often,,,Sometimes,,Sometimes,,Often,Sometimes,,,Sometimes,Sometimes,Often,Often,,,,50,25,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,Often,,Most of the time,,,,,,,,Most of the time,Sometimes,Sometimes,,Rarely,,,Often,Most of the time,,10-25% of projects,More internal than external,IT Department,Scrapes of web pages; stopword lists; compound word lists;lemma lists,"Finding the correct event information, extracting and integrating it.","Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Other",AWS S3,"Git,Other",Rarely,72000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Unix shell / awk,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,30,0,30,10,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Naive Bayes",Often,,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,10,10,70,0,10,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,"Git,Subversion",Sometimes,,,,8,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,70,0,25,5,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Stayed the same,1-2 years,A tech-specific job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Always,1GB,"Evolutionary Approaches,Gradient Boosted Machines,Random Forests","Amazon Web services,C/C++,Java,Jupyter notebooks,NoSQL,Python",,Most of the time,,Sometimes,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Simulation,Time Series Analysis",Most of the time,Sometimes,Sometimes,,,Often,,Often,Often,,,Sometimes,,Sometimes,Sometimes,,,Rarely,,Often,,,Sometimes,Often,,,Often,,,Most of the time,,,,30,40,25,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Most of the time,,,Rarely,Often,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,39,Employed full-time,,,Yes,,Other,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",Amazon Web services,Genetic & Evolutionary Algorithms,R,Google Search,"Blogs,Friends network,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,More than 10 years,,Self-taught,85,0,10,0,5,0,Unsupervised Learning,"Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs",I prefer not to answer,Internet-based,I prefer not to answer,,,,Not at all important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,Evolutionary Approaches,"Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,KNIME (free version),Microsoft Excel Data Mining,NoSQL,Perl,Python,R,RapidMiner (free version),SQL,Unix shell / awk,Other",,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,Often,,,"A/B Testing,Data Visualization,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Naive Bayes,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,5,0,5,10,60,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Jupyter notebooks,Text Mining,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences",Very useful,Very useful,Somewhat useful,,Very useful,,,,,,,,,,,,,,The Data Skeptic Podcast,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,GPU accelerated Workstation,,PhD,Yes,Doctoral degree,Mathematics or statistics,,Statistician,Self-taught,NA,NA,NA,NA,NA,NA,"Speech Recognition,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,1 to 2 years,Data Scientist,Self-taught,60,20,0,0,0,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Financial,20 to 99 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",Often,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,,Most of the time,Most of the time,,,,Often,Most of the time,,Most of the time,,,Most of the time,,Most of the time,,,,,,50,10,5,20,15,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,31,Employed part-time,,,No,Yes,Researcher,Perfectly,"Employed by company that makes advanced analytic software,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,38,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,75,0,0,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Random Forests,Logistic Regression",High school,Telecommunications,100 to 499 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Rarely,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Simulation,Text Analytics",,,,,,,Often,Often,,,,,,,,Often,,,,,,,Often,,,,Sometimes,,Often,,,,,20,40,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,65000,USD,Has decreased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,Master's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Male,Australia,47,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Deep learning,Python,"GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses",,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Manufacturing,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Workstation + Cloud service",Relational data,Sometimes,100GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,Sometimes,,,,,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Gradient Boosted Machines,Logistic Regression,Time Series Analysis",Most of the time,,,,Sometimes,Most of the time,Most of the time,,,,,Often,,,,Often,,,,,,,,,,,,,,Often,,,,30,25,20,15,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Privacy issues,Scaling data science solution up to full database",,Sometimes,,,Often,,,,,,,,,,,,Rarely,Often,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,50000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Greece,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,15,10,15,50,10,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Time Series","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,Sometimes,1TB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression,SVMs","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Naive Bayes,Natural Language Processing,Recommender Systems,Text Analytics",Often,,Rarely,,Often,,,,,,,,,,,,,Sometimes,Sometimes,,,,,Often,,,,,Often,,,,,5,35,35,10,15,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Java,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Non-Kaggle online communities",,,Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,< 1 year,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Traditional Workstation,0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Python,Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Retail,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SAS Base,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests,Segmentation",,,,,,,Most of the time,Often,,,,,,Often,,,,,,,,,Sometimes,,,Often,,,,,,,,60,15,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data",,,Sometimes,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,Self-taught,40,20,40,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Neural Networks - CNNs",,Mix of fields,"5,000 to 9,999 employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,1GB,"Decision Trees,Neural Networks,Random Forests","Microsoft Azure Machine Learning,QlikView,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,Often,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher",University courses,30,0,20,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Mix of fields,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,Often,Often,Sometimes,Sometimes,,,Often,,,,Sometimes,,,Often,,Sometimes,,Sometimes,,,,,Sometimes,Often,,,,,35,20,30,5,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,Often,,,,Sometimes,,,,,,,,,Sometimes,,Often,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Other,Rarely,,,Has decreased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,Indonesia,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Tutoring/mentoring,Other",,,,,,,,,,,,,,,,,Very useful,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,,,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Ensemble Methods,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important +Male,France,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,Very useful,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",0,10,40,30,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests","Amazon Web services,Flume,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,Spark / MLlib,SQL,TensorFlow",,Often,,,,,Rarely,,Rarely,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,Most of the time,Most of the time,Sometimes,Most of the time,,,Most of the time,,Sometimes,Often,Rarely,,,Sometimes,Sometimes,Often,Sometimes,Often,,,Sometimes,Sometimes,Rarely,Sometimes,Sometimes,,,,30,15,35,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Often,,,,Sometimes,,,,,Sometimes,,,,,,Rarely,Rarely,,51-75% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst",Self-taught,20,10,30,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,<1MB,"CNNs,Ensemble Methods,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,40,30,0,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,Very useful,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Data Scientist,Self-taught,30,10,50,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,"5,000 to 9,999 employees",Decreased slightly,Less than one year,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,,,Often,,,,,,,,Sometimes,,,,Often,Often,,Often,,,,,,,Often,,,,50,10,5,15,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Limitations in the state of the art in machine learning,Need to coordinate with IT,Privacy issues",Most of the time,Sometimes,,,,,,Often,,,,Often,,,Often,,Sometimes,,,,,,76-99% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Sometimes,45000,EUR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A,Trade book,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,"FastML Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Scientist,Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,20,0,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,"5,000 to 9,999 employees",Stayed the same,More than 10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,SVMs",Sometimes,,,,,Often,,Often,Often,,,,,,,Often,,,,Often,,,Often,,,,,Sometimes,,,,,,70,10,10,0,10,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Always,93000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,"Data Elixir Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Most of the time,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Random Forests,Time Series Analysis",,,,,,,Often,Often,,,,,,,,,,,,,,,Often,,,,,,,Often,,,,25,40,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools",,,,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,51-75% of projects,More internal than external,Standalone Team,N/A,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,0,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Friends network,Online courses,Personal Projects,Textbook,YouTube Videos",,,,,,Somewhat useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Academic,"1,000 to 4,999 employees",Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Other",Most of the time,100GB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,Sometimes,,Most of the time,Most of the time,Often,,,,Often,,Sometimes,,Sometimes,,,,Most of the time,Often,,Sometimes,,,,,,,,,,,30,40,15,15,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,Most of the time,,,Most of the time,,,,,Often,,,,,,,,,100% of projects,Entirely internal,Other,public data of Earth Science institutions all over the world,unification,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,1200000,RUB,Has decreased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Other,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"Predictive Modeler,Other",Work,10,10,80,0,0,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,10GB,Regression/Logistic Regression,"Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,Most of the time,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Lift Analysis,Logistic Regression,Prescriptive Modeling,Segmentation,Simulation",Often,,,,,Often,,,,,,,,,Often,Most of the time,,,,,,Often,,,,Sometimes,Most of the time,,,,,,,40,35,10,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Need to coordinate with IT",,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,185000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,45,"Not employed, but looking for work",,,,,,,,Julia,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Company internal community,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,,,Very useful,,,Somewhat useful,,,,Very useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),3-5 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",2 - 10 hours,Other,Yes,Master's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important +Male,Iran,45,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,DataRobot,Social Network Analysis,Python,Google Search,"Blogs,Company internal community,Conferences,Friends network,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Management information systems,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Software Developer/Software Engineer",University courses,40,10,10,30,0,10,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Image data,Text data",Sometimes,1GB,"GANs,Neural Networks,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,Java,KNIME (free version),MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,RapidMiner (free version),SQL,TensorFlow",,,,Often,,,,,Sometimes,,,,,,,,,,Sometimes,,Most of the time,,Most of the time,,,,Sometimes,,,,Sometimes,,Sometimes,,Rarely,,,,,,,Often,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Neural Networks,Recommender Systems,Segmentation,Simulation,Text Analytics",,,,,Often,Often,Often,,,Sometimes,,,,Sometimes,,Sometimes,,,,Often,,,,Most of the time,,Sometimes,Most of the time,,Sometimes,,,,,50,20,5,10,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Often,Most of the time,,Most of the time,,,Often,Often,Most of the time,,,Most of the time,Often,,,,Sometimes,Often,Most of the time,Most of the time,,26-50% of projects,Approximately half internal and half external,IT Department,"movie lenz, epinion, enron",,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Most of the time,720000000,IRR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,50,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Researcher,Other",University courses,50,0,10,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,A master's degree,Academic,"5,000 to 9,999 employees",,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,,Regression/Logistic Regression,"R,Tableau,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,Rarely,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,29,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",Other,,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,I don't write code to analyze data,Other,Other,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,43,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Link Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Very useful,Very useful,,Very useful,,,,,Very useful,,,,Very useful,"Data Machina Newsletter,KDnuggets Blog,The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Insurance,500 to 999 employees,Decreased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,Most of the time,,,,,Often,Often,Often,,Most of the time,,,,Most of the time,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Most of the time,Often,,,,,,,Most of the time,,Often,Most of the time,Most of the time,,Often,Most of the time,,,,,Most of the time,Often,Often,,,,50,25,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Most of the time,,,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,IT Department,"TransUnion, Tracking data",Dirty Data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,700000,ZAR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Netherlands,33,"Not employed, but looking for work",,,,,,,,Other,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book",,,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Other (Separate different answers with semicolon),3-5 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Other,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Researcher,Other",Work,10,20,50,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important +Male,United States,58,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Microsoft SQL Server Data Mining,Genetic & Evolutionary Algorithms,SQL,I collect my own data (e.g. web-scraping),"Company internal community,Online courses,Personal Projects",,,,Very useful,,,,,,,Somewhat useful,Very useful,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",University courses,25,0,0,75,0,0,Natural Language Processing,,"Some college/university study, no bachelor's degree",Mix of fields,"5,000 to 9,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,,,"C/C++,Perl,SQL,Other",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,Often,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,80,0,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Often,Often,,Less than 10% of projects,Entirely internal,IT Department,None,Reformatting to match the target system formats,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Subversion,Rarely,79000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Scala,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University",Somewhat useful,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,"GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,,No,Master's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,25,0,75,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Julia,Deep learning,Python,,"Company internal community,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Textbook,YouTube Videos",,,,Somewhat useful,,,Not Useful,,Very useful,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Predictive Modeler,Researcher,Other",Self-taught,20,20,30,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Increased significantly,6-10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10MB,Regression/Logistic Regression,"Amazon Web services,Java,Julia,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,Rarely,Rarely,Rarely,,,,Most of the time,,,,,,,,,,Rarely,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Text Analytics",,,,,,,Most of the time,,,,,,,,,Rarely,,,,,Most of the time,,,,,,Most of the time,,Sometimes,,,,,30,20,0,20,30,NA,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Sometimes,Often,,,,,,,,Sometimes,Sometimes,,,,Most of the time,Most of the time,,,76-99% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Sometimes,185000,USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Sweden,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Amazon Machine Learning,Anomaly Detection,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Arxiv,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,30,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",High school,Internet-based,Fewer than 10 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,"Decision Trees,Neural Networks","Microsoft Azure Machine Learning,QlikView,TensorFlow",,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,kNN and Other Clustering,Neural Networks,Recommender Systems",Often,,Sometimes,Rarely,,,Often,,,,,,,Often,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,10,30,40,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,Privacy issues",,,,,Often,Sometimes,,,,,,,,,,Most of the time,Rarely,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,University courses,15,25,20,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,Very useful,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,1 to 2 years,Business Analyst,University courses,10,20,0,50,0,20,Natural Language Processing,"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1GB,"Decision Trees,Markov Logic Networks,Regression/Logistic Regression","Python,R,SAS Base,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Often,,,,,,,,Rarely,,,Often,,,"A/B Testing,Logistic Regression,Natural Language Processing,Simulation,Text Analytics",Often,,,,,,,,,,,,,,,Often,,,Often,,,,,,,,Sometimes,,Often,,,,,25,10,5,5,5,50,Enough to tune the parameters properly,Limitations of tools,,,,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,90000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,Somewhat useful,,,Somewhat useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,0,30,0,20,20,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased significantly,1-2 years,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,,,Sometimes,Most of the time,Most of the time,Often,Often,,,,,Often,,Often,,,,,Often,,Often,Sometimes,,Sometimes,,,Sometimes,,,,,50,10,14,14,12,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",,,,,Often,,,,,,,,,Sometimes,,,Often,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,"120,000",USD,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Female,Belgium,43,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,1 to 2 years,"Researcher,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Very useful,,Very useful,,,,Somewhat useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,3 to 5 years,Data Analyst,Work,40,0,40,20,0,0,"Natural Language Processing,Recommendation Engines","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,Fewer than 10 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Regression/Logistic Regression,SVMs","Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,Most of the time,Often,,,,,,,,,Most of the time,,,Often,Sometimes,Sometimes,,,,,,,Sometimes,Often,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Often,,,,,,,,,,,,Often,,,,,,Often,,,10-25% of projects,More internal than external,Standalone Team,IBGE (Brazil's census),"Expanding it into various public datasets, so that insights may be easily transferred for new projects.","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Bitbucket,Rarely,48000,BRL,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,Other,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Friends network,Kaggle",,,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,"KDnuggets Blog,No Free Hunch Blog",3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Programmer",University courses,30,0,0,60,10,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,Other,34,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",25,45,30,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,42,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,49,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",Very useful,,,,,,Very useful,,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,,,,"O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,More than 10 years,"Programmer,Software Developer/Software Engineer",University courses,10,9,0,80,1,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",High school,Mix of fields,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Rarely,10MB,"Decision Trees,Neural Networks","Amazon Web services,Java,R,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,Often,,,,,,Rarely,,,,"Decision Trees,Neural Networks",,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,20,20,10,10,60,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,78000,GBP,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United Kingdom,21,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Neural Nets,Python,Google Search,College/University,,,Very useful,,,,,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,,University courses,0,0,100,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,Don't know,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Other,,,,"NoSQL,Python,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,"Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Sometimes,,,,80,0,0,20,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,33000,GBP,I was not employed 3 years ago,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Software Developer/Software Engineer,Statistician",Self-taught,60,0,20,20,0,0,"Machine Translation,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation",Text data,Sometimes,,"Bayesian Techniques,HMMs,Neural Networks,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Java,Perl,Python,R,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Often,,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,62,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,I don't plan on learning a new tool/technology,Deep learning,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Conferences,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,,,,Very useful,,Very useful,,,,,Very useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Other,60,0,5,0,10,25,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A professional degree,Other,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Always,100MB,"Ensemble Methods,Evolutionary Approaches,Neural Networks,Regression/Logistic Regression","Java,Spark / MLlib,SQL,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,Most of the time,,,"Collaborative Filtering,Cross-Validation,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,,,,Sometimes,Most of the time,,,Most of the time,Often,,,,Sometimes,,Often,,,Most of the time,Sometimes,Sometimes,,,Sometimes,,,,,Most of the time,,,,,40,40,20,0,0,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Other,,cleaning the texts,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,105000,USD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,No,Yes,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,R,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Official documentation,Online courses",,Somewhat useful,Very useful,,,,,,,Very useful,Very useful,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Online Courses and Certifications,No,Professional degree,,I don't write code to analyze data,DBA/Database Engineer,University courses,10,50,0,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,,,,,,,,,,,,,,,,, +Female,United States,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Researcher",Self-taught,20,20,20,20,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,"1,000 to 4,999 employees",Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,1GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,MATLAB/Octave,Python,R",,Rarely,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Often,Often,,Sometimes,,,,,,,Sometimes,,,Sometimes,Sometimes,,,Most of the time,,,,,Sometimes,Most of the time,Sometimes,,,,30,20,20,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,55,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Researcher,Statistician",University courses,40,20,20,20,0,0,"Machine Translation,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Government,20 to 99 employees,Decreased slightly,More than 10 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1GB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression","Amazon Machine Learning,Google Cloud Compute,IBM SPSS Statistics,Tableau",Often,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,"Bayesian Techniques,Logistic Regression,Markov Logic Networks,Time Series Analysis",,,Often,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,Often,,,,30,20,20,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Often,,,,,,,,,Most of the time,,,,Sometimes,Sometimes,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"40,000",USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Other,Work,55,0,5,40,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,500 to 999 employees,Increased significantly,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Always,10MB,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Often,,,,Most of the time,,,Often,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Text Analytics",Rarely,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,,,,,70,5,5,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,NA,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Miner,Data Scientist,Researcher,Statistician",University courses,40,5,5,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Pharmaceutical,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,29,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,Business Analyst,Self-taught,10,10,60,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Greece,29,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Very useful,,Somewhat useful,Somewhat useful,,Not Useful,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,O'Reilly Data Newsletter",3-5 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,DBA/Database Engineer,Other",University courses,10,20,10,60,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Poland,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by non-profit or NGO,Julia,Genetic & Evolutionary Algorithms,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Trade book",,Somewhat useful,Somewhat useful,,Very useful,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,20,50,10,10,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Non-profit,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Most of the time,100GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Java,Julia,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Tableau,TensorFlow",,,,,,,,,,,,,,,Often,Sometimes,Often,,,,,Often,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,Often,Sometimes,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,RNNs,Segmentation,Time Series Analysis",,,,Sometimes,Sometimes,Often,Often,Often,Often,Rarely,Sometimes,Often,,Sometimes,,Sometimes,,,,Often,Sometimes,Sometimes,Often,,Sometimes,Sometimes,,,,Rarely,,,,50,20,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Often,Sometimes,,,Sometimes,Often,Sometimes,Often,Often,,,Sometimes,Sometimes,,Sometimes,Sometimes,Sometimes,,76-99% of projects,More external than internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,150000,PLN,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,South Africa,29,"Not employed, but looking for work",,,,,,,,Python,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,,"KDnuggets Blog,Linear Digressions Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,Less than a year,Engineer,University courses,30,40,0,20,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important +Male,United States,40,Employed full-time,,,Yes,,Statistician,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Stan,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",20,40,20,0,10,10,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,,Regression/Logistic Regression,"Jupyter notebooks,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,,,,Often,Often,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Often,,,,15,15,15,10,5,40,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,76-99% of projects,Entirely internal,Other,Census; business points; bloomberg;,updating data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,,Sometimes,,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,18,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Unix shell / awk,Monte Carlo Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,Very useful,,,Somewhat useful,Somewhat useful,O'Reilly Data Newsletter,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,PhD,Yes,Doctoral degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Natural Language Processing,Reinforcement learning","Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important +Female,Canada,40,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Amazon Web services,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,,Less than a year,I haven't started working yet,University courses,0,15,0,85,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,28,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,25,25,5,25,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Not important,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important +Male,Spain,43,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,R,Google Search,"Blogs,Online courses,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Very useful,,Somewhat useful,Somewhat useful,,,,,"Partially Derivative Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",Self-taught,25,25,0,0,0,50,Computer Vision,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +,,NA,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",70,20,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",,A master's degree,Financial,,,,An external recruiter or headhunter,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data,Relational data",Sometimes,10PB,,"C/C++,Cloudera,DataRobot,Hadoop/Hive/Pig,Impala,Java,Julia,Jupyter notebooks,NoSQL,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Spark / MLlib,SQL,Tableau,TensorFlow",,,,Often,Most of the time,Most of the time,,,Most of the time,,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,Often,,,,,Most of the time,Most of the time,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,51,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by professional services/consulting firm,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Very useful,,Very useful,,,Very useful,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,35,20,0,5,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,100GB,"HMMs,Neural Networks,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,KNIME (free version),Python,R,Other",,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,"Cross-Validation,Decision Trees,Logistic Regression",,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,50,30,10,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input",,Often,,,Most of the time,,,Often,,,Often,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,Kaggle,Data understanding,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,Rarely,"54,800",EUR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,Canada,28,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by government,Python,Anomaly Detection,Python,Government website,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,Supervised Machine Learning (Tabular Data),Ensemble Methods,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Stories Podcast,FastML Blog,The Analytics Dispatch Newsletter",1-2 years,Necessary,,Necessary,,Necessary,,,,,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,PhD,No,Bachelor's degree,Computer Science,,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,,,Very Important,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook,YouTube Videos",,,,,,,,,,,,Very useful,,,Somewhat useful,,,Very useful,"Data Elixir Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,40,60,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A professional degree,Technology,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1TB,Bayesian Techniques,"Amazon Web services,Cloudera,Hadoop/Hive/Pig",,Rarely,,,Rarely,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,20,20,0,0,Enough to tune the parameters properly,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,75000,USD,I was not employed 3 years ago,2,,,,,,,,,,,,,,,,,, +Male,Germany,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Conferences,Kaggle,Newsletters,Online courses,Textbook",,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Researcher,Other",Self-taught,20,20,40,15,5,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Hadoop/Hive/Pig,Java,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,Often,,,,,,Often,,,,,,Rarely,,,,,,Often,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Time Series Analysis",,,,,,Often,Often,,,,,,,Sometimes,,Often,,,,Often,Sometimes,,Sometimes,,Sometimes,,,,,Often,,,,65,10,15,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations of tools,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,,,,Often,,Sometimes,,,,,Most of the time,Often,,51-75% of projects,More external than internal,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Bitbucket,Sometimes,65000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,43,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Company internal community,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,Very useful,,,Very useful,,,,,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,,Less than a year,"Business Analyst,Other",Kaggle competitions,40,10,20,30,0,0,Time Series,Logistic Regression,A professional degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important +Male,United Kingdom,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Online courses,Textbook",,,,Not Useful,Very useful,,,,,,Very useful,,,,Somewhat useful,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,"10,000 or more employees",Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,,,,"Java,Jupyter notebooks,NoSQL,Python,QlikView,SQL",,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,,,Often,,,,Most of the time,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,25,10,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,Most of the time,,,,,,,,Sometimes,Most of the time,Sometimes,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Bitbucket,Subversion",Rarely,63000,GBP,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,68,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,R,I don't plan on learning a new ML/DS method,,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Other",Self-taught,60,40,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,Other,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,5,5,0,10,20,0,Enough to explain the algorithm to someone non-technical,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Sometimes,,,100% of projects,Entirely internal,Business Department,,Understanding the context of entry,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,"40,000",USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,34,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Online courses,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Very useful,Very useful,,,,,,,,Very useful,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Data Analyst,Self-taught,60,30,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",High school,Other,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Neural Networks,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,TensorFlow",,,,,,,,,,,,,Most of the time,,,,Often,,,,Sometimes,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs,Text Analytics",Often,,,,Sometimes,Most of the time,Most of the time,,,,,,,,,Often,,,Most of the time,Often,Most of the time,,,,,Often,,Often,Most of the time,,,,,50,20,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Most of the time,,,,,,,,,Most of the time,Most of the time,,,,,,Often,,,Often,Most of the time,Most of the time,26-50% of projects,More internal than external,IT Department,UCL,"Finding public data, finding relevant data","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,841,MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Russia,46,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by government,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects",,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,1-2 years,,,,,,,,,,,,,,Coursera,,2 - 10 hours,,Yes,Master's degree,Management information systems,Less than a year,"Data Analyst,Researcher,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Neural Nets,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,,Very useful,,Very useful,Somewhat useful,,,,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,20,10,35,35,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,Most of the time,,,,,,Sometimes,Most of the time,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Most of the time,Often,,,,Sometimes,,Most of the time,Often,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Often,Most of the time,,Sometimes,Most of the time,Most of the time,,,,,,,Often,,Most of the time,,Most of the time,Sometimes,Sometimes,Often,Sometimes,Often,Sometimes,,Most of the time,,Often,Often,Often,,,,40,25,10,15,5,5,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,,,Sometimes,,,,,Sometimes,Most of the time,,Often,,76-99% of projects,More internal than external,Standalone Team,Prefer not to say,"Data scale is the most difficult part of all of it; accordingly, this makes data preparation a much lengthier process given the need to first iterate on a sub-sample of the data and then deal with inconsistencies as they arise when scaling to the full dataset.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",Company Developed Platform,,"Git,Other",Rarely,125000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Canada,61,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,66,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Statistician,Self-taught,90,0,0,10,0,0,Unsupervised Learning,Support Vector Machines (SVMs),A doctoral degree,Academic,20 to 99 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Traditional Workstation,Relational data,Never,<1MB,SVMs,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,30,60,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,15,15,20,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,Python,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Kaggle,Non-Kaggle online communities,Official documentation,Stack Overflow Q&A",,,,,,,Very useful,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,1 to 2 years,Data Analyst,Self-taught,37,30,20,10,3,0,Time Series,Logistic Regression,A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,1MB,Regression/Logistic Regression,"MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,Often,,,,,Often,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression",,,,,,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,76-99% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Subversion,,,,,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Friends network,Non-Kaggle online communities,Personal Projects,Podcasts,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,Very useful,,,Somewhat useful,,,Somewhat useful,Not Useful,,Somewhat useful,,,Somewhat useful,"FastML Blog,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,"Data Scientist,DBA/Database Engineer,Researcher,Other",University courses,40,0,30,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Other,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,"Bayesian Techniques,Ensemble Methods,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,Rarely,,Often,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests",Sometimes,,Often,,,Most of the time,,,Sometimes,,,,,,,Sometimes,,Rarely,,,,,Often,,,,,,,,,,,85,2,6,4,3,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",Often,Often,Sometimes,Sometimes,Often,Most of the time,,Most of the time,Often,,Most of the time,,,Often,Most of the time,Most of the time,,Sometimes,Often,Sometimes,Most of the time,Most of the time,10-25% of projects,Entirely internal,Business Department,none,"Getting access to data, manipulating / transforming data in a suitable environment","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,I don't typically share data,Share Drive/SharePoint",,Git,Sometimes,52000,GBP,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,5,80,0,5,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,Don't know,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,22,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,40,30,30,0,0,Time Series,Logistic Regression,Primary/elementary school,Academic,Fewer than 10 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,,Basic laptop (Macbook),Text data,Never,1MB,Regression/Logistic Regression,"IBM SPSS Statistics,Julia,R",,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,20,20,20,40,0,0,"Enough to code it again from scratch, albeit it may run slowly",Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Somewhat useful,,Very useful,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,"Coursera,DataCamp,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",11 - 39 hours,Online Courses and Certifications,No,Master's degree,,,"Business Analyst,Data Analyst,Operations Research Practitioner,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Recommendation Engines,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Australia,50,"Not employed, but looking for work",,,,,,,,Jupyter notebooks,"Ensemble Methods (e.g. boosting, bagging)",Python,Google Search,"Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Somewhat useful,,,,,Very useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Jack's Import AI Newsletter,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Unnecessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,Other",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Not important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,43,Employed full-time,,,No,Yes,DBA/Database Engineer,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,"FlowingData Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,Coursera,Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Germany,37,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Textbook",Somewhat useful,Very useful,,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Very useful,,,,"No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,20,10,40,15,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Python,R,SQL,Stan,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,Often,,,Often,,Sometimes,,,,"A/B Testing,Bayesian Techniques,CNNs,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,RNNs,Time Series Analysis",Often,,Most of the time,Sometimes,,,Often,Sometimes,Rarely,,,Sometimes,,Sometimes,,Sometimes,,,,Most of the time,,Most of the time,Often,,,,,,,Most of the time,,,,10,40,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",,,,Often,Most of the time,,,,,,,Sometimes,,Sometimes,,,,,,,,,10-25% of projects,More internal than external,Other,imagenet; cifar; atari games for reinforcement learning,missing fields or erroneous annotations,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Sometimes,"90,000",EUR,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,Somewhat useful,,,Very useful,,Very useful,,Very useful,Very useful,Very useful,,Somewhat useful,,,,Very useful,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity",Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,Less than a year,"Computer Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,20,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,Very Important,Very Important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Not important,Very Important +Male,United States,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,I don't plan on learning a new ML/DS method,R,,"Kaggle,Other",,,,,,,Somewhat useful,,,,,,,,,,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Analyst,Data Miner,Researcher,Other",Work,20,5,75,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,10MB,"Decision Trees,Regression/Logistic Regression","Microsoft Azure Machine Learning,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Segmentation,Text Analytics",,,,,,,Often,,,,,,,,,Sometimes,,,,,,Sometimes,,,,Rarely,,,Rarely,,,,,40,25,0,20,15,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,Rarely,,,,,,,Most of the time,,,Often,,,,Sometimes,,,76-99% of projects,Entirely internal,Standalone Team,,"Easy access to cloud data - no defined API, but data extracts are available for scheduled exports.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"90,000",USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Brazil,58,Retired,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that doesn't perform advanced analytics,R,Cluster Analysis,R,"Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Online courses,Textbook",,Very useful,,,,,,,,,Very useful,,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Operations Research Practitioner,Other",Self-taught,50,0,0,0,0,50,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)",Logistic Regression,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,40,Employed full-time,,,No,Yes,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Deep learning,Python,Google Search,"Kaggle,YouTube Videos,Other",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,,Nice to have,Nice to have,Necessary,,,Nice to have,,,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,Yes,I prefer not to answer,Other,Less than a year,Statistician,Self-taught,30,35,30,5,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,,,,,,,,,,,,,,, +Male,United Kingdom,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,34,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Very useful,,,,Somewhat useful,,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,50,0,0,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting",A master's degree,Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Never,10GB,Gradient Boosted Machines,"Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Gradient Boosted Machines,Recommender Systems",Sometimes,,,,,Often,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,70,5,0,15,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Often,,,,,,,,,,,,Often,,76-99% of projects,More internal than external,IT Department,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Rarely,250000,RUB,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Germany,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Personal Projects,Stack Overflow Q&A",Somewhat useful,Somewhat useful,Very useful,,,,,,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Other",University courses,10,5,15,70,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,10 to 19 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data,Relational data",,100MB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,,,,,,,Sometimes,,Sometimes,,,Often,Often,Often,,,,,,,Sometimes,Sometimes,,,,,10,15,0,10,5,60,Enough to refine and innovate on the algorithm,"Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data,Other",,,,,,,,,,,,,,,,,,Often,,Sometimes,Sometimes,Often,76-99% of projects,Approximately half internal and half external,Standalone Team,"pretty much every public dataset i can get my hands on - though this is harder than it seems, even getting some well known challenge dataset such as the conell 2003 NER challenge is harder than it should be!","getting it in the first place; besides that, having the computing resources to work with it in RAM","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git,Other",Most of the time,46000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,C/C++,Link Analysis,C/C++/C#,Google Search,"Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,Very useful,Very useful,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Other,Less than a year,Other,University courses,20,0,0,80,0,0,,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important +Male,United States,35,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,Very useful,,,,Somewhat useful,"Data Elixir Newsletter,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",< 1 year,Necessary,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Other","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Other,3 to 5 years,"Data Analyst,Predictive Modeler,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",20,20,50,5,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,Researcher,Self-taught,60,20,20,0,0,0,Time Series,,A bachelor's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",,Other,Sometimes,10GB,Regression/Logistic Regression,"Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Often,,,,35,30,20,10,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,Often,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,89500,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Italy,26,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Personal Projects",,,,Somewhat useful,,,,,,,,Very useful,,,,,,,,1-2 years,Necessary,Nice to have,Nice to have,,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Not important +Male,United States,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",0,75,0,0,25,0,Reinforcement learning,Decision Trees - Random Forests,A master's degree,Financial,"5,000 to 9,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,Text data,Sometimes,100MB,Decision Trees,"Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,0,25,25,0,50,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,Less than 10% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,,,80000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Australia,44,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Programmer,Researcher",Other,5,20,30,40,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs","Some college/university study, no bachelor's degree",Financial,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Other,Sometimes,1TB,"Bayesian Techniques,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Simulation,Time Series Analysis",,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,Often,,,Often,,,,50,20,0,30,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,Sometimes,,,,,Often,,,100% of projects,Entirely internal,Other,,"The need to understand the business and technical environment, the need for modelling the hidden business processes well","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,Other,Sometimes,275000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Italy,23,"Not employed, but looking for work",,,,,,,,Julia,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Stack Overflow Q&A",,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Physics,3 to 5 years,I haven't started working yet,University courses,25,0,0,70,5,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Not important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,37,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,,Friends network,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,6 to 10 years,Other,Self-taught,50,24,25,0,1,0,Outlier detection (e.g. Fraud detection),Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,"Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,SQL,Tableau",,,,,,,,,Sometimes,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,100,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,Other,"Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Australia,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Non-Kaggle online communities,,,,,,,,,Very useful,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Other,Basic laptop (Macbook),Other,Sometimes,,,"Amazon Web services,Python",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Neural Networks,Simulation",,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,,,,0,100,0,0,0,0,Enough to tune the parameters properly,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,26-50% of projects,More internal than external,,,,,,,,,,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,South Korea,26,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,Google Search,"College/University,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,Very useful,Very useful,,Very useful,Very useful,Very useful,,,Very useful,,,,Very useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,"Coursera,DataCamp,Udacity,Other",Laptop or Workstation and local IT supported servers,40+,Other,Yes,Bachelor's degree,,1 to 2 years,I haven't started working yet,University courses,15,15,0,20,25,25,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,NA,Employed full-time,,,Yes,,Programmer,,Employed by non-profit or NGO,,,,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,Non-profit,,,,,,,,,,,,"Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,20,70,10,0,0,0,Enough to run the code / standard library,"Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,,,,,,,,,,Git,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Company internal community,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,Very useful,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,1 to 2 years,Other,University courses,10,10,10,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Sometimes,,Often,Sometimes,Rarely,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Most of the time,,,Rarely,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,Sometimes,Sometimes,Most of the time,,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,,55,5,10,5,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,Often,Often,,,,,Often,,,,,,Sometimes,Sometimes,Sometimes,,Often,,76-99% of projects,Approximately half internal and half external,Other,,Accessing it,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Git,Sometimes,110000,USD,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Other,Deep learning,Python,Google Search,"Arxiv,Official documentation,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,,,,,,,,Very useful,,,,Somewhat useful,Very useful,,,Somewhat useful,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,25,20,25,25,5,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Insurance,"10,000 or more employees",Decreased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Rarely,1GB,"Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk,Other,Other,Other",,Most of the time,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Often,Sometimes,,Most of the time,Sometimes,Sometimes,Most of the time,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,Sometimes,Often,Most of the time,Most of the time,,,Most of the time,,,Rarely,,,,Often,,,,,,,,,,Often,Most of the time,,,,55,25,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,,,Often,,,,,Most of the time,,,,,,,Often,,,100% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Share Drive/SharePoint,,Git,Don't know,77000,USD,I was not employed 3 years ago,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Canada,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Self-employed",KNIME (free version),Genetic & Evolutionary Algorithms,R,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,A bachelor's degree,Other,,,,,"N/A, I did not receive any formal education","Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,60,10,20,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,Most of the time,,,,Often,,,,,,,Most of the time,,,,Most of the time,,,None,Entirely internal,IT Department,,Not treated seriously so not very good.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Most of the time,"200,000",CAD,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Australia,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,Computer Scientist,Work,60,10,30,0,0,0,Recommendation Engines,Bayesian Techniques,,Academic,500 to 999 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Java,Mathematica,Python,R,SQL,Tableau",,,,,,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Often,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Random Forests",,Sometimes,Sometimes,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,,,,,,,,,,,60,20,10,10,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,Other,University courses,10,70,0,20,0,0,Computer Vision,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Machine Learning Engineer,Researcher",University courses,20,0,20,60,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",,A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics",,,,,,Most of the time,Often,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,,,,Most of the time,,Sometimes,,,,Most of the time,,Often,,,,,50,30,10,5,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,Rarely,,,,,Often,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,79,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,,Online courses,,,,,,,,,,,Very useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,DataCamp,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,A social science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important +Male,United States,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Java,Deep learning,R,University/Non-profit research group websites,College/University,,,Very useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Jack's Import AI Newsletter",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Business Analyst,Programmer,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,Time Series,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Textbook",,,Not Useful,,,,,,,,,,,,Somewhat useful,,,,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,23,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,31,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Unsupervised Learning,Bayesian Techniques,,Other,Fewer than 10 employees,Decreased significantly,Don't know,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,,Relational data,Rarely,,Bayesian Techniques,"C/C++,MATLAB/Octave,Python,R",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,"Cross-Validation,kNN and Other Clustering,Prescriptive Modeling",,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,20,30,20,10,20,0,Enough to refine and innovate on the algorithm,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Subversion,,,,,,,,,,,,,,,,,,,,,,, +Male,India,41,"Not employed, but looking for work",,,,,,,,R,Regression,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"DataCamp,Udacity",Other,11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Other,1 to 2 years,Engineer,Kaggle competitions,30,20,0,0,50,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,South Korea,23,Employed full-time,,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,40,40,0,0,0,Unsupervised Learning,Logistic Regression,,Technology,100 to 499 employees,Increased significantly,1-2 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,,,50,10,5,25,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools",,Often,,,Sometimes,,,,Sometimes,,,,Sometimes,,,,,,,,,,26-50% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Julia,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,,,,,,,,,,Very useful,,,,Somewhat useful,R Bloggers Blog Aggregator,3-5 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,More than 10 years,Programmer,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Other",University courses,10,0,10,80,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"1,000 to 4,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Most of the time,10MB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,R,Tableau",,,,,Most of the time,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Time Series Analysis",,Sometimes,,Sometimes,,Sometimes,Often,Sometimes,Often,,,Often,,Often,,Often,,,,Often,Often,,Most of the time,,,,,,,Often,,,,10,30,20,20,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,51-75% of projects,Entirely external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,1600000,,Has increased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,Vietnam,22,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring",,Somewhat useful,Not Useful,,Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Not Useful,Not Useful,Somewhat useful,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Other,Yes,Bachelor's degree,"Information technology, networking, or system administration",,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Not important,Very Important +Male,Other,24,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,27,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Researcher,University courses,50,0,25,25,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,100TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,RNNs","Amazon Machine Learning,Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,RNNs,Segmentation,Time Series Analysis",Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Researcher,Statistician",Self-taught,50,0,50,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Text data",Most of the time,100PB,"CNNs,Neural Networks,Random Forests","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,Text Analytics",Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,Employed part-time,,,Yes,,,,,Python,,,,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Yes,,,,,,50,0,0,0,50,0,,,,,,,,,,,,,,,,"C/C++,Hadoop/Hive/Pig,Java,Python,Spark / MLlib,SQL",,,,Rarely,,,,,Sometimes,,,,,,Rarely,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,0,0,50,0,,Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,R,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,Other",,,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,,,< 1 year,,Nice to have,Nice to have,,,Nice to have,Nice to have,Nice to have,,,,,,,Basic laptop (Macbook),40+,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Other,Less than a year,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",45,15,0,0,20,20,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,,,,,,,,,,, +Male,Canada,45,Employed full-time,,,Yes,,Other,Fine,Employed by government,Microsoft Azure Machine Learning,Anomaly Detection,R,Government website,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Government,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Not very important,Other,Workstation + Cloud service,Text data,Never,10TB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM Cognos,IBM SPSS Statistics,KNIME (free version),Mathematica,Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Simulation,Time Series Analysis",,,,,,,Often,Often,,,,Sometimes,,Sometimes,,Often,,,,,,,Often,,,Sometimes,Often,,,Often,,,,10,40,0,10,40,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Privacy issues",Often,,,,Often,,,,,,,,,,,,Most of the time,,,,,,76-99% of projects,More internal than external,IT Department,,De-identification of health data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Never,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,Somewhat useful,Very useful,Very useful,,,Very useful,Somewhat useful,Somewhat useful,,Very useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,5,0,60,30,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,500 to 999 employees,Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,,,"Hadoop/Hive/Pig,Java,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Rarely,Sometimes,,,,,,Sometimes,,,,Prescriptive Modeling,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,0,0,10,0,0,90,,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,Rarely,,,Rarely,,Rarely,,,Rarely,,,Rarely,,,,,,Rarely,,,10-25% of projects,More internal than external,Business Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),I don't typically share data,,Git,Rarely,153000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Female,United Kingdom,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",Cloudera,Bayesian Methods,Python,Government website,Conferences,,,,,Somewhat useful,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,,Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,CRM/Marketing,,,,,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,100MB,,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Rarely,,,,Most of the time,,,,,,,,,,"A/B Testing,Segmentation,Text Analytics",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,,,,80,0,10,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Most of the time,Most of the time,,,,,,,Often,Sometimes,,,,,,Sometimes,Often,,,,Often,,Less than 10% of projects,More external than internal,Central Insights Team,"Acxiom, FB",Not enough personalised data e.g. demographics,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,110000,GBP,Has increased between 6% and 19%,3,,,,,,,,,,,,,,,,,, +Female,Belgium,23,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Data Miner,Fine,Employed by government,Hadoop/Hive/Pig,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","College/University,Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,Somewhat useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Miner,Machine Learning Engineer",Work,10,15,60,5,5,5,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Government,"1,000 to 4,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Sometimes,1MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Python,R,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,Often,Most of the time,,,Rarely,Often,Most of the time,,,,,,Often,,,,Most of the time,,Sometimes,Sometimes,,,,,,,Most of the time,Often,,,,,50,30,5,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,Sometimes,,,,,,Often,,Often,Often,Sometimes,,,,,,,Most of the time,Most of the time,,51-75% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Most of the time,"120,000",ETB,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,No,Yes,Predictive Modeler,Perfectly,Employed by a company that performs advanced analytics,IBM SPSS Modeler,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping)","Company internal community,Online courses,Stack Overflow Q&A,YouTube Videos",,,,Somewhat useful,,,,,,,Very useful,,,Somewhat useful,,,,Very useful,KDnuggets Blog,1-2 years,Nice to have,Nice to have,,Nice to have,Necessary,Nice to have,Nice to have,,Necessary,Nice to have,,,,"Coursera,DataCamp,edX",Laptop or Workstation and local IT supported servers,0 - 1 hour,Experience from work in a company related to ML,Yes,Master's degree,Other,I don't write code to analyze data,"Data Analyst,Data Scientist,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important +Male,United Kingdom,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Podcasts,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,Very useful,,Somewhat useful,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,5,90,5,0,0,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Telecommunications,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Text data,Never,10GB,"HMMs,Neural Networks,RNNs","IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,Sometimes,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Text Analytics,Time Series Analysis",,,,,,Often,Often,,,,,,Sometimes,Often,,Sometimes,,,Most of the time,Most of the time,,,,Often,Sometimes,,,,Often,Sometimes,,,,80,5,0,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT",Sometimes,Often,,,Often,Sometimes,,Sometimes,,Most of the time,,,,,Often,,,,,,,,10-25% of projects,More external than internal,Standalone Team,open source data sets,cleaning up the data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Rarely,50000,GBP,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Poland,23,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",35,25,25,5,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data",Sometimes,1GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,Sometimes,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,60,20,5,10,5,0,Unsupervised Learning,Logistic Regression,High school,CRM/Marketing,20 to 99 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Relational data,Sometimes,10GB,"Regression/Logistic Regression,SVMs,Other","Amazon Web services,Hadoop/Hive/Pig,NoSQL,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Logistic Regression,Segmentation,SVMs",Often,Sometimes,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,,40,20,10,10,20,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,Sometimes,,Sometimes,Sometimes,,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,10-25% of projects,Approximately half internal and half external,Standalone Team,,to make it proper with the dataset specification and not losing a lot of data when migrate it from one source to another source,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",I don't typically share data,,"Git,Other",Never,700000,IDR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Other,Deep learning,Python,I collect my own data (e.g. web-scraping),"College/University,Conferences,Friends network,Online courses,Personal Projects,Textbook",,,Very useful,,Somewhat useful,Somewhat useful,,,,,Somewhat useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,Work,30,10,40,20,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Internet-based,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Random Forests,RNNs","Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Random Forests,Recommender Systems",Sometimes,,,,,Most of the time,Sometimes,,Most of the time,,,Most of the time,,,,,,Sometimes,Often,,,,Sometimes,Rarely,,,,,,,,,,35,30,35,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,Less than 10% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Other,Sometimes,2400000,RUB,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Singapore,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Anomaly Detection,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Somewhat useful,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,10,25,0,15,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Don't know,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,Sometimes,,,Sometimes,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,Sometimes,,,,,,30,20,30,5,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,Sometimes,Often,Sometimes,,Sometimes,,,,,,,Often,,,Often,,Often,,,Less than 10% of projects,Entirely internal,Other,N/A,"Unbalanced, non-standard data logging ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Subversion,Sometimes,60000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Ukraine,23,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,No Free Hunch Blog,< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,"Coursera,Udacity",Basic laptop (Macbook),40+,Experience from work in a company related to ML,Yes,Bachelor's degree,,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,,Somewhat useful,Not Useful,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Scientist,Researcher",University courses,45,5,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,1GB,"CNNs,Decision Trees,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,Often,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,,Often,,Most of the time,Often,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,Rarely,,Often,Often,Sometimes,Sometimes,,,,Sometimes,Often,,Often,,Sometimes,Most of the time,Sometimes,Sometimes,,Sometimes,,Sometimes,,,Sometimes,Often,,,,,30,30,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools",Sometimes,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,,,,26-50% of projects,More internal than external,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Other",Google drive,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,150000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,India,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,Microsoft Excel Data Mining,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Not Useful,,,,,Somewhat useful,,,,,,,Somewhat useful,,,,Very useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,,,,Most of the time,Most of the time,,,,,,,,Often,,,,Often,,,Sometimes,,,,,,Most of the time,Often,,,,10,20,10,40,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Most of the time,,,,,Often,,,,,,Often,,,51-75% of projects,Entirely external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,Switzerland,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Poorly,Employed by company that makes advanced analytic software,Other,"Ensemble Methods (e.g. boosting, bagging)",Python,GitHub,"Blogs,Kaggle,Official documentation,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,Very useful,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Image data,Other",Most of the time,1TB,"CNNs,Evolutionary Approaches,SVMs","C/C++,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Data Visualization,Evolutionary Approaches",,,,Often,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,20,40,20,20,0,0,Enough to run the code / standard library,"Did not instrument data useful for scientific analysis and decision-making,Dirty data",,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Bitbucket,Git,Subversion",,80000,CHF,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A bachelor's degree,CRM/Marketing,"10,000 or more employees",Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,40,Employed full-time,,,Yes,,Other,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,36,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Textbook",,Somewhat useful,,,,,Very useful,,Very useful,,Very useful,,,,Somewhat useful,,,,"Data Elixir Newsletter,Data Machina Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",15,50,30,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,"1,000 to 4,999 employees",Increased significantly,Less than one year,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,10MB,"Random Forests,Regression/Logistic Regression","DataRobot,Python,R,SQL,Tableau",,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,Rarely,,,,,,,"Association Rules,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,Rarely,,,,,,,,,,,,Often,,Often,,,,,Often,,Often,,,Often,,,,Often,,,,20,20,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Often,,,Most of the time,,,,,,,,Often,,,Often,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,27000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,Very useful,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,45,25,20,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Increased significantly,1-2 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Other",Most of the time,10GB,"CNNs,Decision Trees,Neural Networks,Random Forests,SVMs","Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Often,,,,"Decision Trees,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics",,,,,,,,Often,,,,,,,,,,,Often,Sometimes,Often,,Often,,,,,Often,Often,,,,,50,30,10,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,I prefer not to say,Need to coordinate with IT",,,,,Sometimes,,Often,,,,,,,,Often,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,NA,Na,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Other,Rarely,1800000,INR,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Indonesia,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,30,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Retail,20 to 99 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Rarely,10MB,SVMs,"C/C++,MATLAB/Octave,R",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Collaborative Filtering,Decision Trees,SVMs",,,,,Often,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,20,60,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,None,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Bitbucket,Never,45000000,IDR,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Poland,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Podcasts,Textbook",,,,,,,,,,,,Very useful,Somewhat useful,,Very useful,,,,"Data Machina Newsletter,Linear Digressions Podcast,No Free Hunch Blog",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,I prefer not to answer,Mathematics or statistics,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important +Male,Republic of China,28,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that performs advanced analytics,Python,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),"Blogs,Online courses",,Very useful,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,FastML Blog",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,100,0,0,0,0,0,Natural Language Processing,Bayesian Techniques,High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Not important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,People 's Republic of China,22,Employed full-time,,,No,Yes,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Mathematics or statistics,Less than a year,Statistician,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important +Male,Italy,30,Employed full-time,,,Yes,,Engineer,Poorly,Employed by college or university,Java,Deep learning,Python,GitHub,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Scientist,Engineer,Researcher",University courses,5,5,10,80,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,A doctoral degree,Academic,Fewer than 10 employees,Stayed the same,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Image data,Rarely,10MB,"Decision Trees,Regression/Logistic Regression","C/C++,KNIME (free version),MATLAB/Octave,Python,R,SAS Enterprise Miner,SQL",,,,Rarely,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,Sometimes,,,,,,Rarely,,,Sometimes,,,,,,,,,,"Data Visualization,Decision Trees,PCA and Dimensionality Reduction",,,,,,,Often,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,10,50,0,20,20,0,Enough to run the code / standard library,"Dirty data,Need to coordinate with IT,Privacy issues",,,,,Often,,,,,,,,,,Often,,Sometimes,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Other,Sometimes,"20,000",EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,25,15,50,0,0,,"Bayesian Techniques,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Other,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Microsoft Azure Machine Learning,Python,R",,,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,Often,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Gradient Boosted Machines,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction",,,Often,,,Often,Often,,,,,Often,,,,,,Often,Often,Often,Often,,,,,,,,,,,,,0,0,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues",Often,,,,Often,,,,Often,,,,,,,,Often,,,,,,26-50% of projects,More external than internal,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,53,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,,Self-taught,70,0,0,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,23,"Not employed, but looking for work",,,,,,,,Python,Link Analysis,Python,Google Search,"Conferences,Friends network,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,50,20,0,30,0,0,"Machine Translation,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,28,"Not employed, but looking for work",,,,,,,,C/C++,Association Rules,C/C++/C#,,Tutoring/mentoring,,,,,,,,,,,,,,,,,Somewhat useful,,,1-2 years,,,,,,,,,,,,,,,Traditional Workstation,,PhD,No,Master's degree,Computer Science,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",40,20,0,40,0,0,Recommendation Engines,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,29,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,GitHub,"Arxiv,Blogs,Personal Projects,YouTube Videos",Very useful,Somewhat useful,,,,,,,,,,Very useful,,,,,,Somewhat useful,FastML Blog,< 1 year,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),,Github Portfolio,No,Doctoral degree,Electrical Engineering,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,,,,,,,,,,,,,,,, +Male,Greece,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,Becoming a Data Scientist Podcast,1-2 years,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests",Primary/elementary school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important +Female,Australia,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,,"DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Professional degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,,,High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Other,23,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Neural Nets,Python,Google Search,"Kaggle,Non-Kaggle online communities,Online courses,Podcasts,Textbook,YouTube Videos",,,,,,,Very useful,,Not Useful,,Very useful,,Somewhat useful,,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,"Coursera,Other",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,People 's Republic of China,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Miner,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,39,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,Very useful,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,Less than a year,Other,Self-taught,40,25,10,0,25,0,,,A master's degree,Other,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,,1MB,,"C/C++,MATLAB/Octave,R",,,,Rarely,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Segmentation,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,Rarely,,,Often,,,,,30,35,0,20,15,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,Often,,,,,,Sometimes,,,,,,Sometimes,,76-99% of projects,More external than internal,Other,dosimetric data,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Rarely,85000,PLN,Other,7,,,,,,,,,,,,,,,,,, +Female,Hong Kong,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,Arxiv,Very useful,,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,,,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Adversarial Learning,,,Technology,Fewer than 10 employees,Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,,,,,,,Amazon Machine Learning,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making",Rarely,,Often,,,,,,,,,,,,,,,,,,,,None,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,, +Male,Hungary,41,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Not Useful,,,Somewhat useful,Not Useful,,Somewhat useful,Somewhat useful,Somewhat useful,,,Somewhat useful,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",More than 10 years,"Data Miner,Engineer,Programmer",Work,10,10,50,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Manufacturing,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,,Laptop or Workstation and private datacenters,Relational data,Sometimes,<1MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Impala,Python,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Unix shell / awk",,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Rarely,Rarely,Rarely,,Often,,,,,,Sometimes,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,Rarely,Rarely,Often,Sometimes,,,,,,Sometimes,,Often,,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,Often,Sometimes,Rarely,Sometimes,Sometimes,,,,60,10,5,20,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Rarely,Often,Often,,Sometimes,Often,,,Sometimes,,Often,,,Often,,,,Sometimes,,Often,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,15000000,HUF,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Male,United States,61,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,95,0,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Internet-based,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Arxiv,Kaggle,Online courses,Personal Projects,Podcasts",Very useful,,,,,,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,,,,,,"Linear Digressions Podcast,No Free Hunch Blog,Partially Derivative Podcast",< 1 year,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,10,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Netherlands,24,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Machine Learning Engineer,University courses,20,0,0,40,40,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,20 to 99 employees,Increased slightly,Don't know,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Don't know,10GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Other,Time Series Analysis,R,"GitHub,Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),1-2 years,Unnecessary,Necessary,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,No,Bachelor's degree,Mathematics or statistics,,,Self-taught,NA,NA,NA,NA,NA,NA,Recommendation Engines,"Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Very Important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,Romania,21,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,"Ensemble Methods (e.g. boosting, bagging)",Python,"Government website,I collect my own data (e.g. web-scraping)","College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,1 to 2 years,"Operations Research Practitioner,Other",University courses,30,40,0,30,0,0,Time Series,Bayesian Techniques,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,QlikView,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Company internal community,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,Somewhat useful,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,Very useful,,"Jack's Import AI Newsletter,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Statistician",Work,10,10,60,5,15,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Video data,Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Minitab,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL,Tableau,TensorFlow",Often,,,,,,,,Often,Most of the time,Most of the time,Most of the time,Sometimes,,,,Often,,Often,,,Most of the time,Most of the time,,,,,,,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,Often,Often,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,,,,Most of the time,Often,Most of the time,,Most of the time,Most of the time,Often,Most of the time,Often,Most of the time,Often,,Sometimes,Sometimes,Most of the time,Most of the time,Sometimes,,,,10,30,20,10,30,0,Enough to refine and innovate on the algorithm,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",,,Often,,Most of the time,,,,Sometimes,,Often,,Most of the time,Often,Most of the time,,,Often,,,,,51-75% of projects,More internal than external,Central Insights Team,Practice data,Cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,100000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Stack Overflow Q&A",,,,,Somewhat useful,,Very useful,Very useful,Very useful,,,,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Data Analyst,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Mix of fields,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,100MB,"Decision Trees,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),R",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,60,5,0,15,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,Often,Often,,,,Often,,Often,Often,,,,,Often,,Often,,,,100% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,,Rarely,107000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Australia,NA,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Self-taught,95,0,0,0,5,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,Primary/elementary school,Financial,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Rarely,1GB,Neural Networks,"Perl,Python,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,Rarely,,Sometimes,Often,,,"Cross-Validation,Data Visualization,Neural Networks,Text Analytics",,,,,,Rarely,Often,,,,,,,,,,,,,Rarely,,,,,,,,,Sometimes,,,,,20,5,5,50,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,Often,,Most of the time,,,Sometimes,Often,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Git,Rarely,,AUD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,Japan,27,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,37,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Very useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",65,25,0,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,People 's Republic of China,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,GitHub,"Arxiv,College/University,Conferences,Tutoring/mentoring,YouTube Videos",Very useful,,Not Useful,,Somewhat useful,,,,,,,,,,,,Somewhat useful,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",11 - 39 hours,Kaggle Competitions,No,Master's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Genetic & Evolutionary Algorithms,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Personal Projects,Textbook",,,,,,,,,,,,Very useful,,,Somewhat useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Researcher,Other",Self-taught,40,0,40,20,0,0,,,A doctoral degree,Internet-based,500 to 999 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,,"Regression/Logistic Regression,Other","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Logistic Regression,Recommender Systems,Segmentation,Simulation,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,Most of the time,,Most of the time,Sometimes,,,Often,,,,40,10,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Most of the time,,Sometimes,Most of the time,,,Often,,,,,,,,,,,Often,,,,100% of projects,Entirely internal,Standalone Team,None,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Always,65000,,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Software Developer/Software Engineer,Other",University courses,20,35,10,20,15,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A bachelor's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,1GB,Decision Trees,"C/C++,Java,KNIME (free version),NoSQL,RapidMiner (free version),Spark / MLlib",,,,Sometimes,,,,,,,,,,,Often,,,,Often,,,,,,,,Sometimes,,,,,,,,Often,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,,,Very useful,,,Very useful,,,Somewhat useful,,"KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Gradient Boosted Machines,Logistic Regression",Often,,,,,,Most of the time,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,50,20,0,0,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,Often,,76-99% of projects,Entirely external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Sometimes,"84,500",BRL,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,27,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,46,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Conferences,Kaggle",,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,More than 10 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,0,0,100,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"5,000 to 9,999 employees",Increased significantly,6-10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Image data,Rarely,10TB,"Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Python,R",,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Random Forests,SVMs",,,,Sometimes,,Often,,,,,,,,Sometimes,,Often,,,,,,,Often,,,,,Often,,,,,,90,5,0,5,0,0,Enough to tune the parameters properly,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,Often,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Subversion,,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,Python,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Conferences,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,YouTube Videos",,Very useful,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,,Very useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,DataCamp,Udacity,Other",Basic laptop (Macbook),11 - 39 hours,Master's degree,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",15,50,5,20,10,NA,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,11-15,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,"Business Analyst,Data Scientist,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",20,10,70,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Random Forests,Text Analytics",,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,34,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,Management information systems,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Adversarial Learning,Recommendation Engines","Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Conferences,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Somewhat useful,,,Very useful,,,,,,,Very useful,,Somewhat useful,,,,Very useful,"FastML Blog,KDnuggets Blog,No Free Hunch Blog",3-5 years,Nice to have,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",2 - 10 hours,PhD,No,Master's degree,Computer Science,,"Computer Scientist,Machine Learning Engineer",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Not important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Not important +Male,United States,34,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","College/University,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,Very useful,,Very useful,Very useful,,,,KDnuggets Blog,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Master's degree,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important +Male,Ukraine,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,25,15,0,0,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Most of the time,100GB,"Neural Networks,RNNs,SVMs","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow",,Most of the time,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Sometimes,,,,,Often,,,,,,"Cross-Validation,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics",,,,,,Most of the time,,,,,,,,,,,,,Most of the time,Often,,,,,Often,,,Sometimes,Sometimes,,,,,50,40,6,3,1,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Often,Most of the time,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Other,63,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that performs advanced analytics,Orange,Social Network Analysis,R,Other,"Kaggle,Textbook",,,,,,,Very useful,,,,,,,,Very useful,,,,"Data Machina Newsletter,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Predictive Modeler,Kaggle competitions,20,30,20,10,20,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Pharmaceutical,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Always,10PB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Angoss,C/C++,Cloudera,DataRobot,Flume,Google Cloud Compute,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Java,Julia,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Minitab,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Stan,Statistica (Quest/Dell-formerly Statsoft),Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk",Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,11,18,50,20,1,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Explaining data science to others,Limitations in the state of the art in machine learning",,,,Most of the time,,Sometimes,,,,,,Rarely,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,government;pokemon go; stock data; ,,Graph (e.g. GraphBase/Neo4j),Email,,Subversion,Never,120000,GBP,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,Australia,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,TIBCO Spotfire,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,College/University,Kaggle,Online courses,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist",Self-taught,100,0,0,0,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Other,Traditional Workstation,Relational data,Always,10GB,"Bayesian Techniques,Regression/Logistic Regression","SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Time Series Analysis",,,Often,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,,,0,20,20,10,50,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization",,Often,,,,Often,,,Most of the time,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,international and domestic data; administrative payments data,Often poorly documented and with large gaps in metadata,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,110000,AUD,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Germany,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,Amazon Web services,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Conferences,Friends network,Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos",Somewhat useful,Very useful,,,Somewhat useful,Somewhat useful,Somewhat useful,,,,,Very useful,,,,,Very useful,Very useful,"FlowingData Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,20,10,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,Most of the time,,,,Sometimes,Sometimes,,Often,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,Often,Most of the time,Often,Rarely,,,,,Sometimes,,Sometimes,,,,Sometimes,,,Often,,,Often,,Sometimes,Rarely,Most of the time,,,,40,10,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",Sometimes,,Often,,Most of the time,,,,Often,,,,,,Most of the time,,,,,,Often,,100% of projects,Approximately half internal and half external,Business Department,,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,58000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Operations Research Practitioner,Programmer,Statistician",University courses,0,0,0,100,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",,Insurance,20 to 99 employees,Stayed the same,6-10 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,,"Microsoft SQL Server Data Mining,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,Somewhat useful,,,Very useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,edX,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Other,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,70,10,0,0,20,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Self-employed,Spark / MLlib,Genetic & Evolutionary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Personal Projects,Podcasts,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,Very useful,Somewhat useful,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,A health science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,80,15,5,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",,A bachelor's degree,Other,,,,,Not very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,SVMs","Amazon Web services,C/C++,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL",,Often,,Sometimes,,,,,Sometimes,,,,Rarely,,Sometimes,,,,,,,Sometimes,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Prescriptive Modeling,Text Analytics,Time Series Analysis",Often,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,Most of the time,,,,30,50,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,Often,Most of the time,Most of the time,,,,,,,Often,,,,,Most of the time,Most of the time,,,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,,8,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Blogs,College/University,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,,,,,,,,Very useful,Very useful,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Researcher,University courses,20,20,20,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,1GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,NoSQL,Python,QlikView,R,RapidMiner (free version),SQL",,Rarely,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,Rarely,,,,Sometimes,Rarely,Most of the time,,Sometimes,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,,,Sometimes,Most of the time,Most of the time,,,,,,Often,Sometimes,Sometimes,,,Sometimes,Often,Often,,Sometimes,Sometimes,,Most of the time,,,Most of the time,Most of the time,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT,Scaling data science solution up to full database",,Often,,,Most of the time,,,,,,,,,,Often,,,Often,,,,,51-75% of projects,More internal than external,Standalone Team,"DMV, INE",,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,12000000,CLP,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Personal Projects",,Somewhat useful,Very useful,,Very useful,,,,,,,Somewhat useful,,,,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",University courses,20,0,50,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,Python,SQL,Unix shell / awk",,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,"Collaborative Filtering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,,Sometimes,,,,,,,,,,,Often,,Often,Most of the time,,Sometimes,,Sometimes,Sometimes,,,,Often,,,,,,15,5,60,5,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,,,,Sometimes,Often,,,,,,,,,Most of the time,,,Less than 10% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Sometimes,110000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,Somewhat useful,Very useful,,,Very useful,Not Useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Mix of fields,100 to 499 employees,Decreased slightly,3-5 years,A general-purpose job board,Somewhat important,Other,Traditional Workstation,Other,Never,100MB,"Decision Trees,Random Forests","KNIME (free version),Python,R,SQL",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,Sometimes,,,,,,Often,,Often,,,,,,,Sometimes,,,Often,,,,,,,,40,8,0,50,2,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,Often,Often,,,Most of the time,Most of the time,,Often,,,,Often,,Most of the time,,Sometimes,Most of the time,,,100% of projects,Entirely internal,Other,iris; chondro; diamonds; mtcars,You have to acquire data in the lab before you have them!,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,35000,EUR,Has stayed about the same (has not increased or decreased more than 5%),2,,,,,,,,,,,,,,,,,, +Female,Colombia,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Researcher",University courses,10,40,20,30,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,I don't know,Increased slightly,Don't know,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Sometimes,10MB,"Decision Trees,SVMs","Google Cloud Compute,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,Tableau",,,,,,,,Sometimes,,,,,,,,,,,,,Rarely,Sometimes,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,Often,,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Naive Bayes,Random Forests,Recommender Systems,Time Series Analysis",Sometimes,Sometimes,,,Sometimes,Often,Most of the time,,,,,,,,,,,Sometimes,,,,,Rarely,Sometimes,,,,,,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed part-time,,,Yes,,DBA/Database Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,75,0,0,5,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Other",Somewhat useful,Somewhat useful,,,,Very useful,Somewhat useful,,,,Somewhat useful,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Researcher,Self-taught,40,20,30,5,5,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),High school,Academic,100 to 499 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Relational data",,100GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,Often,,,,Most of the time,Sometimes,,,,,,Sometimes,,,,,,Sometimes,Often,,,,,,,Sometimes,,,,,,25,10,5,30,30,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT",,,,Often,Most of the time,,,,,,,,,,Most of the time,,,,,,,,100% of projects,Approximately half internal and half external,Other,,Size of data (PBs),"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,United States,60,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,24,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A health science,1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,30,50,10,0,0,Survival Analysis,Logistic Regression,High school,Government,100 to 499 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,26,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,NoSQL,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,,,Very useful,,Somewhat useful,,,,,"O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,10,10,60,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,Fewer than 10 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Text data,,100MB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,RNNs,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Python,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,Often,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,Rarely,,,,,,,,,,"CNNs,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,35,45,5,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,Often,,,,,,,Often,,,,,,,Most of the time,,Most of the time,,100% of projects,More internal than external,Standalone Team,,,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Git,Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,United States,65,Retired,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Personal Projects,Textbook",,Very useful,,,,,Very useful,,,Very useful,,Somewhat useful,,,Somewhat useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,40,10,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,20,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,Employed full-time,,,Yes,,Machine Learning Engineer,,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Text Mining,Java,GitHub,"Arxiv,Kaggle,Textbook",Very useful,,,,,,Very useful,,,,,,,,Very useful,,,,"Data Machina Newsletter,No Free Hunch Blog,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Programmer",Work,20,10,60,5,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Image data,Video data,Text data",,,"CNNs,GANs,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,Python,TensorFlow,Other",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,Often,,,"A/B Testing,CNNs,Decision Trees,GANs,HMMs,Naive Bayes,Natural Language Processing,Neural Networks,RNNs,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Sometimes,,,,Often,,,Often,,Often,,,,,Most of the time,Most of the time,Most of the time,,,,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,20,20,20,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,,,,,,Sometimes,,,,Often,,,,,Most of the time,Most of the time,Often,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),"Email,I don't typically share data",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,26000,CNY,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Germany,45,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,Python,Google Search,"Company internal community,Online courses,Textbook",,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,,,Sometimes,,,Often,,Most of the time,,,,,,,,,Often,,,Rarely,,,Often,,,,"Association Rules,Cross-Validation,Logistic Regression,Naive Bayes,Neural Networks,Time Series Analysis",,Often,,,,Most of the time,,,,,,,,,,Most of the time,,Often,,Often,,,,,,,,,,Most of the time,,,,10,20,10,30,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,Most of the time,Often,,Most of the time,Often,,Most of the time,,,Sometimes,,,Often,,,,Often,Most of the time,Often,Sometimes,,51-75% of projects,More internal than external,Standalone Team,IDMP,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,98000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Canada,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,54,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,35,0,0,35,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Julia,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",15,40,35,0,0,10,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Sometimes,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","C/C++,Hadoop/Hive/Pig,Julia,Python,R,SAS Base",,,,Rarely,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"Association Rules,Decision Trees,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,Sometimes,,,,,,Often,,,,,,Often,Most of the time,Most of the time,,,,,Often,,Often,,,Often,Sometimes,,,,,,,10,30,30,10,20,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,Often,,,,,Sometimes,,,Most of the time,,Most of the time,,,,Less than 10% of projects,Entirely external,Central Insights Team,,,,,,,Sometimes,130000,INR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Russia,47,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,RapidMiner (free version),I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Trade book,YouTube Videos",,,,,,,Somewhat useful,,,Very useful,,,,,,Very useful,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,15,20,10,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A bachelor's degree,Other,I prefer not to answer,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,Orange,Python,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,Often,,,,Often,,,,,,,,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Naive Bayes,Segmentation,Time Series Analysis",,,,,,,Often,,,,,,,Often,,,,Sometimes,,,,,,,,Sometimes,,,,Often,,,,25,40,10,20,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Limitations of tools",Sometimes,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,100000,RUB,Has increased 20% or more,I prefer not to share,,,,,,,,,,,,,,,,,, +Female,Finland,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,,Very useful,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,10,5,5,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",High school,Internet-based,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,10GB,"Decision Trees,Ensemble Methods,Random Forests,SVMs","NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",Rarely,,,,,Often,Often,,,,,,,,,Rarely,,,,,Rarely,,Sometimes,,,Rarely,,Sometimes,,,,,,10,10,60,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process",Most of the time,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,"Bitbucket,Git,Subversion",Always,60000,EUR,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Italy,35,Employed full-time,,,Yes,,Data Scientist,Fine,,IBM Watson / Waton Analytics,Survival Analysis,,,"Arxiv,Blogs,Friends network,Online courses,Textbook,YouTube Videos",Very useful,Very useful,,,,Very useful,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,50,10,15,25,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,CNNs,Evolutionary Approaches,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Often,,,,Often,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Rarely,,,,,Sometimes,,Often,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Evolutionary Approaches,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Time Series Analysis",,Sometimes,Often,Often,,Most of the time,Most of the time,,,Sometimes,,,Often,Sometimes,,,,,,Most of the time,Often,,,,Sometimes,,,Sometimes,,Most of the time,,,,30,20,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,,,,,,,Never,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Salfrod Systems CART/MARS/TreeNet/RF/SPM,"Ensemble Methods (e.g. boosting, bagging)",Python,University/Non-profit research group websites,"College/University,Online courses",,,Very useful,,,,,,,,Very useful,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Other,Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Master's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Survival Analysis,"Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Ukraine,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Work,50,0,0,0,50,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,9,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,MATLAB/Octave,Proprietary Algorithms,Matlab,Google Search,"Blogs,Conferences,Other",,Not Useful,,,Not Useful,,,,,,,,,,,,,,"Data Elixir Newsletter,Talking Machines Podcast",3-5 years,Unnecessary,,Unnecessary,,,Unnecessary,Unnecessary,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Survival Analysis",Decision Trees - Random Forests,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",R,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects",,,,,,,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,1-2 years,Unnecessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,DataCamp,Traditional Workstation,0 - 1 hour,Experience from work in a company related to ML,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Researcher",University courses,5,20,0,75,0,0,Natural Language Processing,"Evolutionary Approaches,Neural Networks - CNNs",A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important,Not important +Male,United States,39,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,Software Developer/Software Engineer,University courses,30,20,25,25,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Computer Scientist,,,Microsoft Azure Machine Learning,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,30,10,10,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",High school,Technology,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Sometimes,100GB,Random Forests,"Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Spark / MLlib,SQL",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,,,,Often,,Often,,,,,,,,Sometimes,Sometimes,,,,,,,,,,"kNN and Other Clustering,Random Forests",,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,20,10,10,10,50,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,Sometimes,Most of the time,,,,,Sometimes,,,,Often,,,76-99% of projects,Do not know,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,"50,000",BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,53,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Google Cloud Compute,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Blogs,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,Somewhat useful,,Very useful,Very useful,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Software Developer/Software Engineer",Work,0,38,60,0,2,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Hadoop/Hive/Pig,Java,Julia,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL,TensorFlow,Unix shell / awk",Rarely,Sometimes,,,,,,Sometimes,Sometimes,,,,,,Rarely,Rarely,Often,,,,Often,,Sometimes,,,,Sometimes,Sometimes,,,Most of the time,,Often,,,,,,,,,Often,,,,Sometimes,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Often,Often,,,,,Often,Sometimes,Most of the time,,Often,Often,Sometimes,,Often,Often,Sometimes,,,Most of the time,Often,Often,Sometimes,,,,80,1,1,8,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues",Most of the time,Often,,,Most of the time,,,,,Sometimes,Often,,Often,,,Sometimes,Most of the time,,,,,,100% of projects,More internal than external,Business Department,Free ones,Putting the data into a useful input format,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",,0,,Has decreased 20% or more,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,17,"Not employed, and not looking for work",Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Other,I don't write code to analyze data,Other,Other,9,5,6,7,8,65,"Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,23,"Not employed, but looking for work",,,,,,,,Weka,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Very useful,,Very useful,,,,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation",,Master's degree,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,50,0,0,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst",University courses,5,35,35,20,5,0,,,A master's degree,Academic,Fewer than 10 employees,Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,70,15,15,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations in the state of the art in machine learning,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,72,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A social science,More than 10 years,"Data Scientist,Predictive Modeler,Other",Self-taught,80,0,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,54,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",15,30,0,50,5,0,"Time Series,Unsupervised Learning",Logistic Regression,A doctoral degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10TB,Regression/Logistic Regression,"Java,Jupyter notebooks,NoSQL,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,Rarely,,,,Often,,,,,,,,,,"Data Visualization,Simulation,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,C/C++,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects,YouTube Videos",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,,Very useful,"FastML Blog,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,20,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Insurance,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","DataRobot,Java,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,,,,,Rarely,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Often,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,Sometimes,,Often,Often,Often,Often,,,,,Sometimes,,Most of the time,,Sometimes,Most of the time,Often,,,Often,,,,,Most of the time,Most of the time,,,,,10,30,35,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Most of the time,,Most of the time,Often,,Most of the time,,,,,Often,,Often,,10-25% of projects,More internal than external,IT Department,20newsgroups; wikipedia; twitter,"Biggest challenge for me is getting enough labeled data for training, cleanup of redundant document clauses and fixing of textual artefacts (OCR errors, encoding issues).","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Rarely,25000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,36,"Not employed, but looking for work",,,,,,,,R,,R,Google Search,"Kaggle,Textbook,YouTube Videos",,,,,,,Very useful,,,,,,,,Very useful,,,Very useful,,< 1 year,,,Necessary,,Necessary,,Necessary,,,Necessary,,,,,Basic laptop (Macbook),,PhD,No,Master's degree,Computer Science,Less than a year,"Data Miner,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,,,,,,,,,,,,,,, +Female,Other,28,"Not employed, but looking for work",,,,,,,,Python,Anomaly Detection,Python,"GitHub,Google Search","Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,30,10,0,55,5,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Not important,Somewhat important +Male,Belgium,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,Very useful,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,"Becoming a Data Scientist Podcast,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,"Data Analyst,Data Scientist,Researcher",Self-taught,30,50,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Retail,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Logistic Regression,Time Series Analysis",Often,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,30,15,25,15,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,joining datasets,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,,Git,Never,,,,7,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed part-time,,,No,Yes,Other,Poorly,Employed by non-profit or NGO,Python,Regression,Other,University/Non-profit research group websites,"College/University,Online courses",,,Somewhat useful,,,,,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,,,,DataCamp,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Other,Less than a year,I haven't started working yet,University courses,50,40,0,10,0,0,Unsupervised Learning,Other (please specify; separate by semi-colon),High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important +Male,Russia,35,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,37,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,QlikView,Time Series Analysis,R,I collect my own data (e.g. web-scraping),"Kaggle,Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,,,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,,No,Bachelor's degree,Mathematics or statistics,,"Business Analyst,Data Analyst,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Canada,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,"Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Hospitality/Entertainment/Sports,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Video data,Relational data",Sometimes,100GB,"Bayesian Techniques,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,61,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,Deep learning,Python,,"Kaggle,Online courses,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Researcher,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,,"Bayesian Techniques,Evolutionary Approaches",A bachelor's degree,Military/Security,"10,000 or more employees",Decreased slightly,6-10 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,,"Java,Python,R",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Evolutionary Approaches,Simulation",,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,,,,10,80,0,10,0,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,,Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Not Useful,,Somewhat useful,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Data Analyst,Other",Work,33,0,50,10,0,7,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"5,000 to 9,999 employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Microsoft R Server (Formerly Revolution Analytics),Python,R,SQL,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Prescriptive Modeling,Random Forests,SVMs,Time Series Analysis",,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,Often,,Often,,,,Often,Sometimes,,,,,Sometimes,,Often,,,,17,25,10,15,33,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Other",Often,,,,,Often,,,Most of the time,,,,,,,,,,,,,Most of the time,51-75% of projects,Approximately half internal and half external,IT Department,N/A,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Other",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,"95,000",USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,United States,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist",Work,45,10,40,5,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,,,Not Useful,,,,,Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,20,5,30,45,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Hadoop/Hive/Pig,Python,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,Sometimes,,Sometimes,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Rarely,,Often,Most of the time,Most of the time,Most of the time,Sometimes,,,Sometimes,,,,Often,,,Sometimes,,Often,Sometimes,Most of the time,Often,,Sometimes,,Often,Sometimes,Sometimes,,,,40,10,5,25,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,Sometimes,Most of the time,Sometimes,,,,Sometimes,Sometimes,,,Often,,,,,,Most of the time,Most of the time,,51-75% of projects,Entirely internal,Other,census; weather,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,107000,USD,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,Turkey,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Genetic & Evolutionary Algorithms,Python,"GitHub,Google Search","Online courses,Personal Projects,YouTube Videos",,,,,,,,,,,Very useful,Somewhat useful,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Programmer",Self-taught,50,10,30,10,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Internet-based,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Most of the time,10GB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow",,Often,,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,,,Often,,,Often,Sometimes,,,,,,Often,,Sometimes,,,Often,Most of the time,,,Most of the time,,,,,,Most of the time,Sometimes,,,,60,15,5,5,15,0,Enough to refine and innovate on the algorithm,"Dirty data,Other",,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,76-99% of projects,More internal than external,Standalone Team,facebook analytics,Deciding what is relevant,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Commercial Data Platform,Company Developed Platform,Email",,Git,Sometimes,500000,MXN,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,Very useful,,,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,,,,"Data Elixir Newsletter,No Free Hunch Blog",1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,"Coursera,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,No,Master's degree,Physics,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,10,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Not important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important +Male,United States,43,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,Very useful,Somewhat useful,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist",Self-taught,75,0,25,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Other,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",,100GB,"Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SAS Base,Unix shell / awk",,Often,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Rarely,,,,,,,,,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,Sometimes,,,,50,10,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,Sometimes,Sometimes,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,,,100% of projects,Approximately half internal and half external,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,,Git,Sometimes,165000,USD,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,Amazon Web services,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Blogs,College/University,Kaggle,Podcasts",,Very useful,Somewhat useful,,,,Very useful,,,,,,Somewhat useful,,,,,,,1-2 years,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,40,10,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,62,Retired,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Regression,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","Kaggle,Newsletters,Online courses,Personal Projects,Podcasts,YouTube Videos",,,,,,,Somewhat useful,Somewhat useful,,,Very useful,Very useful,Somewhat useful,,,,,Very useful,The Analytics Dispatch Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",35,60,0,0,5,0,Time Series,Logistic Regression,Primary/elementary school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,IBM SPSS Statistics,Social Network Analysis,Matlab,"Government website,I collect my own data (e.g. web-scraping)",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Other,No,Bachelor's degree,Engineering (non-computer focused),,Other,University courses,NA,NA,NA,NA,NA,NA,Speech Recognition,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,South Africa,31,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Conferences,Friends network,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,Very useful,,,,,Very useful,,,Very useful,Somewhat useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher",University courses,20,20,20,40,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Text data,Relational data",Rarely,100MB,"Bayesian Techniques,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Java,Julia,Jupyter notebooks,MATLAB/Octave,Python",,,,Sometimes,,,,,,,,,,,Often,Sometimes,Often,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Segmentation,SVMs",,,Often,Often,,,Often,Sometimes,,,,,Sometimes,Often,,Often,,Rarely,,Often,Often,,,,,Sometimes,,Sometimes,,,,,,0,70,0,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Often,,,Sometimes,Often,,,,,,,,,,,Sometimes,,,26-50% of projects,Do not know,Standalone Team,,,,Commercial Data Platform,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,,ZAR,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,DataRobot,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Company internal community,Conferences,Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",Very useful,Somewhat useful,,Somewhat useful,Not Useful,,Very useful,,Very useful,,Somewhat useful,,,Very useful,,,,,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Biology,More than 10 years,"Data Scientist,Predictive Modeler",Kaggle competitions,25,0,50,0,25,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,100MB,"Ensemble Methods,Gradient Boosted Machines","Amazon Web services,DataRobot,Hadoop/Hive/Pig,NoSQL,Python,R,Unix shell / awk",,Most of the time,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,,,,,Sometimes,,,,"Cross-Validation,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,Most of the time,,Rarely,,Rarely,,Most of the time,,Rarely,,Sometimes,,,Often,Sometimes,Sometimes,,Often,,,,,Rarely,Often,Rarely,,,,0,50,25,0,0,25,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,100% of projects,Entirely internal,Other,,,Document-oriented (e.g. MongoDB/Elasticsearch),Other,S3,Git,Rarely,200000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,"Ensemble Methods (e.g. boosting, bagging)",SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Personal Projects,YouTube Videos",,,,,,,Somewhat useful,,,,,Somewhat useful,,,,,,Somewhat useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Self-taught,20,0,50,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,500 to 999 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Unix shell / awk",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Lift Analysis,Logistic Regression,Time Series Analysis",Sometimes,,,,,Often,Most of the time,,,,,,,,Often,Often,,,,,,,,,,,,,,Often,,,,30,10,10,10,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,quality,Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),I don't typically share data,,Git,Rarely,135000,USD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Very useful,Very useful,Very useful,,,,,,,Very useful,Very useful,,Very useful,Somewhat useful,,,Somewhat useful,"Data Elixir Newsletter,Data Machina Newsletter,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher",University courses,20,10,30,40,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,,"Bayesian Techniques,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,IBM SPSS Statistics,Java,Jupyter notebooks,Python,QlikView,R,SQL,Stan,Tableau,Unix shell / awk",,Often,,,,,,,,,,Rarely,,,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,Rarely,Most of the time,,,,,,,,,Most of the time,Sometimes,,Sometimes,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Text Analytics",Often,,Often,,,Most of the time,Most of the time,Often,Most of the time,,,Sometimes,,,,Most of the time,,,,,Sometimes,,Sometimes,,,Sometimes,Rarely,,Rarely,,,,,40,5,0,35,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,Sometimes,,Often,Most of the time,,,,,,,Often,Often,,,,,Most of the time,Most of the time,,,76-99% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,"Git,Other",Rarely,60000,USD,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Canada,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Very useful,,,,,,,,KDnuggets Blog,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Udacity,Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Yes,Master's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important +Male,Switzerland,34,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by professional services/consulting firm,Amazon Machine Learning,"Ensemble Methods (e.g. boosting, bagging)",R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Blogs,College/University,Conferences,Official documentation,Stack Overflow Q&A,Textbook",Very useful,Very useful,Very useful,,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,Researcher,University courses,35,0,15,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Sometimes,Often,Often,,,,Sometimes,,,,Often,,,,,Rarely,,Sometimes,,,,,,,Often,,,,30,25,15,10,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools",Rarely,Often,,,Most of the time,,,Often,Most of the time,,,,Most of the time,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,115200,CHF,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Deep learning,R,Google Search,"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Very useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Lift Analysis,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,Often,,,,,Most of the time,Often,,,,,,Most of the time,Often,,,,Often,,Often,,Often,,,Often,,,Often,,,,,0,40,20,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Scaling data science solution up to full database",Sometimes,Often,,,,Most of the time,,Often,,,Often,Sometimes,,,,,,Sometimes,,,,,100% of projects,More external than internal,Standalone Team,,Traceability over time - We work basically with surveys. The demographics are not sufficient nor entirely relevant for our questions.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,96000000,COP,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,50,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,Very useful,,,,,15+ years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst",Work,30,10,50,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Other,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Online courses,Podcasts",,,,,Somewhat useful,,,,,,Very useful,,Somewhat useful,,,,,,"Partially Derivative Podcast,Siraj Raval YouTube Channel,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,DBA/Database Engineer,Engineer",Work,0,60,40,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,100 to 499 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,,,,Often,Most of the time,Sometimes,,,,,,Sometimes,,,,,Sometimes,,Often,Often,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,50,20,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Sometimes,,,,,,,Sometimes,,,Often,,,,,Often,,,,,,,51-75% of projects,Entirely internal,Other,census; location data,don't understand the methodology on how the data was collected and how it is updated,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,180000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Google Cloud Compute,Other,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Conferences,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook",Somewhat useful,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,1 to 2 years,Researcher,Self-taught,25,15,35,10,15,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Academic,I don't know,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Other,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Image data,,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,Sometimes,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,Often,,,,,,,Often,,,,,,,Often,,,,35,20,0,20,25,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,Sometimes,,,Often,,,,,,Often,,,,Often,,,,100% of projects,More external than internal,Other,Satellite data; Climate data; Species data;,"preprocessing of satellite and climate data. dirty species data from multiple sources which requires extensive cleaning","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,26000,AUD,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,Google Cloud Compute,,,"GitHub,Google Search","Arxiv,Blogs,Friends network,Kaggle,Textbook,YouTube Videos",Very useful,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,10,50,20,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Image data,Text data",Most of the time,1TB,"CNNs,Neural Networks,RNNs,SVMs","C/C++,Python,TensorFlow,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,Most of the time,,,"CNNs,Cross-Validation,Natural Language Processing,RNNs,SVMs",,,,Most of the time,,Sometimes,,,,,,,,,,,,,Often,,,,,,Most of the time,,,Often,,,,,,10,40,30,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,10-25% of projects,,Standalone Team,,,,,,Git,,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,DataRobot,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Conferences,Friends network,Non-Kaggle online communities,Personal Projects,YouTube Videos",,,,,Very useful,Very useful,,,Somewhat useful,,,Very useful,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,More than 10 years,,University courses,25,0,45,10,0,20,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Mix of fields,Fewer than 10 employees,Stayed the same,More than 10 years,Some other way,Somewhat important,Other,Basic laptop (Macbook),"Text data,Relational data,Other",Most of the time,10MB,"Decision Trees,Ensemble Methods,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Oracle Data Mining/ Oracle R Enterprise",,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Segmentation,Time Series Analysis",,Sometimes,,,,,Often,Often,Sometimes,,,,,Sometimes,Often,Often,,,,,,,,,,Sometimes,,,,Rarely,,,,10,8,2,10,20,50,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Often,Often,,,,,,,,Sometimes,,,,,,,Sometimes,,51-75% of projects,More internal than external,Other,Mortality data from the US Center of Disease Control (CDC),,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,,,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Support Vector Machines (SVM),R,University/Non-profit research group websites,"College/University,Online courses",,,Somewhat useful,,,,,,,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),No education,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,55,Employed full-time,,,No,Yes,Other,Fine,Self-employed,SQL,,R,"Google Search,Government website",Textbook,,,,,,,,,,,,,,,Very useful,,,,,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Doctoral degree,Psychology,I don't write code to analyze data,Researcher,Self-taught,98,0,0,0,2,0,Unsupervised Learning,Logistic Regression,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Not important,Not important,Very Important,Not important,Not important,Not important,,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Female,United States,29,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,R,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,"Researcher,Other",University courses,10,25,30,35,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,,1GB,"Neural Networks,Other","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Most of the time,,Often,,,,"Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,Often,,,,5,80,0,10,5,0,Enough to refine and innovate on the algorithm,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,Most of the time,,,,,,,,Sometimes,,,100% of projects,Do not know,Other,,Hard to fit model to it because highly non-linear,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,"65,000",USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Newsletters,Online courses,Personal Projects",Somewhat useful,Very useful,,,,,Very useful,Somewhat useful,,,Somewhat useful,Very useful,,,,,,,"FastML Blog,No Free Hunch Blog",1-2 years,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Electrical Engineering,Less than a year,,Self-taught,25,25,0,0,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,,Very useful,,Very useful,Very useful,,,Very useful,"KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,Self-taught,40,0,10,10,40,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Often,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Neural Networks,Random Forests",,,,,,Often,Often,Often,,,,,,,,,,,,Often,,,Often,,,,,,,,,,,40,10,0,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Often,,,,Often,,,,,Sometimes,,,,,,Often,,,10-25% of projects,More internal than external,IT Department,mapillary; openstreetmap,Unstructured data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,34000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,0,30,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,100MB,"Neural Networks,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Python",,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,,,,,,Often,,,,,,,,Often,,,,Often,Often,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,,,,Very useful,The Data Skeptic Podcast,< 1 year,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Traditional Workstation",,Online Courses and Certifications,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,25,75,0,0,0,0,Time Series,Logistic Regression,,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,NA,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Textbook",,,,,,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,,,,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,edX,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,Canada,33,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,,,,,Very useful,Very useful,,,,Partially Derivative Podcast,1-2 years,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,Yes,Professional degree,,,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer",Work,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Not important,Not important,Not important +Male,Philippines,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Other,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Rarely,10MB,"Neural Networks,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,Most of the time,,,,30,40,10,10,10,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",Often,,,,Most of the time,,,,Most of the time,,Often,,,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Rarely,900000,PHP,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Japan,40,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"College/University,Conferences,Friends network,Kaggle,Online courses,Textbook,YouTube Videos",,,Not Useful,,Very useful,Very useful,Very useful,,,,Very useful,,,,Very useful,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",< 1 year,Unnecessary,Nice to have,Unnecessary,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,,,,"Coursera,Udacity","Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Kaggle Competitions,No,Doctoral degree,A humanities discipline,Less than a year,"Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",10,70,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important +Male,United States,NA,Employed full-time,,,Yes,,Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Spark / MLlib,Social Network Analysis,Python,Google Search,"Arxiv,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,,,,,,,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,,,Very useful,Somewhat useful,"FastML Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Text data,Other",Most of the time,1GB,"Decision Trees,GANs","C/C++,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Often,Often,,Often,Most of the time,,,,Sometimes,Sometimes,Sometimes,Often,,Often,,Often,Sometimes,Often,Most of the time,Sometimes,Sometimes,Sometimes,Sometimes,Often,Most of the time,Often,Most of the time,Most of the time,,,,40,30,0,30,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database",,,,,Often,Sometimes,,,Sometimes,,Most of the time,Often,,Sometimes,,,,Often,,,,,100% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Rarely,"140,000",USD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Indonesia,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed part-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,MATLAB/Octave,Neural Nets,Python,Google Search,"Blogs,Friends network,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,Somewhat useful,,,,,Very useful,,,Somewhat useful,,,,Very useful,Other (Separate different answers with semicolon),< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp",Traditional Workstation,2 - 10 hours,Master's degree,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Engineer,Software Developer/Software Engineer",Self-taught,20,30,30,15,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Somewhat important,Very Important +Male,Hong Kong,27,Employed full-time,,,Yes,,Other,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,University courses,20,20,10,20,30,0,,,"Some college/university study, no bachelor's degree",Government,"10,000 or more employees",,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,26,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,Other,Neural Nets,Python,"GitHub,Google Search","Textbook,YouTube Videos",,,,,,,,,,,,,,,Somewhat useful,,,Somewhat useful,O'Reilly Data Newsletter,< 1 year,Nice to have,,Necessary,,Nice to have,,,,,,,,,,"Basic laptop (Macbook),Traditional Workstation",0 - 1 hour,Master's degree,Sort of (Explain more),Bachelor's degree,,Less than a year,"Programmer,Other",Self-taught,100,0,0,0,0,0,"Adversarial Learning,Other (please specify; separate by semi-colon)","Logistic Regression,Markov Logic Networks",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,,,,,,,,,,,,,,, +Male,Argentina,56,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by professional services/consulting firm,Tableau,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,Somewhat useful,,,,,,,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,No Free Hunch Blog",3-5 years,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp","Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service",2 - 10 hours,Master's degree,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important +Male,Other,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Engineer,Researcher,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",0,60,0,0,0,40,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting",A master's degree,Other,,,,,Important,Other,"Basic laptop (Macbook),Traditional Workstation",Other,Never,10GB,Other,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Gradient Boosted Machines,Segmentation,Simulation",,,,,,Sometimes,,,,,,Rarely,,,,,,,,,,,,,,Often,Often,,,,,,,94,5,0,1,0,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,48,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses",,,,,,,Somewhat useful,,,,Very useful,,,,,,,,,1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,edX",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,TensorFlow,Genetic & Evolutionary Algorithms,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Kaggle,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,,,,,Very useful,,,,,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Scientist,Predictive Modeler,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,Less than one year,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","R,Spark / MLlib,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Often,,Most of the time,,,,,,,Most of the time,,,,,,,Often,,,,15,10,35,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,,,,,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,100% of projects,More internal than external,Central Insights Team,flightaware,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Rarely,125000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,49,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,I don't plan on learning a new tool/technology,Other,R,GitHub,"Arxiv,Company internal community,Personal Projects,Other",Very useful,,,Somewhat useful,,,,,,,,Very useful,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Statistician",University courses,50,0,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Technology,"10,000 or more employees",Decreased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,Other","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,MATLAB/Octave,Python,R,Spark / MLlib,Stan",,,,,,,,,Sometimes,,Often,Sometimes,Sometimes,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,,Often,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,,,Most of the time,Most of the time,Sometimes,Often,,,,,Often,,Most of the time,,,,Sometimes,Most of the time,Sometimes,Often,,,,Often,Often,Sometimes,Often,,,,10,10,0,10,20,50,Enough to refine and innovate on the algorithm,"Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Sometimes,,,Most of the time,Often,,26-50% of projects,Entirely internal,Business Department,NA,NA,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Share Drive/SharePoint",,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,JPY,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,Google Cloud Compute,Time Series Analysis,Python,Google Search,"Podcasts,YouTube Videos",,,,,,,,,,,,,Very useful,,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Insurance,"10,000 or more employees",Increased slightly,Less than one year,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Never,100MB,Regression/Logistic Regression,"C/C++,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,50,0,0,0,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Often,Often,,,,Often,Most of the time,Often,,,,,,,,,Sometimes,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Subversion,Sometimes,110000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Taiwan,38,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,Python,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Conferences,Friends network,Kaggle,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Somewhat useful,,,,Very useful,Very useful,Somewhat useful,,,,,,,Very useful,,,Very useful,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,40,0,10,5,15,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Technology,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Sometimes,100MB,"CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Python,R,Other",,Rarely,,,,,,Often,,,,,,,,,Often,,,,,Most of the time,,Often,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,Often,,,"CNNs,GANs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs",,,,Often,,,,,,,Sometimes,,,Often,,Most of the time,,,Most of the time,Most of the time,,,Often,Most of the time,Often,,,,,,,,,60,15,5,5,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Privacy issues,Other",,,,,Most of the time,Often,,,,,,,,,,,Often,,,,,Most of the time,10-25% of projects,Do not know,Other,"http://data.gov.tw, http://quandl.com",dirty data and clean data,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Sometimes,0,,Other,8,,,,,,,,,,,,,,,,,, +Male,United States,44,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SAS Base,Deep learning,Python,,Other,,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog",< 1 year,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Unnecessary,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,40,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Arxiv,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,,,,,Very useful,,,Somewhat useful,,,,,,1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Doctoral degree,Physics,1 to 2 years,"Computer Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",25,70,0,0,5,0,Supervised Machine Learning (Tabular Data),Neural Networks - RNNs,High school,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,6 to 10 years,DBA/Database Engineer,University courses,20,10,20,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation",Relational data,Most of the time,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Microsoft SQL Server Data Mining,Python,QlikView,R,SQL,Tableau,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Most of the time,Often,Most of the time,,,,,,,,,Often,,,Often,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Other",Work,40,40,10,10,0,0,,,A master's degree,Insurance,"5,000 to 9,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,27,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,QlikView,Factor Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Podcasts",,Very useful,,,,,,,,,,,Very useful,,,,,,"Becoming a Data Scientist Podcast,The Data Skeptic Podcast,Other (Separate different answers with semicolon)",3-5 years,Necessary,Necessary,,Necessary,Necessary,Necessary,,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,Necessary,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",,Github Portfolio,Sort of (Explain more),Master's degree,,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,Time Series,Logistic Regression,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,,Very Important,Somewhat important,Somewhat important +Female,Spain,41,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,FastML Blog,3-5 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Necessary,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",5,85,5,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,Company internal community,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,20,40,0,0,0,Natural Language Processing,,A bachelor's degree,Academic,10 to 19 employees,Stayed the same,3-5 years,Some other way,Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,1TB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,Limitations of tools,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,33,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Very useful,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,Less than a year,Data Analyst,Self-taught,20,10,30,0,0,40,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Financial,10 to 19 employees,Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Never,1GB,"Random Forests,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Python,SQL",,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,,,50,15,15,15,5,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,26-50% of projects,More internal than external,Standalone Team,Finacial data,Build databases,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Bitbucket,Git",,102000,ILS,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Greece,48,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,Self-taught,98,NA,1,0,0,1,Machine Translation,Neural Networks - CNNs,Primary/elementary school,Technology,20 to 99 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Most of the time,<1MB,CNNs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Cross-Validation,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,80,5,10,5,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,Yes,,Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer",Self-taught,80,5,5,0,5,5,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,10 to 19 employees,Increased slightly,1-2 years,A tech-specific job board,Not very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data",Always,10GB,"Neural Networks,RNNs","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Text Analytics,Time Series Analysis",,,,,,Often,Often,,,,,,,Often,,Rarely,,Sometimes,Most of the time,Most of the time,Often,,,,Most of the time,,,,Most of the time,Sometimes,,,,20,65,5,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues",Often,,,,,Sometimes,,,Often,Most of the time,Often,Often,,Often,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,R,Other,Python,Google Search,"Arxiv,Blogs,Kaggle,Newsletters,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,10,0,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"1,000 to 4,999 employees",Stayed the same,Less than one year,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Never,1TB,Neural Networks,"Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Sometimes,,,,"Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,40,50,0,10,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,Do not know,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,43000,EUR,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Other,Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Gradient Boosting,,Financial,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,29,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,< 1 year,,Nice to have,Necessary,,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,"DataCamp,edX",Basic laptop (Macbook),2 - 10 hours,Other,No,Bachelor's degree,,Less than a year,"Business Analyst,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Time Series,Bayesian Techniques,"Some college/university study, no bachelor's degree",Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Not important +Female,United States,23,"Not employed, but looking for work",,,,,,,,Mathematica,Link Analysis,R,GitHub,"Blogs,College/University,Friends network,Kaggle,Tutoring/mentoring,YouTube Videos",,Somewhat useful,Very useful,,,Very useful,Very useful,,,,,,,,,,Very useful,Very useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,Talking Machines Podcast",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Workstation + Cloud service,2 - 10 hours,Github Portfolio,Yes,Master's degree,Mathematics or statistics,3 to 5 years,I haven't started working yet,University courses,30,20,0,40,10,0,Time Series,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Female,India,30,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,,< 1 year,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Engineer,Self-taught,80,0,0,0,20,0,,,A bachelor's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Taiwan,45,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,College/University,Official documentation,Online courses,YouTube Videos",Very useful,,Somewhat useful,,,,,,,Very useful,Somewhat useful,,,,,,,Somewhat useful,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,DataCamp,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Other,1 to 2 years,"Business Analyst,DBA/Database Engineer",University courses,0,20,0,80,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition","Decision Trees - Random Forests,Neural Networks - CNNs",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Text Mining,Python,GitHub,Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,O'Reilly Data Newsletter",1-2 years,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,Coursera,Basic laptop (Macbook),,Online Courses and Certifications,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Female,Belgium,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Amazon Machine Learning,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Personal Projects",,,Very useful,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,Data Analyst,University courses,40,40,20,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",A master's degree,CRM/Marketing,10 to 19 employees,Increased slightly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Regression/Logistic Regression","Amazon Web services,IBM SPSS Modeler,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL",,Rarely,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,Often,,,Rarely,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Recommender Systems",Sometimes,,,,Sometimes,Often,,Often,,,,,,Often,,Often,,,,,,,,Often,,,,,,,,,,60,10,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues",Sometimes,Often,,,Often,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,Sometimes,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,55000,EUR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Amazon Web services,,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),College/University,,,Very useful,,,,,,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Programmer,Software Developer/Software Engineer",University courses,30,10,10,0,50,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Retail,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",,1PB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Cloudera,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,RapidMiner (commercial version),SQL,Tableau,TensorFlow",,Sometimes,,,Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,,Sometimes,Often,,,,Often,,,,Often,,Often,Often,,,,,,,,Often,,,Often,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Often,,,Often,Often,Often,Often,Rarely,,,,Rarely,Most of the time,Most of the time,,Most of the time,,,Most of the time,,Most of the time,Most of the time,,Most of the time,Rarely,Most of the time,Most of the time,Most of the time,,,,50,25,5,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Standalone Team,UCI;kaggle,data cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Subversion,Sometimes,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by non-profit or NGO,Employed by government",TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Textbook,YouTube Videos",Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,Somewhat useful,,Somewhat useful,,,Very useful,"O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,"Data Analyst,Data Scientist",Self-taught,25,55,5,15,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Rarely,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Julia,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,Rarely,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Rarely,Often,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,Text Analytics,Time Series Analysis",Often,Rarely,Sometimes,Rarely,Rarely,,Most of the time,Rarely,,,,,,Often,,Sometimes,,,Often,Often,Often,,Rarely,Rarely,Sometimes,Often,Sometimes,,Often,Most of the time,,,,5,4,1,10,80,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,Sometimes,,,,Rarely,Rarely,,,,,Sometimes,,,,,,Most of the time,,76-99% of projects,More internal than external,Standalone Team,,"systems built for function only, not retrospective analysis","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,32600,GBP,I was not employed 3 years ago,9,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Conferences,Kaggle,Personal Projects,Textbook",Very useful,Somewhat useful,,,Very useful,,Very useful,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Scientist,Engineer,Machine Learning Engineer,Researcher",University courses,10,10,15,50,15,0,"Reinforcement learning,Time Series","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Financial,20 to 99 employees,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,Neural Networks,"Jupyter notebooks,MATLAB/Octave,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,Often,,,,,,"Cross-Validation,Ensemble Methods,HMMs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,Most of the time,,,Sometimes,,,,Often,,,Sometimes,,,,Often,Sometimes,,,,,,,Sometimes,,Often,,,,5,65,10,15,5,0,Enough to explain the algorithm to someone non-technical,"Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Rarely,45000,EUR,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Male,Italy,42,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Scientist,Software Developer/Software Engineer",Work,10,15,50,10,15,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A bachelor's degree,Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,10MB,Other,"Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,20,30,20,10,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,41,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Conferences,Kaggle,Online courses,Personal Projects,Textbook,YouTube Videos",Somewhat useful,,,,Very useful,,Very useful,,,,Very useful,Very useful,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,FastML Blog,KDnuggets Blog",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,Udacity","Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,No,Doctoral degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Machine Learning Engineer,Researcher",Self-taught,40,20,10,20,5,5,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Somewhat important,Not important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important +Male,Ukraine,25,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Work,10,30,50,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,Fewer than 10 employees,Stayed the same,,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,33,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,R,GitHub,"Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,Very useful,,Somewhat useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,3 to 5 years,Researcher,Self-taught,80,10,10,0,0,0,"Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","Perl,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,Sometimes,,,,Often,Sometimes,,,,,,,,Sometimes,,,Often,,,,Sometimes,,,,,Sometimes,Often,,,,,60,20,5,10,5,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,Do not know,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,,Other,,,,I do not want to share information about my salary/compensation,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Vietnam,32,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Neural Nets,Matlab,Google Search,Conferences,,,,,Very useful,,,,,,,,,,,,,,"Data Machina Newsletter,Data Stories Podcast,Jack's Import AI Newsletter",1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,GPU accelerated Workstation,40+,PhD,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer",University courses,10,0,10,80,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +A different identity,Other,0,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Netherlands,45,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",5,95,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Official documentation,YouTube Videos",,,,,,,Very useful,,Somewhat useful,Very useful,,,,,,,,Very useful,"FastML Blog,Jack's Import AI Newsletter,KDnuggets Blog",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",45,40,0,10,5,0,Unsupervised Learning,Neural Networks - CNNs,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Male,Italy,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,3 to 5 years,"Engineer,Operations Research Practitioner,Researcher",Kaggle competitions,0,80,0,20,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Relational data,Other",Most of the time,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Time Series Analysis",Sometimes,,,,,,,,,,,Sometimes,,Often,,Often,,,,Often,Often,,,,,,Most of the time,Most of the time,,Most of the time,,,,50,30,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,Often,,,Most of the time,,,,,,,Most of the time,,,,,,,Less than 10% of projects,Do not know,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Rarely,,,,7,,,,,,,,,,,,,,,,,, +Male,Netherlands,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,Software Developer/Software Engineer,Self-taught,45,20,30,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Pharmaceutical,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Text data,Sometimes,10GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs","NoSQL,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,Rarely,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,,,Often,,Sometimes,Often,,Sometimes,,,Sometimes,,,,,,,Often,Most of the time,Often,,Sometimes,,Often,,,,,,,,,20,35,20,10,15,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,19,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Time Series Analysis,Python,Google Search,"Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,Very useful,,,Very useful,Very useful,Other (Separate different answers with semicolon),< 1 year,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),I prefer not to answer,Computer Science,,"Engineer,I haven't started working yet",Other,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,28,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,R,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,,,Very useful,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,,Somewhat useful,"R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,6 to 10 years,"Researcher,Other",University courses,40,0,0,60,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series",Logistic Regression,A master's degree,Academic,100 to 499 employees,,,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Never,1GB,Regression/Logistic Regression,"IBM SPSS Statistics,Mathematica,MATLAB/Octave,R,Other",,,,,,,,,,,,Rarely,,,,,,,,Rarely,Rarely,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,"Data Visualization,Logistic Regression,Prescriptive Modeling,Other",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,Often,40,20,0,20,20,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,Most of the time,,,Sometimes,Most of the time,,100% of projects,More internal than external,Other,Bloomberg; Compustat; Bureau Van Dyke; Eurostat,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Other",Hard drive,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Female,Turkey,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Statistician",Work,0,20,70,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",,Technology,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Image data,Most of the time,1GB,CNNs,"Amazon Web services,Python",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,,Most of the time,Sometimes,Often,Most of the time,,,,,,,,,,,,,Often,Sometimes,,,,Sometimes,,,,,Most of the time,,,,70,20,10,0,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that performs advanced analytics,Self-employed",Python,Support Vector Machines (SVM),R,Other,"Arxiv,Blogs,Company internal community,Conferences,Friends network,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,,Very useful,,Very useful,,Very useful,Very useful,,Very useful,Somewhat useful,"R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Researcher,Other",Self-taught,88,2,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Mix of fields,,,,,Somewhat important,Other,Laptop or Workstation and private datacenters,Other,Sometimes,10MB,"Ensemble Methods,Random Forests,Regression/Logistic Regression,Other","C/C++,Java,MATLAB/Octave,Perl,Python,R,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Often,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Other,Other",,,,,,Most of the time,Most of the time,,Often,,,,,,,Often,,,,,Most of the time,,Often,,,Most of the time,Often,Sometimes,,,Often,Most of the time,,10,20,0,15,30,25,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,,Sometimes,,Often,,Often,,,,Often,Often,,100% of projects,More internal than external,Other,NASA Cuprite data,"Severely limited availability of specimen (physical samples) - limiting further data analysis options. NB: internal vs. external resources question is unclear (external tools or external experts?), and depends on the project with a range from client wished complete complete secrecy to being partner in a public research network. ","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other","nextcloud, (git lfs)","Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,,EUR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Germany,27,"Not employed, but looking for work",,,,,,,,IBM SPSS Statistics,,,,"College/University,Kaggle,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,,,,,,Very useful,Somewhat useful,,< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,I never declared a major,I don't write code to analyze data,"Researcher,Statistician,Other",Other,0,0,0,0,0,100,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,Microsoft R Server (Formerly Revolution Analytics),Text Mining,R,I collect my own data (e.g. web-scraping),"Blogs,Conferences,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book",,Very useful,,,Somewhat useful,,,,,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,"Partially Derivative Podcast,R Bloggers Blog Aggregator,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,3 to 5 years,Other,University courses,70,15,0,15,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - GANs",A master's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,6-10 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Rarely,100MB,"Neural Networks,Regression/Logistic Regression","IBM SPSS Statistics,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,Rarely,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing",Sometimes,Sometimes,,,,,Often,Sometimes,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,25,10,40,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,Sometimes,,Sometimes,Most of the time,,,,,,Often,,,Sometimes,,,Rarely,,10-25% of projects,Approximately half internal and half external,Business Department,,Lack of documentation; need to conform to formats expected by older tools,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Never,35000,TRY,I was not employed 3 years ago,4,,,,,,,,,,,,,,,,,, +Male,Japan,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Programmer,Researcher",Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,IBM Watson / Waton Analytics,Regression,Python,Government website,Kaggle,,,,,,,Very useful,,,,,,,,,,,,,1-2 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"Laptop or Workstation and local IT supported servers,Traditional Workstation",0 - 1 hour,Kaggle Competitions,No,Master's degree,Management information systems,6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,40,0,0,5,45,Other (please specify; separate by semi-colon),Neural Networks - CNNs,High school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Switzerland,52,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Government,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",GPU accelerated Workstation,"Image data,Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),Other","College/University,Kaggle,Online courses,Stack Overflow Q&A,Other",,,Very useful,,,,Very useful,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",University courses,20,20,15,35,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression",High school,Government,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Traditional Workstation",Other,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,R,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Often,,,Sometimes,Most of the time,Often,Often,Sometimes,,,,Sometimes,,Often,Often,,,Often,Sometimes,,Sometimes,,,,Often,,Often,Often,,,,40,30,10,10,10,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,100% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,9,,,,,,,,,,,,,,,,,, +Male,Germany,30,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,"No Free Hunch Blog,Other (Separate different answers with semicolon)",1-2 years,Nice to have,,Nice to have,,Nice to have,,,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Very Important,Somewhat important,Not important,Not important,Not important +Male,Russia,52,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,SQL,,,,"Company internal community,Kaggle",,,,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Programmer,Software Developer/Software Engineer,Other",Work,20,20,20,20,20,0,Time Series,Bayesian Techniques,A professional degree,Technology,100 to 499 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,10GB,Bayesian Techniques,"C/C++,Python,SQL",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,Bayesian Techniques,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,30,20,0,0,0,Enough to tune the parameters properly,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,1000000,RUB,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,0,50,50,0,0,0,Computer Vision,"Bayesian Techniques,Neural Networks - RNNs",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,No Free Hunch Blog",< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,,,,Udacity,Workstation + Cloud service,11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,0,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Indonesia,24,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Text Mining,C/C++/C#,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,"DataTau News Aggregator,FlowingData Blog,Siraj Raval YouTube Channel",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Natural Language Processing,Decision Trees - Gradient Boosted Machines,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Other,"Arxiv,Non-Kaggle online communities,Personal Projects",Very useful,,,,,,,,Very useful,,,Very useful,,,,,,,"DataTau News Aggregator,No Free Hunch Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",20,30,10,0,40,0,"Adversarial Learning,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,500 to 999 employees,Increased significantly,3-5 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10TB,"Random Forests,Regression/Logistic Regression","Amazon Web services,Python,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Ensemble Methods,Natural Language Processing",,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,25,25,20,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Rarely,165000,USD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,More than 10 years,"Business Analyst,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,0,0,20,0,Natural Language Processing,Bayesian Techniques,"Some college/university study, no bachelor's degree",Non-profit,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,,,,Technology,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,,,"Decision Trees,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Online courses,Podcasts,Stack Overflow Q&A",,Somewhat useful,,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,,,,,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,Partially Derivative Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,Data Analyst,Work,30,50,20,0,0,0,"Time Series,Unsupervised Learning",,A master's degree,Financial,10 to 19 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,,,,"Microsoft R Server (Formerly Revolution Analytics),Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team",,,,,,,,Sometimes,,,,,,,,Often,,,,,,,Less than 10% of projects,Approximately half internal and half external,Other,financial data feeds,"dirty, inconsistent across providers, entity resolution ","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,,,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Other,30,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,GitHub,Other,,,,,,,,,,,,,,,,,,,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",1-2 years,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Statistician",Self-taught,50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Somewhat important,Not important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important +Female,Spain,40,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,QlikView,,R,GitHub,"Blogs,College/University,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,Somewhat useful,,,Very useful,Very useful,,Somewhat useful,Not Useful,,,,"KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,,No,Master's degree,,3 to 5 years,Data Miner,"Online courses (coursera, udemy, edx, etc.)",0,90,0,10,0,0,Time Series,,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,Data Scientist,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,Mathematica,Link Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle",,,Not Useful,,,,Not Useful,,,,,,,,,,,,Becoming a Data Scientist Podcast,< 1 year,,,,,,,,,,,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",2 - 10 hours,Online Courses and Certifications,No,Master's degree,,Less than a year,Data Scientist,Self-taught,100,0,0,0,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Ensemble Methods",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,,,,,,,,,,,,,,,, +Male,Germany,27,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,,,"KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,Coursera,Workstation + Cloud service,2 - 10 hours,Github Portfolio,No,Bachelor's degree,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Somewhat important,Not important,Not important +Male,United States,33,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,42,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,28,Employed part-time,,,No,Yes,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,Tableau,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Researcher,Statistician",University courses,20,20,40,20,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,500 to 999 employees,Decreased slightly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,10GB,"Decision Trees,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,R,SAS Base,SQL",,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Simulation,Time Series Analysis",,,,,,Sometimes,Often,Sometimes,,,,,,,,Often,,,,,,,,,,Often,Often,,,Often,,,,30,30,15,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Sometimes,Often,Most of the time,,,,,Sometimes,,,,,,,,,,,,Often,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,"540,000",CZK,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Survival Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Arxiv,Blogs,Official documentation,Personal Projects,Textbook",Very useful,Somewhat useful,,,,,,,,Somewhat useful,,Somewhat useful,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,I haven't started working yet,University courses,30,5,30,30,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",I prefer not to answer,Technology,100 to 499 employees,Increased slightly,More than 10 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation","Image data,Relational data",Rarely,10MB,"CNNs,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Python,R,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"Association Rules,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,GANs,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,Rarely,,Sometimes,Sometimes,Sometimes,Rarely,Rarely,,,Rarely,Sometimes,,,,Sometimes,,,,,Rarely,,Rarely,,,,,,,,,,,10,70,0,10,10,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Rarely,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,,,,,Sometimes,,,10-25% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Don't know,500,JPY,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Monte Carlo Methods,SQL,"Government website,I collect my own data (e.g. web-scraping)","College/University,Friends network",,,Very useful,,,Somewhat useful,,,,,,,,,,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst",Work,15,0,50,35,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Other,"5,000 to 9,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Microsoft Excel Data Mining,Python,R,RapidMiner (commercial version),SQL,Tableau",,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,Often,Most of the time,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Rarely,,,Rarely,Sometimes,Sometimes,Sometimes,,,,,,,Often,,,,Often,,,Sometimes,,,,Most of the time,Sometimes,Sometimes,Often,,,,40,15,5,5,5,30,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,Often,,,Often,,,,Sometimes,,76-99% of projects,Approximately half internal and half external,Standalone Team,Polls; GDELT; Google Trends; Wiki Page Views,Scaling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,100000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Engineer,Work,40,10,30,20,0,0,,,,Government,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Link Analysis,R,"GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Company internal community,Kaggle,Online courses,Stack Overflow Q&A",,Very useful,,Very useful,,,Somewhat useful,,,,Somewhat useful,,,Very useful,,,,,O'Reilly Data Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Master's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst",University courses,40,0,30,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Russia,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Newsletters,Stack Overflow Q&A,Textbook,Trade book",,Somewhat useful,,,,,Very useful,Very useful,,,,,,Very useful,Very useful,Very useful,,,"R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,A master's degree,Retail,10 to 19 employees,Stayed the same,1-2 years,A general-purpose job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",Sometimes,10GB,,"NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,Often,,,,,,Often,,,,"A/B Testing,Data Visualization,Natural Language Processing,Text Analytics,Time Series Analysis",Rarely,,,,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,Often,,,,70,0,10,20,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,Somewhat useful,,Somewhat useful,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer",Work,20,20,60,0,0,0,Unsupervised Learning,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,Python,Spark / MLlib",,,,,,,,,Often,,Often,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Prescriptive Modeling,Text Analytics",,,Sometimes,,,,Sometimes,Sometimes,,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,,,,,,Sometimes,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,26-50% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,0,60,0,40,0,0,Recommendation Engines,"Evolutionary Approaches,Logistic Regression",A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,35,Employed full-time,,,Yes,,Business Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,"Kaggle,Non-Kaggle online communities,Personal Projects,Tutoring/mentoring",,,,,,,Very useful,,Very useful,,,Very useful,,,,,Very useful,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Machine Learning Engineer,Programmer",Work,38,20,40,0,2,0,"Machine Translation,Unsupervised Learning",Logistic Regression,A master's degree,Telecommunications,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,10MB,"Bayesian Techniques,Regression/Logistic Regression","IBM Cognos,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,QlikView,R,SQL,TIBCO Spotfire",,,,,,,,,,Rarely,,,,,,,,,,,,Most of the time,Often,,,,,,,,Often,Sometimes,Most of the time,,,,,,,,,Most of the time,,,,,Sometimes,,,,,"Logistic Regression,Segmentation,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,Sometimes,,Sometimes,Rarely,,,,50,40,0,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Often,,,,,,,,Often,Often,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,34,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Genetic & Evolutionary Algorithms,Python,Google Search,"Arxiv,Blogs,College/University,Online courses,Textbook,YouTube Videos",Not Useful,Very useful,Very useful,,,,,,,,Very useful,,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Github Portfolio,No,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important +Female,United Kingdom,49,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,I don't write code to analyze data,"Data Analyst,Other",Self-taught,50,5,45,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Blogs,College/University,Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,Not Useful,,,Very useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,"Partially Derivative Podcast,Talking Machines Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,25,15,60,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Always,100TB,,"Amazon Web services,Java,Jupyter notebooks,Python,R,Spark / MLlib,Other",,Most of the time,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,Most of the time,,,,,,,,Most of the time,,,"kNN and Other Clustering,Markov Logic Networks,Segmentation",,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,10,10,70,5,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Limitations of tools,Need to coordinate with IT",,,,,Sometimes,,,,,,,,Often,,Sometimes,,,,,,,,Less than 10% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,75000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,India,55,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,More than 10 years,"Data Analyst,Statistician",Work,40,30,10,0,0,20,Computer Vision,Decision Trees - Random Forests,High school,Government,"10,000 or more employees",Stayed the same,6-10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,38,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Very useful,FastML Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",5,80,0,5,10,0,"Computer Vision,Time Series","Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Manufacturing,20 to 99 employees,Stayed the same,Less than one year,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Never,100GB,"CNNs,Neural Networks,RNNs","C/C++,Python,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Often,,,,"CNNs,Neural Networks,RNNs",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,0,90,0,10,0,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,,,,Often,,,,Sometimes,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,,finding the relation/patten,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,65000,TWD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,R,"Ensemble Methods (e.g. boosting, bagging)",R,Google Search,"Blogs,Kaggle,Official documentation,Online courses,Podcasts,Textbook",,Somewhat useful,,,,,Somewhat useful,,,Somewhat useful,Very useful,,Very useful,,Somewhat useful,,,,"KDnuggets Blog,Linear Digressions Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Psychology,More than 10 years,"Data Analyst,Data Scientist,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,20 to 99 employees,Increased slightly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,1MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Python,R",,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests,Text Analytics",Often,,,,,Most of the time,Most of the time,Sometimes,,,,,,,,,,,,,,Often,Sometimes,,,,,,Sometimes,,,,,20,30,10,20,20,0,Enough to run the code / standard library,"Dirty data,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,Often,,,Often,,,,,Often,,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,185000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,Brazil,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,Somewhat useful,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Not important,Somewhat important +Male,Mexico,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,Hadoop/Hive/Pig,Random Forests,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",40,50,5,5,0,0,Time Series,,A doctoral degree,Government,"5,000 to 9,999 employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,,,,"Microsoft SQL Server Data Mining,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,Often,,,,,"Data Visualization,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Often,Rarely,,,,86,1,2,10,1,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,Sometimes,,Most of the time,Most of the time,,,Often,,Most of the time,,,,Most of the time,,,,Sometimes,,,,76-99% of projects,More internal than external,Central Insights Team,,Being able to clean it and data governance,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Never,"270,000",MXN,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Other,45,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Bayesian Methods,R,Google Search,"Blogs,Kaggle,Online courses,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,R Bloggers Blog Aggregator,1-2 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,High school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Not important +Male,Sweden,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Time Series Analysis,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Company internal community,Online courses,Podcasts,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,Very useful,Very useful,,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,"KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,5,15,10,70,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,Internet-based,"1,000 to 4,999 employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,10GB,,"Hadoop/Hive/Pig,Impala,Jupyter notebooks,QlikView,R,SQL,Tableau",,,,,,,,,Often,,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,Often,,,Rarely,,,,,,,"A/B Testing,Time Series Analysis",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,40,0,20,20,20,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,31000,SEK,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,39,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Conferences,Kaggle,Newsletters,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,Very useful,,Very useful,Somewhat useful,,,Very useful,Somewhat useful,,Somewhat useful,,,,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,edX",Other,11 - 39 hours,Online Courses and Certifications,Yes,Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,44,"Not employed, but looking for work",,,,,,,,R,Deep learning,R,Google Search,"Conferences,Kaggle,Personal Projects,Textbook,YouTube Videos",,,,,Somewhat useful,,Somewhat useful,,,,,Very useful,,,Very useful,,,Somewhat useful,,5-10 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,Other,Work,25,0,50,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Not important,Not important +Male,Italy,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Personal Projects,Stack Overflow Q&A",Somewhat useful,,,,,,Somewhat useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Data Analyst,Self-taught,75,5,15,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,Random Forests,"Google Cloud Compute,Jupyter notebooks,Mathematica,Python,R,SQL,TIBCO Spotfire,Unix shell / awk",,,,,,,,Rarely,,,,,,,,,Most of the time,,,Rarely,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,Rarely,Most of the time,,,,"A/B Testing,Random Forests,Segmentation,Simulation,Time Series Analysis",Rarely,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,,Rarely,,,,60,15,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,Sometimes,,,Often,,Most of the time,,Most of the time,Often,,,,Often,,Most of the time,Sometimes,,10-25% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Git",Never,38000,,,7,,,,,,,,,,,,,,,,,, +Female,United States,52,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,Very useful,,,,Very useful,Very useful,,Very useful,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,Sort of (Explain more),Master's degree,Biology,3 to 5 years,Researcher,Self-taught,10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,3-5,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Very Important,Not important,Not important,Somewhat important,Not important,Very Important,Not important,Not important,Not important,Somewhat important +Male,Colombia,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,TensorFlow,Deep learning,R,Google Search,"College/University,Official documentation,Stack Overflow Q&A,Tutoring/mentoring",,,Very useful,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Researcher",University courses,25,0,35,40,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,100MB,"Decision Trees,Neural Networks,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,Rarely,,,,Sometimes,,,,,,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,kNN and Other Clustering,Natural Language Processing,SVMs,Text Analytics",,,,,,,Often,,,,,,,Rarely,,,,,Most of the time,,,,,,,,,Sometimes,Most of the time,,,,,55,20,10,8,7,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,Sometimes,,,,,Most of the time,,,,,,,,,,Sometimes,,76-99% of projects,More internal than external,Standalone Team,Twitter,"Cleaning, categorical data","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Git,Sometimes,45000000,COP,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,Poland,48,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that performs advanced analytics,Stan,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Newsletters,Online courses,YouTube Videos",Somewhat useful,Somewhat useful,,,,,Somewhat useful,Very useful,,,Very useful,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,"Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",0,50,40,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Important,Other,Laptop or Workstation and local IT supported servers,Relational data,,,"Gradient Boosted Machines,Regression/Logistic Regression","Microsoft Excel Data Mining,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,,,"Cross-Validation,Gradient Boosted Machines,Logistic Regression,Simulation",,,,,,Rarely,,,,,,Rarely,,,,Often,,,,,,,,,,,Often,,,,,,,20,20,10,40,10,0,Enough to explain the algorithm to someone non-technical,"Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",,,,,,,,Most of the time,,,,,,,,,,,,,Often,,26-50% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,120000,PLN,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Female,United States,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Amazon Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Kaggle,Online courses,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,No,Bachelor's degree,Engineering (non-computer focused),,Data Analyst,University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important +A different identity,Israel,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Java,Genetic & Evolutionary Algorithms,Python,Other,"Arxiv,Blogs,College/University,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,Very useful,Very useful,,Not Useful,,,,,Very useful,Very useful,Somewhat useful,,Very useful,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",70,15,15,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100GB,"CNNs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Time Series Analysis",,,,Sometimes,,,,,,,,,,Sometimes,,Often,,,,Most of the time,Rarely,,,,Most of the time,Sometimes,,,,Most of the time,,,,50,50,0,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT",,,,,Most of the time,Sometimes,,,Often,Most of the time,Most of the time,,Often,,Most of the time,,,,,,,,None,More internal than external,Other,,it's badly orginized and dirty as hell. ,Other,Commercial Data Platform,,Git,Rarely,264000,ILS,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Female,Spain,32,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Genetic & Evolutionary Algorithms,Python,Government website,"Conferences,Kaggle,Online courses,Stack Overflow Q&A",,,,,Very useful,,Very useful,,,,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,DataCamp",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Engineering (non-computer focused),,"Data Scientist,Engineer",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important +Male,Iran,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Neural Networks - RNNs,A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,100GB,Bayesian Techniques,"Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,24,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,Tableau,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,10,10,10,10,20,Natural Language Processing,Bayesian Techniques,A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Anomaly Detection,R,Google Search,"Blogs,Personal Projects,Textbook",,Very useful,,,,,,,,,,Very useful,,,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,,Less than a year,Business Analyst,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",A master's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Not important,Not important,Not important +Male,United States,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,Other,University courses,35,10,5,45,5,0,,"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",25,40,25,5,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Time Series Analysis",Sometimes,,,,Sometimes,Sometimes,Sometimes,Sometimes,Sometimes,,,Sometimes,,,Sometimes,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,,,,Sometimes,,Sometimes,,,,30,10,5,30,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects",,,,,Sometimes,Sometimes,,,Sometimes,,,,,Sometimes,,,,,,,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Always,120000,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Australia,28,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,Self-employed,Java,Time Series Analysis,Python,"Government website,University/Non-profit research group websites","Kaggle,Online courses,Podcasts,Textbook,Trade book,YouTube Videos",,,,,,,Very useful,,,,Very useful,,Somewhat useful,,Very useful,Somewhat useful,,Very useful,"KDnuggets Blog,Siraj Raval YouTube Channel",1-2 years,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Udacity","GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Somewhat important,Very Important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,54,"Not employed, but looking for work",,,,,,,,Google Cloud Compute,Neural Nets,C/C++/C#,Google Search,"Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,,5-10 years,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,0,20,0,40,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Neural Networks - CNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,39,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,3 to 5 years,,Self-taught,30,70,0,0,0,0,,,,Academic,20 to 99 employees,Stayed the same,Don't know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,More than 10 years,"Business Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Technology,"10,000 or more employees",Stayed the same,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Don't know,1GB,Regression/Logistic Regression,"R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Time Series Analysis,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Often,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Explaining data science to others,Limitations in the state of the art in machine learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Arxiv,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook",Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,,,"Partially Derivative Podcast,Talking Machines Podcast,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,University courses,0,5,0,95,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Military/Security,"5,000 to 9,999 employees",Stayed the same,Less than one year,A career fair or on-campus recruiting event,Not very important,Other,Traditional Workstation,Relational data,Sometimes,1GB,Decision Trees,"Amazon Web services,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,Rarely,,,Rarely,,,,Sometimes,,,,,,Sometimes,,Sometimes,,,,Sometimes,,Often,,,,Sometimes,,,,Often,,Often,,,,,,,,Often,Often,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Text Analytics",,,Sometimes,Sometimes,,Sometimes,Often,Often,,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,Often,,Sometimes,Sometimes,Sometimes,,,,Often,,,,,75,5,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Often,,,,Often,,,,Often,,,,Often,Sometimes,,Often,,,100% of projects,Entirely internal,IT Department,None,Privacy issues,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,150000,USD,Has stayed about the same (has not increased or decreased more than 5%),3,,,,,,,,,,,,,,,,,, +Female,Mexico,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,10,10,60,0,5,15,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Decision Trees - Gradient Boosted Machines,High school,Insurance,"1,000 to 4,999 employees",Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees","C/C++,Python,R,RapidMiner (commercial version),SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,Rarely,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Decision Trees,Segmentation",,Often,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Often,,,Often,Most of the time,,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,28,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,No,Yes,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Unsupervised Learning,Evolutionary Approaches,A bachelor's degree,Retail,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",,10GB,,TIBCO Spotfire,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Friends network,Online courses,Personal Projects,Trade book,YouTube Videos",,Somewhat useful,,,,Somewhat useful,,,,,Very useful,Very useful,,,,Very useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Psychology,3 to 5 years,"Data Analyst,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,20,10,0,0,50,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,,,"Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,113000,USD,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Female,United States,20,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,Data Scientist,Self-taught,20,50,0,10,0,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,22,Employed part-time,,,No,Yes,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Machine Learning Engineer,Operations Research Practitioner,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Fine arts or performing arts,More than 10 years,"Business Analyst,Software Developer/Software Engineer,Other",University courses,20,10,20,50,0,0,,,,Non-profit,100 to 499 employees,,,An external recruiter or headhunter,Somewhat important,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,I don't write code to analyze data,"Business Analyst,Data Miner,Researcher",Self-taught,30,60,10,0,0,0,Time Series,Logistic Regression,,Technology,20 to 99 employees,Decreased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",Video data,Never,100MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,Less than a year,"Business Analyst,Researcher",Self-taught,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Time Series Analysis,Python,GitHub,Podcasts,,,,,,,,,,,,,Very useful,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Business Analyst,University courses,30,0,20,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,500 to 999 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,SVMs","Amazon Web services,Python",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Ensemble Methods,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Random Forests,Segmentation,Text Analytics",,,Most of the time,,,Most of the time,,,Sometimes,,,,,,,Sometimes,Sometimes,Sometimes,Most of the time,,,,Most of the time,,,Sometimes,,,Most of the time,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",,85000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Australia,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,6 to 10 years,"Business Analyst,Other",Self-taught,70,0,0,30,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,Government,I don't know,Increased significantly,1-2 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Don't know,10MB,Random Forests,"R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Rarely,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,0,100,0,0,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer",University courses,0,0,100,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Retail,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Image data,Most of the time,1TB,CNNs,"C/C++,Jupyter notebooks,TensorFlow",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,SVMs",,Often,Often,Often,,Often,,Often,,,,,,Often,,,,Often,Often,Often,Often,,,Often,Often,,,,,,,,,20,20,20,10,30,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,90,5,0,5,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Logistic Regression",A master's degree,Government,"10,000 or more employees",Decreased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,100MB,"Bayesian Techniques,Markov Logic Networks","Amazon Web services,C/C++,Java,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Perl,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,Often,,,,,,,,Often,,Often,,,,,Often,Most of the time,,Sometimes,,,,,,,,,Often,,,Often,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Segmentation,Text Analytics",Sometimes,Often,Sometimes,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,10,10,0,20,60,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,"Predictive Modeler,Statistician",University courses,0,15,30,55,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,More than 10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SAS Base,SQL,Tableau,Unix shell / awk",,Rarely,,,,,,,Rarely,,,Sometimes,,,,,Often,,,,,Rarely,,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,,,,Sometimes,,,Rarely,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Rarely,,Rarely,,,Most of the time,Most of the time,Sometimes,Often,,,Sometimes,,Often,,Often,,Sometimes,,Often,Most of the time,Most of the time,Sometimes,,,Most of the time,Rarely,Most of the time,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,22,Employed part-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,70,0,0,30,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,Fewer than 10 employees,Stayed the same,Less than one year,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,47,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,80,0,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A professional degree,Telecommunications,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Never,1GB,Decision Trees,"Python,QlikView,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Often,Sometimes,,,,,,,,,Most of the time,,,Often,,,Most of the time,,,,"Data Visualization,Prescriptive Modeling",,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Often,,,,,,,,Often,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,,Self-taught,60,0,20,0,20,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,29,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,Neural Networks - CNNs,No education,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,18,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,95,2,1,2,0,0,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Financial,10 to 19 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Traditional Workstation,Workstation + Cloud service",Relational data,Rarely,100MB,Evolutionary Approaches,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Researcher,Other",Self-taught,40,30,0,0,30,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",I prefer not to answer,Other,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data,Other",Sometimes,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests","Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft SQL Server Data Mining,NoSQL,Python,Spark / MLlib,SQL,Unix shell / awk",Sometimes,Often,,,Often,,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,,,,Often,,Often,,,,Often,,,,,,,,,,Often,Often,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,37,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,50,15,5,25,5,0,Computer Vision,Bayesian Techniques,A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data,Relational data",Sometimes,,"Bayesian Techniques,Decision Trees","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Decision Trees,kNN and Other Clustering,Naive Bayes,SVMs",,,Most of the time,,,,,Often,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,"College/University,Personal Projects",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,Engineer,University courses,50,0,0,50,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Support Vector Machines (SVMs)",,Technology,I prefer not to answer,Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Relational data",,<1MB,SVMs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Random Forests,Simulation",,,,,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,50,50,0,0,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,Hadoop/Hive/Pig,Social Network Analysis,SQL,Google Search,"Blogs,Conferences",,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician",Work,15,15,50,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Always,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,Sometimes,,,,"Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,,,,Sometimes,Often,Often,,,,,,Often,Most of the time,,,,Most of the time,Often,,Most of the time,,,Most of the time,,Most of the time,,,,,,40,20,15,10,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,Often,,Sometimes,,,,Sometimes,,,,,Often,,,,Often,Sometimes,,Often,,Less than 10% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,Subversion,Rarely,35000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Philippines,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,30,20,10,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",High school,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Bayesian Methods,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,Python,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Podcasts,Textbook,Tutoring/mentoring",,Very useful,Very useful,,,,Very useful,,,,Very useful,,Very useful,,Very useful,,Very useful,,"Data Machina Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,20,Employed part-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Java,Social Network Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Official documentation,Stack Overflow Q&A",,,,,,,Very useful,,,Very useful,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,50,0,0,50,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Other,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,<1MB,"Bayesian Techniques,Decision Trees","Hadoop/Hive/Pig,Jupyter notebooks,Python,R",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Decision Trees,Neural Networks,Random Forests",,,,,,,,Most of the time,,,,,,,,,,,,Often,,,Often,,,,,,,,,,,20,10,40,20,10,0,Enough to run the code / standard library,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,,,,Often,,,Less than 10% of projects,Do not know,IT Department,No,Data cleaning,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Share Drive/SharePoint",,Git,Sometimes,12000000,IDR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Male,Brazil,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,30,0,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs,Text Analytics",,,Rarely,,,Most of the time,Most of the time,Often,,,,,,,,Often,,Sometimes,,Most of the time,Most of the time,,Often,,,,Often,Often,Most of the time,,,,,40,10,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Most of the time,,,Most of the time,,,,,,,,,,,Often,,,100% of projects,Entirely internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Always,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,,,A doctoral degree,Financial,20 to 99 employees,Increased significantly,Less than one year,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Computer Scientist,Programmer",Self-taught,80,0,20,0,0,0,Time Series,Bayesian Techniques,High school,Technology,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,Bayesian Techniques,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Text Analytics,Time Series Analysis",,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,30,30,20,20,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,Software Developer/Software Engineer,Other,50,0,40,0,0,10,Recommendation Engines,Neural Networks - CNNs,,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,22,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,40,20,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,60,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,20,25,15,15,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Markov Logic Networks",A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Sometimes,,,"C/C++,Java,Mathematica,Python,SQL",,,,Rarely,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Decision Trees",,,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,24,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,Statistician,University courses,5,10,10,75,0,0,Time Series,"Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,48,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,I don't write code to analyze data,"Data Analyst,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,29,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,46,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,1 to 2 years,"Data Miner,Programmer",Self-taught,40,50,0,0,10,0,"Natural Language Processing,Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,Rarely,,Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,,,Often,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Often,,,Often,Most of the time,Often,,,,,,,,Sometimes,,,,,Sometimes,,Often,,,,,,,Often,,,,40,20,10,20,10,0,Enough to explain the algorithm to someone non-technical,"I prefer not to say,Inability to integrate findings into organization's decision-making process",,,,,,,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,6 to 10 years,Researcher,Work,60,10,10,20,0,0,"Natural Language Processing,Time Series",Neural Networks - RNNs,High school,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Other",Never,1TB,"Neural Networks,RNNs,Other","Jupyter notebooks,MATLAB/Octave,Python,Other,Other",,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,"Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,,,,,,,,,,,,,,,,Often,Often,Sometimes,,,,Often,,,,,,,,,25,30,0,25,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,,,,,,,,,Sometimes,,51-75% of projects,More internal than external,Standalone Team,,,,,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","Blogs,Kaggle,Online courses,Podcasts,Textbook,YouTube Videos",,Somewhat useful,,,,,Somewhat useful,,,,Very useful,,Very useful,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,5,0,0,5,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,24,Employed part-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,25,15,5,55,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Academic,I prefer not to answer,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,0,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",I prefer not to answer,Other,"10,000 or more employees",,,An external recruiter or headhunter,Important,Other,Other,"Text data,Relational data",,,,"MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Other,32,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,,Business Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,32,Employed part-time,,,Yes,,Data Miner,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,,"Business Analyst,Data Analyst,Data Miner,Engineer,Programmer,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,University courses,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,6 to 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,DBA/Database Engineer,Programmer",University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Deep learning,Python,,Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,Supervised Machine Learning (Tabular Data),,A master's degree,Government,100 to 499 employees,Increased slightly,Less than one year,Some other way,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,5,5,50,20,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Often,,,,,,,Often,,,,,,,,,,,,Sometimes,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",,,Rarely,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,University courses,70,10,0,10,10,0,"Computer Vision,Reinforcement learning","Ensemble Methods,Neural Networks - CNNs",High school,Internet-based,"5,000 to 9,999 employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Always,10GB,Gradient Boosted Machines,"C/C++,NoSQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,,University courses,0,50,0,0,50,0,"Reinforcement learning,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,500 to 999 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Insurance,"1,000 to 4,999 employees",Increased significantly,More than 10 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,10MB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,"Data Analyst,Researcher",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Other,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Gradient Boosting,Logistic Regression",A master's degree,Government,"5,000 to 9,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,52,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,Becoming a Data Scientist Podcast,< 1 year,Unnecessary,,,,,,,,,,,,,,Basic laptop (Macbook),,Master's degree,No,Master's degree,Computer Science,,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)",Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)",Kaggle,,,,,,,Very useful,,,,,,,,,,,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,20,20,5,40,5,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Mix of fields,100 to 499 employees,Decreased significantly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Python,R,SAS Base",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Text Analytics",,,,,,,Most of the time,,,,,,,,,Often,,,Rarely,Often,,Often,Most of the time,,,,,,Sometimes,,,,,30,15,15,20,15,5,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,Often,,,,,Most of the time,,,,Often,,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Most of the time,53000,INR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,More than 10 years,"Machine Learning Engineer,Software Developer/Software Engineer",Work,50,0,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,100GB,"Bayesian Techniques,Neural Networks,Regression/Logistic Regression,SVMs,Other","Amazon Web services,C/C++,Java,Jupyter notebooks,NoSQL,Perl,Python,Spark / MLlib,SQL,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,Often,,Often,,,,,,,,,,Sometimes,,,Sometimes,Often,,,,,,,,,,Sometimes,Sometimes,,,,,,Often,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Simulation,SVMs,Time Series Analysis",,,Rarely,,,,Often,,,,,,,Rarely,,,,,,Sometimes,Sometimes,Often,,,,Sometimes,Rarely,Sometimes,,Often,,,,20,30,10,10,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed part-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst,Other",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,Jupyter notebooks,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,Personal Projects,Stack Overflow Q&A",,Very useful,,,,,,,,,,Very useful,,Very useful,,,,,KDnuggets Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,90,0,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,,Financial,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Rarely,1GB,"Neural Networks,Regression/Logistic Regression","Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,Sometimes,,,,"Logistic Regression,Neural Networks,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,Sometimes,,,,,,Often,,,,Sometimes,,,,30,40,0,20,10,0,Enough to explain the algorithm to someone non-technical,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Often,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,South Korea,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,40,30,10,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",High school,Manufacturing,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Self-taught,30,30,40,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Other,500 to 999 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10MB,"Bayesian Techniques,CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs","Java,Python,R,SQL",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses",,,,,,,Somewhat useful,,Very useful,,Somewhat useful,,,,,,,,,< 1 year,Necessary,,Nice to have,,Necessary,,Nice to have,,,Nice to have,,,,"Coursera,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,Sort of (Explain more),,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,25,20,50,0,0,5,Computer Vision,,A bachelor's degree,Technology,100 to 499 employees,Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Image data,Video data",Always,10TB,"CNNs,Neural Networks,Regression/Logistic Regression",C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Neural Nets,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Unsupervised Learning,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,27,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,48,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,,More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",Self-taught,70,0,0,0,30,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Technology,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Never,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests","Java,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Sometimes,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Naive Bayes,Random Forests",Often,,Often,,,Often,Often,,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,R,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Conferences,Kaggle,Online courses,YouTube Videos",,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Very useful,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia)",< 1 year,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,Sort of (Explain more),Master's degree,,I don't write code to analyze data,Other,University courses,0,10,0,90,0,0,,,A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Not important,Somewhat important,Somewhat important,Not important +Female,People 's Republic of China,23,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Engineer,,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",Other,50,25,0,0,0,25,,,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,3 to 5 years,"Business Analyst,Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Insurance,"5,000 to 9,999 employees",Stayed the same,6-10 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,Evolutionary Approaches,Random Forests,Regression/Logistic Regression","Java,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,QlikView,R,Tableau",,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,Sometimes,,,Rarely,Rarely,Sometimes,,,,,,,,,,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Logistic Regression,Naive Bayes,Random Forests,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,GitHub,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Data Machina Newsletter,Emergent/Future Newsletter (Algorithmia),Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,Recommendation Engines,"Logistic Regression,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,45,Employed part-time,,,No,Yes,Data Analyst,Fine,Employed by college or university,Spark / MLlib,Time Series Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Conferences,Kaggle,Online courses,Personal Projects,Podcasts",,,,,Somewhat useful,,Very useful,,,,Somewhat useful,Very useful,Somewhat useful,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,Researcher,University courses,30,0,0,70,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Academic,"10,000 or more employees",,More than 10 years,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Vietnam,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,30,50,10,10,0,Natural Language Processing,"Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,100MB,"Gradient Boosted Machines,Neural Networks,SVMs","Amazon Web services,Microsoft SQL Server Data Mining,NoSQL,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"kNN and Other Clustering,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,Most of the time,,,,,Most of the time,Most of the time,,,,,40,0,0,40,20,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Sometimes,100GB,"CNNs,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Other,University courses,30,30,30,10,0,0,"Computer Vision,Natural Language Processing","Neural Networks - CNNs,Neural Networks - RNNs",,Technology,100 to 499 employees,Increased slightly,1-2 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,51,Employed full-time,,,Yes,,Data Analyst,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Data Analyst,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Government,100 to 499 employees,Increased slightly,3-5 years,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,Python,Government website,"Official documentation,Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,Very useful,Very useful,,,Somewhat useful,Somewhat useful,,,,"Siraj Raval YouTube Channel,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Other,30,5,30,35,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Rarely,1TB,"Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,Spark / MLlib,Other",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,,"Cross-Validation,Data Visualization,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks,Time Series Analysis,Other",,,,,,Most of the time,Most of the time,,Most of the time,Often,,,,,,Often,,,,Most of the time,,,,,,,,,,Most of the time,Often,,,25,25,10,15,25,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,I prefer not to say,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,,Most of the time,Often,,,Often,,,Often,,,,,,Most of the time,,,76-99% of projects,Entirely internal,Other,None,Compute cluster falls apart,"Column-oriented relational (e.g. KDB/MariaDB),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,225000,,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",40,30,20,10,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,Employed part-time,,,Yes,,Data Scientist,Poorly,"Employed by professional services/consulting firm,Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,GitHub,Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Researcher,Self-taught,20,10,50,10,10,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,13,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,0,20,10,0,"Adversarial Learning,Recommendation Engines,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,KNIME (commercial version),Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,YouTube Videos,Other",,Somewhat useful,,,,,Very useful,,,,Very useful,,,,,,,Very useful,"Data Stories Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",15+ years,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Necessary,,,Udacity,"Laptop or Workstation and local IT supported servers,Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,,I don't write code to analyze data,"Data Analyst,Programmer,Other",Other,25,25,25,0,0,25,"Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)",Other (please specify; separate by semi-colon),A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Very Important,Not important,Not important,Somewhat important,Not important,Not important,Not important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important +Male,United States,19,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Other,University courses,0,0,0,100,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",High school,Financial,"10,000 or more employees",Increased significantly,6-10 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,,Neural Networks,"Amazon Machine Learning,Amazon Web services,NoSQL,Python",Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,30,10,0,40,20,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,100% of projects,Do not know,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,Git,Sometimes,,,Has increased 20% or more,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,Less than a year,Other,Other,NA,10,0,0,0,90,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Never,1GB,"SVMs,Other","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,Often,,,,"Data Visualization,Natural Language Processing,SVMs",,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,40,30,0,30,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Machine Learning Engineer,University courses,30,40,10,15,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,13,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Government website,Blogs,,Very useful,,,,,,,,,,,,,,,,,Linear Digressions Podcast,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,30,30,20,20,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,24,"Not employed, but looking for work",,,,,,,,DataRobot,Text Mining,R,Google Search,"Blogs,College/University,Kaggle,Textbook,YouTube Videos",,Somewhat useful,Very useful,,,,Very useful,,,,,,,,Very useful,,,Very useful,"Data Machina Newsletter,Data Stories Podcast,R Bloggers Blog Aggregator",3-5 years,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Laptop or Workstation and local IT supported servers,40+,Master's degree,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician",University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Supervised Machine Learning (Tabular Data),Hidden Markov Models HMMs,Primary/elementary school,Mix of fields,500 to 999 employees,Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,1 to 2 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",High school,Internet-based,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,100MB,"CNNs,Ensemble Methods,Neural Networks","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,SQL,TensorFlow",,Often,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Text Analytics",,,Sometimes,Often,,Most of the time,Most of the time,Often,Often,,,,,Sometimes,,Sometimes,,Sometimes,Often,,,,Often,,,,,,Often,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues",Sometimes,,,,Often,,,,Often,,,,,,,Often,Often,,,,,,10-25% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Bitbucket,Git",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,Employed full-time,,,No,Yes,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,70,0,0,10,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Python,"GitHub,Google Search,University/Non-profit research group websites","College/University,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,50,0,10,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,"Business Analyst,Data Analyst,Researcher",University courses,25,25,20,25,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,30,0,20,0,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation","Image data,Video data,Text data",Sometimes,,,"C/C++,Python,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Neural Networks",,,,Sometimes,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,39,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer",Work,20,60,20,0,0,0,Adversarial Learning,Logistic Regression,A master's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,,Neural Networks,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"Data Visualization,Logistic Regression,Neural Networks",,,,,,,Often,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,20,30,5,20,25,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,31,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Computer Scientist,Programmer,Statistician",Self-taught,80,0,20,0,0,0,"Adversarial Learning,Computer Vision,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Internet-based,500 to 999 employees,Increased slightly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Engineer,Fine,Employed by government,Microsoft Excel Data Mining,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,Engineer,Kaggle competitions,30,10,10,0,50,0,,,A master's degree,Military/Security,20 to 99 employees,Decreased slightly,Don't know,I visited the company's Web site and found a job listing there,Not very important,,Traditional Workstation,Image data,Rarely,100MB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,0,0,0,10,0,90,Enough to run the code / standard library,Need to coordinate with IT,,,,,,,,,,,,,,,Often,,,,,,,,Less than 10% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Never,130000,USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Turkey,25,"Not employed, but looking for work",,,,,,,,C/C++,Neural Nets,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Other,50,50,0,0,0,0,Survival Analysis,Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Business Analyst,University courses,0,10,0,90,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Other,"10,000 or more employees",,Don't know,I visited the company's Web site and found a job listing there,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Don't know,,,"IBM Cognos,SQL",,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,10,0,0,20,30,40,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician,Other","Online courses (coursera, udemy, edx, etc.)",20,30,10,5,35,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,NA,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,25,Employed part-time,,,Yes,,Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Other,University courses,60,0,0,30,10,0,"Machine Translation,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Government,10 to 19 employees,Decreased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,100MB,"Bayesian Techniques,HMMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,28,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,6 to 10 years,Researcher,University courses,25,0,25,50,0,0,Natural Language Processing,"Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Other (please specify; separate by semi-colon)",A doctoral degree,Academic,Fewer than 10 employees,Stayed the same,More than 10 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,21,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,50,20,0,25,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,,"Hadoop/Hive/Pig,NoSQL,Python,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,,,Often,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,30,5,10,5,0,Enough to tune the parameters properly,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Software Developer/Software Engineer,Other,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,46,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Time Series Analysis,Python,"Google Search,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,Somewhat useful,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer,Other",Self-taught,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Neural Nets,R,University/Non-profit research group websites,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,I never declared a major,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,Unsupervised Learning,"Bayesian Techniques,Evolutionary Approaches,Logistic Regression",A master's degree,Financial,500 to 999 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Not at all important,Other,Workstation + Cloud service,Other,Don't know,,,"Amazon Web services,Java",,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,60,0,0,0,40,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,Often,,,,,,,Often,,,,,,,Often,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,Graph (e.g. GraphBase/Neo4j),Other,,Git,Sometimes,35000,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,Less than a year,"Data Analyst,Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Most of the time,10GB,"Ensemble Methods,Other","Java,KNIME (free version),Microsoft SQL Server Data Mining,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Most of the time,,,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Text Analytics,Other",Most of the time,,,,,Sometimes,Most of the time,Often,Most of the time,,,,,Often,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,,,,Most of the time,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,Python,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Statistician",Work,30,0,70,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Decreased slightly,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Python,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,Rarely,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Segmentation,Text Analytics",,,Rarely,,,,Often,Sometimes,,,,,,Rarely,,Often,,Sometimes,Rarely,,Often,,,,,Often,,,Rarely,,,,,50,30,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning",Most of the time,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,,,Less than 10% of projects,Entirely internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,45000,,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Female,Australia,38,Employed part-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer",University courses,20,10,0,60,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,46,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,DataRobot,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,Very useful,Very useful,,,,,,,Very useful,,,,Very useful,"Data Stories Podcast,R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler",Self-taught,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Time Series","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Predictive Modeler,Programmer,Software Developer/Software Engineer",Work,20,20,50,0,10,0,Supervised Machine Learning (Tabular Data),,A master's degree,Technology,100 to 499 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Prescriptive Modeling",,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,52,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Data Analyst,University courses,30,0,40,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,Some other way,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Image data,Text data",Never,,,"IBM SPSS Statistics,Python,R,SAS Base",,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Often,,,,,,,,,,,,,,"Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Often,Often,,,,,10,50,25,5,10,0,Enough to refine and innovate on the algorithm,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,30,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,SAS Enterprise Miner,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Textbook",,,Not Useful,,,,Very useful,,,,,,,,Somewhat useful,,,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",1-2 years,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),40+,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,70,0,0,30,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,United States,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,40,5,20,30,5,0,"Outlier detection (e.g. Fraud detection),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Text data,Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,Sometimes,,Often,,,,,Often,Often,,,Most of the time,,,Most of the time,,,Sometimes,,,,"A/B Testing,Data Visualization,kNN and Other Clustering,Natural Language Processing,Simulation,Text Analytics,Time Series Analysis",Most of the time,,,,,,Most of the time,,,,,,,Sometimes,,,,,Often,,,,,,,,Most of the time,,Most of the time,Most of the time,,,,25,10,20,20,25,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Neural Networks,SVMs","Java,Jupyter notebooks,Python",,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Logistic Regression",Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,44,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,,Self-taught,80,0,20,0,0,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Academic,500 to 999 employees,Stayed the same,Don't know,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Rarely,1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","MATLAB/Octave,R,Other",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,SVMs",,Often,Most of the time,,,,Often,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,,,,30,20,10,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues,Unavailability of/difficult access to data",Often,Most of the time,,,,,,Most of the time,,Most of the time,,Often,,,,,Often,,,,Often,,51-75% of projects,More external than internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,35,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,37,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,"Not employed, but looking for work",,,,,,,,RapidMiner (free version),,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",Blogs,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,10,0,10,80,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,43,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher",Self-taught,50,30,10,0,10,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,500 to 999 employees,Decreased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,10MB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,SVMs","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Naive Bayes,Random Forests",,Sometimes,Sometimes,Sometimes,,Sometimes,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,,,20,20,20,0,20,20,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Business Department,,,,Share Drive/SharePoint,,Mercurial,,,,Has increased between 6% and 19%,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +A different identity,India,30,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,Kaggle competitions,20,50,0,0,30,0,Computer Vision,Other (please specify; separate by semi-colon),A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,"Not employed, but looking for work",,,,,,,,Tableau,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),,"DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,0,0,80,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,Other,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,SQL",,Rarely,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,0,0,100,0,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Researcher",Self-taught,30,20,30,10,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",I prefer not to answer,Technology,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,25,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Business Analyst,Self-taught,50,40,0,0,10,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",,Financial,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,0,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",High school,Other,500 to 999 employees,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,SQL,Tableau",,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,Sometimes,Most of the time,,,,,,,,Often,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",Sometimes,Sometimes,,,,Most of the time,Most of the time,Often,Often,,,Often,,,,Often,,Sometimes,,,Often,,Often,,,Often,,,,Often,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Most of the time,Sometimes,,Often,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,I don't write code to analyze data,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed part-time,,,No,Yes,Computer Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,45,25,10,20,0,0,"Adversarial Learning,Natural Language Processing,Speech Recognition,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,CRM/Marketing,10 to 19 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",15,85,0,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",50,0,0,50,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Always,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Google Cloud Compute,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Java,KNIME (commercial version),MATLAB/Octave,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAP BusinessObjects Predictive Analytics,SAS Enterprise Miner,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Natural Language Processing",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,,,,,Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,,1GB,"CNNs,GANs,Neural Networks,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Business Analyst,Self-taught,95,0,5,0,0,0,"Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,Often,,,,,,,,Rarely,,,,Rarely,,,,,,,,,,Sometimes,,Often,,,,,,Sometimes,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,,Often,Often,Often,,,,,,,Often,Often,,,,,Sometimes,Often,Often,,,Often,,,Often,Often,,,,25,25,25,10,15,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Often,,,Sometimes,Sometimes,,,Rarely,,,,,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,Recommendation Engines,Logistic Regression,"Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Decision Trees - Gradient Boosted Machines,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,29,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,,Self-taught,40,20,20,20,0,0,"Computer Vision,Speech Recognition","Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,Deep learning,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,,"Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,0,80,20,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Reinforcement learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased significantly,6-10 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10GB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Java,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,Rarely,,,,,,,,,,,,,Often,,Sometimes,,,,,,Sometimes,,Often,,Often,,,,Often,,,,,,,,,,,Often,,,Sometimes,Often,,Often,Sometimes,,,"Logistic Regression,Neural Networks",,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,46,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Data Analyst,Software Developer/Software Engineer",University courses,50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Microsoft R Server (Formerly Revolution Analytics),NoSQL,QlikView,R,Spark / MLlib,SQL,TensorFlow",Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,,,,Most of the time,,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,Most of the time,Most of the time,,,,Most of the time,,,,,,"Association Rules,kNN and Other Clustering,Lift Analysis,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,Most of the time,,,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Most of the time,,Most of the time,,,Most of the time,,,,,,Most of the time,,,51-75% of projects,More internal than external,Central Insights Team,"Multiple data such as weather, events ","Granularity, Format & quality ","Column-oriented relational (e.g. KDB/MariaDB),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,30,20,40,0,0,Supervised Machine Learning (Tabular Data),,A doctoral degree,Internet-based,20 to 99 employees,Decreased significantly,1-2 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Prescriptive Modeling,Simulation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,30,10,10,0,0,"Computer Vision,Machine Translation,Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,Programmer,University courses,20,0,0,80,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Other,100 to 499 employees,Stayed the same,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Bayesian Methods,Python,,Kaggle,,,,,,,Not Useful,,,,,,,,,,,,Other (Separate different answers with semicolon),< 1 year,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,,"Data Analyst,Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Google Cloud Compute,Deep learning,Python,"GitHub,Google Search",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Engineer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data,Other",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Perl,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Often,Most of the time,,,,,,,,,,,Often,,,,Often,,Most of the time,,,,"A/B Testing,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,Often,Sometimes,,Most of the time,Most of the time,Often,Sometimes,Sometimes,,Sometimes,Sometimes,Often,,Often,,Often,Often,Often,Most of the time,,Often,,Sometimes,Sometimes,,Often,Sometimes,Often,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,44,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Business Analyst,Self-taught,50,10,40,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,1GB,Regression/Logistic Regression,"Jupyter notebooks,R",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,48,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,DBA/Database Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,6 to 10 years,Other,Work,0,0,0,100,0,0,Time Series,Logistic Regression,High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,Researcher,University courses,10,10,30,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",Self-taught,60,10,20,0,10,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",,Financial,,,,,Important,,Laptop or Workstation and private datacenters,"Image data,Text data",,,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","C/C++,R,SAS Base,SQL,Tableau",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,Often,,,Sometimes,,,,,,,"Data Visualization,Decision Trees,Prescriptive Modeling,Random Forests",,,,,,,Sometimes,Often,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,60,20,10,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Other",Self-taught,40,40,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Microsoft SQL Server Data Mining,Python,SQL",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",Most of the time,Often,,,,,Most of the time,Sometimes,,,,,,,,Often,,,,,,,Sometimes,,,,,,,Often,,,,20,30,10,20,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,33,Employed full-time,,,No,Yes,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,Computer Science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Telecommunications,"10,000 or more employees",,,,,,Laptop or Workstation and local IT supported servers,Text data,Never,,"Bayesian Techniques,Decision Trees,Random Forests","IBM Cognos,QlikView,R,SQL,Tableau",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,30,10,10,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,25,Employed full-time,,,Yes,,Data Scientist,,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",Work,30,10,10,30,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Telecommunications,"10,000 or more employees",Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Most of the time,10GB,"CNNs,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,RNNs",Most of the time,,,Sometimes,,Most of the time,Most of the time,,,,,,,,,Often,,Often,Sometimes,,,,Sometimes,,Sometimes,,,,,,,,,20,30,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,40,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,High school,Other,100 to 499 employees,Stayed the same,,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Scientist,Kaggle competitions,0,30,20,0,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - GANs",A master's degree,Mix of fields,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R",,Rarely,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,,,,,,"Logistic Regression,Neural Networks,Random Forests",,,,,,,,,,,,,,,,Often,,,,Often,,,Often,,,,,,,,,,,20,30,0,10,40,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database",Sometimes,,,Sometimes,Often,,,,,,,,,,,,,Often,,,,,26-50% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",,220000,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,Singapore,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Conferences,Friends network,Kaggle,Online courses,Personal Projects,YouTube Videos",Very useful,,,,Very useful,Somewhat useful,Somewhat useful,,,,Very useful,Very useful,,,,,,Very useful,"Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,40,40,10,10,0,0,"Computer Vision,Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Increased significantly,Don't know,An external recruiter or headhunter,Somewhat important,Research that advances the state of the art of machine learning,"Laptop or Workstation and local IT supported servers,Other",Image data,Sometimes,10GB,CNNs,"C/C++,Java,Mathematica,MATLAB/Octave,Python,SQL",,,,Sometimes,,,,,,,,,,,Rarely,,,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,"CNNs,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Most of the time,,,Often,,,,,,,Often,,,,,,Often,Sometimes,,,,,,,,,,,,,0,85,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Limitations of tools,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,Often,,,,,,Sometimes,Most of the time,,,100% of projects,More external than internal,Standalone Team,ImageNet; VOC ,Annotation,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,40,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,6 to 10 years,Researcher,Self-taught,50,30,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Logistic Regression,High school,Academic,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Image data,Video data",Rarely,1GB,"CNNs,Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,"Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,20,40,10,10,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,Fewer than 10 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,100MB,CNNs,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,"CNNs,Random Forests",,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,0,0,0,0,0,0,,Other,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Very useful,,,,Somewhat useful,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Programmer,Researcher,Statistician",Work,0,0,50,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Researcher",Self-taught,50,0,30,0,10,10,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Support Vector Machines (SVMs)",A bachelor's degree,Retail,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Most of the time,10GB,"Decision Trees,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,Self-taught,NA,NA,NA,NA,NA,NA,Reinforcement learning,Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Statistician",University courses,30,0,30,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,"10,000 or more employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Other,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Relational data,Sometimes,10GB,"Decision Trees,Ensemble Methods,Random Forests","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Prescriptive Modeling,Random Forests,Simulation",Rarely,,,,,Rarely,Often,Rarely,Rarely,,,,,,,,,,,,,Rarely,Rarely,,,,Rarely,,,,,,,80,0,0,20,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,50,50,0,0,0,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",CRM/Marketing,"1,000 to 4,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,29,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Predictive Modeler",Self-taught,30,30,30,0,0,10,Recommendation Engines,Logistic Regression,,Financial,20 to 99 employees,Increased slightly,6-10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,Python,University/Non-profit research group websites,"College/University,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,,,,,,,,,Very useful,Very useful,,,Very useful,,< 1 year,,,,,,,,,,,,,,,,,PhD,No,I did not complete any formal education past high school,,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Other,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Other,Text data,,,,"C/C++,Java,R,SAS Base,SQL,Tableau",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,Most of the time,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,0,0,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,19,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Poland,42,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",R,Deep learning,C/C++/C#,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Adversarial Learning,Bayesian Techniques,,Academic,"1,000 to 4,999 employees",Decreased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,,,,Markov Logic Networks,Amazon Machine Learning,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,90,0,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,None,Entirely internal,IT Department,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Bitbucket,,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,South Korea,25,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",Matlab,University/Non-profit research group websites,"Arxiv,Conferences,Stack Overflow Q&A",Very useful,,,,Somewhat useful,,,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,"GPU accelerated Workstation,Workstation + Cloud service",2 - 10 hours,Github Portfolio,No,Master's degree,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,India,24,Employed full-time,,,No,Yes,Business Analyst,,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Statistician",University courses,30,0,0,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,Operations Research Practitioner,Self-taught,50,0,10,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Retail,"10,000 or more employees",Decreased slightly,6-10 years,A tech-specific job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Random Forests,Regression/Logistic Regression,Other","Oracle Data Mining/ Oracle R Enterprise,Python,R,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,Sometimes,,Rarely,,,,,,,,,,,,Often,,,Most of the time,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,30,20,20,20,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,Sometimes,,,Often,,Most of the time,Most of the time,,,,Often,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,23,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,,,0,30,NA,30,0,40,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,15,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Arxiv,Blogs,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Somewhat useful,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,FastML Blog,1-2 years,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Kaggle Competitions,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Reinforcement learning","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Stack Overflow Q&A,YouTube Videos",Very useful,,,,,,,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,Computer Vision,"Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Never,1TB,"Bayesian Techniques,CNNs,Neural Networks,SVMs",C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,30,30,40,0,0,0,Enough to run the code / standard library,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Often,,,,,,,,,,,,,Less than 10% of projects,More internal than external,Central Insights Team,imagenet,labeling,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Rarely,"150,000",TWD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Computer Vision,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,"Data Analyst,Data Scientist",University courses,100,0,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A doctoral degree,Academic,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Text data,Sometimes,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R,TensorFlow",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests",,,,,,Sometimes,Most of the time,Often,,,,Often,,Most of the time,,,,,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,R,Neural Nets,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Friends network,Kaggle,Online courses,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,Very useful,,,Somewhat useful,,< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Unsupervised Learning,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,3 to 5 years,Statistician,Self-taught,50,30,20,0,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A doctoral degree,Other,"10,000 or more employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Don't know,1GB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Logistic Regression,Random Forests",,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,,,0,90,10,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",High school,Internet-based,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Never,10GB,CNNs,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Neural Networks",,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,50,50,0,0,0,0,Enough to explain the algorithm to someone non-technical,Explaining data science to others,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Kaggle,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,20,20,0,20,0,Computer Vision,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,23,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,20,Employed part-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,Statistician,University courses,25,25,25,25,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,CRM/Marketing,20 to 99 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,46,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Machine Learning Engineer,University courses,30,5,10,55,0,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,37,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,1 to 2 years,Data Analyst,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,Hadoop/Hive/Pig,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Business Analyst,Data Miner,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,Retail,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,,,"Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,20,20,10,25,25,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Do not know,Business Department,,,,,,,Sometimes,76000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,India,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),,,Technology,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Other,Workstation + Cloud service,Text data,Never,,"Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,NoSQL,Perl,Python,R,SQL,Tableau,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,Sometimes,Sometimes,,Often,,,,,,,,,Sometimes,,,Sometimes,,,Often,,,,"kNN and Other Clustering,Random Forests,Time Series Analysis",,,,,,,,,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,Rarely,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,52,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,R,Time Series Analysis,SQL,University/Non-profit research group websites,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,More than 10 years,Business Analyst,Self-taught,100,0,0,0,0,0,Survival Analysis,Bayesian Techniques,,Financial,Fewer than 10 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,Most of the time,1TB,Bayesian Techniques,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Random Forests",Often,,,,,Often,Often,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,50,10,5,25,10,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Most of the time,,,,,,,,,,,,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Business Department,oracle,labelling,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,250000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Biology,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,6-10 years,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,Neural Networks,"Java,NoSQL,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,55,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Fine arts or performing arts,More than 10 years,"Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Work,50,20,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Internet-based,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,"Data Analyst,Data Scientist",Work,60,NA,20,10,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,Natural Language Processing,Bayesian Techniques,A professional degree,Telecommunications,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,53,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,75,0,10,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Other,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Never,1GB,Decision Trees,"Hadoop/Hive/Pig,Java,MATLAB/Octave,Python,QlikView,SQL",,,,,,,,,Rarely,,,,,,Sometimes,,,,,,Often,,,,,,,,,,Often,Often,,,,,,,,,,Most of the time,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,60,0,40,0,0,0,,Company politics / Lack of management/financial support for a data science team,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",20,25,10,35,10,0,"Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,Financial,100 to 499 employees,Stayed the same,6-10 years,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,NoSQL,Social Network Analysis,Python,Google Search,"Tutoring/mentoring,YouTube Videos",,,,,,,,,,,,,,,,,Not Useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,10,35,0,10,0,45,Natural Language Processing,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,50,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Logistic Regression",,Financial,I don't know,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Always,1TB,Regression/Logistic Regression,"Hadoop/Hive/Pig,SQL,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,35,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,"GitHub,Google Search",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,70,0,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Retail,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,10MB,"Random Forests,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,I don't write code to analyze data,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Predictive Modeler,Statistician",University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,,,,,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Sometimes,10GB,Regression/Logistic Regression,"Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Minitab,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,Rarely,,Rarely,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Logistic Regression,Prescriptive Modeling,Time Series Analysis",,,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,50,0,0,0,50,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,0,0,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Software Developer/Software Engineer,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,64,Employed full-time,,,Yes,,Researcher,Perfectly,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A social science,More than 10 years,Researcher,Work,30,0,70,0,0,0,"Survival Analysis,Other (please specify; separate by semi-colon)",Logistic Regression,A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Africa,32,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Online courses,Textbook",,Very useful,,,,,Very useful,,,,Very useful,,,,,,,,Other (Separate different answers with semicolon),< 1 year,,Nice to have,,,,Necessary,Necessary,Nice to have,Necessary,Necessary,,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Electrical Engineering,I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Spark / MLlib,Link Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Non-Kaggle online communities,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,Very useful,,Somewhat useful,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"FlowingData Blog,KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Researcher,Software Developer/Software Engineer,Other",Self-taught,60,10,30,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Ensemble Methods,Logistic Regression",A doctoral degree,Technology,"10,000 or more employees",Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Don't know,1TB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Python,R,SQL,Tableau,Unix shell / awk,Other",,Often,,,,,,,,,,,,,,,Often,,Sometimes,,,,Often,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Sometimes,,,Often,Often,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Often,,,,,,Sometimes,,Sometimes,,Sometimes,,,Often,,Often,,,,,,,,,,,60,20,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,Most of the time,Often,,Less than 10% of projects,Entirely internal,Other,none,"governance, access, and completeness","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Git,Rarely,"140,000",USD,Has increased 20% or more,3,,,,,,,,,,,,,,,,,, +Male,India,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Miner,Engineer,Software Developer/Software Engineer",Self-taught,60,30,5,0,0,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Business Analyst,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,41,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,19,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,32,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Personal Projects",,,,,,,,,,,Very useful,Very useful,,,,,,,,< 1 year,Unnecessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,Other,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,No,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,50,0,40,0,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Rarely,10GB,,"Amazon Web services,C/C++,Google Cloud Compute,Java,MATLAB/Octave,Python",,Rarely,,Most of the time,,,,Rarely,,,,,,,Sometimes,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,SVMs",Often,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Often,,,,,,10,50,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,YouTube Videos",,,,,,,Very useful,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,R Bloggers Blog Aggregator,The Analytics Dispatch Newsletter",< 1 year,,,Nice to have,,,Necessary,Necessary,,Nice to have,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),,Business Analyst,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Machine Translation,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,3-5,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,45,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Business Analyst,Statistician",Self-taught,50,30,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning",Decision Trees - Random Forests,A bachelor's degree,Financial,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),"N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,10MB,,"Microsoft Excel Data Mining,NoSQL,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Republic of China,27,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Other,I haven't started working yet",University courses,20,10,20,50,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,22,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,,Kaggle competitions,20,0,0,0,80,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,100MB,Other,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Business Analyst,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Official documentation,Personal Projects,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,Very useful,,Very useful,,,Somewhat useful,,,,No Free Hunch Blog,< 1 year,,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Necessary,Necessary,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Statistician,Other",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,30,0,20,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Non-Kaggle online communities,Podcasts,Tutoring/mentoring,YouTube Videos",,,,,Very useful,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,Very useful,Somewhat useful,Data Elixir Newsletter,1-2 years,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,,,,,Basic laptop (Macbook),,Master's degree,No,Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,80,10,0,0,10,0,Adversarial Learning,"Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,60,25,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,"Some college/university study, no bachelor's degree",Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,65,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,0,0,0,100,0,Computer Vision,Gradient Boosting,,Academic,20 to 99 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Image data,Never,10MB,Gradient Boosted Machines,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A bachelor's degree,Academic,100 to 499 employees,Decreased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,NA,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,C/C++/C#,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,Data Machina Newsletter,1-2 years,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,,,,,Traditional Workstation,,Experience from work in a company related to ML,Yes,Bachelor's degree,Other,3 to 5 years,I haven't started working yet,Self-taught,30,30,10,0,30,0,Survival Analysis,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,33,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,Partially Derivative Podcast,< 1 year,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Africa,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,80,0,10,10,0,Time Series,Logistic Regression,,Technology,500 to 999 employees,Decreased slightly,Don't know,Some other way,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Don't know,,Bayesian Techniques,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Recommendation Engines,Other (please specify; separate by semi-colon),A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,27,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,NoSQL,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Kaggle,YouTube Videos",,,Very useful,,,,Very useful,,,,,,,,,,,Somewhat useful,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,Basic laptop (Macbook),11 - 39 hours,PhD,Yes,Master's degree,Mathematics or statistics,,"Computer Scientist,Data Scientist,Programmer",University courses,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Researcher,University courses,80,20,0,0,0,0,Computer Vision,"Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,50,20,20,0,0,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Logistic Regression",A bachelor's degree,Internet-based,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,R,Support Vector Machines (SVM),R,GitHub,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Software Developer/Software Engineer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Computer Vision,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Pakistan,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,39,1,10,10,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,1 to 2 years,"Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,80,0,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,60,Employed full-time,,,Yes,,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist,Engineer,Operations Research Practitioner",University courses,10,30,45,10,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,0,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Technology,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Rarely,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,Tableau,TensorFlow",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,Often,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,"Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks",,,,,,,Most of the time,,Sometimes,,,Most of the time,,,,Sometimes,,,,Often,,,,,,,,,,,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Engineer,Operations Research Practitioner,Researcher",Self-taught,50,10,0,0,40,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,50,0,25,25,0,0,Computer Vision,"Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,20 to 99 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Never,10GB,"CNNs,Neural Networks","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,"CNNs,Data Visualization,Segmentation",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,10,60,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,,,,,,,,Often,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Time Series,"Gradient Boosting,Logistic Regression",A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Business Analyst,Data Scientist",Self-taught,30,70,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,"1,000 to 4,999 employees",Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Never,<1MB,,"C/C++,Jupyter notebooks,Python,SQL",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Decision Trees,PCA and Dimensionality Reduction,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Business Analyst,Researcher",Self-taught,25,15,60,0,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Mix of fields,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Always,10MB,"Bayesian Techniques,Decision Trees,Random Forests","IBM SPSS Statistics,Python,QlikView,R,SQL,Tableau,Other,Other",,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,Often,,,,,,,,,Sometimes,,,Rarely,,,,Most of the time,Sometimes,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,Sometimes,,Often,,,,,,,Rarely,,,Sometimes,,,Sometimes,,,,,20,5,30,20,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,27,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Programmer,Software Developer/Software Engineer",Self-taught,70,15,0,0,0,15,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"10,000 or more employees",,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Genetic & Evolutionary Algorithms,C/C++/C#,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,Linear Digressions Podcast,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,Basic laptop (Macbook),0 - 1 hour,Online Courses and Certifications,No,Master's degree,Electrical Engineering,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,18,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Other (please specify; separate by semi-colon),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Scientist,Other",University courses,70,5,5,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Video data,Text data,Relational data",Always,100MB,Regression/Logistic Regression,"IBM SPSS Modeler,Python,R,SQL",,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,PCA and Dimensionality Reduction,Text Analytics,Time Series Analysis",,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,Sometimes,Sometimes,,,,10,30,40,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Unavailability of/difficult access to data,Other",Sometimes,Sometimes,,,Often,Often,,,Often,Often,Sometimes,,,Sometimes,,Sometimes,Sometimes,Sometimes,,,Sometimes,Sometimes,76-99% of projects,More internal than external,Standalone Team,,lack of keys for merging datasets,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,"Git,Other",Rarely,88000,GBP,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,Netherlands,31,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,24,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,15,15,0,65,5,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed part-time,,,Yes,,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,43,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,Computer Scientist,Work,25,25,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,Recommendation Engines,,,Technology,500 to 999 employees,,,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,49,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer",Self-taught,50,30,20,0,0,0,"Machine Translation,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,1GB,"CNNs,Decision Trees,Neural Networks","Amazon Web services,C/C++,Google Cloud Compute,Jupyter notebooks,Perl,Python,R,Unix shell / awk",,Often,,Sometimes,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,Often,,Often,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Machine Learning Engineer,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",60,20,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,I prefer not to answer,Stayed the same,Don't know,,Important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Image data,,,,"C/C++,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,Rarely,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Decision Trees,GANs,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,SVMs",,,,Most of the time,,,,Sometimes,,,Sometimes,,,,,Rarely,,,,Often,Sometimes,,,,,,,Often,,,,,,10,80,0,10,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,Other,,,Subversion,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,31,Employed part-time,,,Yes,,Business Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",University courses,10,10,40,40,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Internet-based,Fewer than 10 employees,Increased slightly,3-5 years,A tech-specific job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"HMMs,Neural Networks,Regression/Logistic Regression,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,30,20,10,10,0,Natural Language Processing,"Logistic Regression,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,1MB,"Decision Trees,Random Forests,SVMs","Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Often,,,,"Natural Language Processing,SVMs",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Often,,,,,,20,30,10,10,30,0,Enough to run the code / standard library,Lack of significant domain expert input,,,,,,,,,,,Often,,,,,,,,,,,,26-50% of projects,More internal than external,Central Insights Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Other,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,38,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,Other,University courses,50,50,0,0,0,0,Computer Vision,Neural Networks - CNNs,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,Researcher,University courses,5,20,65,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Greece,25,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Programmer,Researcher",University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Stayed the same,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",High school,Internet-based,"1,000 to 4,999 employees",Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,45,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer",Self-taught,50,20,0,30,0,0,Natural Language Processing,"Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",High school,Technology,"5,000 to 9,999 employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,80,0,0,20,0,0,Supervised Machine Learning (Tabular Data),,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,37,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Other,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,,Self-taught,72,20,5,2,1,0,Other (please specify; separate by semi-colon),,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,Academic,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,"Programmer,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,50,Employed full-time,,,Yes,,Engineer,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",Less than a year,Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,Decision Trees - Gradient Boosted Machines,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,36,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,6 to 10 years,"Engineer,Programmer",University courses,50,10,5,5,30,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",Primary/elementary school,Telecommunications,"10,000 or more employees",Increased slightly,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Time Series,"Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,26,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,1 to 2 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,10,0,30,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Gradient Boosting",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,29,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Financial,Fewer than 10 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,SVMs",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed part-time,,,Yes,,Data Miner,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,No,Yes,Scientist/Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A health science,Less than a year,Researcher,Self-taught,100,0,0,0,0,0,Natural Language Processing,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,DBA/Database Engineer,Self-taught,100,0,0,0,0,0,"Recommendation Engines,Time Series","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Scientist,Statistician",Work,20,0,70,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Internet-based,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,100 to 499 employees,Decreased significantly,More than 10 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Most of the time,10MB,"Decision Trees,GANs,HMMs,Neural Networks,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Fine arts or performing arts,Less than a year,"Other,I haven't started working yet",,25,0,0,0,25,50,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Engineer,Self-taught,50,30,10,5,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,20,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,34,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,52,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,22,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,80,5,5,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,17,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Neural Nets,Python,,"Stack Overflow Q&A,Textbook",,,,,,,,,,,,,,Somewhat useful,Somewhat useful,,,,O'Reilly Data Newsletter,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,,0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,32,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,No,Yes,Programmer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer,Other",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Unsupervised Learning,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher",University courses,80,10,0,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,,10MB,"Neural Networks,Random Forests,SVMs","MATLAB/Octave,R",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation,SVMs",,,,,,Most of the time,,,,Often,,,,,,,,,,Most of the time,Sometimes,,Most of the time,,,,Often,Most of the time,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,60,20,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,,,A master's degree,Insurance,"10,000 or more employees",Decreased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,40,0,10,50,0,0,Natural Language Processing,"Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Always,1PB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Java,Python",,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Logistic Regression,Natural Language Processing,RNNs,SVMs",,,,,,,,,,,,,,,,Often,,,Most of the time,,,,,,Sometimes,,,Often,,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,Computer Scientist,University courses,10,20,20,45,5,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Increased slightly,Don't know,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data,Text data",Never,100GB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,RNNs,SVMs","C/C++,Python",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,HMMs,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs,Text Analytics",,,,,,Often,Sometimes,,Often,,,,Rarely,,,,,Sometimes,Sometimes,Often,Often,,Sometimes,,,Often,,Most of the time,Often,,,,,15,25,35,10,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Sometimes,,,,Most of the time,,Often,,,,,,,Often,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Speech Recognition,,No education,Technology,"10,000 or more employees",Decreased slightly,1-2 years,"A friend, family member, or former colleague told me",Not at all important,,,,,,,"Python,QlikView",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,3 to 5 years,"Engineer,Predictive Modeler,Researcher",Self-taught,50,0,10,30,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,500 to 999 employees,Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data,Other",Sometimes,1GB,,"C/C++,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Rarely,,Sometimes,,,Rarely,Rarely,,,,,,Rarely,,,,Sometimes,,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,,Self-taught,60,30,0,0,10,0,"Machine Translation,Time Series",,,Internet-based,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Workstation + Cloud service",Relational data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,22,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Statistician,Work,70,0,20,0,10,0,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Decreased slightly,1-2 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,49,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,43,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,A bachelor's degree,Other,"10,000 or more employees",,Don't know,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,1GB,Other,"KNIME (free version),Python,R,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,,Often,,,Rarely,,,Rarely,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,50,0,0,50,0,0,,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Subversion,Sometimes,28000,USD,Has increased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Predictive Modeler,Fine,Employed by professional services/consulting firm,SAS Enterprise Miner,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Kaggle,Non-Kaggle online communities,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,Very useful,,Very useful,,,,,Somewhat useful,,,Somewhat useful,,"R Bloggers Blog Aggregator,Siraj Raval YouTube Channel",< 1 year,Necessary,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),,,Yes,Master's degree,,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,5,15,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Technology,20 to 99 employees,,,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,28,Employed full-time,,,Yes,,Machine Learning Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Machine Learning Engineer,Work,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,22,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,24,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",15,40,15,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Support Vector Machines (SVMs)",A professional degree,Mix of fields,"5,000 to 9,999 employees",Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,47,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",10,80,0,0,10,0,Reinforcement learning,Other (please specify; separate by semi-colon),I prefer not to answer,Financial,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Never,,Other,"SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,0,0,0,0,0,100,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,25,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",0,40,50,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,100TB,"Bayesian Techniques,CNNs,Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R",,Rarely,,,,,,,,,,,,,,,Often,,,,Rarely,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,60,0,5,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,23,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,Other,Self-taught,45,45,10,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,50,0,35,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,CRM/Marketing,"1,000 to 4,999 employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,I never declared a major,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,"10,000 or more employees",Increased slightly,6-10 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,39,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Machine Learning Engineer,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,"Natural Language Processing,Recommendation Engines","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Technology,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,80,0,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist",Work,30,40,10,10,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting",A bachelor's degree,Financial,I prefer not to answer,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Traditional Workstation","Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Minitab,NoSQL,Python,R,RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,Sometimes,Often,,,,,Most of the time,,,,,,,,,Rarely,Sometimes,,,,Often,,Most of the time,,Often,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling,Random Forests",,,,,,,Most of the time,Often,,,,,,,,Often,,,,,,Often,Often,,,,,,,,,,,80,2,2,6,10,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,Often,,,Most of the time,Often,,,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Researcher,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,25,10,35,20,10,0,Computer Vision,Decision Trees - Gradient Boosted Machines,Primary/elementary school,Academic,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Rarely,10MB,Decision Trees,"Amazon Web services,C/C++,Hadoop/Hive/Pig",,Sometimes,,Most of the time,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to refine and innovate on the algorithm,"Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team",,,,,,,,,,Often,,,Often,,,Most of the time,,,,,,,10-25% of projects,More internal than external,IT Department,,,"Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,23,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,53,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,40,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",10,70,10,5,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",Evolutionary Approaches,I prefer not to answer,Academic,100 to 499 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,,10GB,"Decision Trees,Evolutionary Approaches,Neural Networks,Random Forests,SVMs","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),Python,Spark / MLlib",,,,,Sometimes,,,,Sometimes,,,,,,Most of the time,,Often,,Rarely,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"Data Visualization,Decision Trees,Evolutionary Approaches,Random Forests,SVMs",,,,,,,Often,Sometimes,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,5,50,10,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Scaling data science solution up to full database",,,,,,,,,Most of the time,,Most of the time,,,,,,,Often,,,,,100% of projects,Entirely external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,35,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),Less than a year,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,25,0,0,0,25,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Evolutionary Approaches",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Video data",Don't know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Manufacturing,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,100MB,,QlikView,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,23,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Retail,"1,000 to 4,999 employees",Decreased slightly,Less than one year,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,100MB,,"Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python",,,,,,,,,,,,,,,,,Sometimes,,,,,,Most of the time,,Most of the time,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,20,0,0,0,80,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,Column-oriented relational (e.g. KDB/MariaDB),Commercial Data Platform,,Other,Don't know,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer",Self-taught,20,20,40,0,20,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,36,Employed full-time,,,Yes,,Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Retail,500 to 999 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,10GB,Other,"IBM Cognos,SQL",,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Evolutionary Approaches,Segmentation",,,,,,,Often,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Other",Work,70,10,5,0,15,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Nigeria,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",5,90,5,0,0,0,,,A master's degree,Internet-based,,,,,,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,,,"Jupyter notebooks,Microsoft Azure Machine Learning,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,10,10,20,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Central Insights Team,Government data,Questionable,,Commercial Data Platform,,Bitbucket,Most of the time,36000,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,37,Employed full-time,,,No,Yes,Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,Google Search,Kaggle,,,,,,,Very useful,,,,,,,,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,20,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Mix of fields,100 to 499 employees,Increased significantly,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Don't know,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Flume,Hadoop/Hive/Pig,Java,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL",,,,Sometimes,,,Sometimes,,Often,,,,,,Often,,Often,,,,,,,,,,,Often,,,Most of the time,,Often,,,,,,,,Most of the time,Often,,,,,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Often,Often,Often,Most of the time,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,Most of the time,,,Most of the time,,Most of the time,Most of the time,Often,Often,,Often,Most of the time,,Often,,Often,Often,,,,,40,10,20,20,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,24,Employed part-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,0,20,10,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,100 to 499 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,26,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,30,10,60,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Other (please specify; separate by semi-colon),,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,45,5,20,15,0,"Computer Vision,Natural Language Processing,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Text data",,1GB,,"Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,30,30,5,10,25,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,26,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)",Logistic Regression,A master's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,41,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,R,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Becoming a Data Scientist Podcast,Data Stories Podcast,R Bloggers Blog Aggregator",1-2 years,Nice to have,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Experience from work in a company related to ML,No,Doctoral degree,Computer Science,,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,48,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",10,70,10,5,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",Primary/elementary school,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10GB,"Bayesian Techniques,Markov Logic Networks","Amazon Machine Learning,C/C++,Google Cloud Compute,IBM Watson / Waton Analytics,Julia,Perl,Python",Sometimes,,,Often,,,,Rarely,,,,,Rarely,,,Sometimes,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Logistic Regression,Markov Logic Networks,Naive Bayes",Sometimes,,,,,,Often,,,,,,,,,Rarely,Rarely,Sometimes,,,,,,,,,,,,,,,,30,20,5,25,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Often,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,"60,000,000",KRW,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Spain,35,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,33,Employed full-time,,,Yes,,Data Analyst,,"Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,A social science,3 to 5 years,Data Analyst,Work,70,20,10,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Reinforcement learning,Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Evolutionary Approaches,Markov Logic Networks",,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,30,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",30,40,15,15,0,0,"Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,27,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,10,0,0,90,0,0,Computer Vision,"Bayesian Techniques,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Other,100 to 499 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Adversarial Learning,Evolutionary Approaches,A master's degree,Manufacturing,I prefer not to answer,Decreased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I did not complete any formal education past high school,,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,20,60,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,25,Employed part-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Data Analyst,University courses,10,30,10,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,10,5,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Business Analyst,Self-taught,30,40,30,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Computer Vision,,Primary/elementary school,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Important,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Hong Kong,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",Work,10,10,40,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)",,Academic,20 to 99 employees,Stayed the same,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,1,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,Self-taught,50,20,10,20,0,0,Natural Language Processing,"Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,10 to 19 employees,Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,,,,,,,C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,0,0,0,0,0,0,,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Time Series,Neural Networks - CNNs,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,1GB,Neural Networks,C/C++,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer",Self-taught,20,10,30,40,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,31,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by college or university,C/C++,Time Series Analysis,Python,Other,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter",< 1 year,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,Basic laptop (Macbook),,Github Portfolio,Sort of (Explain more),Master's degree,,Less than a year,Business Analyst,Self-taught,60,0,20,20,0,0,Time Series,Neural Networks - GANs,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A professional degree,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,kNN and Other Clustering,Prescriptive Modeling,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,Sometimes,,,,,,,,Rarely,,,,,,,,Most of the time,,,,40,20,10,10,20,0,Enough to run the code / standard library,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,,,,,,,,,Often,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,6 to 10 years,Researcher,University courses,10,0,0,90,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Hidden Markov Models HMMs",A master's degree,Academic,I don't know,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Researcher",University courses,10,40,0,50,0,0,"Computer Vision,Machine Translation,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Sometimes,100GB,"CNNs,Ensemble Methods,Random Forests,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Decision Trees,Natural Language Processing,Random Forests,SVMs",,,,Most of the time,,,,Often,,,,,,,,,,,Sometimes,,,,Often,,,,,Often,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Explaining data science to others,Limitations of tools",,,,Often,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,20,20,40,10,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,"Online courses,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,,No,Master's degree,I never declared a major,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,41,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,DBA/Database Engineer,Work,0,70,20,4,6,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Online courses,Textbook",,,Somewhat useful,,,,,,,,Somewhat useful,,,,Very useful,,,,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,,Necessary,,Necessary,Nice to have,Necessary,Necessary,Necessary,,,Coursera,Basic laptop (Macbook),2 - 10 hours,Other,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,40,0,15,5,0,,Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer",University courses,20,40,0,40,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Increased significantly,Less than one year,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Always,10MB,"Neural Networks,RNNs","Amazon Web services,Python,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,Often,,,,"Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,,,,,Often,,,,,,,Most of the time,,,,,,Most of the time,Sometimes,,,,Most of the time,,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,30,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,I don't write code to analyze data,Researcher,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,DBA/Database Engineer,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,34,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,Other,30,50,10,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Increased slightly,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,28,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,DBA/Database Engineer,Kaggle competitions,25,0,0,0,75,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Retail,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Not very important,Other,Other,Relational data,Never,,,"C/C++,Jupyter notebooks,Python",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,28,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Machine Learning Engineer,Software Developer/Software Engineer",University courses,20,20,30,30,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Retail,"1,000 to 4,999 employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1TB,Regression/Logistic Regression,"Python,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"A/B Testing,Cross-Validation,Lift Analysis,Logistic Regression,Time Series Analysis",Most of the time,,,,,Most of the time,,,,,,,,,Often,Most of the time,,,,,,,,,,,,,,Often,,,,70,10,2,9,9,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,Often,,,,,,,,Often,,,,,,Often,,,10-25% of projects,Approximately half internal and half external,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,36,Employed full-time,,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,,3 to 5 years,"Researcher,Statistician",University courses,30,30,10,30,0,0,,Logistic Regression,A master's degree,Internet-based,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,Amazon Web services,Decision Trees,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Very useful,,,,Very useful,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,Time Series,Logistic Regression,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Biology,More than 10 years,Other,Work,25,25,40,8,2,0,"Computer Vision,Time Series","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Image data,Sometimes,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods","C/C++,Jupyter notebooks,Python",,,,Often,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Random Forests,Time Series Analysis",,,Often,,,Often,Most of the time,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,,35,30,5,15,15,0,Enough to refine and innovate on the algorithm,"Dirty data,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,35,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,64,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,Python,Google Search,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,Other (Separate different answers with semicolon),1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Engineering (non-computer focused),,DBA/Database Engineer,Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,23,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Regression,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,Less than a year,Other,Self-taught,40,0,35,0,25,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A professional degree,Other,10 to 19 employees,Stayed the same,Don't know,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,,Other,"Amazon Web services,Java,Jupyter notebooks,Python,SQL",,Sometimes,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Time Series Analysis",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,50,0,0,50,0,0,Enough to run the code / standard library,"Dirty data,Organization is small and cannot afford a data science team",,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Rarely,7200000,RUB,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Female,India,NA,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Researcher,Statistician",University courses,10,0,30,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",,More than 10 years,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data,Other",,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Julia,Jupyter notebooks,MATLAB/Octave,Python,QlikView,R,SQL,Unix shell / awk",,,,Often,,,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,Rarely,Most of the time,,,,,,,,,Often,,,,,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",,Rarely,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Often,Most of the time,Most of the time,,,,Sometimes,Often,,Most of the time,,,Sometimes,Most of the time,,,Sometimes,,,,10,50,10,10,20,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Privacy issues",,,,,Sometimes,,,,,Rarely,Often,,,,,Sometimes,Sometimes,,,,,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Programmer,Fine,"Employed by non-profit or NGO,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,40,10,50,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Academic,100 to 499 employees,Decreased slightly,1-2 years,Some other way,Not very important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1MB,"Bayesian Techniques,Decision Trees",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,25,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Machine Learning Engineer",University courses,60,0,10,20,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,50,0,0,50,0,0,Computer Vision,,Primary/elementary school,Government,"1,000 to 4,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Other,,,,,,"Java,NoSQL,Orange,SQL,Other",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,35,Employed full-time,,,Yes,,Programmer,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,Programmer,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,34,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,27,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,0,40,30,10,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Ensemble Methods,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Other,Always,1TB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,Unix shell / awk",,,,,Sometimes,,,,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",,,Sometimes,,Often,,Most of the time,,Most of the time,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,,,,Often,,,,70,20,5,4,1,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,DBA/Database Engineer",University courses,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Markov Logic Networks,Neural Networks - RNNs",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Workstation + Cloud service","Text data,Relational data,Other",Sometimes,100GB,"Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Neural Networks","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Python,R,SQL",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,Often,,Often,,Sometimes,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Markov Logic Networks,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,Often,Sometimes,,Often,Most of the time,Often,,,,Often,,,,Most of the time,Often,Often,,,Sometimes,,Often,,,,,,,Often,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,35,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,60,40,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Self-taught,25,65,0,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,36,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,"Predictive Modeler,Researcher",Self-taught,85,15,0,0,0,0,Computer Vision,Support Vector Machines (SVMs),A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,R,Deep learning,R,Google Search,"Blogs,Non-Kaggle online communities,Online courses,YouTube Videos",,Somewhat useful,,,,,,,Somewhat useful,,Very useful,,,,,,,Very useful,,< 1 year,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,"Coursera,DataCamp,edX",Laptop or Workstation and local IT supported servers,2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,,,A bachelor's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Taiwan,32,Employed full-time,,,Yes,,Researcher,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,0,30,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data,Other",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","Amazon Machine Learning,Amazon Web services,Angoss,C/C++,Cloudera,DataRobot,Flume,Google Cloud Compute,Hadoop/Hive/Pig,IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Impala,Java,Julia,Jupyter notebooks,KNIME (commercial version),KNIME (free version),Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Minitab,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Perl,Python,QlikView,R,RapidMiner (commercial version),RapidMiner (free version),Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,Spark / MLlib,SQL,Stan,Statistica (Quest/Dell-formerly Statsoft),Tableau,TensorFlow,TIBCO Spotfire,Unix shell / awk,Other",Rarely,Rarely,Rarely,Rarely,Often,Rarely,Rarely,Often,Often,Often,Often,Often,Often,Rarely,Most of the time,,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Often,Often,Often,Rarely,Rarely,Often,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Often,Often,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,Rarely,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Evolutionary Approaches,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests,Text Analytics,Time Series Analysis",,Often,Often,,,Often,Often,Often,,Often,,Often,,,,Often,,Often,,,,,Often,,,,,,Often,Often,,,,20,50,15,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,26-50% of projects,Do not know,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other","Commercial Data Platform,Company Developed Platform,Email,Other",,Other,Sometimes,340000,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,More than 10 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,10,0,0,50,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,I don't write code to analyze data,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,25,Employed part-time,,,No,Yes,Machine Learning Engineer,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,24,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Kaggle competitions,0,50,30,0,20,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,Financial,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,42,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,Self-taught,85,0,10,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,21,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Programmer,Self-taught,70,30,0,0,0,0,"Computer Vision,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,20 to 99 employees,,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer",University courses,50,10,10,30,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,20 to 99 employees,Increased slightly,Less than one year,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,,Bayesian Techniques,A bachelor's degree,Technology,"10,000 or more employees",,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed part-time,,,Yes,,Data Scientist,,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Business Analyst,University courses,0,0,0,100,0,0,Time Series,Logistic Regression,A bachelor's degree,Insurance,500 to 999 employees,Increased slightly,More than 10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,27,Employed full-time,,,No,Yes,Computer Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,United States,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Other,University courses,10,10,0,80,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Academic,100 to 499 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,25,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Researcher,University courses,10,0,30,60,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,20 to 99 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,33,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,21,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,20,35,35,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Researcher",University courses,25,30,10,30,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Traditional Workstation,Workstation + Cloud service",Relational data,Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","Amazon Web services,Julia,Jupyter notebooks,KNIME (free version),Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,R,SAS Enterprise Miner,SQL",,Rarely,,,,,,,,,,,,,,Rarely,Sometimes,,Rarely,,,,Most of the time,,,,,Often,,,,,Most of the time,,,,,,Sometimes,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",20,60,0,10,10,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",High school,Technology,"5,000 to 9,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,100GB,Neural Networks,"IBM Cognos,IBM SPSS Statistics,Java,NoSQL,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,Sometimes,,Sometimes,,,Often,,,,,,,,,,,,Often,,,,Most of the time,,Often,,,,,,,,,Often,,,Often,Most of the time,,,,,,"Data Visualization,Logistic Regression,Markov Logic Networks,Neural Networks,Text Analytics,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Often,Sometimes,,,Often,,,,,,,,,Sometimes,Often,,,,20,35,10,20,15,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,15,15,20,0,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Internet-based,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Image data,Text data",Rarely,100MB,"Decision Trees,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,IBM Watson / Waton Analytics,Java,MATLAB/Octave,NoSQL,Python,R,SQL,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,Rarely,,Sometimes,,,,,,Sometimes,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Rarely,,Sometimes,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests",,,,Sometimes,,Often,Often,Often,,,,,,Often,,Often,,,,Sometimes,,,Often,,,,,,,,,,,15,20,15,15,10,25,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,New Zealand,35,Employed part-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,10,50,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Engineer,Other,0,50,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,23,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",0,70,0,20,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Insurance,"5,000 to 9,999 employees",Increased slightly,6-10 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Most of the time,10GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Python,R,SQL,Tableau",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Often,,Sometimes,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,Sometimes,Often,Sometimes,Often,,Often,Sometimes,,Often,,Most of the time,Sometimes,,Often,,Often,Often,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Researcher,University courses,10,0,10,80,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",High school,Other,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Data Analyst,Self-taught,40,30,20,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,Primary/elementary school,Insurance,500 to 999 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Gradient Boosted Machines,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,37,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",70,20,10,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Technology,500 to 999 employees,Increased significantly,More than 10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,22,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,47,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,Researcher,Self-taught,50,20,10,10,0,10,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs",A doctoral degree,Academic,500 to 999 employees,Increased slightly,Less than one year,Some other way,Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,Rarely,100MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks","Microsoft Excel Data Mining,Minitab,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Lift Analysis,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,25,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,Work,20,60,20,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,26,Employed full-time,,,Yes,,Statistician,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Researcher",University courses,30,0,30,20,10,10,"Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,6 to 10 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Adversarial Learning,Time Series",,A bachelor's degree,Internet-based,500 to 999 employees,Increased slightly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Always,1TB,,"Flume,Hadoop/Hive/Pig,Java,R,Spark / MLlib,SQL",,,,,,,Sometimes,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Most of the time,Most of the time,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,29,"Not employed, but looking for work",,,,,,,,Microsoft R Server (Formerly Revolution Analytics),Neural Nets,SAS,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,"O'Reilly Data Newsletter,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Engineering (non-computer focused),Less than a year,"Business Analyst,Engineer,Researcher",Self-taught,50,5,20,25,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher",University courses,20,0,10,70,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,A bachelor's degree,Internet-based,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Data Scientist,Self-taught,60,10,30,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,Primary/elementary school,Non-profit,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Relational data,Other",Sometimes,10GB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,32,Employed full-time,,,No,Yes,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,82,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",1,99,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,38,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Researcher,Work,60,0,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Neural Networks,SVMs,Time Series Analysis",,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Often,,Most of the time,,,,50,0,0,30,20,0,"Enough to code it again from scratch, albeit it may run slowly","Explaining data science to others,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,Often,,,,,,Often,,,,,,,,,Often,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,Graph (e.g. GraphBase/Neo4j),Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Business Analyst,Work,70,0,30,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,<1MB,Decision Trees,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,0,0,0,50,50,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,Sometimes,,,,,,,,Often,,76-99% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,"Git,Other",Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,Other,26,Employed part-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,10,20,40,30,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,Often,,,,,,,,,,Rarely,Often,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,SVMs",,Sometimes,,,,Often,Most of the time,Sometimes,Often,,,,,Often,,Sometimes,,,Sometimes,,Most of the time,,,Often,,Sometimes,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,32,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Data Analyst,Data Scientist,Researcher,Statistician",University courses,30,10,30,30,0,0,"Survival Analysis,Time Series",Logistic Regression,,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed part-time,,,Yes,,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Financial,100 to 499 employees,Stayed the same,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,33,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A humanities discipline,3 to 5 years,Operations Research Practitioner,Self-taught,50,25,0,10,15,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United Kingdom,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,I don't write code to analyze data,"Business Analyst,Data Analyst,Researcher,Other",Work,40,10,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,10,0,0,10,20,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,Fewer than 10 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Always,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,Often,,,Most of the time,Often,Most of the time,,,,,,,,Sometimes,,Most of the time,Often,Most of the time,Often,,,,,,,Most of the time,Most of the time,,,,,30,10,5,15,40,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of funds to buy useful datasets from external sources,Limitations of tools,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,Sometimes,,,Sometimes,,,,Often,,,Rarely,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,0,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,44,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician",Self-taught,25,25,25,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Angoss,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (free version),SAS Base,Spark / MLlib,SQL,Tableau,TensorFlow",,,Most of the time,,,,,,Often,,,,,,Rarely,,Most of the time,,Rarely,,,Sometimes,Rarely,Most of the time,Most of the time,,,Rarely,,,Often,,Most of the time,,Most of the time,,,Rarely,,,Often,Most of the time,,,Rarely,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)",,High school,Manufacturing,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Most of the time,10MB,Regression/Logistic Regression,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,0,4,6,0,0,90,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,None,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Other,Rarely,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Mexico,36,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,0,10,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",,Technology,Fewer than 10 employees,Stayed the same,Don't know,A tech-specific job board,Important,Other,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",,100MB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Java,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Orange,Python,R",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Programmer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,Other,Self-taught,50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Financial,Fewer than 10 employees,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,25,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,"Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,30,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer",Self-taught,90,0,10,0,0,0,Survival Analysis,Decision Trees - Random Forests,A master's degree,Retail,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Don't know,<1MB,"Decision Trees,Gradient Boosted Machines,Random Forests","Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Decision Trees,Random Forests",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,30,30,10,0,30,0,Enough to run the code / standard library,The lack of a clear question to be answering or a clear direction to go in with the available data,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,31,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,60,30,10,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Financial,100 to 499 employees,,Less than one year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,0,80,0,0,0,20,"Machine Translation,Natural Language Processing,Speech Recognition","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,,,University courses,80,15,0,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A master's degree,Academic,I don't know,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Decision Trees,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","NoSQL,Python,R,Tableau,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Most of the time,,Most of the time,,,,,,,,,,,,Most of the time,Sometimes,,Most of the time,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Often,Often,Sometimes,,,,,Sometimes,Sometimes,,Often,,Rarely,Often,Sometimes,,,Sometimes,,,,,,Often,Sometimes,,,,75,10,0,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,Often,Often,,,Often,,,Often,Often,Often,Sometimes,,,Often,,Sometimes,Often,,Less than 10% of projects,Entirely internal,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,,Data Scientist,University courses,20,20,15,30,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",Primary/elementary school,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,100MB,Random Forests,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Random Forests",,,,,,Often,Sometimes,Often,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,20,20,30,20,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,,,,,Sometimes,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,27,Employed full-time,,,Yes,,Scientist/Researcher,,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",,Technology,"10,000 or more employees",,,A tech-specific job board,Important,,,"Image data,Video data",,,"CNNs,Decision Trees","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,Rarely,,,,,,CNNs,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,40,0,40,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,,,,,,,,,Most of the time,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,University courses,10,30,10,40,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Don't know,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Random Forests,Time Series Analysis",Often,,,,,,Sometimes,Sometimes,,,,,,,Rarely,Often,,Rarely,,,,,Rarely,,,,,,,Sometimes,,,,50,10,20,10,10,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Psychology,1 to 2 years,,Self-taught,100,0,0,0,0,0,,,High school,Academic,,,,,Somewhat important,,,,,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,37,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,1 to 2 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Scientist,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",5,70,5,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,0,10,60,30,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Always,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,,Rarely,Most of the time,,Often,Often,,,Sometimes,,,,Sometimes,,,Most of the time,,Often,,Most of the time,Rarely,,,,Most of the time,Most of the time,,,,,30,25,37,1,7,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team",,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,,Less than 10% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,common servers,Git,Rarely,130000,RUB,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,Data Scientist,,10,20,30,40,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Recommendation Engines,Survival Analysis",,,Academic,,,,,,,,,,,,Amazon Machine Learning,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,23,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,40,30,30,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation",Bayesian Techniques,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,29,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,"Natural Language Processing,Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",High school,Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Psychology,3 to 5 years,"Computer Scientist,Data Analyst,Programmer,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",35,20,40,5,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,High school,CRM/Marketing,Fewer than 10 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United Kingdom,25,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,30,30,0,10,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,Fewer than 10 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Sometimes,10GB,"CNNs,Neural Networks,RNNs","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Time Series Analysis",,,,Most of the time,,Often,Often,,,,,,,Sometimes,,Sometimes,,,,Most of the time,Rarely,,,,Most of the time,Most of the time,,,,Most of the time,,,,60,20,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,Often,,,,,Most of the time,,,,,Most of the time,,Less than 10% of projects,Entirely internal,IT Department,None,,Document-oriented (e.g. MongoDB/Elasticsearch),Commercial Data Platform,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer",Kaggle competitions,60,0,10,0,30,0,"Natural Language Processing,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A professional degree,Manufacturing,"10,000 or more employees",,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,50,40,0,10,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,39,Employed full-time,,,Yes,,Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Engineer,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,50,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,More than 10 years,"Computer Scientist,DBA/Database Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Machine Learning Engineer",Work,20,0,80,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Angoss,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,QlikView,R,RapidMiner (free version),Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SQL,Tableau,TensorFlow",Often,Often,Sometimes,,Often,,,,Often,,,,,Often,,,Often,,,,Sometimes,Sometimes,Often,Often,Often,,,,,,Often,Sometimes,Often,,Sometimes,Often,,Often,,,,Often,,,Often,Often,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Lift Analysis,Logistic Regression,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics,Time Series Analysis",Often,,,Often,Often,Often,Often,Often,Often,,,,,,Often,Often,,,,Often,,Often,Often,,,Often,,,Often,Often,,,,50,30,0,20,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Privacy issues",Often,Often,,,Sometimes,Often,,,,,,,,,,,Often,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,AUD,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),,High school,Mix of fields,10 to 19 employees,Increased slightly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Social Network Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Very useful,,3-5 years,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Necessary,,,,Coursera,"Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,10,10,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,,Self-taught,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1TB,,"Hadoop/Hive/Pig,R,Spark / MLlib,SQL",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,Most of the time,,,,,,,,,,"A/B Testing,Decision Trees",Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Data Scientist,University courses,60,5,10,10,15,0,Natural Language Processing,Support Vector Machines (SVMs),"Some college/university study, no bachelor's degree",Technology,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1EB,Bayesian Techniques,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,20,20,20,20,15,5,Enough to explain the algorithm to someone non-technical,Privacy issues,,,,,,,,,,,,,,,,,Sometimes,,,,,,51-75% of projects,Entirely external,Central Insights Team,,,Graph (e.g. GraphBase/Neo4j),Email,,Git,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,22,Employed part-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,25,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Manufacturing,10 to 19 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,20,20,20,20,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,26,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",20,10,20,30,20,0,Computer Vision,Neural Networks - CNNs,Primary/elementary school,Academic,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,46,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher",Work,10,0,90,NA,0,0,Outlier detection (e.g. Fraud detection),Neural Networks - CNNs,I prefer not to answer,Non-profit,,,,,,,,,,,,"C/C++,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,NA,Employed full-time,,,No,Yes,DBA/Database Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Researcher,Statistician,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,65,10,5,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Random Forests,Logistic Regression",High school,Academic,100 to 499 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,24,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,"Data Analyst,Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,35,35,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,I prefer not to answer,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,Other (please specify; separate by semi-colon),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,Less than a year,I haven't started working yet,University courses,10,15,0,70,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Miner,Engineer,Operations Research Practitioner,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",10,15,70,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Don't know,1GB,"Ensemble Methods,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,NA,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by a company that performs advanced analytics,SQL,Bayesian Methods,Python,University/Non-profit research group websites,"Online courses,Personal Projects,Podcasts,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,,,,,Somewhat useful,Somewhat useful,Not Useful,,Very useful,,Very useful,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Business Analyst,University courses,20,75,0,0,5,0,,,Primary/elementary school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,30,Employed full-time,,,Yes,,Machine Learning Engineer,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed part-time,,,No,Yes,Business Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",1,5,1,2,0,91,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,21,Employed part-time,,,No,Yes,Statistician,Fine,Employed by college or university,Amazon Web services,Monte Carlo Methods,Python,Google Search,"Non-Kaggle online communities,Online courses,Personal Projects,Textbook",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Machine Learning Engineer,Programmer,Statistician",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,32,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,Other,Traditional Workstation,"Text data,Relational data",Never,100MB,,"Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",45,45,10,0,0,0,Recommendation Engines,Support Vector Machines (SVMs),A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,45,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,19,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,Unsupervised Learning,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",I don't know/not sure,Technology,I don't know,Stayed the same,Don't know,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",University courses,25,0,50,25,0,0,"Computer Vision,Speech Recognition","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Traditional Workstation","Image data,Video data",Most of the time,1GB,"CNNs,Neural Networks,Random Forests","C/C++,Python,SQL,TensorFlow",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Neural Networks,Random Forests,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",30,30,10,30,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Increased slightly,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Business Analyst,University courses,0,80,0,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Manufacturing,"5,000 to 9,999 employees",Decreased significantly,,A general-purpose job board,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,<1MB,,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Belgium,44,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,6 to 10 years,Other,Work,50,15,35,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Academic,"5,000 to 9,999 employees",,Don't know,Some other way,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Other","Image data,Video data,Other",Rarely,,"Evolutionary Approaches,Neural Networks,Regression/Logistic Regression,RNNs","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Evolutionary Approaches,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,,Most of the time,,,Most of the time,,,Most of the time,,,,,,Often,,,,Most of the time,Often,,,,Most of the time,,Most of the time,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,45,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Denmark,30,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Computer Science,,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,SQL,I don't plan on learning a new ML/DS method,Python,"GitHub,Google Search","Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,Somewhat useful,,Somewhat useful,,,,Somewhat useful,"Data Machina Newsletter,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Operations Research Practitioner,Predictive Modeler",Self-taught,70,10,20,0,0,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,I haven't started working yet,University courses,20,20,0,40,20,0,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,"Not employed, but looking for work",,,,,,,,TensorFlow,Support Vector Machines (SVM),Java,"GitHub,Google Search","Blogs,Kaggle,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,,,,,Very useful,"FlowingData Blog,No Free Hunch Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,0,70,10,10,0,10,Computer Vision,Neural Networks - CNNs,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,Python,Deep learning,R,,"Friends network,Kaggle,Personal Projects,Stack Overflow Q&A",,,,,,Somewhat useful,Very useful,,,,,Very useful,,Very useful,,,,,"KDnuggets Blog,No Free Hunch Blog",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Kaggle Competitions,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,40,0,20,0,Time Series,Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",20+,Somewhat important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important +Male,Iran,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Engineer,Programmer",Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,37,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,Microsoft Azure Machine Learning,Deep learning,Python,University/Non-profit research group websites,"Kaggle,Stack Overflow Q&A",,,,,,,Very useful,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Data Miner,Machine Learning Engineer,Researcher",University courses,0,60,20,10,10,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Ensemble Methods,Neural Networks,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,Spark / MLlib,SQL,TensorFlow",,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,Often,,,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Often,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Neural Networks,SVMs,Time Series Analysis",,,,,,Often,Often,,Often,,,,,,,Sometimes,,,,Sometimes,,,,,,,,Often,,Often,,,,40,20,30,10,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Organization is small and cannot afford a data science team",,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,100% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,350000,CNY,Has decreased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Germany,35,Employed part-time,,,Yes,,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,10,90,0,0,"Computer Vision,Speech Recognition","Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,Fewer than 10 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,23,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Scientist,Programmer",Self-taught,5,0,0,0,90,5,Recommendation Engines,Decision Trees - Gradient Boosted Machines,High school,Internet-based,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Rarely,100GB,Gradient Boosted Machines,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R",,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines",Sometimes,,,,,Often,Often,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,40,10,0,5,5,40,Enough to refine and innovate on the algorithm,Privacy issues,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,Less than a year,"Business Analyst,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests",Primary/elementary school,Other,"10,000 or more employees",Decreased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Cloudera,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Somewhat useful,,,,Very useful,,,,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",50,0,25,25,0,0,Natural Language Processing,"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,Fewer than 10 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Always,10GB,"Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Perl,Python,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Segmentation",,,,,,Sometimes,Sometimes,,,,,,,,,Often,,,Most of the time,Most of the time,,,,,,Often,,,,,,,,30,20,20,15,15,0,Enough to run the code / standard library,"Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,,,,Often,,,,,,,,,,Most of the time,,,,10-25% of projects,More internal than external,Central Insights Team,"MNIST CIFAR EUROPARL",find the best parmeter,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Sometimes,35000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Netherlands,33,Employed full-time,,,Yes,,Data Miner,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Engineer,Machine Learning Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Programmer,Poorly,Employed by government,,,Julia,GitHub,YouTube Videos,,,,,,,,,,,,,,,,,,Very useful,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,A social science,More than 10 years,Researcher,University courses,50,45,5,0,0,0,Machine Translation,Markov Logic Networks,,Academic,20 to 99 employees,Decreased slightly,3-5 years,I visited the company's Web site and found a job listing there,Not at all important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Video data,Rarely,10MB,Bayesian Techniques,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Evolutionary Approaches,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,20,80,0,0,0,0,Enough to tune the parameters properly,Limitations in the state of the art in machine learning,,,,,,,,,,,,Often,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Subversion,Never,,,I am not currently employed,2,,,,,,,,,,,,,,,,,, +Female,Spain,26,Employed part-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,10,50,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Military/Security,500 to 999 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,32,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,60,5,10,0,0,Computer Vision,"Decision Trees - Random Forests,Neural Networks - CNNs",,Financial,"10,000 or more employees",Decreased slightly,6-10 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,Sometimes,100MB,Other,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Computer Scientist,DBA/Database Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,20,30,0,30,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Researcher,Other",Self-taught,100,0,NA,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Government,"10,000 or more employees",Increased slightly,Don't know,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Relational data,Other",Never,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,Often,,Sometimes,,,,Often,,,,,,,Rarely,,Rarely,,,,,,,,,,Sometimes,,,,Often,,Most of the time,,,,,,,,,Sometimes,,,,Rarely,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Random Forests",,,,,,,Often,Often,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,70,5,0,10,15,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,34,Employed full-time,,,Yes,,Data Analyst,,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Other,Self-taught,80,10,0,5,5,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Microsoft Excel Data Mining,R,RapidMiner (free version)",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Gradient Boosted Machines",,,Often,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,,,50,10,10,5,25,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Netherlands,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,50,30,10,10,0,0,Recommendation Engines,"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Academic,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Other","Image data,Text data,Relational data,Other",Most of the time,,,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,NoSQL,Perl,Python,R,SQL,Unix shell / awk",,Rarely,,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,Rarely,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,Most of the time,,,,"A/B Testing,Data Visualization,Recommender Systems,Other",Often,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,20,20,5,5,50,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Inability to integrate findings into organization's decision-making process,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,,Self-taught,50,10,20,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A professional degree,Academic,I don't know,,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Portugal,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,3 to 5 years,"Data Scientist,Predictive Modeler,Researcher",University courses,50,10,10,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Neural Networks - GANs",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Evolutionary Approaches,Neural Networks","Java,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,RapidMiner (free version),SQL",,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Evolutionary Approaches,Logistic Regression,Neural Networks,Prescriptive Modeling,Segmentation,Simulation,Time Series Analysis",Sometimes,Sometimes,,,,,Sometimes,,,Sometimes,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,Sometimes,Sometimes,,,Sometimes,,,,50,10,0,30,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Programmer,Software Developer/Software Engineer",Other,10,10,1,0,0,79,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Internet-based,100 to 499 employees,Decreased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,19,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,37,Employed full-time,,,No,Yes,Researcher,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,31,Employed full-time,,,Yes,,Machine Learning Engineer,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,31,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",35,40,20,5,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A doctoral degree,Academic,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation","Image data,Text data",,10MB,CNNs,"C/C++,Python,TensorFlow",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,65,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Spark / MLlib,Deep learning,Python,Google Search,"Conferences,Tutoring/mentoring,YouTube Videos",,,,,Somewhat useful,,,,,,,,,,,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Software Developer/Software Engineer",Self-taught,50,10,20,10,10,0,"Adversarial Learning,Machine Translation,Natural Language Processing","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs,Neural Networks - GANs",,Retail,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,Researcher,University courses,50,10,20,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Non-profit,"1,000 to 4,999 employees",Increased slightly,Don't know,Some other way,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Text data,Sometimes,10MB,"Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Rarely,,Often,,,,"Cross-Validation,Logistic Regression,Natural Language Processing,SVMs,Text Analytics",,,,,,Most of the time,,,,,,,,,,Often,,,Most of the time,,,,,,,,,Most of the time,Most of the time,,,,,20,10,30,10,30,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects",Often,,,,,,,,Sometimes,,Sometimes,Sometimes,,Sometimes,,,,,,,,,51-75% of projects,More external than internal,Other,,Ethics,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Bitbucket,Git",Sometimes,,,Has increased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,Russia,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,Software Developer/Software Engineer,University courses,0,10,60,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,27,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Programmer,Work,0,0,100,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning",Decision Trees - Random Forests,,Technology,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),,,,Neural Networks,NoSQL,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,0,0,10,0,,,A bachelor's degree,Technology,"10,000 or more employees",,,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,32,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,"Computer Scientist,Programmer,Researcher",Self-taught,70,0,30,0,0,0,"Adversarial Learning,Natural Language Processing,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",High school,Academic,100 to 499 employees,Increased slightly,6-10 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",25,25,50,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Academic,10 to 19 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Image data,Video data",Never,10GB,CNNs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks,Segmentation",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Dirty data,Lack of significant domain expert input",,,Most of the time,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,20,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Deep learning,Matlab,"Google Search,Government website","College/University,Conferences",,,Very useful,,Very useful,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,5-10 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,"Traditional Workstation,Workstation + Cloud service",40+,PhD,Yes,Doctoral degree,Computer Science,,Computer Scientist,University courses,NA,NA,NA,NA,NA,NA,Speech Recognition,"Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,,,,,,,,,,Very Important,,,,,, +Male,Other,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Predictive Modeler,Researcher",Self-taught,40,10,40,10,0,0,"Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Academic,I don't know,Increased slightly,More than 10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,26,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,University courses,40,40,0,15,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Other,"10,000 or more employees",,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Rarely,1GB,Neural Networks,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,IBM SPSS Modeler,I don't plan on learning a new ML/DS method,Python,Other,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Machina Newsletter,Data Stories Podcast,R Bloggers Blog Aggregator",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Necessary,,,,,Basic laptop (Macbook),40+,Kaggle Competitions,Yes,Some college/university study without earning a bachelor's degree,Other,,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A master's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,40,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Engineer,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,45,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Predictive Modeler,Programmer",University courses,90,0,0,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,20 to 99 employees,Decreased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Engineer,Machine Learning Engineer,Operations Research Practitioner,Programmer,Researcher",University courses,50,5,40,5,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Hospitality/Entertainment/Sports,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Image data,Don't know,10GB,"CNNs,Ensemble Methods,Evolutionary Approaches,Neural Networks,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,56,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,Friends network,Kaggle,Online courses,Stack Overflow Q&A,Tutoring/mentoring",Very useful,Somewhat useful,,,,Very useful,Very useful,,,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,3-5 years,Nice to have,Unnecessary,Nice to have,,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,edX,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,10,0,0,60,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,,,,,,,,,,,,,,,, +Male,Germany,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Physics,3 to 5 years,"Engineer,Software Developer/Software Engineer",Self-taught,60,10,30,0,0,0,Natural Language Processing,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,,,,,Not at all important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","NoSQL,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",15,80,0,5,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Genetic & Evolutionary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle",,,Very useful,,,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Necessary,Nice to have,Nice to have,Unnecessary,Unnecessary,,,,,Traditional Workstation,2 - 10 hours,PhD,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,,"Statistician,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,33,Employed full-time,,,No,Yes,Other,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,24,Employed part-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,30,40,10,0,0,,,A master's degree,Pharmaceutical,10 to 19 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers",Text data,,100GB,,"MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,Often,Often,,,,,,,,Rarely,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",0,70,10,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Technology,20 to 99 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Most of the time,100MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",Often,,,,,Often,Most of the time,Often,,,,,,,,,,Often,Most of the time,Most of the time,Most of the time,Often,Often,Often,,Often,,Often,Often,,,,,60,10,10,0,20,0,Enough to explain the algorithm to someone non-technical,"Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed part-time,,,No,Yes,Other,Fine,,R,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Very useful,,,,,,,,,,,,,3-5 years,Necessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Kaggle Competitions,Yes,Bachelor's degree,Management information systems,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",35,35,0,0,30,0,,,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,United States,42,"Not employed, and not looking for work",No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,22,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,25,0,25,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,25,Employed part-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,3 to 5 years,Programmer,Work,20,0,70,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,0,40,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Hospitality/Entertainment/Sports,,,,,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100MB,"CNNs,GANs,Neural Networks,RNNs,SVMs","Amazon Machine Learning,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,TensorFlow",Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Decision Trees,GANs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,Sometimes,Sometimes,,Sometimes,,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,Less than a year,"Business Analyst,Other",Self-taught,50,20,25,0,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,Engineer,Self-taught,80,0,20,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning",Ensemble Methods,A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Anomaly Detection,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,25,0,0,25,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Gradient Boosting,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Internet-based,,,,,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Rarely,10GB,"CNNs,Gradient Boosted Machines,Neural Networks","MATLAB/Octave,Python,SQL,Statistica (Quest/Dell-formerly Statsoft)",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,,,,,,,,,,Often,,Rarely,,,,,,,,"CNNs,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction",,,,Sometimes,,,,,,,,Sometimes,,Often,,Sometimes,,,,Sometimes,Sometimes,,,,,,,,,,,,,10,3,1,2,4,80,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",Often,,,,,,,,Most of the time,Sometimes,,,,,,Most of the time,,,,,,,100% of projects,Entirely internal,IT Department,All proprietary,"Although all data comes formatted, each supplier uses a different format that is not well specified and sometimes buggy",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Git,Sometimes,,,I do not want to share information about my salary/compensation,4,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,45,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,20,0,0,60,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs",A bachelor's degree,Financial,,,,,Very important,Other,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"Neural Networks,Random Forests",,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Researcher,Statistician",University courses,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",A bachelor's degree,Academic,20 to 99 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",5,20,5,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs",A bachelor's degree,Technology,Fewer than 10 employees,Decreased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,10GB,"Bayesian Techniques,HMMs,Regression/Logistic Regression","Amazon Web services,Julia,Jupyter notebooks,Python,QlikView,R,Spark / MLlib,TensorFlow",,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,Often,Sometimes,Often,,,,,,,,Often,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,kNN and Other Clustering,Time Series Analysis",,,Sometimes,,,Most of the time,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,20,30,10,25,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Organization is small and cannot afford a data science team",,,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A health science,6 to 10 years,Other,Self-taught,50,50,0,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Other,"10,000 or more employees",Increased slightly,1-2 years,Some other way,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Most of the time,10TB,"CNNs,GANs,Neural Networks,RNNs","Google Cloud Compute,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,70,20,5,4,1,0,Enough to tune the parameters properly,"Dirty data,Need to coordinate with IT",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,22,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Business Analyst,Data Analyst",Self-taught,40,10,30,15,5,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Technology,500 to 999 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,NA,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,University/Non-profit research group websites,"Arxiv,Blogs,Conferences,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,Very useful,,,,,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,1 to 2 years,Computer Scientist,University courses,10,20,60,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Support Vector Machines (SVMs)",A doctoral degree,Academic,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Never,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,IBM SPSS Statistics,Java,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,Often,,,,,,,,Rarely,,,Sometimes,,,,,,Sometimes,,,,Rarely,,,,,,Most of the time,,Often,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests",,Sometimes,Often,,,Often,,Often,Often,,,,,Often,,,,Often,Often,Often,,,Often,,,,,,,,,,,30,50,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,,,,Most of the time,,Most of the time,,,,Most of the time,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,40000,USD,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,DataRobot,Neural Nets,Python,"GitHub,Google Search",Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,"Data Elixir Newsletter,O'Reilly Data Newsletter,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Adversarial Learning,Bayesian Techniques,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,C/C++,Other,C/C++/C#,Other,"Non-Kaggle online communities,Official documentation,Online courses,YouTube Videos",,,,,,,,,Very useful,Very useful,Very useful,,,,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Markov Logic Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler",Self-taught,90,0,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping)","Arxiv,Kaggle,Personal Projects,YouTube Videos",Very useful,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,KDnuggets Blog,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Machine Learning Engineer,Software Developer/Software Engineer,Other",Work,70,0,20,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs",A master's degree,Technology,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,100MB,"CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks","Amazon Web services,NoSQL,Python,Tableau,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,"CNNs,Data Visualization,Decision Trees,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,Often,,,Most of the time,Often,,,,Most of the time,,,,,,,,Often,Most of the time,,Sometimes,,,,,,,Most of the time,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,Most of the time,,,,,,,,,Often,,,100% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,6000000,INR,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Canada,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,35,15,30,15,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A professional degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Regression/Logistic Regression,Other","Java,R,SQL",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Random Forests",Sometimes,Often,Often,,Often,,,Rarely,Sometimes,,,,,,,Often,,Rarely,,,,,Rarely,,,,,,,,,,,10,10,10,10,10,50,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,"Software Developer/Software Engineer,Other",Self-taught,50,20,30,0,0,0,Supervised Machine Learning (Tabular Data),Markov Logic Networks,A master's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Textbook,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",More than 10 years,"Computer Scientist,Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,0,20,0,0,0,,,"Some college/university study, no bachelor's degree",Telecommunications,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,1GB,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,50,0,10,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer,Statistician",Work,50,20,30,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Relational data",Most of the time,100GB,"CNNs,Ensemble Methods,Neural Networks,Random Forests,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",35,45,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Most of the time,,Decision Trees,"Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,kNN and Other Clustering,Random Forests",,,,,,,,Rarely,,,,,,Rarely,,,,,,,,,Rarely,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Business Analyst,Self-taught,30,10,60,0,0,0,Recommendation Engines,Logistic Regression,High school,Technology,500 to 999 employees,Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Always,10TB,Regression/Logistic Regression,"Amazon Web services,Java,Python,SQL,Unix shell / awk",,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,Often,,,,"Collaborative Filtering,Data Visualization,Recommender Systems,Segmentation",,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,20,30,30,0,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,Less than 10% of projects,,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",I don't typically share data,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Researcher,University courses,20,30,0,50,0,0,,,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",50,25,NA,25,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,40,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Other,University courses,30,40,0,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,I haven't started working yet,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,Fewer than 10 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Neural Networks,Random Forests,SVMs","Python,R,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Sometimes,,Most of the time,Sometimes,,,"Bayesian Techniques,Neural Networks,Random Forests,SVMs",,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,Rarely,,,,,,100,0,0,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,I prefer not to say,Lack of funds to buy useful datasets from external sources",,,,Often,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Computer Scientist,Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,30,0,10,50,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Ensemble Methods,Logistic Regression",A bachelor's degree,Non-profit,100 to 499 employees,,,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,75,5,5,10,0,"Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,Regression/Logistic Regression,"C/C++,MATLAB/Octave",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Text Analytics,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,35,10,5,45,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,24,Employed part-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,R,Genetic & Evolutionary Algorithms,R,GitHub,"Blogs,College/University,Friends network,Kaggle,Textbook,YouTube Videos",,Very useful,Very useful,,,Very useful,Very useful,,,,,,,,Very useful,,,Very useful,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,Programmer,University courses,0,20,0,78,2,0,Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,Self-taught,60,0,0,40,0,0,Machine Translation,Neural Networks - CNNs,A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Kaggle,Personal Projects",,,,,,,Very useful,,,,,Very useful,,,,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,I haven't started working yet","Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,1 to 2 years,"Data Analyst,Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,10TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,Tableau,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,Often,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Most of the time,,Often,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Text Analytics,Time Series Analysis",Often,,,,Often,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Rarely,,,Often,Sometimes,Most of the time,,,,,,Sometimes,Often,,,,30,20,10,30,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Entirely internal,Standalone Team,,,,,,,,80000,RON,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Engineer,University courses,0,0,0,100,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Neural Networks - CNNs",A professional degree,Telecommunications,500 to 999 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10MB,Decision Trees,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,35,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,"Engineer,Programmer",Self-taught,40,20,0,15,25,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Computer Scientist,Engineer,Programmer,Researcher",Self-taught,50,30,10,10,0,0,,,"Some college/university study, no bachelor's degree",Financial,500 to 999 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,,,,,,,"Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,40,20,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,Often,Often,,,,Most of the time,,,,,,,Often,,Often,,,,,76-99% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Git,Other",Never,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,27,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,"Recommendation Engines,Reinforcement learning,Time Series",,,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,33,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,15,5,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,23,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",University courses,15,5,0,80,0,0,"Recommendation Engines,Reinforcement learning","Bayesian Techniques,Logistic Regression",,Technology,Fewer than 10 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Perfectly,"Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",60,30,10,0,0,0,,,A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,31,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",University courses,20,20,10,30,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"10,000 or more employees",Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,1GB,Decision Trees,"IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining",,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Text Analytics",,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,25,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Researcher,University courses,78,5,10,5,2,0,Other (please specify; separate by semi-colon),Neural Networks - CNNs,A doctoral degree,Academic,100 to 499 employees,Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Image data,Text data",Don't know,100GB,"CNNs,Ensemble Methods,Neural Networks","C/C++,IBM SPSS Statistics,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,TensorFlow",,,,Rarely,,,,,,,,Rarely,,,,,Rarely,,,Rarely,Sometimes,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,"CNNs,Data Visualization,Ensemble Methods,Neural Networks,SVMs",,,,Most of the time,,,Often,,Often,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,10,70,0,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Sometimes,,,Often,,,,,,Rarely,Sometimes,,Often,,Sometimes,,,Sometimes,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,49,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Management information systems,3 to 5 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,22,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,30,0,45,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased slightly,Don't know,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,1GB,"CNNs,Decision Trees,Neural Networks","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,SQL,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,Most of the time,,,,,,"CNNs,Logistic Regression,Natural Language Processing,Neural Networks",,,,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,,,,,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Scaling data science solution up to full database",,Sometimes,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,76-99% of projects,Approximately half internal and half external,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Reinforcement learning,Neural Networks - RNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,40,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,10,0,20,70,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",,3-5 years,An external recruiter or headhunter,Important,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,,,"Java,NoSQL,Perl,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,Often,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Ensemble Methods,Evolutionary Approaches,Lift Analysis,Naive Bayes",,,,,,Often,,,Sometimes,Sometimes,,,,,Often,,,Sometimes,,,,,,,,,,,,,,,,0,0,100,0,0,0,Enough to explain the algorithm to someone non-technical,"Need to coordinate with IT,Privacy issues",,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Self-employed,Python,Neural Nets,Python,I collect my own data (e.g. web-scraping),Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Engineer,Other",Self-taught,80,20,0,0,0,0,Survival Analysis,Logistic Regression,A bachelor's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,Employed part-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,40,10,15,35,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,I don't know,Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Kaggle",,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,,"Engineer,Other,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,24,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by college or university,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Podcasts,Stack Overflow Q&A",,,,,,,Somewhat useful,,,,,,Not Useful,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",30,25,40,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Financial,I prefer not to answer,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,1GB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SAS Base,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,,Sometimes,,,,Often,,,,,,Sometimes,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,Sometimes,,,Sometimes,Often,,,,Rarely,,Sometimes,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",,Rarely,Rarely,,Sometimes,Most of the time,,Often,Most of the time,,,Most of the time,,Often,Rarely,Often,,Sometimes,Sometimes,,Sometimes,,Most of the time,Sometimes,,,,,Often,Often,,,,45,35,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",Rarely,,,,,,,,Rarely,,,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Statistician",Self-taught,50,20,10,0,20,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,C/C++,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Textbook",,Somewhat useful,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,"Researcher,Statistician",University courses,0,15,50,30,5,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,42,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer",Work,30,20,30,10,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Financial,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Amazon Machine Learning,Jupyter notebooks,R,SAS Base,SAS Enterprise Miner,TensorFlow,Other,Other",Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,Often,Often,,,,,,,Often,,,Often,Often,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs",,,,,,Often,Often,Often,Sometimes,,,Often,,,,Often,,,,Often,Often,,Often,Often,Sometimes,Often,Often,Often,,,,,,60,10,20,10,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Limitations of tools",,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler",Work,20,20,40,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Financial,"1,000 to 4,999 employees",Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,27,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,25,15,0,40,20,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs",A doctoral degree,Other,I don't know,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Always,1GB,Regression/Logistic Regression,"Java,NoSQL,Python,QlikView,SQL,Tableau",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Often,Sometimes,,,,,,,,,,Often,,,Often,,,,,,,"Data Visualization,Decision Trees,Naive Bayes,Segmentation",,,,,,,Most of the time,Sometimes,,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,60,10,20,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,37,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,26,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,30,40,0,10,20,0,Natural Language Processing,,A master's degree,Technology,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Never,100MB,,"Cloudera,Hadoop/Hive/Pig,Impala,Java,NoSQL,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,Sometimes,,,,Most of the time,,,,,Sometimes,Most of the time,,,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,Sometimes,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,30,0,10,10,50,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,60,0,10,10,0,Natural Language Processing,"Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,24,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,46,Employed full-time,,,Yes,,Programmer,Fine,"Employed by college or university,Employed by non-profit or NGO,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",20,30,0,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Decreased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Jupyter notebooks,NoSQL,Python,R,SAS Enterprise Miner,SQL,Tableau",Rarely,Often,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,Sometimes,,,Often,,,Most of the time,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,40,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,41,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,Java,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,,,A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Text data,,,,"Java,Microsoft Azure Machine Learning,Python,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Recommender Systems",,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,100% of projects,,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",,,Subversion,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,20,40,10,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",,Technology,100 to 499 employees,Increased significantly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Machine Translation,,A bachelor's degree,Telecommunications,"10,000 or more employees",Decreased significantly,3-5 years,An external recruiter or headhunter,Important,,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,,,"Java,SQL,Unix shell / awk",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,"Simulation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Data Scientist,Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,30,20,0,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",High school,CRM/Marketing,500 to 999 employees,Decreased significantly,1-2 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,Less than a year,"Business Analyst,Data Analyst,Researcher",Kaggle competitions,40,15,10,15,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,10,50,0,40,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Time Series,Unsupervised Learning",Decision Trees - Random Forests,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Google Search,"Kaggle,Newsletters,Online courses,Podcasts,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,Very useful,,,Very useful,,Somewhat useful,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression",High school,Manufacturing,"10,000 or more employees",,3-5 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Traditional Workstation",Text data,Rarely,100MB,"Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Python,QlikView,SQL,Tableau",,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Often,,,Rarely,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,0,5,5,90,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,,,,,,,,,,,,Often,,,100% of projects,Entirely external,IT Department,,Cleaning and Transformation,Other,I don't typically share data,,Other,Never,,INR,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,40,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Financial,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",Rarely,Most of the time,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,,Sometimes,,,,,,"A/B Testing,Association Rules,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs",Sometimes,Sometimes,,,Sometimes,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,,,Most of the time,Sometimes,,Often,Most of the time,Most of the time,,,,,,30,50,10,5,5,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Most of the time,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,Predictive Modeler,"Online courses (coursera, udemy, edx, etc.)",70,5,0,20,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Workstation + Cloud service",Relational data,Rarely,100MB,"Ensemble Methods,Gradient Boosted Machines","Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),R,RapidMiner (free version),Other",,,,,,,,,,,,,,,,,Sometimes,,,,,,,Most of the time,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,Sometimes,"A/B Testing,Cross-Validation,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Rarely,,,,,Most of the time,,,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,Often,,,,,,,Often,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Often,,,Sometimes,,,,,,,,,,,,,Sometimes,,51-75% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,46,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Vietnam,28,Employed part-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,50,20,10,5,5,10,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Markov Logic Networks,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Deep learning,Python,GitHub,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,25,70,5,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Researcher,Other",University courses,50,5,0,10,0,35,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,100 to 499 employees,Increased significantly,3-5 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Perl,Python,QlikView,R,SQL",,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Rarely,Rarely,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Simulation,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Most of the time,Rarely,Often,,,Often,,,,Most of the time,,,,,,,Most of the time,,,,Most of the time,,Sometimes,Often,,,,60,15,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by company that makes advanced analytic software,TensorFlow,,,,"Personal Projects,Podcasts",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Data Analyst,Data Miner,Machine Learning Engineer,Programmer,Other",Work,25,25,50,0,0,0,Computer Vision,Bayesian Techniques,I prefer not to answer,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,45,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,10,0,20,0,,,,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",University courses,0,0,25,75,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,CRM/Marketing,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,54,Employed full-time,,,Yes,,Data Miner,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,More than 10 years,"Predictive Modeler,Programmer","Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",,Government,Fewer than 10 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Sometimes,1GB,"Bayesian Techniques,Gradient Boosted Machines,Neural Networks","C/C++,Julia,Jupyter notebooks,MATLAB/Octave,Stan",,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,70,20,10,0,0,0,"Recommendation Engines,Survival Analysis,Time Series",,A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,"Engineer,Machine Learning Engineer,Programmer",Work,20,20,60,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Always,1GB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Java,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Predictive Modeler",University courses,10,10,40,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,100 to 499 employees,Increased slightly,1-2 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Python,R,SQL,Statistica (Quest/Dell-formerly Statsoft),TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,Often,,,Sometimes,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,SVMs",,Sometimes,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,Often,,,,,Sometimes,,Most of the time,,,,,Most of the time,,,,,,25,10,25,15,25,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,Often,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Russia,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,3 to 5 years,"Researcher,Other",University courses,40,20,20,20,0,0,"Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,34,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests","Cloudera,Hadoop/Hive/Pig,Python,R,SQL",,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Text Analytics",,,,,,Often,Sometimes,Sometimes,,,,,,Sometimes,,Often,,,Rarely,Often,Sometimes,,Sometimes,,,Sometimes,,,Sometimes,,,,,70,12,12,1,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Limitations of tools",Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",50,40,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,I prefer not to answer,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,"Not employed, but looking for work",,,,,,,,R,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,,,,,Somewhat useful,,Somewhat useful,Very useful,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,I haven't started working yet,University courses,30,5,0,40,25,0,"Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,17,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Mathematica,I don't plan on learning a new ML/DS method,C/C++/C#,Google Search,Non-Kaggle online communities,,,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Talking Machines Podcast,The Data Skeptic Podcast",< 1 year,,,,,,,,,,,,,,,,,Online Courses and Certifications,No,I prefer not to answer,Engineering (non-computer focused),,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Speech Recognition,Other (please specify; separate by semi-colon)",,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",Somewhat useful,Somewhat useful,Somewhat useful,Somewhat useful,,,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,,Very useful,Very useful,,,,"No Free Hunch Blog,R Bloggers Blog Aggregator,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,,University courses,15,5,25,50,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,100 to 499 employees,Increased significantly,6-10 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Text data,Other",Rarely,10MB,"Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,Often,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,Often,,Often,,,,,,Often,Sometimes,,Often,,Often,,Sometimes,Often,,Most of the time,,,,40,30,0,20,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,,,,,,Sometimes,,Sometimes,,,,,,,Often,Often,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Rarely,24000,EUR,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Female,Singapore,54,Retired,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,"Business Analyst,Researcher,Software Developer/Software Engineer,Other",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Other (please specify; separate by semi-colon)",Decision Trees - Random Forests,A master's degree,Other,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data,Other",Sometimes,10GB,"Decision Trees,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Miner,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Work,30,10,60,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,26,Employed full-time,,,Yes,,Other,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,10,10,20,20,40,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Predictive Modeler,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,Programmer,Self-taught,100,0,0,0,0,0,,Logistic Regression,A master's degree,Other,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,,Regression/Logistic Regression,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,50,30,0,0,20,0,Enough to refine and innovate on the algorithm,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,Spark / MLlib,Proprietary Algorithms,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Podcasts,YouTube Videos",,,,,,,,,,,Very useful,Very useful,Somewhat useful,,,,,Very useful,"Becoming a Data Scientist Podcast,FastML Blog,Talking Machines Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,"Computer Scientist,DBA/Database Engineer,Software Developer/Software Engineer",Work,60,0,30,10,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Other,,,,,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Rarely,1GB,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Machine Learning,Amazon Web services,NoSQL,Python,R",Rarely,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Natural Language Processing,Random Forests,Recommender Systems,Text Analytics",,,Sometimes,,,,Often,Often,,,,,,Sometimes,,,,,Often,,,,Sometimes,Often,,,,,Often,,,,,30,20,10,30,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,Most of the time,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Git,Subversion",Most of the time,0,INR,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Miner",University courses,0,50,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,I don't write code to analyze data,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Technology,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,"Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Random Forests,Time Series Analysis",,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Rarely,,,,80,0,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team",,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,70,20,0,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Financial,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Rarely,10MB,"Decision Trees,Random Forests","Microsoft Excel Data Mining,Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,Sometimes,,,,,Often,,,,Often,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Segmentation",,,,,,,Often,Sometimes,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,,,,40,20,5,20,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,Unavailability of/difficult access to data",,,,,Often,,,,,,Often,,,,,,,,,,Often,,51-75% of projects,Entirely internal,IT Department,CIPC;census,We use client data and struggle to get good quality client data often,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,635000,ZAR,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,India,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,Other,Self-taught,50,25,0,0,0,25,Time Series,"Bayesian Techniques,Logistic Regression",A master's degree,Other,,,,,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Other,,<1MB,Bayesian Techniques,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,0,0,0,0,0,100,Enough to explain the algorithm to someone non-technical,Other,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,27,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,MATLAB/Octave,Neural Nets,Matlab,GitHub,"College/University,Other",,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,36,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Cluster Analysis,Python,Google Search,College/University,,,Very useful,,,,,,,,,,,,,,,,,< 1 year,,,,,,,,,,,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Master's degree,No,Bachelor's degree,Computer Science,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Recommendation Engines,Logistic Regression,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,"Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,20,10,50,0,20,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Company internal community,Conferences,Kaggle",,,,Very useful,Very useful,,Somewhat useful,,,,,,,,,,,,,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,20,10,0,0,"Natural Language Processing,Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important +Male,Other,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Psychology,1 to 2 years,Researcher,Self-taught,80,0,0,0,0,20,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",High school,Academic,I don't know,Stayed the same,Don't know,Some other way,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Workstation + Cloud service",Text data,Don't know,,,"IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,Sometimes,Most of the time,Rarely,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,Sometimes,,,,,,,,,,Sometimes,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Natural Language Processing,Text Analytics",,,Rarely,,,,Most of the time,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,Most of the time,,,,,20,20,30,10,20,0,Enough to tune the parameters properly,"Lack of funds to buy useful datasets from external sources,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,Most of the time,,,,,,,Most of the time,Most of the time,,26-50% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,55,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,64,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,Business Analyst,Work,40,20,40,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer",University courses,0,25,25,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Technology,"5,000 to 9,999 employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,Python",,,,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,40,0,40,20,0,0,"Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,,,A general-purpose job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Always,10GB,,"Amazon Web services,C/C++,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,"Data Scientist,Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Researcher",Self-taught,50,50,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",No education,Pharmaceutical,10 to 19 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Cloudera,Hadoop/Hive/Pig,Impala,Java,Microsoft Azure Machine Learning,NoSQL,Python,R,SAS Enterprise Miner,Spark / MLlib,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",100,0,0,0,0,0,Natural Language Processing,Bayesian Techniques,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,"Computer Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,2,10,38,0,0,"Unsupervised Learning,Other (please specify; separate by semi-colon)","Ensemble Methods,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",Primary/elementary school,Academic,10 to 19 employees,Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed part-time,,,No,Yes,Programmer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,0,20,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Software Developer/Software Engineer",Work,50,0,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),6 to 10 years,"Data Analyst,Data Scientist,Statistician",Work,20,0,80,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Decreased slightly,More than 10 years,A general-purpose job board,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Other","Text data,Relational data",Always,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,,Often,,Often,Often,,Sometimes,Often,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,,,Often,Often,Often,Often,,,Sometimes,,,Sometimes,Most of the time,,,,,,,Often,,,Most of the time,Often,Often,Sometimes,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Data Analyst,Data Miner,Programmer,Software Developer/Software Engineer",Self-taught,50,40,10,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Engineer",Work,50,50,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Ensemble Methods,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,I don't write code to analyze data,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,51,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Non-Kaggle online communities,Personal Projects,Stack Overflow Q&A,Textbook",Very useful,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,,"FlowingData Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,6 to 10 years,"Operations Research Practitioner,Predictive Modeler,Researcher",University courses,0,50,0,50,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning",Logistic Regression,A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Never,100MB,Regression/Logistic Regression,"Amazon Web services,Google Cloud Compute,Julia,Jupyter notebooks,MATLAB/Octave,R",,Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,75,0,0,15,10,0,Enough to refine and innovate on the algorithm,"Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,Less than 10% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,,,Has increased 20% or more,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,No,Yes,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Other,University courses,40,0,0,60,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,46,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,40,30,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A master's degree,Retail,"5,000 to 9,999 employees",Decreased slightly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Most of the time,100TB,"CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,NoSQL,Python,QlikView,R,Spark / MLlib,TensorFlow",,Most of the time,,Sometimes,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,Often,,,,Most of the time,Sometimes,Sometimes,,,,,,,,Often,,,,,Sometimes,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs",Most of the time,,,,Often,Sometimes,Often,Often,,,,,,Often,,Sometimes,,,Most of the time,Often,,,Often,Most of the time,,Sometimes,Often,Rarely,,,,,,70,15,10,4,1,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Often,Most of the time,,Sometimes,Often,,,,Sometimes,,,,,Often,,,,Sometimes,Sometimes,,,,10-25% of projects,More internal than external,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,,Rarely,600000,BRL,Has increased between 6% and 19%,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Computer Scientist,,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Programmer",University courses,10,20,0,70,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,30,30,0,40,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",25,25,20,20,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Internet-based,20 to 99 employees,Increased significantly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,10,10,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Telecommunications,"10,000 or more employees",Decreased significantly,3-5 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Rarely,10MB,,"Jupyter notebooks,Microsoft Excel Data Mining,Python,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,23,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,0,10,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,Fewer than 10 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,,,Other,"MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,Prescriptive Modeling,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,50,0,50,0,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization",,Sometimes,,,,Often,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Computer Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,0,30,0,0,0,Supervised Machine Learning (Tabular Data),,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,I don't write code to analyze data,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,,Self-taught,35,40,10,10,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Retail,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Always,,,"Amazon Web services,Java,Unix shell / awk",,Often,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,60,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Recommendation Engines,Unsupervised Learning","Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,,,,,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R",,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Recommender Systems",,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,60,20,0,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,51-75% of projects,Entirely internal,IT Department,N/A,Cleaning and normalizing it.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Git,Most of the time,"85,000",USD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Italy,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Operations Research Practitioner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Relational data,Rarely,100MB,"Bayesian Techniques,Decision Trees,HMMs,Neural Networks","Amazon Machine Learning,C/C++,Hadoop/Hive/Pig,Microsoft Azure Machine Learning,NoSQL,Python,R,SQL,Tableau,TensorFlow",Rarely,,,Often,,,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,Rarely,Rarely,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Simulation",Often,Often,Sometimes,,Often,Often,Often,Sometimes,,,,,,Often,,Often,Sometimes,Sometimes,,Sometimes,Often,,,Often,,,Often,,,,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,50,40,0,10,0,0,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Other,"1,000 to 4,999 employees",Decreased slightly,Don't know,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,"GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,,Very useful,Very useful,Siraj Raval YouTube Channel,1-2 years,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,"Coursera,Udacity",GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,"Data Scientist,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Neural Nets,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Data Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,50,0,30,20,0,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,500 to 999 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,10GB,"CNNs,SVMs","Amazon Web services,Python,SQL,TensorFlow",,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"A/B Testing,CNNs,Data Visualization,Natural Language Processing,Neural Networks,Recommender Systems,Text Analytics",Sometimes,,,Most of the time,,,Often,,,,,,,,,,,,Most of the time,Most of the time,,,,Often,,,,,Most of the time,,,,,30,30,30,0,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,,,,,,,,,,,Often,,Often,,Often,Often,Often,,Less than 10% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,,ILS,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,40,15,15,10,0,Time Series,"Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Other,Text data,,,,"Jupyter notebooks,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,31,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,60,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Neural Networks - RNNs",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",Often,Sometimes,,,,,Most of the time,Often,,,,,,Often,,,,Sometimes,,Often,Often,,,,,,,,,Most of the time,,,,30,10,5,10,20,25,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,23,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Neural Nets,Python,I collect my own data (e.g. web-scraping),"Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Very useful,,Very useful,,Very useful,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,Very useful,Siraj Raval YouTube Channel,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",11 - 39 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Natural Language Processing,Speech Recognition,Other (please specify; separate by semi-colon)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,31,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Business Analyst,Self-taught,50,50,0,0,0,0,Time Series,,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,An external recruiter or headhunter,Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Sometimes,10MB,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",6 to 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Machine Learning Engineer,Researcher",University courses,20,10,25,40,5,0,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Don't know,1GB,"Bayesian Techniques,Neural Networks,SVMs","C/C++,IBM Watson / Waton Analytics,Jupyter notebooks,Mathematica,MATLAB/Octave,R,TensorFlow",,,,Often,,,,,,,,,Sometimes,,,,Often,,,Sometimes,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Logistic Regression,Neural Networks,Segmentation",,,Sometimes,Most of the time,,,,,,,,,,,,Often,,,,Most of the time,,,,,,Often,,,,,,,,30,30,10,10,20,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,3 to 5 years,Other,Work,40,10,30,10,10,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Ensemble Methods,A bachelor's degree,Academic,100 to 499 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Amazon Machine Learning,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Blogs,College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,Not Useful,Very useful,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Very useful,,Very useful,Somewhat useful,,,Very useful,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Scientist,Engineer",University courses,20,50,10,20,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Financial,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Cloudera,Python,R,SQL,Tableau",,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Often,,,Sometimes,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Often,,Often,,Often,,,Often,Often,Most of the time,Sometimes,,Often,Often,Often,Sometimes,Sometimes,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Often,Sometimes,,,Often,,Sometimes,,Often,Sometimes,,,,Sometimes,,Often,,,100% of projects,More internal than external,Business Department,Merkle,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,112000,,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Israel,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,0,10,30,60,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Reinforcement learning,Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,39,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Researcher,Statistician",Self-taught,30,10,40,20,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",Primary/elementary school,Government,"1,000 to 4,999 employees",,,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Other","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Mathematica,MATLAB/Octave,Microsoft Excel Data Mining,Minitab,Oracle Data Mining/ Oracle R Enterprise,Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,63,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",,Academic,20 to 99 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,CNNs,Random Forests","Java,Python,SQL,Other",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,"Bayesian Techniques,CNNs,Random Forests,Segmentation,Simulation",,,Often,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,Sometimes,,,,,,,0,35,0,10,55,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Lack of significant domain expert input,Limitations in the state of the art in machine learning,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,,,,,,,,,Most of the time,Often,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,28,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Time Series","Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Other,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data",Rarely,1GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,R,SQL,Tableau,Unix shell / awk",,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Sometimes,,,Often,,,Sometimes,,,,"A/B Testing,Segmentation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",60,20,0,20,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",10,50,30,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,Other,Self-taught,25,0,40,35,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,39,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Management information systems,More than 10 years,Predictive Modeler,Work,50,0,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,33,Employed part-time,,,No,Yes,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,17,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,Researcher,University courses,30,10,0,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer,Other",Self-taught,50,20,30,0,0,0,"Adversarial Learning,Survival Analysis,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Evolutionary Approaches",High school,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,35,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,0,50,30,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,20,40,0,20,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,University courses,0,0,50,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,39,Employed full-time,,,No,Yes,Other,,Employed by government,Python,Cluster Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,Kaggle,Official documentation,Online courses,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,Somewhat useful,Somewhat useful,,Somewhat useful,Very useful,Very useful,,,Very useful,KDnuggets Blog,< 1 year,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,"Coursera,edX,Other",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,,,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,United States,27,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,Time Series,Logistic Regression,A master's degree,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Portugal,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist",University courses,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Other",Self-taught,50,30,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,3 to 5 years,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Software Developer/Software Engineer",University courses,40,0,0,40,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Unsupervised Learning",,High school,Technology,100 to 499 employees,Stayed the same,More than 10 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service",Text data,Sometimes,100MB,"Bayesian Techniques,Neural Networks,SVMs","C/C++,Java,NoSQL,SQL",,,,Often,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Neural Networks,Simulation,SVMs",,Often,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,Sometimes,Sometimes,,,,,,30,20,30,20,0,0,Enough to tune the parameters properly,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",,,,,Most of the time,Sometimes,,,Often,,,,Sometimes,,,Most of the time,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,32,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,40,10,25,25,0,0,,,High school,Mix of fields,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,An external recruiter or headhunter,Very important,Other,Laptop or Workstation and private datacenters,Text data,,100MB,"Bayesian Techniques,Decision Trees","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,30,30,15,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Increased significantly,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks","Google Cloud Compute,Jupyter notebooks,Python,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Gradient Boosted Machines,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,Segmentation,Time Series Analysis",Often,,,Often,Often,Most of the time,,Often,,,,Most of the time,,,,,,,Sometimes,,Sometimes,,,Most of the time,,Most of the time,,,,Most of the time,,,,40,35,15,5,5,0,Enough to explain the algorithm to someone non-technical,"Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Other",,,,,,,,,,,Often,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,25,25,25,25,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Microsoft R Server (Formerly Revolution Analytics),R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,Often,,,,,Often,,,,,,,,,,Rarely,,,,,,,,,Often,,,,,Often,Sometimes,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Other,"10,000 or more employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Image data,Video data,Text data,Relational data",Never,10GB,,"Jupyter notebooks,NoSQL,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Often,,Sometimes,,,,,,,,,,,,,Rarely,,,,,,"CNNs,Data Visualization,kNN and Other Clustering,Neural Networks",,,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,50,20,0,20,10,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,56,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Time Series Analysis,Python,Government website,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer,Other",University courses,15,10,10,60,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A doctoral degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,38,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,90,5,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",60,20,20,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,500 to 999 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,Less than a year,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,35,65,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,23,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,3 to 5 years,,University courses,5,10,0,85,0,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,36,"Independent contractor, freelancer, or self-employed",,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,Social Network Analysis,Java,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website","Arxiv,Kaggle",Somewhat useful,,,,,,Very useful,,,,,,,,,,,,No Free Hunch Blog,< 1 year,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,I never declared a major,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,20,0,0,80,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Genetic & Evolutionary Algorithms,Python,GitHub,Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"Data Machina Newsletter,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Professional degree,,,Engineer,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,35,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,38,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,1 to 2 years,"Data Scientist,Researcher,Other","Online courses (coursera, udemy, edx, etc.)",5,5,0,0,0,90,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,,,,,Very important,Other,Basic laptop (Macbook),Relational data,,1MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,Rarely,Rarely,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Most of the time,Most of the time,Sometimes,Most of the time,,,,,Rarely,,Sometimes,,,Sometimes,,Sometimes,,Often,,,,,,Sometimes,,,,,40,40,0,10,10,0,Enough to explain the algorithm to someone non-technical,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Computer Scientist,Machine Learning Engineer,Operations Research Practitioner,Software Developer/Software Engineer",Self-taught,50,30,5,10,5,0,Computer Vision,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Academic,20 to 99 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Video data,Most of the time,100MB,Neural Networks,"Amazon Machine Learning,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning",Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,1 to 2 years,Programmer,Self-taught,70,30,0,0,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning",,Primary/elementary school,Financial,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Text data,Relational data",Most of the time,10GB,Decision Trees,"Hadoop/Hive/Pig,Java,NoSQL,Python,SQL",,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,20,5,50,5,10,10,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Government,I prefer not to answer,Increased slightly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,Unix shell / awk",,,,,,,,,Sometimes,,Sometimes,,,,,,Most of the time,,,,,,,Most of the time,Often,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,Often,,,Sometimes,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,Sometimes,,Often,,Often,,,,,Often,,Often,,,,,,Most of the time,Sometimes,,,,40,30,10,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Need to coordinate with IT,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,,,,,Often,,,Often,,Often,Sometimes,,26-50% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Git,Subversion",,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,28,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",50,50,NA,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, but looking for work",,,,,,,,R,Neural Nets,R,University/Non-profit research group websites,Online courses,,,,,,,,,,,Very useful,,,,,,,,"FastML Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",< 1 year,Nice to have,Unnecessary,Nice to have,Unnecessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,Udacity,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Online Courses and Certifications,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,25,25,0,25,25,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Very Important,Not important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Very Important +Male,India,25,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Engineer,Software Developer/Software Engineer",University courses,10,40,10,40,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,1TB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,NoSQL,Python,R,Unix shell / awk",,,,Often,,,,,Most of the time,,,,,,Most of the time,,,,,,Often,,,,,,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Bayesian Techniques,CNNs,Neural Networks",,,Often,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Data Science results not used by business decision makers,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,49,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,0,20,0,30,0,Adversarial Learning,"Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Technology,"5,000 to 9,999 employees",Decreased significantly,Don't know,A tech-specific job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Rarely,1MB,Neural Networks,"C/C++,MATLAB/Octave,Microsoft Azure Machine Learning,Python",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Neural Networks",,,,,,,Most of the time,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,0,20,0,80,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,46,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by professional services/consulting firm,R,Deep learning,Python,Google Search,"Arxiv,Blogs,Conferences,Personal Projects,Stack Overflow Q&A",Very useful,Very useful,,,Very useful,,,,,,,Very useful,,Very useful,,,,,"Data Elixir Newsletter,Data Stories Podcast,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,I don't write code to analyze data,"Engineer,Programmer,Other",Self-taught,80,20,0,0,0,0,,,A doctoral degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,26,Employed full-time,,,No,Yes,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Programmer",University courses,0,0,0,80,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,30,Employed full-time,,,Yes,,Other,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Engineer,Programmer,Software Developer/Software Engineer,Statistician",University courses,60,0,10,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,20 to 99 employees,Increased slightly,More than 10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,38,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,Engineer,Self-taught,100,0,0,0,0,0,,,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Other",University courses,20,0,30,40,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,20 to 99 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,100MB,"Neural Networks,SVMs","Amazon Web services,Jupyter notebooks,Python,R,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Rarely,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,43,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,34,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,50,30,0,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,34,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer",Self-taught,30,30,5,15,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,20 to 99 employees,Decreased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,Traditional Workstation,"Image data,Video data",Rarely,,"Decision Trees,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python",,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,10,50,0,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,"Data Scientist,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,36,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,3 to 5 years,Data Scientist,Self-taught,70,0,20,0,10,0,,,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Argentina,55,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,Less than a year,Researcher,University courses,20,60,0,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,Very useful,,,,,Very useful,,,Very useful,Very useful,Very useful,,Very useful,Very useful,,,Very useful,"FastML Blog,KDnuggets Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,50,20,5,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Jupyter notebooks,MATLAB/Octave,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,Most of the time,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,Rarely,,,Rarely,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Often,Most of the time,Most of the time,,,,,Sometimes,,Sometimes,,Sometimes,Sometimes,,Sometimes,,Most of the time,,,,,Often,,,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,10-25% of projects,Entirely external,,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,100000,INR,I was not employed 3 years ago,5,,,,,,,,,,,,,,,,,, +Female,Argentina,40,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,Programmer,Self-taught,90,0,10,0,0,0,,,High school,Government,500 to 999 employees,Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,10GB,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,0,50,40,10,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Often,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,37,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Other (please specify; separate by semi-colon),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,52,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer",Self-taught,40,40,0,0,0,20,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Ensemble Methods,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Video data,Text data",Don't know,,Neural Networks,TensorFlow,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,"Ensemble Methods,Neural Networks,Time Series Analysis",,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,20,20,20,20,20,0,Enough to explain the algorithm to someone non-technical,Company politics / Lack of management/financial support for a data science team,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,62,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,50,0,0,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",High school,Academic,500 to 999 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Rarely,1MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests","C/C++,Hadoop/Hive/Pig,Java,KNIME (free version),Python,RapidMiner (free version),SQL,Other",,,,Sometimes,,,,,Sometimes,,,,,,Often,,,,Sometimes,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,Sometimes,,,,,,,Most of the time,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Neural Networks,Random Forests",,,Often,Sometimes,,Often,Often,Often,Often,,,,,,,Sometimes,,Often,,Sometimes,,,Often,,,,,,,,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,26-50% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",local repository,"Git,Subversion",,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,Greece,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,28,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Predictive Modeler",Self-taught,20,10,50,5,15,0,Time Series,"Bayesian Techniques,Logistic Regression",A professional degree,Financial,I prefer not to answer,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Sometimes,100MB,"Bayesian Techniques,Markov Logic Networks","Java,Jupyter notebooks,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,Often,,Often,,,,,Often,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,I never declared a major,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,37,Employed full-time,,,Yes,,Programmer,Fine,Employed by college or university,IBM Watson / Waton Analytics,Social Network Analysis,C/C++/C#,Google Search,"Blogs,College/University,Friends network,YouTube Videos",,Somewhat useful,Somewhat useful,,,Somewhat useful,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,30,0,40,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Academic,500 to 999 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,,10MB,Decision Trees,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Decision Trees,Logistic Regression,Text Analytics",,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,0,0,0,50,50,0,Enough to run the code / standard library,"I prefer not to say,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,,,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,,,Less than 10% of projects,Entirely internal,Standalone Team,no more,none,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Most of the time,25000,PHP,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Japan,44,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,44,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Data Scientist,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Data Scientist,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,More than 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",80,10,8,0,2,0,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",Military/Security,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Video data,Sometimes,1GB,"CNNs,Neural Networks","C/C++,Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,Segmentation",,,,Most of the time,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,65,20,10,5,0,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,,,,,,,,,,,,,Often,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Data Scientist,Researcher",Self-taught,100,0,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Relational data,Other",Most of the time,,"Bayesian Techniques,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,DataRobot,Java,Mathematica,Microsoft SQL Server Data Mining,NoSQL,Perl,Python,QlikView,R,SQL,Unix shell / awk",,,,Rarely,,Rarely,,,,,,,,,Rarely,,,,,Often,,,,,Rarely,,Rarely,,,Sometimes,Most of the time,Rarely,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,SVMs,Time Series Analysis",Rarely,,,,,Sometimes,Most of the time,Often,Often,,,,Often,Often,,Most of the time,,Often,Sometimes,Sometimes,Often,,Most of the time,,,Often,Most of the time,Sometimes,,Most of the time,,,,30,30,5,5,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,0,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Business Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Data Analyst,Engineer",Self-taught,80,0,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,Less than a year,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,Natural Language Processing,Hidden Markov Models HMMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,43,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Researcher",University courses,20,20,20,30,10,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,Time Series,"Bayesian Techniques,Logistic Regression",A doctoral degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,6 to 10 years,"Researcher,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,Less than one year,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,20,10,10,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Academic,,,,,Important,,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,1GB,"CNNs,Decision Trees,GANs,Gradient Boosted Machines,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","C/C++,Microsoft Excel Data Mining,Oracle Data Mining/ Oracle R Enterprise,Python,R",,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,Other,University courses,20,45,0,35,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),3 to 5 years,Programmer,University courses,30,5,20,45,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Financial,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression","SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,21,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,Less than a year,Researcher,Self-taught,80,10,0,0,10,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,"10,000 or more employees",Stayed the same,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",20,30,20,30,0,0,,,Primary/elementary school,Other,Fewer than 10 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,20,30,50,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Non-profit,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Statistician,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Predictive Modeler,Statistician",University courses,70,10,10,10,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A doctoral degree,Financial,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,HMMs,Random Forests,Regression/Logistic Regression","MATLAB/Octave,Python,R,SQL",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,HMMs,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Simulation,Text Analytics,Time Series Analysis",,,Often,,,Most of the time,Most of the time,Sometimes,,,,,Sometimes,,,Often,,Often,,,Often,,,,,,Often,,Sometimes,Often,,,,5,45,5,20,25,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,Often,,,,,,Sometimes,,76-99% of projects,Entirely internal,Business Department,Bloomberg,Sparse,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Rarely,125000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Scientist,Researcher,Software Developer/Software Engineer",University courses,50,10,0,40,0,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed part-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,60,10,10,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",,Academic,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Text data,Relational data",Most of the time,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,22,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,Self-taught,60,30,0,0,10,0,Recommendation Engines,Bayesian Techniques,A master's degree,Retail,"10,000 or more employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,50,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Insurance,500 to 999 employees,Stayed the same,More than 10 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Always,100GB,"Bayesian Techniques,Regression/Logistic Regression","IBM Cognos,Java,MATLAB/Octave,NoSQL,Python,R,SAS Enterprise Miner,SQL",,,,,,,,,,Often,,,,,Often,,,,,,Sometimes,,,,,,Often,,,,Often,,Sometimes,,,,,,Most of the time,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,kNN and Other Clustering,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Text Analytics",,Sometimes,Often,,,,Most of the time,,,,,,,Most of the time,,,,,,,Often,Sometimes,Sometimes,,,Often,,,Sometimes,,,,,40,20,5,10,25,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,Often,,Most of the time,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,50,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Engineer,Predictive Modeler,Software Developer/Software Engineer,Other",Self-taught,95,0,0,0,5,0,"Machine Translation,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,100MB,"Gradient Boosted Machines,Regression/Logistic Regression","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,QlikView,Tableau",,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,Often,,,,,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Simulation",,,,,,,Most of the time,Often,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,15,50,20,10,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,24,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,10,0,0,90,0,0,Supervised Machine Learning (Tabular Data),"Hidden Markov Models HMMs,Markov Logic Networks",A doctoral degree,Academic,I don't know,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,20,30,0,20,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,"10,000 or more employees",Increased slightly,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Relational data",Most of the time,100MB,"Ensemble Methods,Random Forests,RNNs","Hadoop/Hive/Pig,Jupyter notebooks,Python,Spark / MLlib,TensorFlow",,,,,,,,,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Collaborative Filtering,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Time Series Analysis",,,,,Sometimes,Most of the time,Most of the time,,Sometimes,,,,,Sometimes,,,,,Most of the time,Sometimes,Sometimes,,Most of the time,Sometimes,Sometimes,,,,,Often,,,,20,40,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Sometimes,100000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Female,Germany,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Hadoop/Hive/Pig,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Somewhat useful,,Very useful,Very useful,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,,,Very useful,"O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,50,20,10,0,0,Computer Vision,Support Vector Machines (SVMs),High school,Academic,20 to 99 employees,Decreased slightly,3-5 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Image data,Video data",Rarely,1TB,SVMs,"C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Python",,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"PCA and Dimensionality Reduction,SVMs",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Often,,,,,,60,25,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,Most of the time,,Most of the time,,Sometimes,,,Often,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,,preparation of data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,50000,EUR,Has increased between 6% and 19%,4,,,,,,,,,,,,,,,,,, +Male,Germany,46,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,Spark / MLlib,Bayesian Methods,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Other","Blogs,Conferences,Friends network,Kaggle,Newsletters,Non-Kaggle online communities,Official documentation,Online courses,Stack Overflow Q&A,Textbook",,Very useful,,,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,,,Very useful,Very useful,,,,"O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Other",Work,15,5,60,20,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Mix of fields,100 to 499 employees,Increased significantly,6-10 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,SQL,Other",,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,,,Often,,Most of the time,,,,,Rarely,,,Sometimes,Most of the time,,,,,,,Sometimes,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Text Analytics,Time Series Analysis",Rarely,Rarely,,,,Often,Most of the time,Rarely,Often,,,Sometimes,,Sometimes,Rarely,Sometimes,,,,,Sometimes,Sometimes,Often,,,,Sometimes,,Sometimes,Sometimes,,,,65,10,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,,,Rarely,Often,,Rarely,,,Sometimes,Often,,Often,,,Sometimes,Often,,100% of projects,,Other,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Subversion",Sometimes,110000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,Less than a year,Business Analyst,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,3 to 5 years,Software Developer/Software Engineer,Self-taught,30,40,20,0,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Neural Networks - CNNs",,Technology,10 to 19 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Image data,Video data",Sometimes,1MB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks","C/C++,MATLAB/Octave,Python",,,,Most of the time,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Ensemble Methods,kNN and Other Clustering,Neural Networks",,,Most of the time,Often,,,,,Often,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,30,30,10,30,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Financial,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Other,Most of the time,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Programmer","Online courses (coursera, udemy, edx, etc.)",0,25,30,40,5,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Financial,"5,000 to 9,999 employees",Stayed the same,Don't know,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Other,Less than a year,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,80,15,0,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",,Financial,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Greece,44,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,38,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,Researcher,Self-taught,100,0,0,0,0,0,Time Series,Bayesian Techniques,High school,Government,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search",Personal Projects,,,,,,,,,,,,Very useful,,,,,,,"Data Machina Newsletter,R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,,,,,GPU accelerated Workstation,2 - 10 hours,Github Portfolio,Yes,Bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,India,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,"Engineer,Researcher",Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),,,Mix of fields,,,,,Not very important,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,I haven't started working yet,Self-taught,40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,SQL,Cluster Analysis,R,"GitHub,Google Search","Blogs,Kaggle,Personal Projects,Tutoring/mentoring,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,Very useful,,,,,Very useful,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,Business Analyst,University courses,0,10,20,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Other,I don't know,Stayed the same,1-2 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Always,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","KNIME (free version),Python,R,SAS Base,SAS Enterprise Miner,SAS JMP",,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Rarely,,Sometimes,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,,Most of the time,,,,Most of the time,,,,20,20,10,20,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Scaling data science solution up to full database",Often,Often,,,,Sometimes,,,Often,,,,Most of the time,,Most of the time,,,Often,,,,,100% of projects,Entirely internal,Business Department,,"permissions integration of the data ","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,30000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Blogs,College/University,Kaggle,Personal Projects,Textbook",,Somewhat useful,Very useful,,,,Very useful,,,,,Very useful,,,Very useful,,,,,1-2 years,Nice to have,Necessary,Necessary,Unnecessary,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,No,Master's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,15,10,5,50,20,0,,"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,24,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,28,Employed full-time,,,No,Yes,Other,,Employed by government,Python,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","College/University,Kaggle,Non-Kaggle online communities,Personal Projects,Textbook",,,Very useful,,,,Very useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,,,,KDnuggets Blog,5-10 years,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Necessary,,,,,Basic laptop (Macbook),2 - 10 hours,Master's degree,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Researcher,Statistician",University courses,20,20,0,40,20,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Engineer,University courses,0,40,20,40,0,0,Natural Language Processing,"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,"5,000 to 9,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,3 to 5 years,"Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,NA,50,30,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Other,"10,000 or more employees",Decreased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Python,R,SAS JMP,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,39,Employed part-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,More than 10 years,Programmer,Self-taught,25,25,25,25,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Retail,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Orange,Python,QlikView,R,SQL,Tableau,TensorFlow",,,,Most of the time,,,,,,,,,,,,,Often,,,,Rarely,,Most of the time,Rarely,Sometimes,,,,Rarely,,Most of the time,Rarely,Rarely,,,,,,,,,Most of the time,,,Sometimes,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Time Series Analysis",Rarely,Rarely,Most of the time,,,Most of the time,Most of the time,Most of the time,Rarely,,,Rarely,,Rarely,,Most of the time,,Most of the time,,,Rarely,,,Most of the time,,,,Rarely,,Most of the time,,,,70,5,10,5,5,5,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Need to coordinate with IT",Most of the time,,,,Most of the time,Sometimes,,,,,,,,,Most of the time,,,,,,,,51-75% of projects,Entirely internal,Business Department,,Re-structuring and cleaning data including even changes in business processes,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Always,3600000,RUB,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Germany,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,Kaggle,Online courses,Personal Projects,Podcasts",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,FlowingData Blog",,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,20,30,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,"Text data,Relational data",Most of the time,1TB,"Decision Trees,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau,TensorFlow",,,,,,,,,Most of the time,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,Sometimes,,,,,,"Cross-Validation,Decision Trees,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,,Most of the time,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,Most of the time,Often,,,,,20,20,30,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,Often,,,,Sometimes,,Often,,,,,,Often,,,,,Sometimes,,,,Less than 10% of projects,More internal than external,IT Department,PubMed,get updates and cleaning,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,95000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Support Vector Machines (SVM),R,,"Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,,The Data Skeptic Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",0,60,40,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Bayesian Techniques,A master's degree,CRM/Marketing,100 to 499 employees,Decreased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1TB,Bayesian Techniques,"QlikView,R,Spark / MLlib",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Naive Bayes,Segmentation",Sometimes,,Most of the time,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,,,,,,20,20,10,20,30,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database",Sometimes,,,,Rarely,,,,Sometimes,,,,,,Most of the time,,,Often,,,,,26-50% of projects,Entirely internal,IT Department,,scalability of algorithms,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,,Rarely,200000,PLN,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,Switzerland,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Programmer,Self-taught,80,0,0,20,0,0,,,High school,Manufacturing,100 to 499 employees,,,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,3 to 5 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Statistician",Self-taught,40,10,40,10,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Government,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,Python,R,RapidMiner (free version),Tableau",,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,,Most of the time,,Sometimes,,,,,,,,,,Most of the time,,,,,,,"Association Rules,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Segmentation,Simulation,Text Analytics,Time Series Analysis",,Most of the time,,,,,Most of the time,Most of the time,,,,,,Sometimes,,Most of the time,Most of the time,Rarely,Rarely,Most of the time,,Most of the time,Most of the time,,,Most of the time,Most of the time,,Sometimes,Most of the time,,,,70,5,5,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Most of the time,,,Most of the time,Most of the time,,Most of the time,Sometimes,,,,,Most of the time,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,31,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Scientist,Machine Learning Engineer",Kaggle competitions,0,20,20,50,10,0,"Natural Language Processing,Recommendation Engines","Ensemble Methods,Neural Networks - RNNs",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,39,Employed full-time,,,Yes,,Other,,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",TensorFlow,Bayesian Methods,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",,30,20,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Evolutionary Approaches","Some college/university study, no bachelor's degree",Pharmaceutical,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,,"Ensemble Methods,Evolutionary Approaches","Amazon Web services,C/C++,Java,Python,R",,Most of the time,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Ensemble Methods,Other",,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,20,50,5,15,10,0,Enough to refine and innovate on the algorithm,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Often,,,,100% of projects,Entirely internal,IT Department,,,,Commercial Data Platform,,Bitbucket,Sometimes,,,,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Data Scientist,Software Developer/Software Engineer,Other",Self-taught,50,25,25,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,Ensemble Methods,Random Forests,SVMs","Jupyter notebooks,Python,QlikView,R,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,Often,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,"Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,Time Series Analysis",,,,,,,,,Most of the time,,,Most of the time,,Often,,Sometimes,,Sometimes,Often,,,,Sometimes,,,,,,,Most of the time,,,,25,5,50,10,10,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,Sometimes,Often,Often,,,,Sometimes,,,,,,,,,,Most of the time,,Often,,100% of projects,More internal than external,Central Insights Team,Public economic data releases,timely access.,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Bitbucket,Most of the time,200000,CAD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Random Forests,R,Other,"Official documentation,Online courses,Stack Overflow Q&A",,,,,,,,,,Very useful,Very useful,,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,,Relational data,Most of the time,1MB,"Bayesian Techniques,Regression/Logistic Regression,Other","Microsoft R Server (Formerly Revolution Analytics),R,Stan",,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,"Cross-Validation,Logistic Regression,Simulation,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,Often,,,,,,,,,,,Often,,,Sometimes,,,,30,20,10,20,20,0,Enough to explain the algorithm to someone non-technical,Lack of data science talent in the organization,,,,,,,,,Sometimes,,,,,,,,,,,,,,51-75% of projects,Entirely internal,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Bitbucket,Git",Rarely,550000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,28,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,Other,University courses,30,30,30,10,0,0,Supervised Machine Learning (Tabular Data),"Ensemble Methods,Logistic Regression",High school,Academic,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Text data,,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","R,SAS Base,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Prescriptive Modeling,Simulation,Time Series Analysis",,Sometimes,Sometimes,,,,Sometimes,Sometimes,Sometimes,,,,,Often,,Often,,Sometimes,,,Often,Often,,,,,Sometimes,,,Sometimes,,,,15,15,15,30,25,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Lack of significant domain expert input",,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,26-50% of projects,More internal than external,Business Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,Predictive Modeler,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Statistician",Work,30,0,70,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,10 to 19 employees,Decreased significantly,Less than one year,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Never,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,,,,,,,,"Lift Analysis,Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,Most of the time,,,,60,30,0,10,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,21,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,1 to 2 years,Programmer,Self-taught,60,10,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,100 to 499 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Researcher,Other",University courses,30,0,30,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,Self-taught,0,50,20,0,0,30,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Always,,CNNs,"Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Rarely,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,30,40,0,30,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Need to coordinate with IT",,,,Most of the time,Often,,,,,,,,,,Often,,,,,,,,None,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Subversion,,1800000,TWD,Has stayed about the same (has not increased or decreased more than 5%),5,,,,,,,,,,,,,,,,,, +Male,Ukraine,38,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,DBA/Database Engineer,Work,15,5,50,20,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Financial,100 to 499 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Don't know,100GB,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Oracle Data Mining/ Oracle R Enterprise,SQL",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,Often,,,,,,,,,,,,,,Rarely,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,25,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Researcher",University courses,50,10,40,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,Academic,10 to 19 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Relational data",Most of the time,1GB,"CNNs,Ensemble Methods,Evolutionary Approaches,Neural Networks","Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization,Neural Networks",,,,Most of the time,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,10,50,20,5,15,0,Enough to refine and innovate on the algorithm,"Limitations of tools,Scaling data science solution up to full database,Unavailability of/difficult access to data",,,,,,,,,,,,,Often,,,,,Often,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,58,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,Computer Vision,Neural Networks - CNNs,A professional degree,Technology,500 to 999 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Rarely,1GB,"CNNs,Neural Networks","C/C++,Python,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks",,,,Often,,Sometimes,Often,,,,,,,,,,,,,Often,,,,,,,,,,,,,,50,40,5,5,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,,,,,,,,,,,,Often,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,74,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,Other,Self-taught,90,0,10,0,0,0,,Other (please specify; separate by semi-colon),A master's degree,Other,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,10MB,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Engineer,Predictive Modeler,Researcher",Self-taught,55,5,35,5,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Logistic Regression",,Technology,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop or Workstation and private datacenters,Other",Other,Always,100TB,"Ensemble Methods,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,Unix shell / awk,Other",,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Most of the time,Most of the time,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Time Series Analysis",,,Often,,,Often,Often,,Often,,,,,,,,,,,,,,,,,,,,,Often,,,,10,50,15,5,20,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Need to coordinate with IT",Often,,,,,Sometimes,,,,,Often,Often,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,29,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,1 to 2 years,,Self-taught,80,0,0,20,0,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Financial,"10,000 or more employees",Stayed the same,1-2 years,A career fair or on-campus recruiting event,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,1GB,,"SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,30,0,0,0,70,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,39,Employed part-time,,,Yes,,Business Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Other",Work,0,0,100,0,0,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,No,Yes,Other,,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,Less than a year,"Other,I haven't started working yet",Self-taught,50,0,0,0,50,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A social science,6 to 10 years,,University courses,50,20,0,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Not very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",Other,Never,<1MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM SPSS Statistics,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Orange,Python,R,RapidMiner (free version),SQL",,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,Often,,Rarely,,Rarely,,Often,,,,,,,Rarely,,,,,,,,,,"Collaborative Filtering,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Random Forests,Recommender Systems,SVMs,Text Analytics",,,,,Rarely,Rarely,,Rarely,,,,,,Rarely,,Rarely,,,,,,,Rarely,Rarely,,,,Rarely,Rarely,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Machine Learning Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,60,20,10,10,0,Computer Vision,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,55,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,Other,Self-taught,0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,"Data Machina Newsletter,KDnuggets Blog,Other (Separate different answers with semicolon)",,,,,,,,,,,,,,,,,,,,Master's degree,Other,6 to 10 years,Data Analyst,University courses,5,10,0,80,5,0,Natural Language Processing,Support Vector Machines (SVMs),No education,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,1GB,Regression/Logistic Regression,"IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,Tableau",,,,,,,,,,,Rarely,Rarely,,,,,Often,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction",,,,,,Often,Most of the time,,,,,,,,,Often,,,,,Sometimes,,,,,,,,,,,,,50,20,5,10,15,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,Often,,,,,,,,,Often,,,51-75% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,,50000,GBP,Has decreased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,India,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,University courses,50,20,20,10,NA,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,High school,Technology,,,,,Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),"Video data,Text data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Random Forests,SVMs","Microsoft R Server (Formerly Revolution Analytics),Python,R",,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,CNNs,Data Visualization,Ensemble Methods,Neural Networks,Random Forests,Simulation,SVMs",Often,Often,,Often,,,Often,,Often,,,,,,,,,,,Often,,,Often,,,,Often,Often,,,,,,10,20,30,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Often,,,,26-50% of projects,,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,"Bitbucket,Git",,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,1 to 2 years,,University courses,20,20,10,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,Fewer than 10 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,33,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Work,70,10,20,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,51,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,,University courses,20,5,10,50,15,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Internet-based,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,Most of the time,,Most of the time,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,30,15,10,5,20,20,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Organization is small and cannot afford a data science team,Privacy issues,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Most of the time,,Most of the time,Most of the time,Most of the time,,,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A humanities discipline,3 to 5 years,,University courses,3,10,15,70,2,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,1 to 2 years,"Engineer,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,40,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Insurance,"1,000 to 4,999 employees",Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Rarely,10GB,"Random Forests,Regression/Logistic Regression","Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,"Data Scientist,Software Developer/Software Engineer",University courses,30,0,10,60,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Internet-based,100 to 499 employees,Decreased slightly,3-5 years,A career fair or on-campus recruiting event,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Text data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Flume,Google Cloud Compute,Hadoop/Hive/Pig,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,,,Most of the time,,,Most of the time,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Neural Networks,Random Forests,Recommender Systems,Time Series Analysis",Most of the time,,,,Most of the time,Sometimes,Sometimes,Sometimes,,,,,,,,Often,,,,Often,,,Sometimes,Most of the time,,,,,,Sometimes,,,,35,20,30,5,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization",Sometimes,,,,Most of the time,Often,,,Often,,,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,"Not employed, but looking for work",,,,,,,,Amazon Web services,Survival Analysis,SQL,Google Search,College/University,,,Very useful,,,,,,,,,,,,,,,,R Bloggers Blog Aggregator,< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,,,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,University courses,10,0,25,65,0,0,Unsupervised Learning,Support Vector Machines (SVMs),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",India,40,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Management information systems,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Random Forests,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",Very useful,Somewhat useful,Very useful,,Very useful,,Very useful,,,,Very useful,Very useful,,Very useful,,,Very useful,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,GPU accelerated Workstation,11 - 39 hours,Experience from work in a company related to ML,Sort of (Explain more),Bachelor's degree,Electrical Engineering,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Researcher,Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,Employed part-time,,,Yes,,Researcher,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,30,10,20,0,0,"Computer Vision,Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data)","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Always,,Other,"Amazon Web services,Jupyter notebooks,NoSQL,Python",,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Natural Language Processing,Recommender Systems",,,,,,,Often,,,,,,,,,,,,Often,,,,,Most of the time,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"DBA/Database Engineer,Programmer,Researcher",Self-taught,40,20,20,20,0,0,,,A professional degree,Technology,Fewer than 10 employees,Stayed the same,Don't know,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Always,100MB,Decision Trees,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),,,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Data Analyst,Machine Learning Engineer,Software Developer/Software Engineer",University courses,20,5,4,70,1,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"1,000 to 4,999 employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Text data,Relational data",Most of the time,100MB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,TensorFlow",,Often,,,Rarely,,,Most of the time,Rarely,,,,,Rarely,,,Often,,,,,,,,,,Often,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Often,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Often,Sometimes,Most of the time,Often,Sometimes,Often,,,,,Often,,Sometimes,,Sometimes,,Often,Often,,Sometimes,Often,Often,,,Sometimes,Sometimes,Sometimes,,,,40,30,10,10,10,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher",University courses,40,10,0,50,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,1GB,"Decision Trees,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,Rarely,Often,,,,Rarely,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Data Miner,Fine,"Employed by college or university,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,46,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Text Mining,Python,Google Search,"Blogs,Online courses,YouTube Videos",,Somewhat useful,,,,,,,,,Very useful,,,,,,,Somewhat useful,,< 1 year,,,,,,,,,,,,,,DataCamp,Basic laptop (Macbook),0 - 1 hour,Master's degree,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Miner,Engineer,Researcher",University courses,0,0,0,80,0,20,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - RNNs",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A humanities discipline,Less than a year,,Self-taught,30,20,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,3 to 5 years,"Researcher,Software Developer/Software Engineer",Work,20,20,60,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Python,Spark / MLlib,Tableau",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,Often,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Recommender Systems,SVMs",Most of the time,,,,Often,Most of the time,,,,,,Often,,Most of the time,,Most of the time,,Most of the time,Most of the time,,,,,Most of the time,,,,Often,,,,,,35,15,20,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Most of the time,,,Most of the time,Often,,,,,,,,,,,Often,,,51-75% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Commercial Data Platform,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,38,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Predictive Modeler,Statistician",Self-taught,85,0,0,15,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,25,Employed full-time,,,Yes,,Computer Scientist,Poorly,"Employed by company that makes advanced analytic software,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",30,30,0,30,10,0,Computer Vision,"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,21,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Arxiv,Kaggle,Online courses,Stack Overflow Q&A",Somewhat useful,,,,,,Very useful,,,,Very useful,,,Very useful,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,FastML Blog",,,,,,,,,,,,,,,,,,,,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,70,0,30,0,0,0,Time Series,Neural Networks - CNNs,I prefer not to answer,Technology,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),Other,Rarely,10GB,"CNNs,Random Forests,SVMs","C/C++,Hadoop/Hive/Pig,NoSQL,Python",,,,Rarely,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,SVMs,Time Series Analysis",,,,,,Often,Often,,,,,,,,,,,,,Often,Sometimes,,,,,,,Often,,Often,,,,30,30,20,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Organization is small and cannot afford a data science team",Often,,,,Most of the time,,,,Most of the time,Most of the time,Often,,,,,Sometimes,,,,,,,10-25% of projects,Entirely internal,Standalone Team,Nothing,Insufficient number of sample,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Other,Rarely,120000,INR,Has increased 20% or more,4,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Other,University courses,5,10,0,85,0,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,,,"Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,NoSQL,R,SAP BusinessObjects Predictive Analytics,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,25,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,45,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,55,0,30,0,0,Unsupervised Learning,"Decision Trees - Gradient Boosted Machines,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Government,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,<1MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R",,Rarely,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,Rarely,,,,Sometimes,Often,Sometimes,Sometimes,,,,,Sometimes,,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,Often,,51-75% of projects,More internal than external,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,30,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Engineer,Researcher",University courses,10,20,30,30,5,5,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters","Image data,Video data,Text data",Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,SVMs","MATLAB/Octave,Minitab,Python",,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Simulation,SVMs",,,Sometimes,Often,,,,Often,,,,,,Most of the time,,Most of the time,,Often,,,,,,,,,Most of the time,Sometimes,,,,,,20,50,15,10,0,5,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Business Analyst,Data Analyst",Work,0,0,80,15,5,0,Survival Analysis,"Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,Financial,20 to 99 employees,Increased significantly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,,Proprietary Algorithms,Python,"GitHub,Google Search,Government website",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,6 to 10 years,Researcher,University courses,60,10,0,30,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,"5,000 to 9,999 employees",Decreased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Relational data",Most of the time,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Segmentation,SVMs,Time Series Analysis",,,Most of the time,Often,,Most of the time,Most of the time,Most of the time,Sometimes,,,Sometimes,Sometimes,Often,,Most of the time,,,Often,Most of the time,Often,,Often,,Sometimes,Sometimes,,Often,,Often,,,,10,35,50,5,0,0,Enough to refine and innovate on the algorithm,"Data Science results not used by business decision makers,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,Often,,,,,,,,,,,,,,,Often,,Most of the time,,Often,,10-25% of projects,Approximately half internal and half external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Share Drive/SharePoint",,Git,Rarely,,,,7,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Researcher,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,1 to 2 years,Researcher,University courses,0,0,100,0,0,0,Reinforcement learning,Hidden Markov Models HMMs,,Technology,20 to 99 employees,Decreased slightly,6-10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,Never,1MB,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,39,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,52,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,SAS Base,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Blogs,,Somewhat useful,,,,,,,,,,,,,,,,,"Becoming a Data Scientist Podcast,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast",< 1 year,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Kaggle Competitions,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Computer Scientist,DBA/Database Engineer,Operations Research Practitioner","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Computer Vision,Other (please specify; separate by semi-colon),A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,Russia,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Other,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,,University courses,40,0,10,50,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,100 to 499 employees,,,,Important,,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",,,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Time Series,,,Internet-based,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,100MB,,Microsoft Excel Data Mining,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,80,0,20,0,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Often,,,,,,,Often,,,,,,,,,,,,,,10-25% of projects,Entirely internal,Business Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Git,Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,42,Employed full-time,,,No,Yes,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,70,30,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,Unsupervised Learning,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Academic,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,CNNs,Neural Networks,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,R,SQL,TensorFlow,Unix shell / awk",,Often,,Most of the time,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",Rarely,,,Often,,Most of the time,Most of the time,Most of the time,Often,,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,Most of the time,,Sometimes,,Most of the time,,,Most of the time,Often,,,,,40,20,10,25,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,,,,,,,,,,,,,,,Often,,,76-99% of projects,,,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,"Bitbucket,Git,Mercurial,Subversion",Always,,,,7,,,,,,,,,,,,,,,,,, +Male,United Kingdom,51,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A professional degree,Technology,"10,000 or more employees",Stayed the same,Don't know,A tech-specific job board,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,SAS Base,Neural Nets,R,Google Search,"Blogs,Kaggle,Stack Overflow Q&A",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Self-taught,10,50,30,0,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,"Employed by a company that doesn't perform advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,3 to 5 years,"Business Analyst,Data Analyst",University courses,30,50,0,20,0,0,"Natural Language Processing,Recommendation Engines","Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,,,,,Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Text data,Sometimes,1GB,"CNNs,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,Python",Sometimes,Most of the time,,,,,,Often,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Collaborative Filtering,Cross-Validation,Naive Bayes,Natural Language Processing,Text Analytics",,,,Often,Often,Sometimes,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Often,,,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Unavailability of/difficult access to data",,,,,Sometimes,,,,Sometimes,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,15,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","Conferences,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,,,,,,Very useful,Very useful,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,31,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,University courses,30,30,10,0,30,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,Internet-based,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Python,SQL,Unix shell / awk",,Most of the time,,,,,,,Most of the time,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Random Forests,Text Analytics",Rarely,,Rarely,,,Rarely,,Rarely,Rarely,,,Rarely,,,,Rarely,,Rarely,,,,,Rarely,,,,,,Rarely,,,,,65,10,15,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization",,Often,Sometimes,,Often,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,49,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",10,80,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,Fewer than 10 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes",,,,,,Sometimes,Often,Sometimes,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,,50,25,10,10,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,47,"Not employed, but looking for work",,,,,,,,TensorFlow,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle",Somewhat useful,,Very useful,,,,Very useful,,,,,,,,,,,,,1-2 years,Nice to have,Nice to have,Necessary,,Necessary,Necessary,Necessary,,,Nice to have,,,,,Other,40+,Master's degree,Yes,Master's degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,10,0,0,90,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Operations Research Practitioner,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Operations Research Practitioner",Self-taught,10,25,50,15,0,0,"Natural Language Processing,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Stayed the same,6-10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Decision Trees","Amazon Web services,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SQL,Tableau",,Sometimes,,,,,,,Rarely,,,,,,Sometimes,,Most of the time,,,,,,,,Rarely,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,Rarely,,,,,,,"A/B Testing,Association Rules,Decision Trees,Lift Analysis,Naive Bayes,Natural Language Processing,Simulation,Text Analytics",Sometimes,Often,,,,,,Often,,,,,,,Most of the time,,,Sometimes,Often,,,,,,,,Sometimes,,Often,,,,,25,5,40,15,15,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,Google Search,Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,65,0,5,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,,Logistic Regression,A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased slightly,3-5 years,A general-purpose job board,Not at all important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Sometimes,10MB,Regression/Logistic Regression,"Google Cloud Compute,NoSQL",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Statistician,Other",Self-taught,70,20,0,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,100MB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks","Python,R,SAS Base,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,Often,,,,,,,,,,"Collaborative Filtering,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,Time Series Analysis",,,,,Rarely,,,Sometimes,Often,,,Most of the time,,,,Sometimes,,Rarely,,Often,Often,,,Sometimes,,,,,,Sometimes,,,,70,15,0,NA,15,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,Often,,,Sometimes,,,,,,,,,,,,,,Often,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Git,Mercurial",Rarely,53000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,10,80,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Mix of fields,500 to 999 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft SQL Server Data Mining,Python,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,Rarely,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,Text Analytics",,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,38,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Statistician",University courses,10,0,0,90,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Other (please specify; separate by semi-colon)",,Academic,"10,000 or more employees",Stayed the same,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Most of the time,,"Decision Trees,Ensemble Methods,Random Forests,Other","C/C++,IBM SPSS Statistics,Microsoft Excel Data Mining,R,SAS Base,SAS Enterprise Miner,Other",,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,Sometimes,Sometimes,,,,,,,,,,Often,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,Most of the time,Most of the time,Sometimes,Often,,,,,,,Most of the time,,Sometimes,Rarely,,Sometimes,,Sometimes,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Rarely,,,,,,,,,,,,,,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Software Developer/Software Engineer",Self-taught,35,10,10,30,10,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,22,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Psychology,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,20,0,40,40,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,Fewer than 10 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Image data,Always,10GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",40,30,0,0,30,0,Speech Recognition,Logistic Regression,"Some college/university study, no bachelor's degree",Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,46,Employed full-time,,,No,Yes,DBA/Database Engineer,Poorly,Employed by company that makes advanced analytic software,Python,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","College/University,Official documentation,Online courses",,,Somewhat useful,,,,,,,Very useful,Very useful,,,,,,,,"Data Stories Podcast,FastML Blog,O'Reilly Data Newsletter",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,,,,"Coursera,DataCamp,edX,Udacity","Basic laptop (Macbook),Workstation + Cloud service",0 - 1 hour,Master's degree,No,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,DBA/Database Engineer,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,57,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,45,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Programmer",Work,0,80,20,0,0,0,"Natural Language Processing,Speech Recognition","Decision Trees - Random Forests,Neural Networks - RNNs",I prefer not to answer,Government,"10,000 or more employees",Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data",Always,100TB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Cloudera,Google Cloud Compute,Java,MATLAB/Octave,NoSQL,Perl,R,SQL",,Sometimes,,Rarely,Rarely,,,Often,,,,,,,Sometimes,,,,,,Rarely,,,,,,Often,,,Rarely,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Naive Bayes,Natural Language Processing,Neural Networks",,,,,,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,,,,,,,10,40,50,0,0,0,Enough to run the code / standard library,"Organization is small and cannot afford a data science team,Scaling data science solution up to full database",,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,10-25% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Other,owen,Subversion,Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Data Scientist,University courses,20,20,30,30,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Technology,20 to 99 employees,Increased slightly,Less than one year,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Python,R,SQL",,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Segmentation,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,20,20,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,30,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,35,20,25,20,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",,Telecommunications,"10,000 or more employees",Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100TB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Tableau",,,,,Most of the time,,,,Most of the time,,,,,Often,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Most of the time,,,Sometimes,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,Sometimes,Sometimes,,,Most of the time,Sometimes,,,,,,,Sometimes,,,,Most of the time,,,Often,,,Often,,,,Often,Often,Often,,,,30,30,10,10,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,"Information technology, networking, or system administration",More than 10 years,"Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,15,55,20,0,0,Recommendation Engines,"Bayesian Techniques,Gradient Boosting,Neural Networks - GANs",A master's degree,Internet-based,500 to 999 employees,,More than 10 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,14,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by a company that doesn't perform advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Operations Research Practitioner",Self-taught,80,10,10,0,0,0,"Adversarial Learning,Machine Translation,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Retail,100 to 499 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",Rarely,Often,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,Often,,,,,Often,,Sometimes,,,,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Financial,"1,000 to 4,999 employees",Stayed the same,Less than one year,An external recruiter or headhunter,Important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,Relational data,Sometimes,10MB,"CNNs,Decision Trees,Neural Networks,SVMs","Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,Rarely,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,45,25,5,15,10,0,"Computer Vision,Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,Government,100 to 499 employees,Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,,"Bayesian Techniques,Decision Trees","Amazon Web services,C/C++,Java,NoSQL,Python,R,SQL",,Rarely,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Often,,,,20,30,20,15,15,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Need to coordinate with IT",Most of the time,Often,,,,,,,,,Often,,,,Most of the time,,,,,,,,51-75% of projects,Entirely external,Central Insights Team,,,Key-value store (e.g. Redis/Riak),"Company Developed Platform,Email,I don't typically share data",,,,,,Has decreased 20% or more,,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Engineer,Programmer",Work,20,30,0,50,0,0,,,Primary/elementary school,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Other,Self-taught,40,10,50,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",,Retail,"5,000 to 9,999 employees",Increased significantly,6-10 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,25,0,25,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,40,0,0,0,20,Reinforcement learning,,,CRM/Marketing,"1,000 to 4,999 employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Text data,Sometimes,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,More than 10 years,"Data Scientist,Other",Work,20,20,60,0,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning",Other (please specify; separate by semi-colon),A master's degree,Mix of fields,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,,Other,"Amazon Web services,Google Cloud Compute,Jupyter notebooks,Microsoft Excel Data Mining,NoSQL,Python,R,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Often,,,,,,,,,Most of the time,,,,,,Sometimes,,,,Often,,,,Most of the time,,Rarely,,,,,,,,,,,,Sometimes,Sometimes,,Often,,,,"Data Visualization,Natural Language Processing",,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed full-time,,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,Less than a year,Statistician,Kaggle competitions,0,40,30,0,30,0,Survival Analysis,"Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,75,0,0,25,0,0,Supervised Machine Learning (Tabular Data),"Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Don't know,100MB,"CNNs,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Support Vector Machines (SVM),Python,Google Search,"College/University,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,O'Reilly Data Newsletter,< 1 year,Unnecessary,Nice to have,Necessary,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Github Portfolio,Yes,Master's degree,Computer Science,,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,1-2,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important +Male,People 's Republic of China,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,1 to 2 years,"Data Analyst,Other",University courses,20,5,40,35,0,0,,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Text data,Most of the time,100GB,"Bayesian Techniques,Regression/Logistic Regression","Python,QlikView,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,Most of the time,,,,,Rarely,Rarely,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,SVMs,Text Analytics",,Often,Often,,,Sometimes,Often,,,,,,,,,Often,,Often,Often,,Sometimes,,,,,,,Sometimes,Most of the time,,,,,45,20,10,10,15,0,Enough to explain the algorithm to someone non-technical,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Limitations of tools,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Sometimes,Sometimes,Often,,,,,,,,Sometimes,,,,,Often,Often,Sometimes,Most of the time,,51-75% of projects,More internal than external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",,Other,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Relational data,Other",Sometimes,100GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Java,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow,Unix shell / awk",,Most of the time,,,,,,Most of the time,,,,,,,Sometimes,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Simulation,Time Series Analysis",Often,,,,,Most of the time,Most of the time,Often,,,,Often,,,,,,,,Often,Most of the time,Often,Often,,,,Often,,,Often,,,,30,10,10,30,20,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Scientist,Engineer,Operations Research Practitioner,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",50,25,10,15,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Pharmaceutical,"10,000 or more employees",Stayed the same,1-2 years,Some other way,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,,"Bayesian Techniques,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Microsoft Azure Machine Learning,Python,Other,Other,Other",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Most of the time,Most of the time,Sometimes,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Random Forests,Simulation,Text Analytics",,,Sometimes,,,Sometimes,Most of the time,,,,,,,Sometimes,,Often,,,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,50,5,5,25,15,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,,,,Technology,"10,000 or more employees",,,,Important,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,29,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,Python,Cluster Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Textbook",,,,,,,,,,,Very useful,,,,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Other,Other,40,10,30,0,0,20,Outlier detection (e.g. Fraud detection),Logistic Regression,A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,19,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Computer Scientist,University courses,40,0,0,50,10,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,I don't know,Stayed the same,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,3 to 5 years,Software Developer/Software Engineer,Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Other,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",60,20,10,10,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Never,10MB,Regression/Logistic Regression,"Amazon Web services,Jupyter notebooks,R,Spark / MLlib",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Rarely,,,,,,,,Rarely,,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,1,0,0,1,5,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Other",University courses,0,50,0,50,0,0,"Adversarial Learning,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Mix of fields,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,"Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,Outlier detection (e.g. Fraud detection),Neural Networks - RNNs,A master's degree,Government,100 to 499 employees,Stayed the same,1-2 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Image data,Most of the time,>1EB,"CNNs,Neural Networks,RNNs","Amazon Machine Learning,Amazon Web services,C/C++,Hadoop/Hive/Pig,Java,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Perl,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",Often,Often,,Often,,,,,Often,,,,,,Often,,,,,,,,,Rarely,,,Most of the time,,,Sometimes,Most of the time,,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,Most of the time,,,,"A/B Testing,CNNs,Data Visualization,Decision Trees,GANs,Logistic Regression,Neural Networks,RNNs",Most of the time,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,Most of the time,,,,,,,,,50,0,0,50,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,I prefer not to say,Lack of significant domain expert input,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,Most of the time,Most of the time,,Most of the time,,Most of the time,,,,Most of the time,,Most of the time,,Most of the time,,Most of the time,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,47,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,More than 10 years,Other,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,24,Employed part-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,0,0,20,80,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Computer Scientist,Data Analyst,Operations Research Practitioner,Predictive Modeler,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,40,0,50,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Always,100GB,Bayesian Techniques,"C/C++,NoSQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,32,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,84,5,0,1,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,Very useful,,,,,,,Very useful,,,,Very useful,"FastML Blog,No Free Hunch Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,University courses,10,0,5,80,5,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Pharmaceutical,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Don't know,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","QlikView,R,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,Most of the time,,,,,,,,,Most of the time,,,,,,,Most of the time,,,"Data Visualization,kNN and Other Clustering,Neural Networks,Random Forests,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,Rarely,,,,,,Sometimes,,,Sometimes,,,Often,,,,Often,,,,70,15,0,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Often,,,Most of the time,,,,Most of the time,,,,,,,,,,,Most of the time,,,76-99% of projects,More internal than external,Business Department,,Lack of well described datasets.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,73500,USD,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),3 to 5 years,,Self-taught,50,0,0,50,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",A master's degree,Other,"1,000 to 4,999 employees",Decreased significantly,Less than one year,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,1GB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL,TIBCO Spotfire",,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,Sometimes,,Often,,,,,,,,,Most of the time,,,,,Most of the time,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Sometimes,Most of the time,Rarely,Rarely,,,,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,Jupyter notebooks,Text Mining,SQL,University/Non-profit research group websites,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,More than 10 years,"Researcher,Software Developer/Software Engineer,Other",Self-taught,60,30,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Random Forests,Regression/Logistic Regression","Amazon Web services,MATLAB/Octave,Python,R,SQL,Unix shell / awk",,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression",,,,,,Sometimes,Often,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,70,10,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Most of the time,Often,,,,,,,,,,,,,,Often,Often,,76-99% of projects,Approximately half internal and half external,Business Department,COSMIC; CCLE,Different data sources use different identifiers for the same samples.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,85000,USD,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Female,India,22,"Not employed, but looking for work",,,,,,,,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,50,0,50,0,0,0,Natural Language Processing,,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,6 to 10 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,80,0,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression",,Government,"5,000 to 9,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Other","Python,R,SAS Base,SAS JMP,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,Often,,Sometimes,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees",,,,,,Most of the time,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,,,,,10,25,25,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning",,Often,,,Sometimes,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,No,Yes,Other,,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,Other,Other,80,NA,0,0,20,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer",Self-taught,15,70,15,0,0,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Non-profit,"1,000 to 4,999 employees",Stayed the same,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,1 to 2 years,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",15,20,10,10,10,35,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A professional degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,1 to 2 years,Other,Self-taught,20,10,0,70,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"1,000 to 4,999 employees",Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",,1GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Statistics,Python,R,RapidMiner (commercial version),RapidMiner (free version),SQL,Unix shell / awk",,,,,,,,,Rarely,,,Rarely,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,Rarely,Rarely,,,,,,,Most of the time,,,,,,Often,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Sometimes,,,Rarely,Most of the time,Often,Often,,,,,Most of the time,Sometimes,Most of the time,,Often,Rarely,Rarely,Sometimes,,Sometimes,Sometimes,,,,Rarely,Rarely,Most of the time,,,,53,17,7,16,7,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,Often,Rarely,,Sometimes,Often,Sometimes,Sometimes,,,,Sometimes,,,,,Sometimes,Sometimes,,26-50% of projects,More internal than external,Business Department,,Dirty data and lack of expert domain to interpret raw data,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,Other,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Very useful,,,Very useful,Very useful,Very useful,,,,,,Very useful,"Emergent/Future Newsletter (Algorithmia),R Bloggers Blog Aggregator,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",40+,Experience from work in a company related to ML,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,15,15,25,30,0,15,"Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",Work,10,10,44,10,3,23,"Adversarial Learning,Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Always,10TB,"Bayesian Techniques,CNNs,Decision Trees,GANs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Amazon Web services,Cloudera,Flume,Google Cloud Compute,Hadoop/Hive/Pig,Impala,Java,KNIME (commercial version),KNIME (free version),Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,Tableau,TensorFlow",Rarely,Often,,,Sometimes,,Sometimes,Rarely,Sometimes,,,,,Rarely,Most of the time,,,Sometimes,Sometimes,,,Sometimes,,,,,Often,,,,Often,,,,,,,,,,Often,,,,Often,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Often,Often,Often,Often,Often,Sometimes,,,Sometimes,,,Often,,Often,,Often,Often,Often,,,,Often,Often,Often,,Sometimes,Sometimes,Often,,,,20,30,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,26-50% of projects,More external than internal,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",20,10,70,0,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,I don't know,Increased significantly,Don't know,A tech-specific job board,Somewhat important,Research that advances the state of the art of machine learning,Other,Image data,,100GB,"CNNs,Ensemble Methods,Evolutionary Approaches,GANs,Neural Networks,RNNs","C/C++,Jupyter notebooks,Python,TensorFlow,Other",,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,Often,,,"CNNs,Data Visualization,Ensemble Methods,GANs,Neural Networks,RNNs,SVMs,Time Series Analysis",,,,Most of the time,,,Often,,Sometimes,,Sometimes,,,,,,,,,Often,,,,,Often,,,Sometimes,,Sometimes,,,,0,20,0,0,0,80,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Miner,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,40,30,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,"5,000 to 9,999 employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Markov Logic Networks,Neural Networks,RNNs","Amazon Web services,Cloudera,Flume,Hadoop/Hive/Pig,Java,NoSQL,R,Spark / MLlib,TensorFlow",,Often,,,Often,,Often,,Often,,,,,,Most of the time,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,HMMs,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,RNNs,SVMs",,Often,Most of the time,Most of the time,Often,,,,,,,,Most of the time,Most of the time,,,Most of the time,Most of the time,Often,Often,,,,,Often,,,Often,,,,,,10,30,10,25,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Often,,,,,,,,,Often,,,,Sometimes,Often,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,30,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Programmer,Researcher,Statistician",University courses,50,15,0,25,10,0,Unsupervised Learning,"Bayesian Techniques,Ensemble Methods,Logistic Regression,Neural Networks - GANs",,Academic,100 to 499 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Sometimes,1GB,"Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Regression/Logistic Regression,RNNs","C/C++,MATLAB/Octave,Perl,Python,R",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,23,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",20,0,60,0,20,0,"Computer Vision,Natural Language Processing","Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,Fewer than 10 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,30,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,32,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,"Not employed, but looking for work",,,,,,,,Python,,C/C++/C#,I collect my own data (e.g. web-scraping),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer",Work,10,20,30,30,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,34,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,Other,University courses,3,5,10,80,2,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Increased significantly,Less than one year,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Never,100MB,,"IBM Cognos,IBM SPSS Modeler",,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Time Series Analysis",,,,,,,Often,Rarely,,,,,,,,Rarely,,,,Rarely,Rarely,,,,Rarely,,,,,Sometimes,,,,50,20,20,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others",,,,,Often,Sometimes,,,,,,,,,,,,,,,,,10-25% of projects,More external than internal,Other,,,,,,,,,,,I prefer not to share,,,,,,,,,,,,,,,,,, +Male,Germany,56,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Researcher",Self-taught,25,10,50,0,15,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,89,6,0,0,5,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Technology,"1,000 to 4,999 employees",Stayed the same,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,100GB,"Bayesian Techniques,GANs,Gradient Boosted Machines,Markov Logic Networks,RNNs,SVMs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,SVMs,Text Analytics",,,,,,Often,,Often,Most of the time,,,,,,,Most of the time,,,,,Most of the time,,,,,,,Often,Often,,,,,25,50,10,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,,,,Often,Often,,Most of the time,Most of the time,Often,,,,,,Sometimes,Sometimes,,51-75% of projects,Entirely internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j)","Commercial Data Platform,Share Drive/SharePoint",,"Mercurial,Subversion",,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,45,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,60,20,0,20,0,0,"Machine Translation,Natural Language Processing,Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R,SQL",,,,Often,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,Most of the time,Most of the time,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",Often,Most of the time,Most of the time,Sometimes,,Most of the time,Most of the time,Most of the time,Sometimes,,,Sometimes,,Often,,Most of the time,Often,Most of the time,Most of the time,Most of the time,,Most of the time,Sometimes,Most of the time,,,,Often,Most of the time,Most of the time,,,,35,10,10,35,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,3 to 5 years,"Data Scientist,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",10,60,30,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - GANs",A doctoral degree,Pharmaceutical,"10,000 or more employees",Increased significantly,6-10 years,A tech-specific job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,100GB,"Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks","Cloudera,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,TensorFlow,TIBCO Spotfire",,,,,Often,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,,Most of the time,,,,,"Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics",,,,,,Most of the time,,Most of the time,Most of the time,Most of the time,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,,,,,Often,,,,,50,20,10,10,10,0,Enough to tune the parameters properly,"Difficulties in deployment/scoring,Dirty data,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Most of the time,Most of the time,,,,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,61,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Machine Learning Engineer,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A doctoral degree,CRM/Marketing,,,,,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,Rarely,Rarely,,,,,Most of the time,,,,Sometimes,,Sometimes,,,,,,,,Most of the time,,Often,,,,,,,,,Sometimes,,,Often,Often,,,,,,"Association Rules,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Random Forests,Segmentation,Text Analytics",,Often,,,,,Often,Often,Often,,,Often,,,,Often,,,,,,,Often,,,Often,,,Sometimes,,,,,30,10,10,20,10,20,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,31,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Analyst,Data Miner","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,Reinforcement learning,Bayesian Techniques,A bachelor's degree,Manufacturing,"10,000 or more employees",Decreased significantly,Less than one year,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)","Text data,Relational data",Rarely,10MB,HMMs,"Microsoft Excel Data Mining,Python,R,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Rarely,,Sometimes,,,,,,,,,Most of the time,,,,,Often,,,,,Logistic Regression,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,50,0,0,50,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",Most of the time,Sometimes,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,51-75% of projects,Do not know,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,37,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,44,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,19,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,3 to 5 years,"Engineer,Researcher,Other",Self-taught,90,0,0,10,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Not at all important,Other,Laptop or Workstation and private datacenters,Image data,Never,100GB,Decision Trees,"IBM SPSS Statistics,MATLAB/Octave,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,Rarely,,Most of the time,,,,"Cross-Validation,Data Visualization,Decision Trees,Random Forests,Segmentation,SVMs",,,,,,Sometimes,Often,Often,,,,,,,,,,,,,,,Often,,,Often,,Rarely,,,,,,75,10,0,15,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Jupyter notebooks,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Friends network,Kaggle,Stack Overflow Q&A,YouTube Videos",,Somewhat useful,,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,Internet-based,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Never,1GB,"Bayesian Techniques,Decision Trees,Random Forests","Java,NoSQL,R,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,Often,,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Decision Trees,Neural Networks",Often,,,,,,,Sometimes,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,30,50,20,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Organization is small and cannot afford a data science team,Privacy issues",Sometimes,,,,Often,Often,,,,,,,,,,Sometimes,Sometimes,,,,,,Less than 10% of projects,More internal than external,IT Department,None,Dirty Data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,35000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Male,United States,34,"Not employed, but looking for work",,,,,,,,TensorFlow,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,"Researcher,Other",University courses,50,10,0,35,5,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Conferences,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,Somewhat useful,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,3 to 5 years,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",20,30,10,0,40,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Always,1GB,"Gradient Boosted Machines,Random Forests","Amazon Web services,Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,SQL,Other",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,Rarely,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,"Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,Most of the time,,,,,Often,,Sometimes,,,,,,,,Often,Often,,,Sometimes,,,,Often,,,,50,10,10,20,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,,,,,,,,,,,,Often,Sometimes,,,76-99% of projects,Entirely internal,IT Department,"Google Maps, location data, crawler information","missing data, dirty data","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,55000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,France,36,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,DataRobot,Deep learning,Python,"Government website,University/Non-profit research group websites","College/University,Conferences,Online courses",,,Very useful,,Somewhat useful,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,KDnuggets Blog,O'Reilly Data Newsletter",3-5 years,Necessary,,Necessary,,Necessary,,Necessary,Nice to have,Unnecessary,,,,,Coursera,"Laptop + Cloud service (AWS, Azure, GCE ...)",0 - 1 hour,Master's degree,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Other",University courses,0,0,0,100,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,0,Very Important,Very Important,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,R,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,,University courses,50,10,0,30,0,10,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,40,20,0,10,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Decreased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Image data,Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Flume,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,Sometimes,,Often,,,,,,,,Most of the time,,,,Rarely,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Most of the time,,,Often,Most of the time,,Often,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,,,Often,Most of the time,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,Sometimes,,Often,,,Most of the time,Most of the time,Often,,Most of the time,Most of the time,Often,Sometimes,,Sometimes,Most of the time,Often,,,,40,10,20,5,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Poorly,Self-employed,TensorFlow,Deep learning,R,University/Non-profit research group websites,Kaggle,,,,,,,Very useful,,,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Other,University courses,0,0,0,100,0,0,Survival Analysis,Logistic Regression,A master's degree,Mix of fields,,,,,Very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,,,,"C/C++,Python,R,SQL",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Rarely,,,,,,,,,,"Logistic Regression,Prescriptive Modeling,Segmentation,Simulation",,,,,,,,,,,,,,,,Often,,,,,,Rarely,,,,Sometimes,Sometimes,,,,,,,50,50,0,0,0,0,Enough to explain the algorithm to someone non-technical,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,Less than 10% of projects,Entirely internal,Other,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Most of the time,,,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Hong Kong,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Data Scientist,Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,40,20,0,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",University courses,77,0,5,15,3,0,"Speech Recognition,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,6 to 10 years,Researcher,Self-taught,20,40,20,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,39,Employed full-time,,,Yes,,Other,Fine,Employed by government,R,Neural Nets,R,"Google Search,I collect my own data (e.g. web-scraping)",Online courses,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Government,"1,000 to 4,999 employees",Decreased slightly,Less than one year,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,,10GB,Decision Trees,"Java,Python,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,Often,,,,"A/B Testing,Cross-Validation,Simulation",Sometimes,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,5,80,5,5,5,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,,,,,,,,,,,,,,,,Often,,,Less than 10% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","I don't typically share data,Other",Zabbix,"Git,Subversion",Rarely,"140000,00",BRL,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Brazil,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,University courses,20,25,15,20,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,32,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Deep learning,Python,Google Search,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,,Programmer,University courses,NA,NA,NA,NA,NA,NA,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,39,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,30,10,20,10,30,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,45,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Google Cloud Compute,Text Mining,R,Google Search,"Blogs,College/University,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,56,Employed full-time,,,Yes,,Researcher,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A humanities discipline,I don't write code to analyze data,"Data Analyst,Programmer,Statistician",Self-taught,90,2,8,0,0,0,,,"Some college/university study, no bachelor's degree",Internet-based,Fewer than 10 employees,Decreased significantly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,64,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,"Computer Scientist,Programmer",Work,50,0,50,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data",Most of the time,10MB,"Neural Networks,SVMs","C/C++,Python,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Often,,,,"Neural Networks,SVMs",,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,,,,,10,30,10,20,30,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Statistician,Other",University courses,3,7,0,90,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",High school,Technology,500 to 999 employees,Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,55,0,5,30,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Mix of fields,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Relational data,Most of the time,10TB,"Neural Networks,Random Forests","Amazon Web services,Cloudera,Flume,Jupyter notebooks,Python,R,Spark / MLlib,SQL,TensorFlow,Unix shell / awk",,Often,,,Rarely,,Rarely,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Often,Most of the time,,,,Often,,Most of the time,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,55,20,0,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,36,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Data Analyst,Predictive Modeler,Software Developer/Software Engineer",Self-taught,50,10,40,0,0,0,Outlier detection (e.g. Fraud detection),Decision Trees - Random Forests,High school,Telecommunications,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",,1GB,"Decision Trees,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,15,Employed full-time,,,No,Yes,Other,Perfectly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,Other,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,35,5,0,40,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Technology,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Rarely,100MB,"Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Perl,Python,R,SQL,TensorFlow",,Often,,Most of the time,,,,Rarely,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,,,Most of the time,,,Sometimes,Most of the time,,Most of the time,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Segmentation,Simulation,SVMs",,,,,,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,Most of the time,Most of the time,Most of the time,,,,,,25,25,25,5,20,0,Enough to refine and innovate on the algorithm,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,Most of the time,Rarely,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,University courses,0,10,0,90,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",,A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,41,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Miner,Predictive Modeler,Statistician",University courses,60,0,30,9,1,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Scientist,Researcher,Statistician",University courses,30,10,20,40,0,0,"Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Financial,"1,000 to 4,999 employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,44,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,10,0,60,0,30,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Government,"10,000 or more employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Researcher",University courses,0,30,10,50,10,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,,,,,Important,Build and/or run a machine learning service that operationally improves your product or workflows,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,,Self-taught,70,NA,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",I don't know/not sure,Academic,I don't know,,Don't know,A tech-specific job board,Very important,Other,,,,,,"C/C++,Mathematica,MATLAB/Octave,R,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,0,50,20,30,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,39,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,I prefer not to say,,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Time Series Analysis,R,"University/Non-profit research group websites,Other",Textbook,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,,"Researcher,Statistician",University courses,NA,NA,NA,NA,NA,NA,Reinforcement learning,,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,Less than a year,"Programmer,Researcher,Other",Self-taught,80,20,0,0,0,0,Natural Language Processing,Logistic Regression,A bachelor's degree,Academic,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,Sometimes,100MB,,"Julia,Jupyter notebooks,MATLAB/Octave,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,Rarely,Often,,,,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,70,"Independent contractor, freelancer, or self-employed",,,Yes,,Statistician,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,Data Analyst,Self-taught,99,0,0,0,1,0,Survival Analysis,Logistic Regression,High school,Pharmaceutical,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Other",Always,<1MB,"Regression/Logistic Regression,Other","SAS Base,SAS Enterprise Miner,SAS JMP",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,Rarely,Rarely,,,,,,,,,,,,"Data Visualization,Logistic Regression,Other",,,,,,,Often,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,20,20,0,0,10,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - CNNs",High school,CRM/Marketing,Fewer than 10 employees,Increased slightly,3-5 years,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Neural Networks,RNNs","Amazon Machine Learning,Amazon Web services,Google Cloud Compute,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",Rarely,Often,,,,,,Sometimes,,,,,Rarely,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,,,,,Often,,Most of the time,,,,"A/B Testing,Data Visualization,Natural Language Processing,Neural Networks,Recommender Systems,Text Analytics",Sometimes,,,,,,Most of the time,,,,,,,,,,,,Sometimes,Sometimes,,,,Often,,,,,Often,,,,,40,30,5,20,5,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,0,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,Less than a year,Researcher,University courses,0,0,0,90,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,"1,000 to 4,999 employees",Decreased slightly,6-10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,20,Employed part-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher",Self-taught,65,0,0,35,0,0,"Adversarial Learning,Natural Language Processing,Reinforcement learning","Logistic Regression,Markov Logic Networks,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,I don't know,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,Don't know,1GB,"Markov Logic Networks,Neural Networks,RNNs,SVMs","Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,Prescriptive Modeling,RNNs,SVMs",,,,,,,,,,,,,,,,Often,Sometimes,,Most of the time,Most of the time,,Sometimes,,,Often,,,Sometimes,,,,,,35,25,35,5,0,0,Enough to tune the parameters properly,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Friends network,Kaggle,Online courses",,,,,,Very useful,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,Primary/elementary school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed part-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Unsupervised Learning",Other (please specify; separate by semi-colon),A professional degree,Technology,10 to 19 employees,Decreased significantly,Less than one year,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,100TB,Random Forests,"Amazon Machine Learning,Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Random Forests",Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,15,35,0,25,25,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Programmer,Software Developer/Software Engineer,Other",University courses,5,0,5,80,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Military/Security,20 to 99 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Don't know,,Bayesian Techniques,"Amazon Web services,Java,SQL",,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Natural Language Processing",,,Sometimes,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Scientist,Researcher",University courses,40,5,30,20,5,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A bachelor's degree,Non-profit,100 to 499 employees,Decreased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Relational data",Most of the time,1GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SQL,Tableau,TensorFlow,Unix shell / awk",,,,,,,,Sometimes,Rarely,,,,,,Rarely,,Most of the time,,,,,Rarely,Most of the time,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,Most of the time,Sometimes,,Sometimes,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,20,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,23,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,Less than a year,Other,University courses,0,50,50,0,0,0,,,A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,"Data Scientist,Software Developer/Software Engineer",Self-taught,20,40,40,0,0,0,"Adversarial Learning,Natural Language Processing,Recommendation Engines","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Decreased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,University courses,20,0,0,60,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,14,I prefer not to say,No,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer",Self-taught,100,0,0,0,0,0,Recommendation Engines,Decision Trees - Random Forests,A doctoral degree,Mix of fields,I don't know,,,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,6 to 10 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Technology,,,,,Not very important,Other,Traditional Workstation,"Text data,Relational data",,<1MB,"Decision Trees,Regression/Logistic Regression,Other","Microsoft Azure Machine Learning,Microsoft Excel Data Mining,SQL,Other",,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,"Data Visualization,Decision Trees",,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Julia,Regression,R,GitHub,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",Kaggle competitions,60,20,0,0,20,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A bachelor's degree,Insurance,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1TB,"Decision Trees,Neural Networks,Regression/Logistic Regression,RNNs","IBM SPSS Statistics,Python,R,SAS Base,SQL",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,Most of the time,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Recommender Systems,Text Analytics",,,,,,Sometimes,Most of the time,,,,,,,Sometimes,,Often,,,,,Sometimes,,,Most of the time,,,,,Sometimes,,,,,80,10,0,0,10,0,Enough to explain the algorithm to someone non-technical,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,51-75% of projects,Approximately half internal and half external,Standalone Team,,,Document-oriented (e.g. MongoDB/Elasticsearch),Company Developed Platform,,Git,Sometimes,,,Has decreased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Finland,23,Employed full-time,,,Yes,,Operations Research Practitioner,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Operations Research Practitioner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,0,20,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,24,Employed full-time,,,Yes,,Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by non-profit or NGO",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Data Miner,Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",45,15,30,10,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Mix of fields,20 to 99 employees,Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Researcher,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A health science,1 to 2 years,"Operations Research Practitioner,Researcher",Self-taught,70,25,0,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,25,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,1 to 2 years,Engineer,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,39,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Other,Python,Other,"Blogs,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,Very useful,Very useful,,,Very useful,Very useful,Very useful,,Somewhat useful,Very useful,,,Very useful,"No Free Hunch Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",15+ years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Unnecessary,Nice to have,Necessary,Necessary,Necessary,,"Coursera,Udacity","Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation",2 - 10 hours,Experience from work in a company related to ML,Yes,Master's degree,Electrical Engineering,,"Engineer,Researcher,Software Developer/Software Engineer,Other",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,6 to 10 years,Other,Self-taught,50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Financial,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Mathematica,R",,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,Often,,Often,,Most of the time,,,,,Most of the time,,Often,,,,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,41,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",Google Cloud Compute,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Conferences,Friends network,Kaggle,Stack Overflow Q&A",Very useful,,,,Very useful,Very useful,Somewhat useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Software Developer/Software Engineer",Kaggle competitions,30,10,50,0,10,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,10 to 19 employees,Increased slightly,3-5 years,Some other way,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests","Amazon Machine Learning,Python,R,SQL",Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Neural Networks,PCA and Dimensionality Reduction,Random Forests",,,Rarely,Often,,Rarely,Most of the time,Rarely,,,,,,,,,,,,Often,Rarely,,Rarely,,,,,,,,,,,30,10,10,30,20,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,Rarely,,,Often,,,,Often,,Often,Sometimes,,,,Sometimes,,,,,,,51-75% of projects,More internal than external,Other,Landsat and other open satellite data,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other",Commercial Data Platform,,Git,Rarely,,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Other,Other,0,45,50,5,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Insurance,"1,000 to 4,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,30,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,"Data Analyst,Data Miner,Researcher","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Hospitality/Entertainment/Sports,100 to 499 employees,Increased slightly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Cloudera,Hadoop/Hive/Pig,Impala,Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL,Tableau",,Most of the time,,,Sometimes,,,,Sometimes,,,,,Sometimes,,,Most of the time,,,,,,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Random Forests,Segmentation,SVMs",Sometimes,,,,,Often,Most of the time,Sometimes,Sometimes,,,Sometimes,,Sometimes,,Sometimes,,,,,,,Sometimes,,,Often,,Sometimes,,,,,,40,25,10,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input",Sometimes,,,,Most of the time,,,Sometimes,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,More than 10 years,"Business Analyst,Engineer",Self-taught,80,5,10,5,0,0,"Natural Language Processing,Time Series","Bayesian Techniques,Decision Trees - Random Forests",A master's degree,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,100MB,Decision Trees,"IBM SPSS Statistics,Microsoft Excel Data Mining,R,SQL,Other",,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,"Decision Trees,Simulation,Time Series Analysis",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,20,40,0,40,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools",Sometimes,,,,Often,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Predictive Modeler",Work,40,0,40,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Software Developer/Software Engineer,Other,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,University courses,10,40,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,I haven't started working yet,University courses,10,20,0,50,20,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",60,40,0,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,Fewer than 10 employees,Decreased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,100MB,"Decision Trees,Random Forests,SVMs","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,Other,Self-taught,40,40,0,0,20,0,"Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Random Forests,I prefer not to answer,Other,"10,000 or more employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Other,,100GB,"Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,Less than a year,"Engineer,Programmer,Researcher",Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,"Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,1GB,Regression/Logistic Regression,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,20,10,10,50,10,0,Enough to tune the parameters properly,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,,University courses,20,0,10,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Data Analyst,Other",Self-taught,60,20,10,10,0,0,Time Series,"Bayesian Techniques,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,6 to 10 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,0,30,30,40,0,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,6 to 10 years,"Data Analyst,Data Scientist,Researcher,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Other",Work,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,21,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,32,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,1 to 2 years,Other,Self-taught,100,0,0,0,0,0,Survival Analysis,Bayesian Techniques,I prefer not to answer,Financial,,,,,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,100MB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Bayesian Techniques,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst",Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",30,20,20,20,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Academic,I don't know,,,Some other way,Important,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,"Information technology, networking, or system administration",Less than a year,"Machine Learning Engineer,I haven't started working yet",University courses,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,I prefer not to answer,Academic,100 to 499 employees,Stayed the same,Don't know,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Never,1GB,"Bayesian Techniques,Decision Trees,Random Forests,SVMs",C/C++,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Random Forests,SVMs",,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,Sometimes,,,,,,80,10,0,10,0,0,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations of tools,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,Sometimes,,,,Sometimes,,,,,,,Sometimes,,,26-50% of projects,Do not know,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,,Never,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,31,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,40,0,0,60,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Academic,Fewer than 10 employees,Decreased significantly,3-5 years,A tech-specific job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Sometimes,1MB,"Bayesian Techniques,SVMs,Other","C/C++,Other",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,"Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,SVMs,Time Series Analysis",,,,,,,,,Often,,,,,Sometimes,,,,Most of the time,Often,,,,,,,,,Often,,Sometimes,,,,10,30,35,0,25,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Data Miner,Predictive Modeler,Statistician,Other",Self-taught,15,5,30,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - GANs","Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Image data,Text data,Relational data",Rarely,10GB,"Bayesian Techniques,Decision Trees,GANs,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Microsoft Excel Data Mining,SAP BusinessObjects Predictive Analytics,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,34,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by professional services/consulting firm,Hadoop/Hive/Pig,Decision Trees,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Conferences,Kaggle,Podcasts",,,,,Very useful,,Very useful,,,,,,Somewhat useful,,,,,,"Becoming a Data Scientist Podcast,DataTau News Aggregator",< 1 year,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Professional degree,,Less than a year,"DBA/Database Engineer,Software Developer/Software Engineer,Other",University courses,20,70,10,0,0,0,Recommendation Engines,Logistic Regression,"Some college/university study, no bachelor's degree",Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed part-time,,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Statistician,Self-taught,20,30,0,0,50,0,Survival Analysis,Decision Trees - Gradient Boosted Machines,Primary/elementary school,Financial,"10,000 or more employees",Stayed the same,3-5 years,Some other way,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Sometimes,100GB,"Decision Trees,Regression/Logistic Regression","R,SAS Base,SAS Enterprise Miner,SQL,TIBCO Spotfire",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,Most of the time,Rarely,,,Most of the time,,,,,Sometimes,,,,,"Decision Trees,Segmentation",,,,,,,,Rarely,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,A professional degree,Academic,I don't know,Increased slightly,1-2 years,Some other way,Not very important,Other,"Basic laptop (Macbook),Workstation + Cloud service",Text data,,,,"Jupyter notebooks,Mathematica,MATLAB/Octave,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,Sometimes,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,37,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,MATLAB/Octave,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub","Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Trade book,YouTube Videos",,,,,,,Very useful,,,,Very useful,Somewhat useful,,Somewhat useful,,Very useful,,Very useful,No Free Hunch Blog,1-2 years,Necessary,Unnecessary,Necessary,Unnecessary,Necessary,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,,,,"Coursera,DataCamp,edX,Udacity",Traditional Workstation,2 - 10 hours,Kaggle Competitions,No,Professional degree,,3 to 5 years,"DBA/Database Engineer,Software Developer/Software Engineer,Statistician",Self-taught,50,5,0,0,45,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Software Developer/Software Engineer,Other",Work,30,10,40,0,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Internet-based,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Relational data,Other",Sometimes,1PB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,Other","Google Cloud Compute,Jupyter notebooks,Python,SQL,Tableau,TensorFlow,Unix shell / awk,Other",,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Rarely,Sometimes,,Often,Most of the time,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",Often,,,,,Most of the time,Most of the time,Often,Often,,,,,,,Most of the time,,,Sometimes,Sometimes,Sometimes,Often,Often,,,,,,,Most of the time,,,,40,10,0,10,10,30,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Unavailability of/difficult access to data",Often,Sometimes,Often,,Most of the time,,,Often,Often,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,53,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,More than 10 years,Software Developer/Software Engineer,Self-taught,80,0,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,20,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",7,60,10,3,20,NA,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs,Markov Logic Networks",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,Data Analyst,University courses,15,15,0,50,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Manufacturing,500 to 999 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS JMP,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,Most of the time,,Sometimes,,,,,,,,,,"Association Rules,Data Visualization,kNN and Other Clustering,Logistic Regression,Simulation,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,,,Sometimes,,Often,,,,,,,,,,,Sometimes,,,Sometimes,,,,60,15,5,10,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Explaining data science to others,Need to coordinate with IT",Sometimes,,Sometimes,,,Rarely,,,,,,,,,Often,,,,,,,,76-99% of projects,More internal than external,Other,,very splintered,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Email,Share Drive/SharePoint",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,76000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Canada,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Spark / MLlib,Deep learning,R,Google Search,"Blogs,College/University,Conferences,Kaggle,Personal Projects,Stack Overflow Q&A",,Very useful,Very useful,,Somewhat useful,,Very useful,,,,,Very useful,,Very useful,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,,,University courses,NA,NA,NA,NA,NA,NA,Time Series,Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,42,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Financial,"5,000 to 9,999 employees",Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,1TB,Regression/Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,No,Yes,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Time Series,Logistic Regression,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,28,Employed part-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,49,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,6 to 10 years,"Data Analyst,Data Miner,Predictive Modeler",University courses,80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A master's degree,Financial,"10,000 or more employees",Increased slightly,More than 10 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,Regression/Logistic Regression,"Hadoop/Hive/Pig,Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,34,Employed full-time,,,No,Yes,Researcher,Fine,Employed by company that makes advanced analytic software,Hadoop/Hive/Pig,Association Rules,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,Less than a year,"Data Scientist,Engineer,Predictive Modeler,Researcher,Statistician",University courses,20,0,70,10,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Python,Bayesian Methods,Python,Google Search,"College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Not Useful,,Somewhat useful,Very useful,Very useful,,,Somewhat useful,Very useful,Very useful,,Very useful,,,,Somewhat useful,"KDnuggets Blog,O'Reilly Data Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,"Engineer,Researcher",Self-taught,40,30,20,10,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Technology,"5,000 to 9,999 employees",Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,,10GB,"Decision Trees,HMMs,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python",,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,Sometimes,Sometimes,,,,,,Rarely,Often,,,,,,Most of the time,,,Sometimes,,,,30,30,0,20,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Need to coordinate with IT",,,,,,,,,Often,,,,,,Sometimes,,,,,,,,100% of projects,Entirely internal,Other,,Poorly structured; data gathering was not design for the objetives,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other",Company Developed Platform,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,28000,EUR,Other,7,,,,,,,,,,,,,,,,,, +Male,Taiwan,46,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,29,Employed part-time,,,No,Yes,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,50,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,21,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,I don't plan on learning a new ML/DS method,Other,"I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Official documentation,Stack Overflow Q&A,Textbook",,,Very useful,,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,,,,,< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Career fair or on-campus recruiting event,0,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Not important,Somewhat important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important +Male,India,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,53,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Engineer",Self-taught,50,0,0,50,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,"Some college/university study, no bachelor's degree",Other,100 to 499 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,Regression/Logistic Regression,"C/C++,Microsoft Excel Data Mining,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Association Rules,Data Visualization,Logistic Regression",,Often,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,30,30,10,20,10,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,Sometimes,,,Often,,,,,,,,,,,Most of the time,,,,Often,,,26-50% of projects,More internal than external,IT Department,,Primary sources do not provide well normalized data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,85000,,Has stayed about the same (has not increased or decreased more than 5%),,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,Spark / MLlib,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Engineer,University courses,20,0,10,50,20,0,,,I prefer not to answer,Retail,100 to 499 employees,Increased slightly,6-10 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Most of the time,1GB,Other,"Amazon Web services,Java,Python,R,SQL",,Most of the time,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Naive Bayes,Natural Language Processing",Often,,,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,,,40,30,30,0,0,0,Enough to tune the parameters properly,"Dirty data,Limitations in the state of the art in machine learning",,,,,Often,,,,,,,Often,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,,Other,S3,Git,Sometimes,,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Computer Vision,Decision Trees - Random Forests,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,25,"Not employed, but looking for work",,,,,,,,SQL,Text Mining,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A humanities discipline,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,23,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",University courses,30,5,20,45,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Hospitality/Entertainment/Sports,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,Regression/Logistic Regression,"QlikView,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Natural Language Processing,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,70,10,0,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Text data,Most of the time,100GB,"Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,IBM Watson / Waton Analytics,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Rarely,Most of the time,,,,,,,Often,,,,Rarely,,Often,,Often,,,,Sometimes,,,,,,Often,,,,Often,,Sometimes,,,,,,,,Sometimes,Most of the time,,,Sometimes,Sometimes,,Often,,,,"CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",,,,Rarely,Often,Most of the time,Most of the time,Sometimes,Sometimes,,,,,Often,,Often,,,Often,Often,Sometimes,,Sometimes,Often,Rarely,,,Sometimes,Often,Often,,,,30,20,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,40,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Researcher,Software Developer/Software Engineer",Work,50,0,50,0,0,0,Natural Language Processing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Government,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Video data,Text data,Other",Rarely,1GB,"Random Forests,Regression/Logistic Regression,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Physics,3 to 5 years,"Data Scientist,Researcher","Online courses (coursera, udemy, edx, etc.)",20,30,20,10,20,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Always,100GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression","Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Gradient Boosted Machines,Neural Networks",,,,Most of the time,,Most of the time,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,60,20,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly",Dirty data,,,,,Most of the time,,,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git,Mercurial",Sometimes,"400,000",MXN,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,"Data Analyst,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,R,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,,University courses,0,0,0,90,10,0,,,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Programmer,Researcher,Software Developer/Software Engineer",University courses,10,0,10,80,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,"1,000 to 4,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Relational data,,100GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,HMMs,Markov Logic Networks,Neural Networks,Random Forests,RNNs,SVMs","Amazon Web services,DataRobot,Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,48,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,60,"Not employed, but looking for work",,,,,,,,TensorFlow,Social Network Analysis,F#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Scientist,Engineer,Machine Learning Engineer,Researcher,Statistician",Kaggle competitions,40,20,30,0,10,0,"Computer Vision,Recommendation Engines,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,50,30,10,0,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,Spark / MLlib,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that performs advanced analytics,Python,Deep learning,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Textbook",,Very useful,,,,,Very useful,,,,,,,,Very useful,,,,"FastML Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),The Analytics Dispatch Newsletter",3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Master's degree,Sort of (Explain more),Master's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Miner,DBA/Database Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,20,0,0,60,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,3 to 5 years,"Software Developer/Software Engineer,Statistician",University courses,30,10,30,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Insurance,"10,000 or more employees",Increased significantly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,57,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,6 to 10 years,"Engineer,Programmer",Other,0,0,0,10,0,90,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Don't know,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Regression,,University/Non-profit research group websites,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A master's degree,Pharmaceutical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,22,Employed part-time,,,No,Yes,Researcher,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,I don't write code to analyze data,"Other,I haven't started working yet",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,51,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,10,20,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,24,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,Hadoop/Hive/Pig,Cluster Analysis,R,Google Search,"Personal Projects,Other",,,,,,,,,,,,Very useful,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,Other,University courses,80,0,20,0,0,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and local IT supported servers",Relational data,Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods","Amazon Web services,C/C++,Java,NoSQL,Python,R,SQL,Unix shell / awk",,Sometimes,,Often,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Most of the time,,,,,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Ensemble Methods,Naive Bayes,Segmentation",Most of the time,,Sometimes,,,Sometimes,,,Often,,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,30,40,20,2,8,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Organization is small and cannot afford a data science team,Other",Often,,,,,,,,,,,,,,,Often,,,,,,Most of the time,Less than 10% of projects,Approximately half internal and half external,IT Department,,slow processing,"Column-oriented relational (e.g. KDB/MariaDB),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,,USD,I do not want to share information about my salary/compensation,5,,,,,,,,,,,,,,,,,, +Male,Argentina,29,Employed full-time,,,No,Yes,Other,Poorly,Employed by government,Python,Support Vector Machines (SVM),Python,University/Non-profit research group websites,"College/University,Textbook,Trade book,YouTube Videos",,,Not Useful,,,,,,,,,,,,Somewhat useful,Very useful,,Very useful,KDnuggets Blog,< 1 year,,,,,,,,,,,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,I did not complete any formal education past high school,,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",25,25,0,25,25,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,35,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,10,10,5,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Programmer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Programmer,Researcher",University courses,80,10,0,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,51,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,45,5,0,0,0,,,A doctoral degree,Technology,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Important,,Basic laptop (Macbook),Relational data,Never,100GB,,"Amazon Web services,Java,NoSQL,Python",,Most of the time,,,,,,,,,,,,,Rarely,,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,A/B Testing,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,10,0,10,80,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,35,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,Software Developer/Software Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",United States,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",30,40,0,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A professional degree,Technology,100 to 499 employees,Increased significantly,6-10 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Rarely,100TB,CNNs,"Jupyter notebooks,Python,TensorFlow",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Data Visualization",,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,20,20,0,20,0,"Recommendation Engines,Time Series","Ensemble Methods,Logistic Regression,Neural Networks - GANs",A professional degree,Other,Fewer than 10 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Don't know,10GB,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,35,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Programmer,Researcher",University courses,0,30,0,70,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,45,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",25,25,0,0,0,50,Time Series,Logistic Regression,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,3 to 5 years,Researcher,Work,0,0,80,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A doctoral degree,Mix of fields,Fewer than 10 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,21,Employed part-time,,,No,Yes,Engineer,,Employed by college or university,Python,Regression,Python,GitHub,"Blogs,College/University,Online courses,Personal Projects,Textbook,YouTube Videos",,Somewhat useful,Somewhat useful,,,,,,,,Very useful,Very useful,,,Very useful,,,Very useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,"DataCamp,Udacity",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,Self-taught,30,50,0,20,0,0,Recommendation Engines,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important +A different identity,United States,25,Employed part-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,6 to 10 years,,Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Video data,Text data,Relational data",Most of the time,1TB,"CNNs,Neural Networks,SVMs","C/C++,Julia,Jupyter notebooks,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Rarely,,,,,,,,,,,,Sometimes,Often,,,Often,Often,Sometimes,,,,,Often,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,Most of the time,,Most of the time,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,,,,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,Most of the time,,,,50,50,0,0,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,,10-25% of projects,Entirely internal,Other,,,Document-oriented (e.g. MongoDB/Elasticsearch),Other,NAS,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Female,United States,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,3 to 5 years,"Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,70,Retired,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Miner,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,,,Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Data Miner,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Orange,Neural Nets,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",25,75,0,0,0,0,Natural Language Processing,Logistic Regression,A bachelor's degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",University courses,20,20,15,30,15,0,,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased significantly,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,36,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,,University courses,50,20,30,0,0,0,"Adversarial Learning,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,500 to 999 employees,Increased significantly,1-2 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Sometimes,100MB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,C/C++,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,TensorFlow",,Rarely,,Sometimes,,,,,,,,,,,Often,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,RNNs,SVMs",,,,Most of the time,,Most of the time,Most of the time,,Rarely,,,,,,,Often,,,Most of the time,Most of the time,,,,,Rarely,,,Sometimes,,,,,,10,80,2,8,0,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Poorly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,,,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,20,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"Data Analyst,Machine Learning Engineer",Self-taught,50,10,30,5,5,0,"Computer Vision,Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",High school,Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Video data,Relational data",Most of the time,10TB,"CNNs,Neural Networks,RNNs",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"CNNs,RNNs,Time Series Analysis",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,Researcher,Work,10,0,85,5,0,0,"Survival Analysis,Time Series",Logistic Regression,A master's degree,Mix of fields,100 to 499 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,1 to 2 years,Data Analyst,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",40,10,20,10,20,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,Employed part-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,Spark / MLlib,Link Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,I collect my own data (e.g. web-scraping)","College/University,Kaggle,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,Very useful,,,,Somewhat useful,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,35,15,25,15,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,500 to 999 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data,Relational data",Most of the time,1GB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,TensorFlow",,,,,Rarely,,,Often,Sometimes,,,,,,,,Most of the time,,Sometimes,,Most of the time,,,,,,Most of the time,,,,Most of the time,,Sometimes,,,,,,,,Often,,,,,Most of the time,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,SVMs,Text Analytics",,Sometimes,,Often,,Most of the time,Often,,Often,,,,,Often,,Often,,Most of the time,Often,Most of the time,Most of the time,,,,Often,,,Most of the time,Often,,,,,30,20,15,10,25,0,Enough to refine and innovate on the algorithm,"Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,Sometimes,,,Sometimes,,,,,,,,Sometimes,,26-50% of projects,Approximately half internal and half external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Bitbucket,Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,"Business Analyst,Researcher",Work,20,10,70,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,"Some college/university study, no bachelor's degree",Retail,100 to 499 employees,Increased significantly,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Always,1GB,Random Forests,"NoSQL,Python,R,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Rarely,,Most of the time,,,,,,,,,Most of the time,,,,,,Often,,,,"A/B Testing,Data Visualization,Random Forests",Often,,,,,,Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Researcher,Software Developer/Software Engineer",Work,45,0,25,30,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Support Vector Machines (SVMs)",High school,Telecommunications,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,23,Employed part-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Computer Scientist,University courses,0,10,30,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,"Business Analyst,Computer Scientist,Statistician",Self-taught,50,25,25,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Computer Scientist,Engineer,Machine Learning Engineer,Software Developer/Software Engineer",University courses,5,20,50,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A professional degree,Technology,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Rarely,1MB,"Bayesian Techniques,CNNs,Neural Networks","Amazon Web services,C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,"Bayesian Techniques,CNNs,Cross-Validation,GANs,HMMs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,Simulation",,,Rarely,Most of the time,,Rarely,,,,,Sometimes,,Rarely,Rarely,,,,,,Most of the time,Sometimes,,,,Sometimes,Most of the time,Often,,,,,,,40,40,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,,,,Often,,,,,,,,,Often,,Less than 10% of projects,Do not know,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,49,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,More than 10 years,Programmer,Self-taught,10,0,60,10,20,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Survival Analysis","Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - RNNs",,Manufacturing,100 to 499 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,No,Yes,Machine Learning Engineer,Perfectly,Employed by professional services/consulting firm,Amazon Machine Learning,,Python,GitHub,Blogs,,Not Useful,,,,,,,,,,,,,,,,,Becoming a Data Scientist Podcast,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,I never declared a major,Less than a year,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",25,25,25,25,0,0,Time Series,Neural Networks - RNNs,I prefer not to answer,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,3 to 5 years,"Data Analyst,Researcher",Work,0,10,50,40,0,0,"Recommendation Engines,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,I prefer not to answer,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,,,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SAS Base,SAS Enterprise Miner,SQL,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Data Visualization,Lift Analysis,Logistic Regression,Prescriptive Modeling,Segmentation,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,1 to 2 years,"Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Not very important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,"Neural Networks,Regression/Logistic Regression,SVMs","C/C++,MATLAB/Octave,Microsoft Excel Data Mining,Python,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,,,,"Data Visualization,Neural Networks,Simulation",,,,,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,30,30,20,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,,,,,Most of the time,Sometimes,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,NoSQL,Monte Carlo Methods,Python,"Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Friends network,Stack Overflow Q&A",,,Very useful,,,Somewhat useful,,,,,,,,Very useful,,,,,,< 1 year,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Github Portfolio,Sort of (Explain more),Bachelor's degree,A social science,Less than a year,"Data Analyst,Researcher",University courses,10,0,40,50,0,0,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,19,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,43,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",19,80,0,0,1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,18,Employed full-time,,,No,Yes,Other,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher,Statistician",Self-taught,80,0,10,0,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,100 to 499 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data",Always,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,"Software Developer/Software Engineer,Other",Self-taught,80,0,20,0,0,0,,,A bachelor's degree,Technology,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,,"Online courses (coursera, udemy, edx, etc.)",30,30,0,0,40,0,"Computer Vision,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Insurance,"1,000 to 4,999 employees",,Less than one year,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Statistician","Online courses (coursera, udemy, edx, etc.)",25,25,0,48,2,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Technology,10 to 19 employees,Increased slightly,3-5 years,A general-purpose job board,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Text data,Relational data",Most of the time,10GB,"CNNs,Neural Networks,Regression/Logistic Regression","Cloudera,IBM SPSS Statistics,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SQL,Unix shell / awk",,,,,Rarely,,,,,,,Rarely,,,,,Most of the time,,,,Rarely,,,,,,Rarely,,,,Most of the time,,Rarely,,,,,,,,,Rarely,,,,,,Sometimes,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",Rarely,,,Rarely,Most of the time,Sometimes,Sometimes,,,,,,,,,,,,Often,Most of the time,Often,,,Most of the time,,,,,,,,,,60,10,15,10,5,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,20,10,10,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Technology,I don't know,Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Sometimes,100GB,,Java,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,38,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by college or university,IBM Watson / Waton Analytics,Bayesian Methods,Matlab,Google Search,Personal Projects,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,More than 10 years,Operations Research Practitioner,Self-taught,80,0,20,0,0,0,"Outlier detection (e.g. Fraud detection),Other (please specify; separate by semi-colon)",Other (please specify; separate by semi-colon),Primary/elementary school,Academic,100 to 499 employees,Increased slightly,Don't know,Some other way,Important,Other,Laptop or Workstation and private datacenters,Other,Sometimes,,Regression/Logistic Regression,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,0,100,0,0,0,0,Enough to refine and innovate on the algorithm,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Other,,the music score of traditional and folk music have to be converted to midi format to be processed by computer. I'm working on that front too.,Other,Other,,Other,Sometimes,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Italy,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Software Developer/Software Engineer,Statistician",Work,0,0,90,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,27,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Time Series Analysis,R,Other,"Blogs,Kaggle,Textbook,YouTube Videos,Other",,Very useful,,,,,Somewhat useful,,,,,,,,Somewhat useful,,,Very useful,"Data Machina Newsletter,R Bloggers Blog Aggregator,The Data Skeptic Podcast",< 1 year,Necessary,Necessary,Necessary,,Necessary,Necessary,Necessary,,,Necessary,,,,,Other,2 - 10 hours,Experience from work in a company related to ML,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Other,5,5,40,20,20,10,"Adversarial Learning,Machine Translation,Natural Language Processing,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,11-15,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,49,Employed full-time,,,Yes,,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",Other,6,Retired,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),6 to 10 years,Engineer,Self-taught,20,20,50,10,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Important,Analyze and understand data to influence product or business decisions,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Other","Image data,Text data,Relational data",Most of the time,10TB,"CNNs,Decision Trees,Ensemble Methods,Random Forests","Amazon Web services,Google Cloud Compute,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,44,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",20,30,10,0,40,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Other,500 to 999 employees,Stayed the same,1-2 years,Some other way,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Text data,Relational data",Don't know,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Microsoft Azure Machine Learning,Python,R,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,,Often,,,Often,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Often,,,Often,Often,Often,,Most of the time,,,,,,Often,Often,,,,35,15,10,25,15,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,45,Employed full-time,,,Yes,,DBA/Database Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,20,30,0,50,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,Technology,100 to 499 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,38,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,Time Series,Evolutionary Approaches,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,,"Computer Scientist,Data Analyst,Data Miner,Data Scientist,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Python,Google Search,"Kaggle,Stack Overflow Q&A",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",40,40,20,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",,Retail,500 to 999 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,,10MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",5,60,0,0,35,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,30,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,58,"Independent contractor, freelancer, or self-employed",,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"DBA/Database Engineer,Other",University courses,50,0,0,50,0,0,,"Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Financial,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,"Text data,Relational data",Always,100TB,,"C/C++,Cloudera,Flume,Hadoop/Hive/Pig,Impala,NoSQL,Perl,Python,SQL,Unix shell / awk",,,,Sometimes,Most of the time,,Sometimes,,Often,,,,,Often,,,,,,,,,,,,,Rarely,,,Often,Rarely,,,,,,,,,,,Often,,,,,,Most of the time,,,,"Data Visualization,Decision Trees,Simulation,Time Series Analysis",,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Often,,,Sometimes,,,,30,35,20,10,5,0,Enough to tune the parameters properly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Researcher,Other",Self-taught,70,0,0,20,10,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Academic,Fewer than 10 employees,Increased significantly,3-5 years,Some other way,Somewhat important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Text data,Relational data",Never,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Jupyter notebooks,Python,SQL,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Text Analytics,Time Series Analysis",,,Sometimes,Most of the time,,Most of the time,Most of the time,Most of the time,Often,,,,Rarely,Sometimes,,Often,,Rarely,Often,Most of the time,Sometimes,,Most of the time,,Sometimes,,Sometimes,Rarely,Sometimes,Sometimes,,,,50,30,0,15,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,,,,,"Arxiv,Blogs,Personal Projects",Somewhat useful,Very useful,,,,,,,,,,Very useful,,,,,,,"DataTau News Aggregator,KDnuggets Blog,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),,,Sort of (Explain more),Doctoral degree,Physics,I don't write code to analyze data,"Predictive Modeler,Researcher","Online courses (coursera, udemy, edx, etc.)",40,50,0,0,10,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,6 to 10 years,"Programmer,Researcher",University courses,50,5,10,35,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Always,,,"Amazon Web services,C/C++,Python,TensorFlow",,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,20,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",University courses,25,0,25,50,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression",A doctoral degree,Academic,,Stayed the same,,,Very important,,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers,Traditional Workstation","Text data,Other",Sometimes,100TB,"Decision Trees,Ensemble Methods,HMMs,Random Forests,Regression/Logistic Regression","R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,Often,,,,"Cross-Validation,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Simulation,Text Analytics",,,,,,Often,,Often,Often,,,,,Often,,Often,,,Sometimes,,Most of the time,,Most of the time,Rarely,,Sometimes,Sometimes,,Often,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,34,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A humanities discipline,3 to 5 years,"Data Analyst,Other",Work,35,5,60,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,NA,Employed part-time,,,Yes,,Researcher,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,"Data Analyst,Engineer,Programmer,Researcher",Self-taught,60,10,10,10,10,0,"Computer Vision,Reinforcement learning,Time Series",,,Academic,,,,,,,,,,,,"C/C++,MATLAB/Octave",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Segmentation,Simulation,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Researcher,Software Developer/Software Engineer",University courses,30,20,0,50,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Hidden Markov Models HMMs,Neural Networks - CNNs,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Technology,100 to 499 employees,Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Always,100MB,,"C/C++,Java,Jupyter notebooks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Data Analyst,Statistician",University courses,0,30,10,60,0,0,,"Decision Trees - Random Forests,Gradient Boosting,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Stayed the same,Don't know,A general-purpose job board,Very important,Other,Laptop or Workstation and private datacenters,Relational data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,38,Employed full-time,,,Yes,,Other,Poorly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,20,60,0,20,0,0,,,,,,,,,,,,,,,,"Python,QlikView,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,45,Employed part-time,,,Yes,,Data Miner,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Computer Scientist,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,42,Employed full-time,,,No,Yes,Business Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Researcher,Self-taught,30,70,0,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Other,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,33,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,50,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Internet-based,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,Jupyter notebooks,,Python,,"Arxiv,Blogs,College/University,Textbook",Very useful,Very useful,Very useful,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,6 to 10 years,"Computer Scientist,Data Scientist",University courses,35,0,25,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - RNNs",A master's degree,Academic,"1,000 to 4,999 employees",Increased slightly,3-5 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,RNNs","Java,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Neural Networks,RNNs,Time Series Analysis",,,,,,,,Often,Often,,,Often,,Sometimes,,,,,,Often,,,,,Sometimes,,,,,Most of the time,,,,70,15,5,5,5,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of significant domain expert input",,,,,,,,,Often,,Often,,,,,,,,,,,,Less than 10% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Sometimes,60000,EUR,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,University courses,50,0,40,0,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",High school,Internet-based,500 to 999 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,C/C++,Python,Unix shell / awk",,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Naive Bayes,Natural Language Processing,Recommender Systems",Sometimes,Sometimes,Often,,Often,,,,,,,,,,,,,Sometimes,Often,,,,,Often,,,,,,,,,,60,20,20,0,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,3 to 5 years,"Business Analyst,Programmer",University courses,20,0,10,70,0,0,"Natural Language Processing,Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,Fewer than 10 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Rarely,10MB,"Neural Networks,SVMs","Java,R",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems",,,,,,,,,,,,,,Sometimes,,,,,Often,Sometimes,Sometimes,,,Often,,,,,,,,,,40,10,40,0,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer",University courses,49,1,25,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Technology,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Workstation + Cloud service",Image data,Sometimes,100GB,"Bayesian Techniques,CNNs,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,28,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,1 to 2 years,,Self-taught,65,0,35,0,0,0,"Computer Vision,Reinforcement learning","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Mix of fields,500 to 999 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",Image data,Sometimes,10MB,CNNs,"C/C++,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Segmentation",,,,Often,,Most of the time,Often,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,5,85,0,10,0,0,Enough to refine and innovate on the algorithm,"The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Analyst,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",20,60,NA,10,10,NA,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs",I prefer not to answer,Technology,10 to 19 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,,"Amazon Web services,Python,R",,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,Naive Bayes,Recommender Systems",,,,,,,Often,Often,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,29,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Data Analyst,University courses,20,0,0,70,10,0,Time Series,Decision Trees - Random Forests,A bachelor's degree,Technology,Fewer than 10 employees,Increased slightly,More than 10 years,Some other way,Very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Don't know,,,"MATLAB/Octave,Python,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Kenya,29,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,38,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,20,Employed full-time,,,No,Yes,Data Analyst,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Data Analyst",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,3 to 5 years,"Software Developer/Software Engineer,Other",Work,5,0,20,75,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods",A bachelor's degree,Other,100 to 499 employees,Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Sometimes,100GB,Regression/Logistic Regression,"Microsoft SQL Server Data Mining,Python,SQL,Other",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,"kNN and Other Clustering,Segmentation,Text Analytics",,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,5,1,1,0,3,90,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues",,Sometimes,,,Sometimes,,,,Sometimes,,,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",40,20,10,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,"10,000 or more employees",Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Excel Data Mining,Minitab,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SAS JMP,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,Most of the time,,,Sometimes,Often,,,,Most of the time,,Most of the time,,,,Rarely,,,Often,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,33,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,51,Employed full-time,,,Yes,,Statistician,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,Statistician,"Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Academic,500 to 999 employees,Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,57,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",0,50,40,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",,Telecommunications,,,,,Important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,"Image data,Video data,Relational data",Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python",,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Neural Networks,Simulation,Time Series Analysis",,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,Rarely,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Software Developer/Software Engineer,Other",Other,50,0,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,26,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Other,University courses,20,15,10,50,5,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Financial,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,Data Scientist,University courses,0,20,30,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",Primary/elementary school,Technology,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Text data,Relational data",Rarely,10GB,Decision Trees,"IBM Cognos,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,,Sometimes,,,Rarely,,,,Often,,,,,,,,,,Rarely,,,,Often,,,,,,,,,,Rarely,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Natural Language Processing",,,,,,,Often,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,70,10,10,5,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,22,Employed part-time,,,No,Yes,Engineer,,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",0,85,0,10,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,32,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,R,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping)","Blogs,Kaggle,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,,Somewhat useful,The Data Skeptic Podcast,3-5 years,Nice to have,Necessary,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,,,,"Coursera,Other",Traditional Workstation,2 - 10 hours,Experience from work in a company related to ML,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",30,30,25,0,15,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Markov Logic Networks",A professional degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,1-2,Very Important,Not important,Very Important,Somewhat important,Very Important,Not important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Not important,Somewhat important +Male,United States,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Business Analyst,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),I don't write code to analyze data,"Business Analyst,Programmer",University courses,0,0,0,100,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Perfectly,Self-employed,Amazon Machine Learning,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos",Somewhat useful,,,,,,Somewhat useful,,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,,Very useful,,,,Somewhat useful,"FastML Blog,KDnuggets Blog",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Biology,More than 10 years,"Data Scientist,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Adversarial Learning,Computer Vision","Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs","Some college/university study, no bachelor's degree",Mix of fields,,,,,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Image data,Sometimes,100GB,"CNNs,GANs","Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Java,Jupyter notebooks,KNIME (free version),Python,R,Spark / MLlib,TensorFlow,Unix shell / awk",,Most of the time,,Sometimes,,,,Sometimes,Sometimes,,,,,,Sometimes,,Often,,Sometimes,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,,,,,Often,,Often,,,,"CNNs,Data Visualization,GANs,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation",,,,Most of the time,,,Often,,,,Often,,,Sometimes,,,,,,Often,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,,,,,35,30,20,10,5,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,Often,Sometimes,,Often,Most of the time,Often,,,,Sometimes,Often,,,,,,Often,,76-99% of projects,Approximately half internal and half external,Standalone Team,Satellite imagery; aerial imagery; drone imagery,Making sure it is of high enough quality to train a CNN,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak)","Commercial Data Platform,Company Developed Platform",,Git,Most of the time,"180,000",,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,Brazil,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,"Computer Scientist,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,41,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,Supervised Machine Learning (Tabular Data),,A professional degree,Technology,"10,000 or more employees",Increased significantly,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,100MB,,"Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,30,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Engineer,University courses,10,20,20,30,10,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,23,Employed full-time,,,No,Yes,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,DBA/Database Engineer,Self-taught,90,10,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer",University courses,50,0,0,50,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Academic,I don't know,Increased significantly,Don't know,"A friend, family member, or former colleague told me",Very important,,Laptop or Workstation and private datacenters,Image data,,,,"Java,MATLAB/Octave,Python",,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction",,,,Often,,Often,,,,,,,,Often,,,,,,Often,Often,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,27,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,"Computer Scientist,Data Analyst,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,20,20,30,30,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,50,"Not employed, but looking for work",,,,,,,,Python,Monte Carlo Methods,R,Government website,"Company internal community,Non-Kaggle online communities,Online courses,YouTube Videos",,,,Very useful,,,,,Somewhat useful,,Very useful,,,,,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,,,,,"DataCamp,Udacity,Other",Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",15,75,10,0,0,0,Time Series,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,3-5,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Somewhat important,Not important,Not important,Not important +Male,Australia,32,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Data Miner",Self-taught,40,10,40,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines",,,Financial,I don't know,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),,,1GB,Decision Trees,"KNIME (free version),Tableau",,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Decision Trees,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Often,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),,,,,,,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,Colombia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Programmer,Software Developer/Software Engineer",Work,80,0,10,10,0,0,Reinforcement learning,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Mix of fields,20 to 99 employees,Stayed the same,1-2 years,A tech-specific job board,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,23,Employed part-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",0,60,0,0,0,40,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs",,Financial,Fewer than 10 employees,Decreased slightly,1-2 years,An external recruiter or headhunter,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Traditional Workstation",Image data,Rarely,100PB,"CNNs,Ensemble Methods,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,Angoss,C/C++,Java,Jupyter notebooks,KNIME (free version),MATLAB/Octave,NoSQL,Orange,Perl,SAS JMP,Spark / MLlib,Statistica (Quest/Dell-formerly Statsoft),TensorFlow",,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Ensemble Methods,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Simulation,SVMs,Text Analytics",,,Sometimes,,,,,,,,,Often,,,,,,,,Often,,,,,,,Most of the time,Most of the time,,,,,,30,70,0,0,0,0,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,Less than a year,"Business Analyst,Other",Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,No,Yes,Scientist/Researcher,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",85,10,5,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,"Not employed, but looking for work",,,,,,,,Hadoop/Hive/Pig,Deep learning,Python,"GitHub,University/Non-profit research group websites","Blogs,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,Very useful,,,,,Very useful,,,,,Very useful,,Very useful,Somewhat useful,,,Very useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,KDnuggets Blog",1-2 years,Necessary,Nice to have,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),11 - 39 hours,Master's degree,Yes,Master's degree,Other,1 to 2 years,Other,University courses,5,5,0,90,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",3-5,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Female,Portugal,25,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Work,10,0,80,10,0,0,Reinforcement learning,,Primary/elementary school,CRM/Marketing,100 to 499 employees,Stayed the same,6-10 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,1TB,,SAS Base,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,"A/B Testing,Simulation,Other",Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,Often,,,40,20,10,0,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,"Other,I haven't started working yet",Other,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,3 to 5 years,Data Miner,Work,10,40,50,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Financial,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,40,50,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Financial,500 to 999 employees,Increased significantly,3-5 years,Some other way,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Rarely,100GB,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,TensorFlow",,Sometimes,,,,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Neural Networks,Random Forests,Segmentation,Time Series Analysis",,,,Sometimes,,Sometimes,Sometimes,Sometimes,Sometimes,,,,,Sometimes,,,,,,Sometimes,,,Sometimes,,,Sometimes,,,,Sometimes,,,,70,20,0,10,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,40,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,15,40,5,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",I don't know/not sure,Military/Security,20 to 99 employees,Increased significantly,3-5 years,A general-purpose job board,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,GPU accelerated Workstation,Image data,Most of the time,100GB,"CNNs,Ensemble Methods,Neural Networks","Jupyter notebooks,Python,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,SVMs",,,,Most of the time,,Most of the time,Often,Sometimes,Often,,,Sometimes,,Sometimes,,Often,,,,Often,Sometimes,,Sometimes,,,Sometimes,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A humanities discipline,More than 10 years,Data Scientist,University courses,10,0,10,80,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,"10,000 or more employees",Increased significantly,6-10 years,An external recruiter or headhunter,Very important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Hadoop/Hive/Pig,Java,Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,,Sometimes,,,Sometimes,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,"Engineer,Programmer",University courses,0,80,0,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning",,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,United States,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,More than 10 years,Software Developer/Software Engineer,Self-taught,50,0,30,20,0,0,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,27,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,Less than a year,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",2,25,2,50,1,20,"Adversarial Learning,Computer Vision,Machine Translation,Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Other,"5,000 to 9,999 employees",Decreased slightly,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,30,0,0,50,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,"Image data,Text data",Don't know,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Naive Bayes,Neural Networks",,,Often,,,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,40,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"DBA/Database Engineer,Software Developer/Software Engineer,Other",Work,20,30,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression",I don't know/not sure,Technology,20 to 99 employees,Increased significantly,6-10 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Bayesian Techniques,GANs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Data Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,Google Cloud Compute,Social Network Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Arxiv,Blogs,College/University,Company internal community,Conferences,Friends network,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,YouTube Videos",Not Useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,,,,,Very useful,Very useful,Not Useful,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Computer Scientist,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician",University courses,50,30,10,10,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Support Vector Machines (SVMs)",A master's degree,Academic,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,Regression/Logistic Regression","Amazon Web services,C/C++,Jupyter notebooks,Mathematica,MATLAB/Octave,Python,R,TensorFlow",,Rarely,,Sometimes,,,,,,,,,,,,,Often,,,Sometimes,Sometimes,,,,,,,,,,Often,,Most of the time,,,,,,,,,,,,,Rarely,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,Recommender Systems,Simulation,Time Series Analysis",Sometimes,,Sometimes,,,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,Sometimes,,,Sometimes,,,Most of the time,,,,60,20,5,5,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,Sometimes,,,,,,Often,Sometimes,,Sometimes,,,Most of the time,,,,,Rarely,Most of the time,Most of the time,,100% of projects,Approximately half internal and half external,Standalone Team,Oecd; census data,Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,40000,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Japan,34,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,6 to 10 years,Software Developer/Software Engineer,Self-taught,60,0,10,0,30,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",Primary/elementary school,Hospitality/Entertainment/Sports,,,,,"N/A, I did not receive any formal education",Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,Image data,Most of the time,100GB,"CNNs,Neural Networks,Regression/Logistic Regression","C/C++,Python,SQL,Unix shell / awk",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,,,,Often,,,,"CNNs,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Segmentation,Text Analytics",,,,Most of the time,,Most of the time,Sometimes,,,,,,,Sometimes,,Sometimes,,,,Most of the time,,,,,,Often,,,Sometimes,,,,,50,10,35,5,0,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others",,,,,Often,Sometimes,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Most of the time,9600000,JPY,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Female,United States,44,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,Business Analyst,Work,0,50,50,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,,Pharmaceutical,"10,000 or more employees",Increased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Work,20,20,50,0,10,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Relational data,Most of the time,,,"Amazon Machine Learning,Cloudera,Flume,Hadoop/Hive/Pig,Impala,Java,NoSQL,Python,SAS Base,Spark / MLlib,SQL,Tableau",Sometimes,,,,Most of the time,,Most of the time,,Most of the time,,,,,Most of the time,Most of the time,,,,,,,,,,,,Often,,,,Often,,,,,,,Often,,,Most of the time,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees",,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,50,25,10,15,0,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Physics,Less than a year,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Engineer,,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,10,0,40,0,0,"Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer",University courses,50,0,20,10,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Analyst,Data Scientist,Programmer",Self-taught,50,10,0,40,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Financial,"5,000 to 9,999 employees",Decreased slightly,3-5 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,31,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Data Scientist,Software Developer/Software Engineer",University courses,25,25,40,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,University courses,50,0,0,50,0,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Computer Scientist,Machine Learning Engineer",University courses,50,0,25,25,0,0,Computer Vision,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,6 to 10 years,"DBA/Database Engineer,Software Developer/Software Engineer",University courses,40,0,20,25,15,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,,Self-taught,40,10,20,30,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Academic,100 to 499 employees,Increased slightly,1-2 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,100GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SQL,Tableau",,Most of the time,,,,,,,,,,,,,,,,,,,,,Sometimes,,Rarely,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,Most of the time,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,Often,Rarely,,,,Most of the time,Sometimes,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,No,Yes,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,I never declared a major,,"Business Analyst,Programmer,Software Developer/Software Engineer",,20,20,20,40,0,0,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,33,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,24,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Engineer,Programmer",University courses,20,0,20,40,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,Fewer than 10 employees,Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Business Analyst,Self-taught,60,10,30,0,0,0,Time Series,Bayesian Techniques,A bachelor's degree,Hospitality/Entertainment/Sports,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst",Work,30,0,50,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,New Zealand,48,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Self-taught,20,80,0,0,0,0,Time Series,"Decision Trees - Random Forests,Logistic Regression",High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Other,35,I prefer not to say,,"Yes, I'm focused on learning mostly data science skills",,,,,,Other,Proprietary Algorithms,Other,,Non-Kaggle online communities,,,,,,,,,Very useful,,,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Management information systems,,Other,Other,NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,3 to 5 years,"Data Analyst,Other",Self-taught,25,50,0,25,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Logistic Regression",A doctoral degree,Government,"1,000 to 4,999 employees",Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters,Traditional Workstation","Text data,Relational data",Most of the time,100MB,"Bayesian Techniques,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,25,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Researcher",University courses,5,20,20,50,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Telecommunications,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Rarely,1GB,Regression/Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,1 to 2 years,"Data Analyst,Data Scientist,Software Developer/Software Engineer",University courses,10,10,30,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,44,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher,Other",University courses,7,20,30,40,3,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",High school,Other,10 to 19 employees,Stayed the same,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Other,Other,Never,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,75,10,0,5,0,"Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Government,"5,000 to 9,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Image data,,,,"Hadoop/Hive/Pig,Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Rarely,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,47,Employed full-time,,,Yes,,Statistician,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,More than 10 years,"Data Analyst,Programmer,Researcher,Statistician",Self-taught,40,20,40,0,0,0,Unsupervised Learning,"Bayesian Techniques,Markov Logic Networks,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Pharmaceutical,100 to 499 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,,,,"Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Data Visualization,Simulation,Text Analytics,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,67,"Not employed, but looking for work",,,,,,,,Amazon Web services,Deep learning,SQL,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Engineer,Predictive Modeler,Researcher,Statistician",Work,50,0,40,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning,Other (please specify; separate by semi-colon)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,35,"Independent contractor, freelancer, or self-employed",,,No,Yes,Engineer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,48,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,40,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,More than 10 years,"Computer Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,50,10,30,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines","Markov Logic Networks,Neural Networks - CNNs",A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased significantly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,10TB,"CNNs,HMMs","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"CNNs,Collaborative Filtering",,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,35,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,29,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,31,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,30,0,0,70,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Hidden Markov Models HMMs",A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,58,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,I don't write code to analyze data,"Data Scientist,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Survival Analysis,Neural Networks - CNNs,High school,Government,100 to 499 employees,Decreased slightly,Less than one year,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,Text data,Never,100MB,Regression/Logistic Regression,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,20,20,20,20,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations in the state of the art in machine learning",,Often,,,,,,,Most of the time,,,Often,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,Just our data,Sandarization.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Other,,,Never,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,40,20,30,0,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - CNNs,,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,35,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Machine Learning Engineer,University courses,20,0,30,50,0,0,"Computer Vision,Time Series","Bayesian Techniques,Neural Networks - CNNs",A doctoral degree,Telecommunications,"1,000 to 4,999 employees",Stayed the same,6-10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),GPU accelerated Workstation",Image data,Sometimes,100GB,"Bayesian Techniques,CNNs","C/C++,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Naive Bayes,PCA and Dimensionality Reduction",,,,Often,,,Often,,,,,,,,,,,Often,,,Often,,,,,,,,,,,,,0,30,0,0,50,20,Enough to run the code / standard library,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,Other,Other,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Online courses,Stack Overflow Q&A",,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,,,,,,1-2 years,Necessary,Unnecessary,Nice to have,Nice to have,Necessary,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,,,,"Coursera,Udacity,Other",Basic laptop (Macbook),2 - 10 hours,Github Portfolio,Sort of (Explain more),Master's degree,Other,1 to 2 years,"Researcher,Software Developer/Software Engineer,Other",Self-taught,20,50,5,25,0,0,,"Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Not important,Very Important,Very Important,Very Important,Somewhat important +Male,Japan,41,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Management information systems,More than 10 years,"Business Analyst,Other",Self-taught,30,20,40,10,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Internet-based,"10,000 or more employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,,"Ensemble Methods,Regression/Logistic Regression","Jupyter notebooks,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,Often,,,,,,,,,,Often,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Other",Self-taught,65,35,0,0,0,0,"Adversarial Learning,Time Series","Bayesian Techniques,Logistic Regression",A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,10MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,44,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Other",Self-taught,55,0,30,15,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Other (please specify; separate by semi-colon)",A master's degree,Other,500 to 999 employees,Increased significantly,1-2 years,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,,,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Python,R,Spark / MLlib,SQL",,Rarely,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Rarely,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,,,,,,,,Often,,Often,,Sometimes,Often,,Often,,,,,,,Sometimes,,,,25,15,5,25,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,76-99% of projects,,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,"Git,Other",Rarely,,,I do not want to share information about my salary/compensation,,,,,,,,,,,,,,,,,,, +Male,United States,18,"Not employed, but looking for work",,,,,,,,Java,,,,"Online courses,Textbook,YouTube Videos",,,,,,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,,1-2 years,Necessary,,Necessary,,,,,,,,,,,,Basic laptop (Macbook),0 - 1 hour,,No,I did not complete any formal education past high school,,I don't write code to analyze data,Other,Self-taught,80,5,0,0,0,15,,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,3 to 5 years,Other,Self-taught,70,0,10,10,10,0,"Natural Language Processing,Reinforcement learning","Decision Trees - Random Forests,Evolutionary Approaches",A master's degree,Mix of fields,"10,000 or more employees",,,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Predictive Modeler,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,1 to 2 years,Researcher,Work,30,20,15,25,10,NA,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,Pharmaceutical,"10,000 or more employees",Increased slightly,3-5 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Relational data,Most of the time,1GB,"Bayesian Techniques,Ensemble Methods,Gradient Boosted Machines,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,Other,Other",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,Often,Often,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Time Series Analysis",,,,,,Often,Often,Often,Often,,,,,,,Often,,,,,Often,Often,Often,,,,,,,Often,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,55,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,80,0,0,20,0,0,Supervised Machine Learning (Tabular Data),Bayesian Techniques,A bachelor's degree,Technology,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,50,0,50,0,0,"Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,"5,000 to 9,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Relational data",Most of the time,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Time Series Analysis",Often,,,,,Often,Often,Often,,,,,,Often,,Often,,,,,Most of the time,,,,,,,,,Most of the time,,,,30,10,10,10,40,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,"Not employed, but looking for work",,,,,,,,Python,Link Analysis,Python,"Google Search,University/Non-profit research group websites","College/University,Conferences,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,Very useful,,,,,,,,,Very useful,Very useful,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Engineer,Programmer,Researcher",University courses,60,0,0,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,29,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",3 to 5 years,"Engineer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,30,20,10,10,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Other (please specify; separate by semi-colon)",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Biology,More than 10 years,Predictive Modeler,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Other",Self-taught,60,0,0,20,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Researcher,Software Developer/Software Engineer,Other",University courses,50,0,20,30,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A professional degree,Academic,"5,000 to 9,999 employees",Increased slightly,Don't know,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Image data,Sometimes,100GB,"Bayesian Techniques,CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,MATLAB/Octave,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs",,,Often,Most of the time,,Most of the time,,,,,,,,Often,,Often,,Sometimes,Often,Most of the time,Often,,,,Often,Sometimes,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,,More than 10 years,"Data Miner,Data Scientist,Operations Research Practitioner,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,45,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,6 to 10 years,"Business Analyst,Data Scientist,Programmer,Researcher",University courses,50,0,50,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,,"1,000 to 4,999 employees",Increased slightly,3-5 years,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Most of the time,100GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Java,Python,RapidMiner (commercial version),RapidMiner (free version),SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,,Most of the time,Most of the time,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Naive Bayes,Natural Language Processing,Neural Networks,Prescriptive Modeling,Random Forests,Recommender Systems,Segmentation,SVMs,Text Analytics",,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,Most of the time,,,,Most of the time,,Most of the time,,,,Most of the time,Most of the time,Most of the time,,Most of the time,Most of the time,Most of the time,,Most of the time,,Most of the time,Most of the time,,,,,50,20,20,10,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,21,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,Statistician,Self-taught,60,20,10,0,10,0,"Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,Sometimes,,Regression/Logistic Regression,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Logistic Regression,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,20,20,20,20,20,0,Enough to run the code / standard library,Lack of data science talent in the organization,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,52,Employed full-time,,,Yes,,Business Analyst,,"Employed by professional services/consulting firm,Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher,Statistician",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,20 to 99 employees,Increased significantly,3-5 years,Some other way,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service","Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,KNIME (commercial version),KNIME (free version),Python,R,SAS Base,SAS Enterprise Miner,Tableau",,,,,,,,,,,Most of the time,Most of the time,Most of the time,,,,,Often,Often,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Sometimes,Sometimes,,,,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Simulation,SVMs,Text Analytics,Time Series Analysis",,Most of the time,Often,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,,,,Often,Most of the time,Most of the time,,,Most of the time,Often,Often,Most of the time,Often,,,,40,30,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input",,,,,Often,,,,Sometimes,,Often,,,,,,,,,,,,10-25% of projects,More internal than external,Business Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,,,"2,200,000",TWD,Has stayed about the same (has not increased or decreased more than 5%),9,,,,,,,,,,,,,,,,,, +Female,South Korea,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Data Scientist,Programmer,Software Developer/Software Engineer",Work,10,10,80,0,0,0,"Recommendation Engines,Other (please specify; separate by semi-colon)",Logistic Regression,A bachelor's degree,Internet-based,"1,000 to 4,999 employees",Increased significantly,More than 10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,RapidMiner (free version),Survival Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Company internal community,Conferences,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,Somewhat useful,Very useful,,,,,,,Very useful,,Very useful,Very useful,,,,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"DBA/Database Engineer,Other",Self-taught,30,0,25,45,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Other,500 to 999 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,Regression/Logistic Regression,"Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL",,Often,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Text Analytics,Time Series Analysis",Often,,,,,Often,Most of the time,Sometimes,,,,,,,Often,Sometimes,,Sometimes,,,,,,,,,,,Often,Often,,,,60,10,5,15,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT",,Often,,,Most of the time,,,,,,,,,,Often,,,,,,,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email,Share Drive/SharePoint",,Other,Rarely,108000,USD,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Programmer,Researcher",University courses,5,10,20,60,5,0,"Computer Vision,Natural Language Processing,Time Series,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,I don't know,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,27,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Biology,6 to 10 years,"Data Scientist,Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",40,20,30,10,0,0,"Computer Vision,Time Series,Unsupervised Learning","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data,Relational data",Always,10GB,"CNNs,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,SAS Enterprise Miner,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Textbook,YouTube Videos",,,,,,,,,,,,,,,Very useful,,,Somewhat useful,"Linear Digressions Podcast,Partially Derivative Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Data Analyst,Other",Self-taught,50,0,10,20,20,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",High school,Insurance,"10,000 or more employees",Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Text data,Relational data",Always,100GB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,KNIME (free version),Microsoft R Server (Formerly Revolution Analytics),NoSQL,Python,R,RapidMiner (free version),SAS Base,Tableau,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Often,,Often,,,,,Most of the time,,,Sometimes,,,,Most of the time,,Most of the time,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Random Forests,Time Series Analysis",,Often,Most of the time,,,Most of the time,Often,Most of the time,,,,Often,,Sometimes,,Often,,Often,,,Often,,Often,,,,,,,Most of the time,,,,60,20,5,10,5,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Most of the time,Sometimes,,,Often,,,,,,Most of the time,,,Sometimes,Sometimes,,,,10-25% of projects,More internal than external,Standalone Team,NPPES,Poor understanding of business rules represented in the data.,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,130000,,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,3 to 5 years,Researcher,Self-taught,95,0,5,0,0,0,Time Series,,A doctoral degree,Academic,10 to 19 employees,,3-5 years,Some other way,Important,Other,Laptop or Workstation and local IT supported servers,Other,,,Other,"Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Machine Learning Engineer,Predictive Modeler,Researcher,Statistician",Self-taught,100,0,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs",,Financial,100 to 499 employees,Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Other,Laptop or Workstation and local IT supported servers,Other,Most of the time,10TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","NoSQL,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,29,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Business Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Adversarial Learning,Decision Trees - Gradient Boosted Machines,A professional degree,Manufacturing,"1,000 to 4,999 employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Always,,,SQL,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,A/B Testing,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to refine and innovate on the algorithm,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,16000,,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,Other,36,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Engineer,Other","Online courses (coursera, udemy, edx, etc.)",25,35,0,20,20,0,Time Series,Logistic Regression,A master's degree,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Relational data,,,Regression/Logistic Regression,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,RapidMiner (free version),SQL",,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,Sometimes,,Rarely,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Time Series Analysis",Sometimes,,,,,,Most of the time,Rarely,,,,,,,,Sometimes,,,,,,,,,,Often,,,,Rarely,,,,20,25,15,20,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Scaling data science solution up to full database",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Analyst,Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",University courses,5,10,25,25,20,15,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,0,10,10,70,10,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - RNNs",A doctoral degree,Financial,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Statistician,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Statistician,I haven't started working yet",,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,25,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,I haven't started working yet,University courses,15,10,20,50,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,3 to 5 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",30,20,20,10,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Google Cloud Compute,Deep learning,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Kaggle,Personal Projects,YouTube Videos",,,,,,,Very useful,,,,,Very useful,,,,,,Very useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Other,2 - 10 hours,Kaggle Competitions,No,Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,10,5,25,10,0,Computer Vision,Neural Networks - CNNs,High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important +Female,Taiwan,45,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Python,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Not Useful,"Data Machina Newsletter,O'Reilly Data Newsletter,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Necessary,Nice to have,Unnecessary,Necessary,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,No,Bachelor's degree,Electrical Engineering,,"Engineer,I haven't started working yet",Self-taught,NA,NA,NA,NA,NA,NA,"Computer Vision,Time Series,Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Somewhat important,Somewhat important,Somewhat important,Not important,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Engineer,University courses,30,0,0,70,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,100 to 499 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Other,"Traditional Workstation,Workstation + Cloud service",Relational data,Never,,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Random Forests,Simulation",,,,,,Rarely,Most of the time,Sometimes,Sometimes,,,,,Rarely,,Sometimes,,,,,,,Sometimes,,,,Most of the time,,,,,,,15,2,0,41,42,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",Sometimes,Sometimes,,,Often,Often,,Sometimes,Often,,,,,,,,,,,,,,76-99% of projects,More internal than external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Female,United States,36,Employed full-time,,,Yes,,Statistician,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,30,10,30,30,NA,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Mix of fields,"10,000 or more employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,23,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,Python,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,Work,0,0,100,0,0,0,,Neural Networks - CNNs,High school,Manufacturing,100 to 499 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Sometimes,10MB,CNNs,"C/C++,Java,Mathematica,Python,TensorFlow,Unix shell / awk",,,,Sometimes,,,,,,,,,,,Often,,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,Often,,,,CNNs,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,100,0,0,Enough to run the code / standard library,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Often,,,,,,,,,,,,,10-25% of projects,More internal than external,IT Department,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Git,Rarely,,,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,20,10,20,50,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition","Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Government,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Traditional Workstation,Workstation + Cloud service",Text data,Most of the time,1TB,"Decision Trees,Markov Logic Networks,Random Forests,SVMs","IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Perl,Python,R,Unix shell / awk",,,,,,,,,,,Often,Often,Most of the time,,,,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,Often,,Often,Sometimes,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,"HMMs,Logistic Regression,Markov Logic Networks,Natural Language Processing,Random Forests,SVMs,Text Analytics",,,,,,,,,,,,,Sometimes,,,Most of the time,Sometimes,,Most of the time,,,,Most of the time,,,,,Most of the time,Most of the time,,,,,25,25,10,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Difficulties in deployment/scoring,Explaining data science to others,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,Sometimes,,Often,,,,,,,,,,,,Sometimes,Sometimes,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,50,0,0,0,50,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,I prefer not to answer,Technology,"10,000 or more employees",Decreased slightly,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,43,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician",Work,50,0,50,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",I prefer not to answer,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,55,0,5,20,0,Supervised Machine Learning (Tabular Data),Decision Trees - Random Forests,A master's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and private datacenters,Traditional Workstation","Relational data,Other",Don't know,1GB,,"Python,R,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,Often,,,,,,,Logistic Regression,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,No,Yes,Operations Research Practitioner,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,26,Employed full-time,,,Yes,,Data Miner,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,Less than a year,"Data Miner,Software Developer/Software Engineer",Self-taught,40,20,20,0,20,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression",Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,20,30,45,0,5,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Other (please specify; separate by semi-colon)",A bachelor's degree,Other,"10,000 or more employees",Decreased slightly,3-5 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,Never,10GB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Python,R,Spark / MLlib",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,Often,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Time Series Analysis",,,,,,Often,Often,Often,,,,,,,Sometimes,Sometimes,,Sometimes,,Sometimes,Sometimes,,Often,,,Often,,,,Most of the time,,,,50,10,0,10,30,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Sometimes,,,,Most of the time,,,Sometimes,,,Often,,,,,,Sometimes,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,25,Employed full-time,,,No,Yes,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,Self-taught,60,10,10,10,10,0,"Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Gradient Boosting",A master's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,30,Employed full-time,,,No,Yes,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,Less than a year,Engineer,Self-taught,30,30,0,0,0,40,,,A master's degree,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,70,0,0,30,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,43,Employed part-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer",Other,100,0,0,0,0,0,,,A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,57,Employed full-time,,,Yes,,Operations Research Practitioner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,More than 10 years,Engineer,University courses,40,0,30,30,0,0,Time Series,"Ensemble Methods,Logistic Regression",A bachelor's degree,Other,"5,000 to 9,999 employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters,Traditional Workstation,Workstation + Cloud service",Text data,Most of the time,10MB,"Ensemble Methods,Regression/Logistic Regression","Amazon Web services,C/C++,Java,Python,R,SAS Base,SQL,Tableau,Other,Other",,Often,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,Often,,,Rarely,,,,Often,Often,,"Data Visualization,Ensemble Methods,Logistic Regression,Time Series Analysis",,,,,,,Often,,Often,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,10,30,30,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,More than 10 years,Engineer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,22,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,18,Employed full-time,,,Yes,,Computer Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,,,,70,0,0,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,31,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,A social science,,Other,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Engineer,Researcher,Other",Self-taught,40,10,50,0,0,0,,,A bachelor's degree,Technology,,,,,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Text data,Sometimes,10MB,,"C/C++,Other,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Rarely,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,70,20,5,5,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,Sometimes,,,Often,Often,,,,,,Often,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,India,31,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Data Analyst,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,,Government,"10,000 or more employees",Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,Employed full-time,,,Yes,,Machine Learning Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,1 to 2 years,Researcher,Work,0,0,40,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A bachelor's degree,Financial,,,,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,Relational data,Most of the time,,"Decision Trees,Regression/Logistic Regression","Jupyter notebooks,Microsoft SQL Server Data Mining,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,Often,,,,,,Sometimes,,Often,,,,,Most of the time,,,,Sometimes,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Segmentation,Simulation,Time Series Analysis",,,,,,Often,Often,Sometimes,,,,,,,,Often,,,,,,,,,,Often,Often,,,Often,,,,20,40,20,10,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,48,Employed full-time,,,Yes,,Machine Learning Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Engineer,Machine Learning Engineer",,50,30,10,10,0,0,Time Series,Logistic Regression,,Technology,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Traditional Workstation",Other,Sometimes,10MB,"Neural Networks,Regression/Logistic Regression,RNNs","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,10,0,60,10,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Financial,10 to 19 employees,Increased slightly,1-2 years,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Image data,Text data",Rarely,10MB,"Bayesian Techniques,Other","Amazon Web services,Java,Jupyter notebooks,Python,Spark / MLlib",,Most of the time,,,,,,,,,,,,,Sometimes,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,Often,,,,,,,,,,,"Data Visualization,Natural Language Processing,Simulation,Text Analytics,Other",,,,,,,Often,,,,,,,,,,,,Most of the time,,,,,,,,Rarely,,Rarely,,Most of the time,,,20,10,40,10,20,0,Enough to explain the algorithm to someone non-technical,"Limitations in the state of the art in machine learning,Scaling data science solution up to full database",,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Data Miner,Engineer,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Physics,1 to 2 years,Computer Scientist,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Recommendation Engines,,"Some college/university study, no bachelor's degree",Technology,,,,,Not at all important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Rarely,100MB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,Java,Python,SQL",,,,Most of the time,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Data Visualization,Decision Trees,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Programmer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,100,0,0,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Government,10 to 19 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1MB,"Bayesian Techniques,Decision Trees,Random Forests","Java,Python",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Decision Trees,Naive Bayes",,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,15,15,15,15,40,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,24,"Not employed, but looking for work",,,,,,,,Cloudera,Text Mining,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Tutoring/mentoring",,,Very useful,,,,Very useful,,,,,,,,,,Very useful,,Linear Digressions Podcast,< 1 year,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,,,,,Basic laptop (Macbook),0 - 1 hour,Github Portfolio,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Programmer,University courses,30,40,0,30,0,0,Unsupervised Learning,,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1-2,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important +Female,India,23,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,24,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Independent contractor, freelancer, or self-employed",,,Yes,,Other,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Statistician,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Survival Analysis,Python,Google Search,"Blogs,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,3 to 5 years,"Researcher,Statistician",University courses,20,20,30,30,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Other (please specify; separate by semi-colon)",High school,Other,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Never,100MB,Decision Trees,"IBM SPSS Modeler,R,SAS Base,SQL,Tableau",,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression",,,,,,,Most of the time,Most of the time,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,50,0,0,50,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of data science talent in the organization,Unavailability of/difficult access to data",,Sometimes,,,Often,,,,Sometimes,,,,,,,,,,,,Sometimes,,51-75% of projects,Entirely internal,Business Department,NO,To figure out why clients are leaving from our Company.,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Share Drive/SharePoint",,Other,Never,54000,IDR,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,Egypt,57,"Independent contractor, freelancer, or self-employed",,,No,Yes,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,Programmer,Self-taught,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,27,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",50,40,5,0,5,0,,,A bachelor's degree,Telecommunications,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Traditional Workstation,"Text data,Relational data",Rarely,,,"Microsoft Excel Data Mining,Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,Often,,,Rarely,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,0,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,6 to 10 years,"Engineer,Researcher,Other",University courses,1,0,0,99,0,0,,,A doctoral degree,Technology,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Text data,,,,"Perl,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Most of the time,Often,,,"Data Visualization,Text Analytics",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,42,"Not employed, but looking for work",,,,,,,,SAP BusinessObjects Predictive Analytics,Neural Nets,SAS,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,,,I don't write code to analyze data,Engineer,Self-taught,0,100,0,0,0,0,Time Series,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,10,5,0,15,0,Natural Language Processing,"Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A bachelor's degree,Pharmaceutical,10 to 19 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Most of the time,10GB,"CNNs,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,C/C++,Jupyter notebooks,NoSQL,Python,TensorFlow",,Most of the time,,Rarely,,,,,,,,,,,,,Often,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,RNNs,Time Series Analysis",,,,Sometimes,,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,Sometimes,,Most of the time,Most of the time,,,,,Often,,,,,Most of the time,,,,65,20,0,5,10,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,30,Employed part-time,,,Yes,,Programmer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Other,University courses,0,0,20,70,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Evolutionary Approaches",A doctoral degree,Academic,10 to 19 employees,Increased slightly,6-10 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,63,Employed part-time,,,Yes,,Other,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,6 to 10 years,Other,University courses,0,25,25,50,0,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I prefer not to answer,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,KNIME (free version),MATLAB/Octave,Microsoft Excel Data Mining,Orange,Python,R,RapidMiner (free version),TensorFlow",,,,,,,,,,,,,,,,,Often,,Sometimes,,Sometimes,,Sometimes,,,,,,Sometimes,,Most of the time,,Often,,Often,,,,,,,,,,,Often,,,,,,"Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,Simulation,SVMs,Text Analytics",,Often,Often,Often,Often,Often,,Often,Often,Sometimes,Sometimes,Often,Often,Often,Sometimes,Often,,Most of the time,Most of the time,Most of the time,Often,,,Sometimes,Often,Often,Sometimes,Often,Often,,,,,40,30,0,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data,Other",,,,,,,,,Often,Often,Often,Often,Often,,,,,,,,Most of the time,,26-50% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,50,0,50,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Japan,23,Employed part-time,,,No,Yes,Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Electrical Engineering,I don't write code to analyze data,I haven't started working yet,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,21,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search","College/University,Kaggle,Official documentation,Online courses,Trade book,YouTube Videos",,,Somewhat useful,,,,Somewhat useful,,,Somewhat useful,Very useful,,,,,Somewhat useful,,Very useful,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Scientist","Online courses (coursera, udemy, edx, etc.)",20,50,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Manufacturing,"1,000 to 4,999 employees",Decreased slightly,Less than one year,Some other way,Very important,Other,"Laptop or Workstation and private datacenters,Workstation + Cloud service","Image data,Text data,Relational data",Rarely,10GB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Python,R,SQL,Tableau,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,Often,,,Sometimes,Often,,,,,,"A/B Testing,Association Rules,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",Rarely,Sometimes,,,,,Most of the time,Sometimes,,,,Often,,Rarely,,Often,,Sometimes,,Often,Sometimes,,Often,,,,,Often,Sometimes,Often,,,,40,5,5,50,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,,,,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,,Most of the time,Most of the time,Most of the time,,100% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,650000,INR,Has decreased between 6% and 19%,,,,,,,,,,,,,,,,,,, +Male,India,48,Employed full-time,,,Yes,,Predictive Modeler,,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Data Scientist,Statistician",University courses,18,2,50,30,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Mix of fields,500 to 999 employees,Decreased significantly,More than 10 years,An external recruiter or headhunter,Very important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Decision Trees,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,IBM SPSS Modeler,IBM SPSS Statistics,Microsoft Excel Data Mining,Python,QlikView,R,SAS Base,SQL,Tableau",,,,,,,,,Rarely,,Sometimes,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Ensemble Methods,kNN and Other Clustering,Logistic Regression,Markov Logic Networks,Segmentation,SVMs,Text Analytics,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Rarely,,,,,,,,Sometimes,Often,Most of the time,,,,,,,Rarely,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Other,Kaggle competitions,50,0,0,0,50,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,27,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Computer Scientist,Programmer,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,"Natural Language Processing,Speech Recognition","Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Government,,,,,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Text data,Relational data",,1GB,Neural Networks,"C/C++,Java,MATLAB/Octave,Microsoft SQL Server Data Mining,SQL,Unix shell / awk",,,,Rarely,,,,,,,,,,,Most of the time,,,,,,Often,,,,Rarely,,,,,,,,,,,,,,,,,Most of the time,,,,,,Rarely,,,,"Decision Trees,Evolutionary Approaches,Natural Language Processing,Neural Networks",,,,,,,,Often,,Most of the time,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,60,40,0,0,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,Generic cloud file sharing software (Dropbox/Box/etc.),Don't know,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,32,Employed full-time,,,Yes,,Engineer,Poorly,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Arxiv,College/University,Conferences,Official documentation,Online courses,Textbook,Tutoring/mentoring",Somewhat useful,,Very useful,,Somewhat useful,,,,,Somewhat useful,,,,,Very useful,,Very useful,,"KDnuggets Blog,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,6 to 10 years,"Engineer,Software Developer/Software Engineer,Other",Work,50,10,20,20,0,0,"Recommendation Engines,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Support Vector Machines (SVMs)",High school,Academic,500 to 999 employees,Increased significantly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Sometimes,,Often,Often,,,,Often,,,,,,Rarely,,Sometimes,,,,Often,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,,Most of the time,,,,"Bayesian Techniques,Collaborative Filtering,Cross-Validation,Decision Trees,GANs,kNN and Other Clustering,Logistic Regression,Naive Bayes,PCA and Dimensionality Reduction,Recommender Systems,Simulation,SVMs",,,Most of the time,,Often,Often,,Sometimes,,,Sometimes,,,Often,,Often,,Often,,,Most of the time,,,Often,,,Often,Sometimes,,,,,,15,75,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Limitations in the state of the art in machine learning",,Most of the time,Often,,,,,,,,,Sometimes,,,,,,,,,,,10-25% of projects,Approximately half internal and half external,Central Insights Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,35000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,34,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,A humanities discipline,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",40,25,10,25,0,0,Natural Language Processing,"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and private datacenters,Traditional Workstation",Text data,Rarely,1MB,"Neural Networks,Regression/Logistic Regression,SVMs","MATLAB/Octave,Perl,Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Natural Language Processing",,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,50,20,0,5,25,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",15,50,5,25,5,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Other,20 to 99 employees,Stayed the same,More than 10 years,A general-purpose job board,Not very important,Other,Basic laptop (Macbook),Relational data,,10MB,"Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression",R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Neural Networks,Random Forests,Time Series Analysis",,,,,,Often,Often,Often,Sometimes,,,,,,,Rarely,,,,Sometimes,,,Sometimes,,,,,,,Rarely,,,,30,10,40,20,0,0,Enough to run the code / standard library,"Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Unavailability of/difficult access to data",,,,,,Sometimes,,,Sometimes,,,,,,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,65,0,0,0,5,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Not at all important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Image data,Text data",Don't know,1GB,"Regression/Logistic Regression,SVMs","MATLAB/Octave,Python,TensorFlow",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Republic of China,41,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,35,10,15,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Academic,"5,000 to 9,999 employees",,Don't know,A career fair or on-campus recruiting event,Somewhat important,Other,Basic laptop (Macbook),Other,Sometimes,1MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,YouTube Videos,Other",,Very useful,,,,,,,,Very useful,Very useful,Very useful,,Very useful,,,,Very useful,"Linear Digressions Podcast,R Bloggers Blog Aggregator,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,I never declared a major,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",20,50,10,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,"10,000 or more employees",Stayed the same,1-2 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",,100MB,"Decision Trees,Gradient Boosted Machines,Random Forests,SVMs","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Segmentation",,,,,,Sometimes,Most of the time,,,,,,,,,Sometimes,,,Often,,Sometimes,,Sometimes,,,Sometimes,,,,,,,,40,20,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Most of the time,Often,,,Most of the time,,,,,Most of the time,,,,,,Most of the time,Most of the time,,100% of projects,Entirely internal,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Share Drive/SharePoint,Other",Dataiku ,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Other",Never,"60,000",,I was not employed 3 years ago,6,,,,,,,,,,,,,,,,,, +Male,Iran,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,6 to 10 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",25,40,10,20,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression",,Mix of fields,100 to 499 employees,Increased slightly,Less than one year,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,Most of the time,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Prescriptive Modeling,Time Series Analysis",,,Often,,,Often,Often,Sometimes,,,,Often,,,,Sometimes,,Often,,,,Sometimes,,,,,,,,Sometimes,,,,50,20,15,10,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,Often,,,,,,,,,Sometimes,,,,Often,Most of the time,,,10-25% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,Git,Sometimes,40000,USD,Has increased 20% or more,5,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A",,Somewhat useful,,,,,Somewhat useful,,Very useful,Somewhat useful,Very useful,Very useful,,Somewhat useful,,,,,"KDnuggets Blog,No Free Hunch Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,50,30,0,0,20,0,"Recommendation Engines,Reinforcement learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,University courses,40,0,40,15,5,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Mix of fields,"10,000 or more employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Other",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,SQL,TensorFlow,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Sometimes,,,,Rarely,,Most of the time,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Prescriptive Modeling,Random Forests,Time Series Analysis",Often,,Sometimes,,,,Often,Often,Most of the time,,,Most of the time,,,Sometimes,Sometimes,,,,,,Sometimes,Often,,,,,,,Most of the time,,,,0,80,0,5,15,0,"Enough to code it again from scratch, albeit it may run slowly",Data Science results not used by business decision makers,,Sometimes,,,,,,,,,,,,,,,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Company Developed Platform,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,TensorFlow,Deep learning,C/C++/C#,Google Search,"Conferences,Kaggle,Stack Overflow Q&A,Textbook",,,,,Somewhat useful,,Somewhat useful,,,,,,,Very useful,Very useful,,,,Other (Separate different answers with semicolon),< 1 year,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,PhD,No,Bachelor's degree,Electrical Engineering,,"Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,Computer Vision,Neural Networks - CNNs,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important,Somewhat important +Male,Spain,38,Employed full-time,,,Yes,,Data Miner,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,40,40,20,0,0,0,Supervised Machine Learning (Tabular Data),"Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",High school,Financial,"10,000 or more employees",Stayed the same,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,10GB,,"Python,R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,Data Visualization,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,70,0,0,30,0,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,40,Employed full-time,,,Yes,,Predictive Modeler,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,More than 10 years,"Researcher,Other",University courses,10,0,30,60,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting",High school,Financial,"5,000 to 9,999 employees",Increased significantly,6-10 years,An external recruiter or headhunter,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Most of the time,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"Association Rules,Data Visualization,Gradient Boosted Machines,Lift Analysis,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Text Analytics,Time Series Analysis",,Sometimes,,,,,Most of the time,,,,,Most of the time,,,Most of the time,,,Sometimes,,Sometimes,Sometimes,,Most of the time,,,,,,Often,Often,,,,40,40,5,5,10,0,Enough to explain the algorithm to someone non-technical,Maintaining responsible expectations about the potential impact of data science projects,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,Software Developer/Software Engineer,Self-taught,20,70,0,0,10,0,Natural Language Processing,"Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Stayed the same,Don't know,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,39,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,Python,Regression,SQL,GitHub,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Miner,Engineer,Software Developer/Software Engineer,Other",Self-taught,50,0,50,0,0,0,Time Series,,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst,Programmer",Self-taught,60,20,5,5,5,5,"Supervised Machine Learning (Tabular Data),Time Series",Logistic Regression,A master's degree,Financial,I prefer not to answer,Increased significantly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop or Workstation and local IT supported servers,Traditional Workstation","Image data,Text data,Relational data",Rarely,10MB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression,SVMs","Microsoft Excel Data Mining,Python,R,SAS Base,SAS Enterprise Miner,Tableau,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,Often,,Often,,,,,Most of the time,Most of the time,,,,,,Sometimes,,,Most of the time,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Segmentation",,,Often,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,20,5,25,30,10,10,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database",,Most of the time,,,,,,,Often,,Often,,Often,,,,,Often,,,,,26-50% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,620000,,Has increased 20% or more,,,,,,,,,,,,,,,,,,, +Male,Pakistan,22,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Proprietary Algorithms,C/C++/C#,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,College/University,Company internal community,Friends network,Kaggle,Podcasts,Tutoring/mentoring,YouTube Videos",Very useful,,Very useful,Somewhat useful,,Somewhat useful,Very useful,,,,,,Somewhat useful,,,,Somewhat useful,Very useful,"Data Machina Newsletter,Jack's Import AI Newsletter,Partially Derivative Podcast",3-5 years,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Necessary,Nice to have,Nice to have,,,,,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Workstation + Cloud service",0 - 1 hour,Experience from work in a company related to ML,Yes,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,"Data Analyst,Other,I haven't started working yet",Other,40,10,20,10,0,20,"Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition","Bayesian Techniques,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,3-5,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,Other,30,Employed part-time,,,Yes,,Data Miner,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A humanities discipline,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",80,5,5,5,5,0,Speech Recognition,Neural Networks - RNNs,,Financial,"1,000 to 4,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,40,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,26,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,Researcher,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,26,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Researcher,University courses,20,0,60,20,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,Academic,"5,000 to 9,999 employees",Stayed the same,6-10 years,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Text data,Don't know,1GB,"CNNs,Decision Trees,GANs",Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Data Visualization,kNN and Other Clustering,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,RNNs,SVMs",,Sometimes,,Sometimes,,,Rarely,,,,,,,Sometimes,,,,,Sometimes,,Sometimes,,,Sometimes,Sometimes,,,Sometimes,,,,,,30,20,0,20,30,0,Enough to tune the parameters properly,Lack of funds to buy useful datasets from external sources,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,44,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",6 to 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",Other,40,50,0,0,10,0,Time Series,Logistic Regression,High school,Financial,"1,000 to 4,999 employees",Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Other,Basic laptop (Macbook),Relational data,Never,,,"Java,SAS Base,SQL,Unix shell / awk",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,,,,,Often,,,,Segmentation,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,40,30,10,20,0,0,Enough to tune the parameters properly,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools,Privacy issues",Often,,,,Often,,,,Sometimes,,,,Often,,,,Often,,,,,,26-50% of projects,Do not know,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,Programmer,Researcher",Self-taught,80,10,10,0,0,0,"Computer Vision,Speech Recognition",,High school,Technology,500 to 999 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Image data,Never,,,"Python,R,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Sometimes,,,,,,,,,,,,,Sometimes,,,,,,Naive Bayes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,,,,,26-50% of projects,Approximately half internal and half external,Business Department,,,,,,,,,,I was not employed 3 years ago,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,6 to 10 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,40,0,0,10,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",A master's degree,Telecommunications,"10,000 or more employees",Decreased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and local IT supported servers,Relational data,Never,,"Decision Trees,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base,SQL,Unix shell / awk",,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,Most of the time,,,,,,Most of the time,,,,"Data Visualization,Logistic Regression,Segmentation,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Often,,,,60,0,0,30,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Other",Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,Employed full-time,,,No,Yes,Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,1 to 2 years,,University courses,5,0,0,95,0,0,,Logistic Regression,High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,34,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,6 to 10 years,"Computer Scientist,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,20,10,5,50,15,0,"Computer Vision,Time Series,Unsupervised Learning","Hidden Markov Models HMMs,Neural Networks - CNNs",High school,Military/Security,100 to 499 employees,Stayed the same,Don't know,Some other way,Important,Other,Laptop or Workstation and private datacenters,"Image data,Text data",Rarely,100GB,"CNNs,HMMs,Neural Networks,SVMs","C/C++,MATLAB/Octave,Python",,,,Often,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Neural Networks,Segmentation,SVMs,Time Series Analysis",,,,Often,,Often,Most of the time,,,,,,Sometimes,Most of the time,,,,,,Most of the time,,,,,,Often,,Sometimes,,Sometimes,,,,25,15,15,35,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Sometimes,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,,NA,Employed full-time,,,Yes,,Data Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,1 to 2 years,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Romania,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Employed by college or university,Spark / MLlib,,,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Official documentation,Online courses,Tutoring/mentoring,YouTube Videos",Very useful,Very useful,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,Very useful,,,,,,Very useful,Very useful,"KDnuggets Blog,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Programmer,Self-taught,50,40,0,10,0,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Evolutionary Approaches,Support Vector Machines (SVMs)",A bachelor's degree,Academic,,,,,Important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Never,10MB,"Decision Trees,Evolutionary Approaches,Random Forests,SVMs","Python,R,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,Often,,,,"Decision Trees,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics",,,,,,,,Sometimes,,,,,,,,,,Sometimes,Most of the time,Often,,,Sometimes,,,,,Sometimes,Most of the time,,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,Unavailability of/difficult access to data",Often,,,,Most of the time,Sometimes,,,Most of the time,,Most of the time,,,,Often,,,,,,Often,,10-25% of projects,Do not know,Other,data available publicly online. ,"It is dirty, small sets and imbalaced and the requirements are not clear.",Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,28800,RON,Has stayed about the same (has not increased or decreased more than 5%),I prefer not to share,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,90,0,0,5,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),I prefer not to answer,Mix of fields,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Amazon Machine Learning,Bayesian Methods,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,,Very useful,Very useful,,Somewhat useful,Very useful,,Very useful,Very useful,Other (Separate different answers with semicolon),1-2 years,Necessary,Nice to have,Necessary,,Nice to have,Necessary,Necessary,Nice to have,,Necessary,,,,Coursera,Laptop or Workstation and local IT supported servers,2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Other,,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,Very Important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Very Important,Very Important +Male,Other,29,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Other","Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,1 to 2 years,"Business Analyst,Programmer",Self-taught,50,0,10,40,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs",,Technology,20 to 99 employees,,Don't know,,Important,,,,,,,"Jupyter notebooks,Python,R,RapidMiner (free version),SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Sometimes,,Sometimes,,,,,,,Often,,,Sometimes,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Naive Bayes,Neural Networks,Random Forests,Time Series Analysis",,Sometimes,Sometimes,,,,Often,Sometimes,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,Sometimes,,,,,,,Sometimes,,,,40,20,10,20,10,0,,Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,6,,,,,,,,,,,,,,,,,, +Male,Indonesia,22,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,24,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,"Data Analyst,Data Miner,Data Scientist","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Switzerland,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Poland,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,37,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Neural Nets,R,GitHub,"Stack Overflow Q&A,YouTube Videos",,,,,,,,,,,,,,Somewhat useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Business Analyst,Data Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer",University courses,20,0,0,80,0,0,Adversarial Learning,"Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A professional degree,Technology,500 to 999 employees,Stayed the same,3-5 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,Decision Trees,"IBM Cognos,Java,MATLAB/Octave,R,SQL",,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Decision Trees,Segmentation",,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,10,60,10,10,10,0,Enough to tune the parameters properly,"Dirty data,Lack of significant domain expert input",,,,,Most of the time,,,,,,Often,,,,,,,,,,,,51-75% of projects,Do not know,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Git,Subversion",Rarely,33000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +A different identity,India,35,Employed full-time,,,No,Yes,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by government",DataRobot,Social Network Analysis,Matlab,Google Search,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,I don't write code to analyze data,Business Analyst,,0,0,0,0,0,100,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Work,50,30,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,I don't know,Increased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,"Basic laptop (Macbook),Traditional Workstation",Image data,Sometimes,1TB,"Bayesian Techniques,Decision Trees,Neural Networks,Random Forests","Java,MATLAB/Octave,R,Unix shell / awk",,,,,,,,,,,,,,,Rarely,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Often,,,,"Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Simulation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,23,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,6 to 10 years,"Machine Learning Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",70,0,0,0,30,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,20 to 99 employees,Stayed the same,3-5 years,A tech-specific job board,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,10MB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Java,Jupyter notebooks,Python,SQL",,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,SVMs,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,22,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,"Data Scientist,Programmer","Online courses (coursera, udemy, edx, etc.)",20,25,25,15,15,0,Time Series,Gradient Boosting,A master's degree,CRM/Marketing,100 to 499 employees,Increased significantly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ireland,26,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,28,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,29,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",University courses,15,10,35,35,5,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Relational data,Most of the time,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Logistic Regression",,,,,,Often,Often,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,20,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Dirty data,Lack of significant domain expert input,Privacy issues",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,6 to 10 years,"Data Scientist,Statistician,Other",University courses,30,20,50,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",,Mix of fields,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Amazon Web services,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,NoSQL,Python,R,SQL,Tableau,TensorFlow",,Rarely,,,,,,,,,Often,Rarely,Sometimes,,,,Most of the time,,,,,,,,,,Rarely,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,Rarely,Rarely,,,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Time Series Analysis",,Sometimes,Sometimes,,Rarely,Most of the time,Most of the time,Most of the time,Most of the time,,,Sometimes,,,Most of the time,Most of the time,,Most of the time,Sometimes,,Often,,Often,Rarely,,,,,,Often,,,,50,25,5,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Lack of significant domain expert input,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,Often,,Most of the time,,,,,,,,,,Often,,100% of projects,Do not know,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",,,"Bitbucket,Generic cloud file sharing software (Dropbox/Box/etc.)",,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A health science,3 to 5 years,Data Analyst,Self-taught,40,20,0,0,40,0,"Recommendation Engines,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression",Primary/elementary school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer,Other",Self-taught,25,0,50,25,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Increased slightly,More than 10 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters,Other","Text data,Relational data",Sometimes,100MB,"Decision Trees,Regression/Logistic Regression,SVMs","Amazon Web services,C/C++,Java,MATLAB/Octave,NoSQL,Perl,Python,R,SQL,TensorFlow,Unix shell / awk",,Sometimes,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,,Rarely,,,Most of the time,Most of the time,,Sometimes,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,"Association Rules,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics,Time Series Analysis",,Rarely,,,,Most of the time,Most of the time,Sometimes,Sometimes,,,,,,,Sometimes,,,Most of the time,,Sometimes,,Sometimes,,Sometimes,,,Most of the time,Most of the time,Often,,,,25,30,5,10,30,NA,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Data Analyst,Data Scientist,Programmer",Self-taught,60,20,20,0,0,0,Natural Language Processing,Neural Networks - CNNs,,Technology,500 to 999 employees,Increased significantly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Always,1GB,Neural Networks,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Naive Bayes,Neural Networks",,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,20,40,20,10,0,10,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Explaining data science to others,Lack of data science talent in the organization",,,,,Most of the time,Often,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,52,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,24,"Not employed, but looking for work",,,,,,,,Microsoft Azure Machine Learning,Support Vector Machines (SVM),Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,Self-taught,NA,50,0,50,0,0,Survival Analysis,Markov Logic Networks,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,30,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,,University courses,50,0,10,40,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",High school,Financial,Fewer than 10 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Relational data",Sometimes,10MB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Julia,Python,R",,Most of the time,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation",Rarely,,Often,,,Most of the time,Most of the time,Often,Often,,,,,,,,,Often,Most of the time,Often,Often,,Often,,,Most of the time,,,,,,,,60,20,5,10,5,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,Researcher,Self-taught,55,40,0,0,5,0,Adversarial Learning,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Researcher",University courses,10,20,25,40,5,0,"Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Other",Most of the time,1TB,"Bayesian Techniques,CNNs,Gradient Boosted Machines,HMMs,Random Forests,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,3 to 5 years,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,70,0,0,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A professional degree,Retail,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Relational data,Always,100MB,Neural Networks,"Java,SQL",,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,52,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,DBA/Database Engineer,Self-taught,60,20,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,36,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Fine,Self-employed,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,University/Non-profit research group websites","Arxiv,Blogs,College/University,Conferences,Kaggle,Textbook,YouTube Videos",Very useful,Somewhat useful,Somewhat useful,,Somewhat useful,,Very useful,,,,,,,,Somewhat useful,,,Very useful,"FlowingData Blog,KDnuggets Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,30,30,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Mix of fields,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation","Text data,Relational data",Sometimes,1GB,"Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,NoSQL,Python,SQL,TensorFlow",,Sometimes,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,,,,Often,,,,,,"Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs",,,,,,Most of the time,Most of the time,,Often,,,Often,,,,Often,,,Often,Often,Often,,Often,,Often,,,,,,,,,30,45,5,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,Sometimes,Often,Often,,,,,Sometimes,,Sometimes,Often,,,,,,Often,Sometimes,,100% of projects,Approximately half internal and half external,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Email",,"Bitbucket,Git",Always,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,39,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Electrical Engineering,3 to 5 years,Data Analyst,Work,80,0,20,0,0,0,"Adversarial Learning,Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,Programmer,Fine,"Employed by company that makes advanced analytic software,Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,"Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Programmer,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,27,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,60,0,30,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis",,A doctoral degree,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,65,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,Other (please specify; separate by semi-colon),,A bachelor's degree,Government,"1,000 to 4,999 employees",Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Other,Laptop or Workstation and private datacenters,Other,Sometimes,100MB,Regression/Logistic Regression,"C/C++,Jupyter notebooks,Python",,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,50,50,0,0,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,,Often,,,,,,,,,Often,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,Java,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Stack Overflow Q&A,Textbook",,,Very useful,,,,Somewhat useful,,,,Very useful,,,Very useful,Somewhat useful,,,,FastML Blog,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",0,20,0,75,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Technology,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,10GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","IBM Watson / Waton Analytics,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Other",,,,,,,,,,,,,Rarely,,,,Most of the time,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,,Rarely,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Often,Sometimes,,Most of the time,Most of the time,Often,,,,,,Sometimes,,Often,,Most of the time,Most of the time,Most of the time,,,Most of the time,,,,,Most of the time,Often,Sometimes,,,,40,20,10,10,20,0,Enough to explain the algorithm to someone non-technical,"Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Sometimes,,Sometimes,,,,,Often,,,Most of the time,Most of the time,,26-50% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Commercial Data Platform,,"Bitbucket,Git",Sometimes,,,I was not employed 3 years ago,8,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Data Analyst,Data Scientist,Statistician",Work,20,0,80,0,0,0,,Logistic Regression,High school,Technology,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,10GB,Regression/Logistic Regression,"Microsoft SQL Server Data Mining,QlikView,R,SAS JMP,SQL",,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,Sometimes,,,,,,,Often,,Most of the time,,,,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,60,10,10,10,10,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,Python,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,University/Non-profit research group websites","Arxiv,College/University,Kaggle,Official documentation,Online courses,Personal Projects,YouTube Videos",Very useful,,Very useful,,,,Very useful,,,Very useful,Very useful,Very useful,,,,,,Very useful,No Free Hunch Blog,1-2 years,,Necessary,Necessary,,,,,Necessary,Necessary,Necessary,,,,"Coursera,edX,Udacity","Basic laptop (Macbook),GPU accelerated Workstation",11 - 39 hours,Master's degree,Yes,Master's degree,Computer Science,1 to 2 years,Machine Learning Engineer,Self-taught,30,40,0,0,30,0,Natural Language Processing,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,28,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Data Miner,Engineer,Machine Learning Engineer,Programmer,Researcher",University courses,30,20,0,30,20,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,Machine Learning Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by government,TensorFlow,Deep learning,Python,GitHub,"Blogs,Official documentation,Online courses,Stack Overflow Q&A,YouTube Videos",,Very useful,,,,,,,,Very useful,Somewhat useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Increased slightly,3-5 years,A general-purpose job board,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Other",Rarely,10MB,"Random Forests,SVMs","Jupyter notebooks,Mathematica,Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,SVMs",,,,Sometimes,,Often,Often,,,,,,,,,,,,,Sometimes,Sometimes,,Sometimes,Sometimes,,,,Often,,,,,,20,0,30,20,30,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of significant domain expert input,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,Sometimes,,,,Often,,,,,Often,Most of the time,,51-75% of projects,More internal than external,Other,kaggle; opendata,getting them,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Share Drive/SharePoint,Other",,"Git,Subversion",Sometimes,110000,CHF,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,Australia,42,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,Very useful,Very useful,,,,Very useful,,Very useful,Very useful,Very useful,Very useful,,Very useful,Very useful,,,,"FastML Blog,KDnuggets Blog,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,Machine Learning Engineer,University courses,20,10,10,50,10,0,"Adversarial Learning,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Internet-based,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Image data,Video data,Text data,Relational data,Other",Most of the time,100TB,"Bayesian Techniques,CNNs,Decision Trees,GANs,HMMs,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Java,Julia,Mathematica,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,R,Spark / MLlib,Tableau,TensorFlow",Sometimes,Often,,,,,,,Most of the time,,,,,,Sometimes,Sometimes,,,,Sometimes,Sometimes,Rarely,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,,,,Often,,,,Often,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,GANs,Gradient Boosted Machines,HMMs,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Simulation,SVMs,Time Series Analysis",Often,Often,Often,Often,,,,,,,Sometimes,Sometimes,Sometimes,,,,,,Often,,Often,Often,Sometimes,Often,,,Often,Often,,Often,,,,40,20,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,,,,,,,,,,,,,Often,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Sometimes,60000,EUR,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Female,Romania,21,Employed full-time,,,No,Yes,Computer Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,I never declared a major,Less than a year,"Business Analyst,Engineer",Self-taught,50,50,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,45,20,0,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,More than 10 years,"Data Analyst,Researcher,Software Developer/Software Engineer,Statistician,Other",University courses,30,10,20,40,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Government,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","KNIME (free version),Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,RapidMiner (free version),SAS Enterprise Miner,Other",,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,,,Most of the time,,,,,,,,,,Often,,,"Bayesian Techniques,Data Visualization,Decision Trees,kNN and Other Clustering,Random Forests",,,Sometimes,,,,Often,Sometimes,,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,,30,15,15,30,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,Often,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Brazil,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,NA,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,44,Employed full-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,MATLAB/Octave,Neural Nets,Matlab,,"Online courses,YouTube Videos",,,,,,,,,,,Very useful,,,,,,,Somewhat useful,"Partially Derivative Podcast,The Data Skeptic Podcast",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,Coursera,Traditional Workstation,2 - 10 hours,Online Courses and Certifications,No,Master's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,I prefer not to answer,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,29,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,NA,Employed part-time,,,No,Yes,Programmer,Poorly,Employed by a company that doesn't perform advanced analytics,Mathematica,Cluster Analysis,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,"FastML Blog,O'Reilly Data Newsletter",< 1 year,Nice to have,,,,Necessary,,,,,,,,,,"Basic laptop (Macbook),Traditional Workstation",2 - 10 hours,Kaggle Competitions,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Programmer,Self-taught,0,80,20,0,0,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,Primary/elementary school,Internet-based,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,1-2,,,,,,,Somewhat important,,,,,,,,, +Female,Other,31,Employed part-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,6 to 10 years,"Programmer,Researcher",Work,0,0,80,20,0,0,Supervised Machine Learning (Tabular Data),Support Vector Machines (SVMs),A master's degree,Academic,100 to 499 employees,Decreased slightly,Don't know,Some other way,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,,1GB,"Neural Networks,SVMs,Other","C/C++,IBM Watson / Waton Analytics,Java,NoSQL,Perl,Python,SQL",,,,Often,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,Rarely,,,Often,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Data Visualization,Natural Language Processing,Neural Networks,SVMs,Text Analytics,Time Series Analysis,Other",,,,,,,Sometimes,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,Sometimes,Most of the time,Sometimes,Often,,,20,50,0,20,10,0,Enough to tune the parameters properly,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Lack of significant domain expert input,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,Most of the time,Most of the time,,Most of the time,,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Vietnam,20,Employed part-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,Software Developer/Software Engineer,University courses,30,20,40,10,0,0,"Computer Vision,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Neural Networks - CNNs,Support Vector Machines (SVMs)",High school,Technology,500 to 999 employees,Stayed the same,Less than one year,A career fair or on-campus recruiting event,Very important,Other,Laptop or Workstation and private datacenters,Video data,Rarely,100GB,"CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","C/C++,Hadoop/Hive/Pig,Java,MATLAB/Octave,Perl,Python,R,Spark / MLlib,TensorFlow",,,,Often,,,,,Sometimes,,,,,,Sometimes,,,,,,Often,,,,,,,,,Often,Most of the time,,Most of the time,,,,,,,,Often,,,,,Most of the time,,,,,,"Bayesian Techniques,Collaborative Filtering,Naive Bayes,Neural Networks,Recommender Systems,Segmentation,SVMs",,,Often,,Most of the time,,,,,,,,,,,,,Most of the time,,Most of the time,,,,Most of the time,,Most of the time,,Often,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,44,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Other,1 to 2 years,Researcher,Other,50,0,0,0,0,50,"Natural Language Processing,Recommendation Engines,Survival Analysis,Unsupervised Learning","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Government,,,,,Not at all important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Workstation + Cloud service","Text data,Relational data",Sometimes,,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Python,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,,Most of the time,,,,,,"CNNs,Logistic Regression,Natural Language Processing,Neural Networks,Recommender Systems,RNNs,Text Analytics",,,,Sometimes,,,,,,,,,,,,Often,,,Sometimes,Sometimes,,,,Sometimes,Sometimes,,,,Sometimes,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,NA,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,48,Employed part-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,Microsoft Azure Machine Learning,Time Series Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website,I collect my own data (e.g. web-scraping)","Blogs,Conferences,Kaggle,YouTube Videos",,Somewhat useful,,,Not Useful,,Very useful,,,,,,,,,,,Very useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Doctoral degree,Biology,More than 10 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",10,85,0,0,5,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Government,100 to 499 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,10MB,"Bayesian Techniques,Decision Trees","Microsoft Azure Machine Learning,Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Time Series Analysis",,,Rarely,,,Sometimes,Most of the time,,,,,,,Often,,Most of the time,,,,,,,,,,,,,,Rarely,,,,30,20,0,40,10,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,Often,,,Often,,,51-75% of projects,Entirely internal,IT Department,,data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,60000,EUR,Has increased between 6% and 19%,9,,,,,,,,,,,,,,,,,, +Male,India,25,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Scientist,Fine,Self-employed,Amazon Machine Learning,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",NA,100,NA,NA,NA,NA,"Computer Vision,Machine Translation,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,I don't write code to analyze data,Researcher,University courses,0,0,0,0,0,100,,,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,31,Employed full-time,,,Yes,,Other,Fine,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A social science,Less than a year,Researcher,"Online courses (coursera, udemy, edx, etc.)",35,65,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A doctoral degree,Non-profit,20 to 99 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,1GB,Random Forests,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Random Forests,Time Series Analysis",,,,,,Sometimes,Most of the time,,,,,,,,,,,,,,,,Sometimes,,,,,,,Often,,,,70,0,0,20,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,33,Employed full-time,,,Yes,,Statistician,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Miner,Data Scientist,Operations Research Practitioner,Predictive Modeler,Programmer,Researcher,Statistician",Self-taught,90,1,3,6,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Logistic Regression,Support Vector Machines (SVMs)",High school,Academic,"1,000 to 4,999 employees",Increased significantly,More than 10 years,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,Rarely,10MB,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,48,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,0,0,50,0,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs",High school,Mix of fields,10 to 19 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed part-time,,,Yes,,Predictive Modeler,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",30,30,20,10,10,0,Recommendation Engines,Logistic Regression,A bachelor's degree,Academic,Fewer than 10 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Important,,Basic laptop (Macbook),Text data,Rarely,1GB,"Neural Networks,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Python,R,Spark / MLlib,SQL",,,,,,,,,Rarely,,,,,Rarely,,,,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,Sometimes,Rarely,,,,,,,,,,"Data Visualization,Decision Trees,Natural Language Processing,Neural Networks,Recommender Systems",,,,,,,Often,Sometimes,,,,,,,,,,,Sometimes,Sometimes,,,,Often,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,26,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Programmer,Work,20,30,10,10,30,0,"Natural Language Processing,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Logistic Regression",High school,Government,"10,000 or more employees",Increased significantly,1-2 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,23,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,NA,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),More than 10 years,"Data Scientist,Programmer,Researcher,Software Developer/Software Engineer",University courses,20,20,60,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,20,I prefer not to say,Yes,"No, I am not focused on learning data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,47,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,TensorFlow,Anomaly Detection,Python,I collect my own data (e.g. web-scraping),"Kaggle,Online courses,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,,,,,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",40,30,20,0,10,0,Time Series,Neural Networks - CNNs,High school,Hospitality/Entertainment/Sports,I don't know,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Workstation + Cloud service,Relational data,Never,100MB,"Neural Networks,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Microsoft Azure Machine Learning,Python,SQL,Tableau",,Often,,,,,,,,,,,,,,,Sometimes,,,,,Rarely,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Data Visualization,Time Series Analysis",,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,10,70,0,15,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Lack of significant domain expert input,Limitations of tools,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Sometimes,,,,,,,,,,Sometimes,,Sometimes,,,,,Often,,Most of the time,Sometimes,,26-50% of projects,More internal than external,Central Insights Team,,Lack of tools and hardware ,Column-oriented relational (e.g. KDB/MariaDB),Company Developed Platform,,Git,Rarely,125000,,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,No,Yes,Programmer,Fine,Employed by government,TensorFlow,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Kaggle,Online courses,Personal Projects,Stack Overflow Q&A,Textbook",,,Very useful,,,,Very useful,,,,Very useful,Somewhat useful,,Very useful,Very useful,,,,"KDnuggets Blog,No Free Hunch Blog,Talking Machines Podcast",1-2 years,Nice to have,Necessary,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,,,"edX,Udacity","Laptop or Workstation and local IT supported servers,Traditional Workstation",11 - 39 hours,Master's degree,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Computer Scientist,"Online courses (coursera, udemy, edx, etc.)",25,25,0,50,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Support Vector Machines (SVMs)",High school,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,Philippines,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,,Technology,500 to 999 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,56,"Independent contractor, freelancer, or self-employed",,,No,Yes,Researcher,,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,A health science,I don't write code to analyze data,Researcher,Self-taught,30,30,30,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by professional services/consulting firm,TensorFlow,Time Series Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Official documentation,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring",,,,,,,,,,Very useful,Very useful,,,Very useful,Very useful,,Very useful,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,20,20,30,10,0,"Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,1GB,"Bayesian Techniques,Regression/Logistic Regression","MATLAB/Octave,Python,R",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,,,"A/B Testing,Lift Analysis",Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,20,0,0,50,30,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,76-99% of projects,Approximately half internal and half external,Standalone Team,none,none,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Company Developed Platform,,Git,Sometimes,,INR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Italy,36,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,6 to 10 years,Data Analyst,Self-taught,40,10,45,0,5,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Academic,500 to 999 employees,,More than 10 years,"A friend, family member, or former colleague told me",Very important,Other,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Text data,Never,10GB,"Random Forests,SVMs","Jupyter notebooks,Perl,Python,R,TensorFlow,Unix shell / awk",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,Most of the time,,Most of the time,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,Sometimes,Most of the time,,Often,,,,,Often,,,,,,50,10,0,10,30,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,Often,,,Sometimes,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,"Business Analyst,Computer Scientist,Data Analyst,Machine Learning Engineer,Researcher",University courses,50,20,10,20,0,0,Unsupervised Learning,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Relational data,Rarely,100GB,"Bayesian Techniques,Evolutionary Approaches,Regression/Logistic Regression","Java,MATLAB/Octave,Microsoft Excel Data Mining,Python,SQL,Tableau,Unix shell / awk",,,,,,,,,,,,,,,Sometimes,,,,,,Rarely,,Rarely,,,,,,,,Most of the time,,,,,,,,,,,Sometimes,,,Rarely,,,Often,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Neural Networks,Simulation,Time Series Analysis",,,Sometimes,,,,Most of the time,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,Often,,,Often,,,,40,20,10,20,10,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of funds to buy useful datasets from external sources,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team",,Most of the time,,,,,,,,Often,Often,Often,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,21,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,68,"Not employed, but looking for work",,,,,,,,TensorFlow,Deep learning,Python,Government website,Personal Projects,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Computer Vision,Bayesian Techniques,Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Machine Learning Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,Less than a year,"Data Scientist,Engineer,Programmer,Software Developer/Software Engineer",Self-taught,20,20,30,5,25,0,"Computer Vision,Natural Language Processing,Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Internet-based,,,,,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Image data,Most of the time,10GB,"CNNs,Ensemble Methods,Neural Networks,RNNs","Amazon Web services,Google Cloud Compute",,Often,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Natural Language Processing,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Programmer,Self-taught,50,5,36,4,4,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,31,Employed full-time,,,Yes,,Scientist/Researcher,,Employed by government,R,Regression,R,Government website,"Kaggle,Online courses",,,,,,,Very useful,,,,Very useful,,,,,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Yes,Master's degree,A health science,3 to 5 years,Data Scientist,University courses,20,30,0,30,20,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,Government,"10,000 or more employees",Stayed the same,Don't know,An external recruiter or headhunter,Somewhat important,Other,Traditional Workstation,Other,Sometimes,10GB,Regression/Logistic Regression,"Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,PCA and Dimensionality Reduction,Random Forests",,,,,,Often,Often,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,30,30,10,10,20,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Often,,,,,,,,,,,,,,,Often,Sometimes,,26-50% of projects,More external than internal,Other,,,Document-oriented (e.g. MongoDB/Elasticsearch),Email,,Other,Rarely,"60,000",BRL,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,United States,63,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Operations Research Practitioner,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,Less than a year,Operations Research Practitioner,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Predictive Modeler","Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,"Adversarial Learning,Survival Analysis","Ensemble Methods,Logistic Regression",Primary/elementary school,Other,"5,000 to 9,999 employees",Decreased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,40,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Programmer,Software Developer/Software Engineer",University courses,20,10,30,40,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series",,High school,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,39,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,10,0,10,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Technology,,,,,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,100GB,"Decision Trees,Neural Networks,Random Forests,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,38,Employed part-time,,,No,Yes,DBA/Database Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other",University courses,0,0,0,0,0,100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,37,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,40,Employed full-time,,,Yes,,Programmer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Financial,500 to 999 employees,Increased slightly,Don't know,"A friend, family member, or former colleague told me",Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Other,Relational data,Rarely,,Other,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,36,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Biology,3 to 5 years,"Business Analyst,Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,0,70,0,0,0,"Survival Analysis,Time Series",,A master's degree,Internet-based,,,,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Portugal,25,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,Jupyter notebooks,,Julia,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites,Other","Arxiv,College/University,Conferences,Kaggle,Non-Kaggle online communities,Textbook,YouTube Videos",Very useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,,,,Very useful,,,Somewhat useful,,1-2 years,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Unnecessary,Nice to have,,,,,"Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and local IT supported servers",2 - 10 hours,PhD,No,Master's degree,I never declared a major,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Neural Networks - CNNs,Neural Networks - GANs",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,0,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important +Male,India,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,Other,"Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,I never declared a major,Less than a year,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,20,20,40,0,0,Other (please specify; separate by semi-colon),Logistic Regression,High school,Technology,20 to 99 employees,Increased significantly,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,22,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Mathematics or statistics,More than 10 years,"Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression",Primary/elementary school,Telecommunications,"10,000 or more employees",Increased significantly,More than 10 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,IBM Watson / Waton Analytics,Text Mining,SQL,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Non-Kaggle online communities,Online courses,Stack Overflow Q&A",,,,,,Somewhat useful,,,Somewhat useful,,Very useful,,,Somewhat useful,,,,,,< 1 year,Nice to have,Necessary,Nice to have,Unnecessary,Unnecessary,Nice to have,Nice to have,Unnecessary,Nice to have,Nice to have,,,,edX,Traditional Workstation,0 - 1 hour,Online Courses and Certifications,Yes,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Netherlands,36,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,I don't plan on learning a new tool/technology,Text Mining,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Non-Kaggle online communities,Online courses,Textbook",,Somewhat useful,Very useful,,,,Very useful,,Somewhat useful,,Very useful,,,,Very useful,,,,,3-5 years,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,,,,Coursera,Basic laptop (Macbook),0 - 1 hour,PhD,Sort of (Explain more),Doctoral degree,A health science,,Researcher,Work,NA,NA,NA,NA,NA,NA,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,People 's Republic of China,26,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Programmer,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,,28,Employed part-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by company that makes advanced analytic software,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,6 to 10 years,"Data Analyst,Data Scientist,Researcher",University courses,10,0,60,30,0,0,Natural Language Processing,"Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Other,10 to 19 employees,Increased slightly,,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,"Conferences,Kaggle,Textbook",,,,,Somewhat useful,,Somewhat useful,,,,,,,,Somewhat useful,,,,"FlowingData Blog,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,100,0,0,0,0,0,Natural Language Processing,Other (please specify; separate by semi-colon),High school,Technology,"10,000 or more employees",Stayed the same,3-5 years,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,,,,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,30,0,10,40,20,0,Enough to tune the parameters properly,Dirty data,,,,,Often,,,,,,,,,,,,,,,,,,26-50% of projects,More internal than external,IT Department,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Sometimes,,,Has stayed about the same (has not increased or decreased more than 5%),7,,,,,,,,,,,,,,,,,, +Male,United States,25,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by non-profit or NGO",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,40,20,0,40,0,0,"Computer Vision,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Military/Security,"5,000 to 9,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs","Amazon Machine Learning,Amazon Web services,C/C++,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau,TensorFlow,Unix shell / awk",Often,Most of the time,,Rarely,,,,Often,Sometimes,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Often,,,Sometimes,Often,,Most of the time,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A health science,More than 10 years,"Engineer,Programmer",Self-taught,50,10,30,0,10,0,,,A master's degree,Technology,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,10GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,Yes,,Data Analyst,Fine,"Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),6 to 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Logistic Regression,Other (please specify; separate by semi-colon)",A doctoral degree,Retail,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers",Relational data,Sometimes,100MB,"Bayesian Techniques,Regression/Logistic Regression","Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,Most of the time,,,,,,Sometimes,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Prescriptive Modeling,Time Series Analysis,Other",,,Often,,,,Often,,,,,,,,,,,,,,,Often,,,,,,,,Often,Often,,,50,20,10,10,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,21,"Not employed, but looking for work",,,,,,,,Java,,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Personal Projects,Textbook,YouTube Videos",,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,Not Useful,Somewhat useful,Very useful,,,Very useful,,,Not Useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",1-2 years,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Necessary,Nice to have,Unnecessary,Nice to have,Necessary,,,,"Coursera,edX,Udacity",Basic laptop (Macbook),0 - 1 hour,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,Self-taught,40,20,0,30,10,0,"Outlier detection (e.g. Fraud detection),Survival Analysis","Ensemble Methods,Logistic Regression","Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Not important,Very Important,Not important,Very Important,Very Important,Somewhat important,Very Important +Male,United States,21,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,25,30,5,25,5,"Survival Analysis,Other (please specify; separate by semi-colon)",Logistic Regression,A bachelor's degree,Pharmaceutical,"10,000 or more employees",Increased significantly,6-10 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,100MB,Regression/Logistic Regression,"Microsoft Excel Data Mining,R",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression",,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,60,15,5,15,5,0,Enough to tune the parameters properly,Unavailability of/difficult access to data,,,,,,,,,,,,,,,,,,,,,Often,,26-50% of projects,More internal than external,Standalone Team,,,"Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,37,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Researcher",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,Employed full-time,,,Yes,,Programmer,,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Computer Scientist,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, and not looking for work",,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Colombia,40,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Researcher",University courses,20,10,10,50,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"1,000 to 4,999 employees",Stayed the same,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Africa,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Programmer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,"Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,42,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,Other,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","Blogs,College/University,Online courses,YouTube Videos",,Very useful,Somewhat useful,,,,,,,,Very useful,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Miner,Data Scientist,Predictive Modeler,Statistician",Work,10,10,30,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Other (please specify; separate by semi-colon)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Retail,,,,,Somewhat important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"Decision Trees,Other","Amazon Web services,Google Cloud Compute,Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,SAS Base,SAS Enterprise Miner,SQL,TensorFlow,Unix shell / awk",,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,,Rarely,,,Often,,Sometimes,,,,,Most of the time,Sometimes,,,Most of the time,,,,Rarely,,Most of the time,,,,"A/B Testing,Data Visualization,Decision Trees,Logistic Regression,Natural Language Processing,Segmentation,Simulation,Text Analytics,Time Series Analysis",Sometimes,,,,,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,Often,Often,,Rarely,Often,,,,60,10,5,5,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Need to coordinate with IT",Sometimes,Often,,,Often,,,,,,,,,,Often,,,,,,,,Less than 10% of projects,More external than internal,IT Department,BrazilIan market data,Data quality,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Most of the time,100000,,Has increased between 6% and 19%,8,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Programmer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Julia,Time Series Analysis,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Arxiv,Very useful,,,,,,,,,,,,,,,,,,"FastML Blog,FlowingData Blog,KDnuggets Blog",1-2 years,Unnecessary,Unnecessary,Nice to have,Unnecessary,Necessary,Unnecessary,Unnecessary,Nice to have,Necessary,Nice to have,,,,,Basic laptop (Macbook),,PhD,Yes,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,"Bayesian Techniques,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Asking friends, family members, or former colleagues for leads",,Not important,Not important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Not important,Not important,Somewhat important,Somewhat important,Somewhat important +Male,Brazil,33,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,No,Yes,Programmer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Matlab,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,University/Non-profit research group websites","College/University,Online courses,Personal Projects",,,Very useful,,,,,,,,Somewhat useful,Very useful,,,,,,,,< 1 year,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,,,,,,Gaming Laptop (Laptop + CUDA capable GPU),11 - 39 hours,,No,Bachelor's degree,Engineering (non-computer focused),Less than a year,I haven't started working yet,Self-taught,70,0,0,30,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - GANs,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,25,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Deep learning,Java,GitHub,"College/University,Friends network,Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,Very useful,,,Very useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Machine Translation,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Taiwan,29,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,University courses,40,20,0,30,10,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,CRM/Marketing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,India,58,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Business Analyst,DBA/Database Engineer,Engineer,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,0,10,10,0,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data)",Bayesian Techniques,A master's degree,Technology,"1,000 to 4,999 employees",Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Not at all important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,,,"Bayesian Techniques,Decision Trees,Regression/Logistic Regression","Amazon Web services,C/C++,Cloudera,Hadoop/Hive/Pig,Java,Jupyter notebooks,Mathematica,MATLAB/Octave,NoSQL,Python,R,Spark / MLlib,SQL",,Most of the time,,Rarely,Sometimes,,,,Sometimes,,,,,,Most of the time,,Rarely,,,Rarely,Rarely,,,,,,Often,,,,Sometimes,,Sometimes,,,,,,,,Sometimes,Most of the time,,,,,,,,,,"Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Text Analytics",,Sometimes,Sometimes,,,,Most of the time,Often,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,36,Employed full-time,,,Yes,,Business Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Biology,3 to 5 years,"Business Analyst,Data Analyst",Self-taught,40,30,30,0,0,0,Supervised Machine Learning (Tabular Data),,,Technology,"10,000 or more employees",Increased significantly,More than 10 years,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Most of the time,,,"Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,Sometimes,,,,,,,"Natural Language Processing,Neural Networks,Text Analytics,Time Series Analysis",,,,,,,,,,,,,,,,,,,Sometimes,Often,,,,,,,,,Most of the time,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,No,Yes,Other,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,I don't write code to analyze data,Other,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,23,"Not employed, but looking for work",,,,,,,,Julia,Deep learning,Other,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Conferences,Friends network,Kaggle,Official documentation,Online courses,Textbook,YouTube Videos",,,Very useful,,Very useful,Very useful,Very useful,,,Very useful,Very useful,,,,Very useful,,,Somewhat useful,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Necessary,Unnecessary,Nice to have,Nice to have,Unnecessary,,,,"Coursera,DataCamp,Udacity",Traditional Workstation,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Mathematics or statistics,1 to 2 years,Other,University courses,40,25,0,35,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Meeting with recruiters who've contacted you directly,,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important +Male,United States,33,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Sweden,50,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed part-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,6 to 10 years,Software Developer/Software Engineer,University courses,20,50,0,30,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,27,Employed part-time,,,No,Yes,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Biology,More than 10 years,Researcher,Self-taught,30,50,20,0,0,0,Computer Vision,"Decision Trees - Random Forests,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Other,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Important,Other,GPU accelerated Workstation,Image data,Sometimes,1GB,CNNs,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,Less than a year,Operations Research Practitioner,Kaggle competitions,10,10,20,20,35,5,Adversarial Learning,Logistic Regression,,,,,,,Important,,,Image data,,,SVMs,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,50,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,Jupyter notebooks,Deep learning,Python,I collect my own data (e.g. web-scraping),"Arxiv,Kaggle,Online courses,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,,Very useful,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,O'Reilly Data Newsletter",,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Technology,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,1GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","Google Cloud Compute,Jupyter notebooks,Python,R,TensorFlow",,,,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,Often,,,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs",,,,,,,,Often,Often,,,Often,,,,,,,,Often,Sometimes,,Often,,,,,Often,,,,,,60,30,0,0,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Often,,,,Often,,,,,,,Often,Sometimes,,,,Often,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,"Git,Subversion",Sometimes,90000,EUR,Has stayed about the same (has not increased or decreased more than 5%),8,,,,,,,,,,,,,,,,,, +Male,Other,29,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,Computer Scientist,University courses,10,30,40,20,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,I don't plan on learning a new tool/technology,Text Mining,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,University/Non-profit research group websites","Arxiv,Kaggle,Official documentation,Textbook,YouTube Videos",Very useful,,,,,,Very useful,,,Very useful,,,,,Somewhat useful,,,Somewhat useful,"Becoming a Data Scientist Podcast,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",5,70,0,0,25,0,"Computer Vision,Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Academic,20 to 99 employees,Stayed the same,Less than one year,A tech-specific job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),GPU accelerated Workstation","Image data,Video data,Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Neural Networks","Jupyter notebooks,NoSQL,Python,SQL",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation",Most of the time,,,Often,,Most of the time,Most of the time,,Sometimes,,,,,Sometimes,,,,,,Often,Often,,Sometimes,Often,,Sometimes,,,,,,,,50,20,10,15,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Organization is small and cannot afford a data science team,Privacy issues,Unavailability of/difficult access to data",Sometimes,,,,Most of the time,,,,,,,,,,,Sometimes,Often,,,,Sometimes,,51-75% of projects,Entirely internal,IT Department,,Dirty Dianaaaaaaaa~,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,,,Has increased 20% or more,7,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Scientist,DBA/Database Engineer",Self-taught,30,30,40,0,0,0,"Recommendation Engines,Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Microsoft Excel Data Mining,NoSQL,R,SQL,Tableau",,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,Sometimes,,,,,,,,,Most of the time,,,Rarely,,,,,,,"Data Visualization,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,,,,,Often,,,,40,15,5,30,10,0,Enough to run the code / standard library,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,Primary/elementary school,Technology,"5,000 to 9,999 employees",Decreased slightly,1-2 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,26,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,"Information technology, networking, or system administration",3 to 5 years,"Business Analyst,Data Scientist",Work,40,10,5,20,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Retail,"10,000 or more employees",Increased slightly,1-2 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","C/C++,Python,R,Tableau",,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,Rarely,,,,,,,"A/B Testing,Cross-Validation,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests,Text Analytics",Sometimes,,,,,Often,,Most of the time,Sometimes,,,Often,,Often,,Often,,,Often,,Sometimes,,Often,,,,,,Sometimes,,,,,50,10,30,5,5,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Scaling data science solution up to full database",,,,Sometimes,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,42,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Predictive Modeler,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician",Other,0,10,10,0,0,80,"Outlier detection (e.g. Fraud detection),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,"1,000 to 4,999 employees",Increased slightly,1-2 years,Some other way,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,100GB,"Decision Trees,Ensemble Methods,Neural Networks,Regression/Logistic Regression","IBM SPSS Modeler,IBM SPSS Statistics,MATLAB/Octave,Microsoft Azure Machine Learning,Microsoft R Server (Formerly Revolution Analytics),Minitab,R,SAS Base,SAS Enterprise Miner,SAS JMP,SQL,Tableau",,,,,,,,,,,Sometimes,Sometimes,,,,,,,,,Sometimes,Sometimes,,Sometimes,,Sometimes,,,,,,,Most of the time,,,,,Sometimes,Sometimes,Sometimes,,Most of the time,,,Most of the time,,,,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Segmentation,Simulation,Time Series Analysis",Sometimes,,,,,Often,Most of the time,Often,,,,,,,Most of the time,Most of the time,,,,Sometimes,Often,,Sometimes,,,Most of the time,Most of the time,,,Often,,,,80,5,5,5,5,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Limitations of tools,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",Most of the time,Often,,,,,,,Most of the time,,,,Most of the time,,Often,,Often,,,Often,,,76-99% of projects,More external than internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Rarely,400000000,IRR,I was not employed 3 years ago,,,,,,,,,,,,,,,,,,, +Male,Japan,37,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,1 to 2 years,"Engineer,Programmer",Self-taught,65,0,30,0,5,0,"Outlier detection (e.g. Fraud detection),Time Series","Logistic Regression,Neural Networks - CNNs",No education,Telecommunications,"10,000 or more employees",Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,52,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,High school,Technology,,,,,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Sometimes,1GB,Decision Trees,"Jupyter notebooks,Microsoft Azure Machine Learning,Python",,,,,,,,,,,,,,,,,Often,,,,,Often,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Prescriptive Modeling,Text Analytics,Time Series Analysis",Often,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,Often,Often,,,,85,5,0,5,5,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,100% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Git,Always,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Kaggle,Online courses,YouTube Videos",,,,,,Somewhat useful,Somewhat useful,,,,Somewhat useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Self-taught,90,5,5,0,0,0,Adversarial Learning,"Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Pharmaceutical,20 to 99 employees,Increased slightly,Less than one year,A career fair or on-campus recruiting event,Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Image data,Never,<1MB,"CNNs,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,35,"Not employed, but looking for work",,,,,,,,R,Regression,R,GitHub,Tutoring/mentoring,,,,,,,,,,,,,,,,,Very useful,,"Data Stories Podcast,O'Reilly Data Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Survival Analysis,"Decision Trees - Random Forests,Logistic Regression",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,"Data Analyst,Data Scientist,Researcher",University courses,10,0,50,30,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Sometimes,,"Bayesian Techniques,Decision Trees,Random Forests","Amazon Web services,Java,Jupyter notebooks,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,R,SQL",,Rarely,,,,,,,,,,,,,Rarely,,Most of the time,,,,,,Often,,Sometimes,,,,,,Most of the time,,Sometimes,,,,,,,,,Often,,,,,,,,,,"Bayesian Techniques,Data Visualization,Decision Trees,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Random Forests",,,Sometimes,,,,Often,Often,,,,,,,,,,Sometimes,Most of the time,,Sometimes,,Sometimes,,,,,,,,,,,15,30,45,0,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of significant domain expert input,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,Often,,,,,,,,Often,Often,,,51-75% of projects,Entirely internal,Other,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Graph (e.g. GraphBase/Neo4j),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,24,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,40,10,30,20,0,0,"Natural Language Processing,Time Series","Decision Trees - Random Forests,Logistic Regression",,Technology,500 to 999 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,58,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,Amazon Web services,I don't plan on learning a new ML/DS method,Python,"Google Search,Other","Personal Projects,Stack Overflow Q&A,Textbook",,,,,,,,,,,,Very useful,,Somewhat useful,Very useful,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),More than 10 years,"Computer Scientist,Engineer,Researcher,Software Developer/Software Engineer",University courses,10,0,0,90,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning,Other (please specify; separate by semi-colon)","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Insurance,"10,000 or more employees",Increased slightly,6-10 years,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,10MB,"Regression/Logistic Regression,Other","C/C++,Mathematica,Python,Unix shell / awk,Other",,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,Most of the time,Often,,,"Data Visualization,Simulation",,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,20,10,0,20,50,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,Other,,,,,,,,,,,,,,,,,,,,,,Sometimes,51-75% of projects,More internal than external,IT Department,no third party or public datasets,"technology changes break data consistence, the same data today doesn't mean the same thing it did a few months back.","Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email,Share Drive/SharePoint",,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,,,I do not want to share information about my salary/compensation,9,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,50,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,Microsoft Excel Data Mining,Cluster Analysis,SQL,University/Non-profit research group websites,"College/University,Conferences,Podcasts",,,Very useful,,Very useful,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,More than 10 years,"Business Analyst,Data Analyst,Researcher",Work,50,0,50,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks",A doctoral degree,Academic,500 to 999 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Most of the time,,,"IBM SPSS Statistics,Microsoft Excel Data Mining,SQL",,,,,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Text Analytics,Time Series Analysis",Sometimes,Sometimes,Sometimes,,,,Sometimes,Sometimes,,,,,,,,Sometimes,Sometimes,,,,,,,,,,,,Sometimes,Sometimes,,,,30,0,20,20,30,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",Often,,,,Often,Often,,Often,Often,,,,Often,Often,Often,,,,,,,,51-75% of projects,More internal than external,Other,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Commercial Data Platform,Company Developed Platform,Email",,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,,,,9,,,,,,,,,,,,,,,,,, +"Non-binary, genderqueer, or gender non-conforming",India,21,"Not employed, but looking for work",,,,,,,,R,Social Network Analysis,Python,Other,Online courses,,,,,,,,,,,Very useful,,,,,,,,"Becoming a Data Scientist Podcast,Data Elixir Newsletter,R Bloggers Blog Aggregator",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",10,50,40,0,0,0,Other (please specify; separate by semi-colon),Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Adversarial Learning,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Mix of fields,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Other,Basic laptop (Macbook),Relational data,Rarely,100MB,Decision Trees,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,,,,,,,,"Data Visualization,Decision Trees",,,,,,,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,20,0,10,20,0,50,Enough to tune the parameters properly,Dirty data,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,35,"Not employed, but looking for work",,,,,,,,DataRobot,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Other","Blogs,Kaggle,Newsletters,Online courses,Personal Projects,Textbook,Other",,Very useful,,,,,Very useful,Very useful,,,Very useful,Very useful,,,Very useful,,,,"KDnuggets Blog,Other (Separate different answers with semicolon)",< 1 year,Necessary,Necessary,Necessary,,Necessary,Necessary,,,Necessary,Necessary,Nice to have,Nice to have,Necessary,Coursera,"Traditional Workstation,Other",2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,A social science,3 to 5 years,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",10,80,0,0,0,10,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Other (please specify; separate by semi-colon)",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Male,United States,26,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,29,Employed part-time,,,No,Yes,Other,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Other,I haven't started working yet",Other,100,0,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,23,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,40,15,30,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,25,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Microsoft Azure Machine Learning,Anomaly Detection,Python,Google Search,"Kaggle,YouTube Videos",,,,,,,Somewhat useful,,,,,,,,,,,Very useful,Siraj Raval YouTube Channel,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,"Data Miner,Predictive Modeler,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,5,40,10,15,0,"Adversarial Learning,Time Series","Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Mix of fields,Fewer than 10 employees,Stayed the same,3-5 years,A career fair or on-campus recruiting event,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Most of the time,10GB,"GANs,Neural Networks,Regression/Logistic Regression,RNNs","Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"A/B Testing,Data Visualization,Neural Networks,Recommender Systems,RNNs,Time Series Analysis",Rarely,,,,,,Most of the time,,,,,,,,,,,,,Most of the time,,,,Most of the time,Often,,,,,Sometimes,,,,35,35,5,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Often,,,,,,,,,,,,,,,Often,,,76-99% of projects,More internal than external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,,,I do not want to share information about my salary/compensation,3,,,,,,,,,,,,,,,,,, +Female,United States,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Statistician,Other",University courses,10,0,10,80,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",High school,Non-profit,100 to 499 employees,Increased significantly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,I don't write code to analyze data,"Business Analyst,Data Analyst",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,3 to 5 years,"Data Scientist,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",University courses,30,15,0,55,0,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Reinforcement learning,Speech Recognition,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Other,I prefer not to answer,Increased significantly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Sometimes,1GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression","C/C++,MATLAB/Octave,Python,SQL,TensorFlow",,,,Sometimes,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,,Self-taught,80,20,0,0,0,0,Time Series,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,49,Employed full-time,,,No,Yes,Programmer,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,64,"Independent contractor, freelancer, or self-employed",,,Yes,,Researcher,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Mathematics or statistics,1 to 2 years,"Programmer,Software Developer/Software Engineer",Self-taught,80,10,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Manufacturing,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,"Image data,Relational data",,,"CNNs,Decision Trees,Ensemble Methods,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,A health science,More than 10 years,"Data Analyst,Researcher,Statistician",University courses,10,0,10,80,0,0,,,A master's degree,Academic,"10,000 or more employees",,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,,,"R,SAS Base",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Prescriptive Modeling",,,,,,,Often,,,,,,,,,Often,,,,,,Often,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,41,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Sweden,30,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Statistician",University courses,0,20,20,60,0,0,Time Series,Neural Networks - CNNs,A master's degree,CRM/Marketing,20 to 99 employees,Increased slightly,6-10 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,,,"IBM SPSS Statistics,R,SAP BusinessObjects Predictive Analytics,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,Sometimes,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Time Series Analysis",,,,,,,Most of the time,Often,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Need to coordinate with IT,,,,,,,,,,,,,,,,,,,,,,,51-75% of projects,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,55,Employed full-time,,,Yes,,Other,Poorly,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,"Business Analyst,Engineer,Researcher",Self-taught,50,0,50,0,0,0,,,High school,Government,100 to 499 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Sometimes,<1MB,,"NoSQL,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Prescriptive Modeling",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,48,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,More than 10 years,"DBA/Database Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,60,20,0,0,0,Reinforcement learning,Logistic Regression,"Some college/university study, no bachelor's degree",Telecommunications,"10,000 or more employees",Increased significantly,6-10 years,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,Most of the time,,,"IBM Watson / Waton Analytics,NoSQL,SQL,Tableau",,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Nigeria,26,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A health science,I don't write code to analyze data,Business Analyst,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,29,Employed full-time,,,No,Yes,Programmer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",0,40,40,20,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,58,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,Other,Self-taught,90,0,0,10,0,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,Primary/elementary school,Academic,500 to 999 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Text data,,1MB,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,19,Employed part-time,,,No,Yes,Engineer,Fine,Employed by college or university,TensorFlow,Neural Nets,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Official documentation,Personal Projects,Stack Overflow Q&A,YouTube Videos",,,,,,,,,,Very useful,,Very useful,,Somewhat useful,,,,Somewhat useful,Siraj Raval YouTube Channel,< 1 year,Nice to have,Nice to have,Nice to have,Nice to have,Necessary,Unnecessary,Nice to have,Necessary,Nice to have,Necessary,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...)",2 - 10 hours,Experience from work in a company related to ML,No,Some college/university study without earning a bachelor's degree,Electrical Engineering,Less than a year,"Data Scientist,Engineer,Machine Learning Engineer",University courses,30,10,0,60,0,0,,,A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Business Analyst,Data Analyst",Work,40,25,25,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Technology,100 to 499 employees,Increased significantly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,38,Employed full-time,,,Yes,,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,6 to 10 years,Researcher,Self-taught,30,0,30,40,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",A master's degree,Academic,"5,000 to 9,999 employees",Increased slightly,6-10 years,Some other way,Very important,Research that advances the state of the art of machine learning,Basic laptop (Macbook),Relational data,,100MB,"CNNs,Decision Trees,Neural Networks","Amazon Web services,C/C++,IBM SPSS Statistics,IBM Watson / Waton Analytics,Java,Microsoft Excel Data Mining,Python,QlikView,R,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,30,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,30,Employed part-time,,,No,Yes,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,"Information technology, networking, or system administration",1 to 2 years,Data Miner,University courses,50,30,0,20,0,0,Other (please specify; separate by semi-colon),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,Fewer than 10 employees,Increased slightly,3-5 years,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,32,"Independent contractor, freelancer, or self-employed",,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,34,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,,Other,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,14,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,I don't plan on learning a new tool/technology,,,,"Kaggle,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,,,,,,Very useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),I did not complete any formal education past high school,,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Other (please specify; separate by semi-colon),,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,48,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,I don't plan on learning a new tool/technology,"Ensemble Methods (e.g. boosting, bagging)",R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Non-Kaggle online communities,Stack Overflow Q&A",,,,,,,,,Very useful,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,6 to 10 years,Other,University courses,50,0,0,50,0,0,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Other (please specify; separate by semi-colon)",A professional degree,Financial,"10,000 or more employees",,More than 10 years,A general-purpose job board,Somewhat important,Analyze and understand data to influence product or business decisions,"Traditional Workstation,Other",Relational data,Most of the time,10GB,"Bayesian Techniques,Regression/Logistic Regression","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Logistic Regression,PCA and Dimensionality Reduction,Simulation,Time Series Analysis",,,Often,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,Often,,,,,,Often,,,Most of the time,,,,80,10,0,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process",Most of the time,Most of the time,,,,,,Most of the time,,,,,,,,,,,,,,,100% of projects,Approximately half internal and half external,Central Insights Team,"teranet house prices, employment data from stats canada",none,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Share Drive/SharePoint,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Rarely,150000,CAD,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,United States,56,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Physics,More than 10 years,Computer Scientist,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,"Not employed, but looking for work",,,,,,,,Python,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Programmer,Self-taught,100,0,0,0,0,0,Recommendation Engines,Neural Networks - RNNs,A professional degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,25,Employed part-time,,,No,Yes,Data Miner,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,25,15,5,5,0,Recommendation Engines,,A master's degree,Insurance,"1,000 to 4,999 employees",Decreased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,"Text data,Relational data",Sometimes,1GB,,"R,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,Simulation,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,0,0,0,0,0,0,Enough to run the code / standard library,"Dirty data,Need to coordinate with IT,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,32,Employed full-time,,,Yes,,Statistician,Fine,"Employed by college or university,Employed by a company that performs advanced analytics,Self-employed",Amazon Web services,Genetic & Evolutionary Algorithms,SAS,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,University/Non-profit research group websites","Arxiv,College/University,Conferences,Kaggle,Official documentation,Personal Projects,Stack Overflow Q&A,Textbook,Trade book,YouTube Videos",Very useful,,Very useful,,Very useful,,Very useful,,,Very useful,,Very useful,,Very useful,Very useful,Very useful,,Very useful,"KDnuggets Blog,O'Reilly Data Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Master's degree,,6 to 10 years,"Data Analyst,Predictive Modeler,Programmer,Researcher,Statistician,Other",University courses,13,0,12,75,0,0,"Computer Vision,Machine Translation,Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Academic,"5,000 to 9,999 employees",Stayed the same,Don't know,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,"Image data,Relational data",Most of the time,100MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,Jupyter notebooks,Python,R,SAS Base,SQL,Other",,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,Sometimes,,,,Sometimes,,,,,,,Most of the time,,,"A/B Testing,Association Rules,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,SVMs,Time Series Analysis",Most of the time,Sometimes,,Sometimes,,Most of the time,Most of the time,Most of the time,Sometimes,,,,,Sometimes,,Most of the time,,Sometimes,,Sometimes,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Most of the time,,,,40,40,10,5,5,0,Enough to refine and innovate on the algorithm,"Dirty data,Explaining data science to others,Lack of significant domain expert input,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects,Organization is small and cannot afford a data science team,Privacy issues,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Most of the time,Most of the time,,,,,Sometimes,,Rarely,Often,,Most of the time,Most of the time,,Most of the time,Sometimes,,,100% of projects,Entirely internal,Standalone Team,"Federal, state, and city datasets",Dirty data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Never,86700,,Has stayed about the same (has not increased or decreased more than 5%),10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +A different identity,Other,98,Retired,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,35,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,1 to 2 years,Business Analyst,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,21,Employed part-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by college or university,TensorFlow,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Newsletters,Official documentation,Online courses,Personal Projects,Podcasts,Stack Overflow Q&A,Textbook,Trade book,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,Somewhat useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Very useful,Very useful,Very useful,"Becoming a Data Scientist Podcast,Emergent/Future Newsletter (Algorithmia),FastML Blog",,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Self-taught,50,10,25,10,5,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Mix of fields,10 to 19 employees,Stayed the same,Don't know,"A friend, family member, or former colleague told me",Somewhat important,Other,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Other","Image data,Other",Most of the time,100GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs,Other","C/C++,Java,Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow,Unix shell / awk,Other,Other,Other",,,,Sometimes,,,,,,,,,,,Rarely,,Often,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Rarely,,,,,Often,,Often,Often,Often,Rarely,"Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs",,Sometimes,Often,,,Most of the time,Most of the time,Often,Most of the time,,,Often,,,,Most of the time,,Often,,Sometimes,Most of the time,,Sometimes,,Sometimes,,,Often,,,,,,5,50,20,10,10,5,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Sometimes,Sometimes,,,Often,,,,,,Sometimes,,,,Sometimes,,,,51-75% of projects,More external than internal,Other,Cyverse public datastore;us gov open data,"For one of our four projects, the hyperspectral imagine data we work with HAS to go through a proprietary software that ONLY runs on Windows and does not allow us to process the images on TACC","Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Share Drive/SharePoint,Other",Cyverse;TACC,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Most of the time,36400,,I was not employed 3 years ago,7,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,18,Employed part-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,Self-taught,50,10,30,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Neural Networks - CNNs,A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,Don't know,"A friend, family member, or former colleague told me","N/A, I did not receive any formal education",Analyze and understand data to influence product or business decisions,,Relational data,Don't know,,Gradient Boosted Machines,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,27,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Other,1 to 2 years,"Data Analyst,Other",University courses,20,0,0,0,80,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",5,80,5,5,5,0,"Computer Vision,Time Series","Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - GANs",High school,Manufacturing,"10,000 or more employees",Increased slightly,More than 10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,25,Employed full-time,,,No,Yes,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Programmer,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,47,Employed full-time,,,No,Yes,Researcher,Fine,Employed by college or university,R,Time Series Analysis,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Textbook",,,Somewhat useful,,,,Very useful,,,,,Somewhat useful,,,Very useful,,,,"O'Reilly Data Newsletter,The Analytics Dispatch Newsletter",< 1 year,Nice to have,Necessary,Necessary,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,,,,,Laptop or Workstation and local IT supported servers,2 - 10 hours,Kaggle Competitions,No,Doctoral degree,Mathematics or statistics,3 to 5 years,"Predictive Modeler,Researcher",Self-taught,30,20,20,10,20,0,Time Series,Support Vector Machines (SVMs),A bachelor's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,1-2,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important,Very Important +Female,Other,28,Employed part-time,,,Yes,,Scientist/Researcher,,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,3 to 5 years,"Business Analyst,Data Analyst,Engineer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,60,18,0,2,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Blogs,College/University,YouTube Videos",Somewhat useful,Somewhat useful,Somewhat useful,,,,,,,,,,,,,,,Somewhat useful,"FlowingData Blog,Jack's Import AI Newsletter,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Researcher,I haven't started working yet",Self-taught,45,20,0,30,5,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A master's degree,Academic,I prefer not to answer,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Image data,Rarely,10GB,"CNNs,Decision Trees,GANs,Neural Networks,Random Forests","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,Perl,Python,Spark / MLlib,TensorFlow",,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,,,Sometimes,,,,,,,,,Rarely,Most of the time,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Decision Trees,GANs,Neural Networks,Random Forests",,,,Often,,Often,,Sometimes,,,Sometimes,,,,,,,,,Often,,,Sometimes,,,,,,,,,,,40,40,5,10,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Limitations of tools",,,,,Often,,,,,,,,Often,,,,,,,,,,Less than 10% of projects,More external than internal,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Email,Share Drive/SharePoint",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Sometimes,20000,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,India,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,"10,000 or more employees",Increased slightly,Less than one year,A career fair or on-campus recruiting event,Not very important,Other,Laptop or Workstation and private datacenters,,Never,,,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Cross-Validation,Decision Trees,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks",,,,,,Sometimes,,Sometimes,,,,,,Sometimes,,Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,,,50,0,50,0,0,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input,Limitations in the state of the art in machine learning",,Sometimes,,,,,,,Sometimes,,Sometimes,Sometimes,,,,,,,,,,,10-25% of projects,Entirely internal,,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,"Git,Subversion",Sometimes,2500000,INR,Has increased between 6% and 19%,2,,,,,,,,,,,,,,,,,, +Male,United States,NA,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,Data Scientist,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,26,Employed full-time,,,No,Yes,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,Other,Other,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Google Search,Government website,I collect my own data (e.g. web-scraping),University/Non-profit research group websites,Other","Non-Kaggle online communities,Online courses,Tutoring/mentoring",,,,,,,,,Very useful,,Very useful,,,,,,Very useful,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),I don't write code to analyze data,"Data Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A bachelor's degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,MATLAB/Octave,Neural Nets,Matlab,Google Search,College/University,,,Somewhat useful,,,,,,,,,,,,,,,,"Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman),Talking Machines Podcast,The Analytics Dispatch Newsletter",< 1 year,Necessary,Necessary,,Necessary,,Necessary,Necessary,,,,,,,,Basic laptop (Macbook),0 - 1 hour,Master's degree,No,Bachelor's degree,Engineering (non-computer focused),,I haven't started working yet,Other,NA,NA,NA,NA,NA,NA,Computer Vision,Support Vector Machines (SVMs),A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,0,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",60,20,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Mix of fields,10 to 19 employees,Stayed the same,6-10 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Don't know,100MB,"Neural Networks,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Microsoft Excel Data Mining,Python,R,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,Often,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,Rarely,,,,Sometimes,,,,,,,,,,Often,,,,30,30,10,20,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Japan,25,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,32,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Engineering (non-computer focused),6 to 10 years,Data Analyst,University courses,50,0,0,50,0,0,"Computer Vision,Survival Analysis,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Hospitality/Entertainment/Sports,100 to 499 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,10GB,"Decision Trees,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Microsoft Azure Machine Learning,Python,Spark / MLlib,SQL,Unix shell / awk",Rarely,Often,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,Rarely,,,,,,,,,Most of the time,,,,,,,,,,Most of the time,Most of the time,,,,,,Most of the time,,,,"Collaborative Filtering,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Time Series Analysis",,,,,Sometimes,,Most of the time,Sometimes,,,,,,,Often,Often,,,,,,,Often,,,,,,,Most of the time,,,,40,5,5,10,40,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources",,Sometimes,Often,,,,,,Sometimes,Often,,,,,,,,,,,,,10-25% of projects,More internal than external,Central Insights Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Bitbucket,Sometimes,170000,USD,Has increased between 6% and 19%,7,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,6 to 10 years,"Business Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,28,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,1 to 2 years,,University courses,0,0,10,90,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Insurance,"5,000 to 9,999 employees",,Don't know,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,MATLAB/Octave,Python,R,SAS Base,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",Python,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Kaggle,Stack Overflow Q&A",Very useful,,,,,,Very useful,,,,,,,Somewhat useful,,,,,"FastML Blog,No Free Hunch Blog",,,,,,,,,,,,,,,,,,,,Master's degree,Electrical Engineering,1 to 2 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Programmer",University courses,0,25,0,50,25,0,"Adversarial Learning,Natural Language Processing","Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,Text data,Sometimes,10GB,"CNNs,Neural Networks,Random Forests,RNNs","Amazon Web services,Python,TensorFlow",,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"CNNs,Cross-Validation,Data Visualization,Logistic Regression,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,RNNs,Segmentation,Text Analytics",,,,Often,,Most of the time,Often,,,,,,,,,Often,,,Most of the time,Often,,,Often,Sometimes,Often,Sometimes,,,Often,,,,,10,50,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,,,,,,,,,,,,,Sometimes,,,,,,,,,51-75% of projects,More external than internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Company Developed Platform,Share Drive/SharePoint",,Git,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Male,United States,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,,Data Scientist,Self-taught,50,10,30,5,5,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Internet-based,"10,000 or more employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,53,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,"10,000 or more employees",Stayed the same,More than 10 years,Some other way,Important,Other,"Laptop or Workstation and local IT supported servers,Traditional Workstation",Relational data,Rarely,1GB,Other,"Java,Jupyter notebooks,Python,QlikView,SAP BusinessObjects Predictive Analytics",,,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,Often,Sometimes,,,,,Often,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,20,10,30,30,10,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Limitations of tools,Maintaining responsible expectations about the potential impact of data science projects",Sometimes,Often,,,,,,Sometimes,Often,,,,Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,53,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,30,Employed full-time,,,Yes,,Computer Scientist,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,Software Developer/Software Engineer,Self-taught,75,20,0,0,5,0,"Computer Vision,Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,Academic,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,39,"Independent contractor, freelancer, or self-employed",,,No,Yes,Data Analyst,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,"Data Analyst,Operations Research Practitioner,Researcher",University courses,0,0,0,100,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",Primary/elementary school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,39,Employed part-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,30,I prefer not to say,No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Biology,,"Programmer,Other",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,,Other,NA,NA,NA,NA,NA,NA,Speech Recognition,Logistic Regression,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,27,Employed full-time,,,No,Yes,Statistician,Fine,Employed by non-profit or NGO,R,Regression,R,,Kaggle,,,,,,,Very useful,,,,,,,,,,,,R Bloggers Blog Aggregator,1-2 years,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Statistician,Self-taught,30,20,30,20,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,90,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",No education,Internet-based,20 to 99 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Build prototypes to explore applying machine learning to new areas,Traditional Workstation,"Text data,Relational data",Never,10MB,Neural Networks,"MATLAB/Octave,SQL",,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by company that makes advanced analytic software,TensorFlow,Anomaly Detection,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","Friends network,Kaggle,Official documentation,Online courses",,,,,,Very useful,Very useful,,,Very useful,Very useful,,,,,,,,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,"Data Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",20,30,35,5,10,0,"Natural Language Processing,Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A professional degree,Military/Security,20 to 99 employees,Increased significantly,Less than one year,A general-purpose job board,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Text data,,1GB,"Regression/Logistic Regression,SVMs,Other","Hadoop/Hive/Pig,Java,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL",,,,,,,,,Rarely,,,,,,Most of the time,,Often,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,Rarely,Often,,,,,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,PCA and Dimensionality Reduction,Segmentation,Text Analytics,Time Series Analysis",Sometimes,,,,Rarely,,Most of the time,,,,,,,,,,,,,,Sometimes,,,,,Often,,,Most of the time,Often,,,,70,15,5,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",Rarely,,,,Most of the time,Sometimes,,,,,,,,,,,,,Most of the time,,,,Less than 10% of projects,Entirely internal,IT Department,,dirty data,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Sometimes,13000,USD,Has increased 20% or more,6,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,21,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",49,50,0,0,1,0,Computer Vision,Neural Networks - CNNs,High school,Academic,100 to 499 employees,Increased slightly,Don't know,A general-purpose job board,Very important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),Image data,,100MB,"CNNs,Neural Networks",MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,50,30,10,5,5,0,Enough to tune the parameters properly,"Dirty data,Lack of data science talent in the organization,Limitations of tools",,,,,Often,,,,Sometimes,,,,Often,,,,,,,,,,,More internal than external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,University courses,20,0,50,30,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Technology,10 to 19 employees,Increased slightly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters",Text data,Sometimes,10MB,"CNNs,Ensemble Methods,Neural Networks,Regression/Logistic Regression,RNNs","Amazon Web services,Jupyter notebooks,NoSQL,Python,Spark / MLlib,SQL,TensorFlow,Unix shell / awk,Other,Other",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Most of the time,,,,,,,,,,Sometimes,Often,,,,Sometimes,,Most of the time,Sometimes,Sometimes,,"CNNs,Cross-Validation,Data Visualization,Ensemble Methods,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Text Analytics",,,,Sometimes,,Often,Most of the time,,Rarely,,,,,,,Most of the time,,,Most of the time,Often,Sometimes,,,Sometimes,Rarely,,,,Sometimes,,,,,40,20,10,10,10,10,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,25,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,30,30,0,0,30,"Adversarial Learning,Computer Vision","Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",High school,Internet-based,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Other",Image data,Most of the time,1TB,"CNNs,Ensemble Methods,Neural Networks","C/C++,Python,TensorFlow,Unix shell / awk,Other",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,Often,Sometimes,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,GANs,Neural Networks,Segmentation",Rarely,,,Most of the time,,Often,Most of the time,,,,Sometimes,,,,,,,,,Most of the time,,,,,,Often,,,,,,,,10,35,20,15,20,0,Enough to run the code / standard library,"Difficulties in deployment/scoring,Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Privacy issues",,,,Sometimes,Sometimes,,,,,,Often,Often,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Researcher,Statistician",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,36,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,Programmer,Software Developer/Software Engineer",University courses,15,0,0,80,5,0,,,"Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Increased slightly,Don't know,Some other way,Not very important,Other,Other,Text data,Never,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,48,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Physics,More than 10 years,"Researcher,Software Developer/Software Engineer",Self-taught,80,20,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,100 to 499 employees,Increased slightly,Less than one year,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed part-time,,,No,Yes,Other,Perfectly,Employed by non-profit or NGO,Python,I don't plan on learning a new ML/DS method,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Non-Kaggle online communities,Online courses",,,,,,Not Useful,,,Not Useful,,Somewhat useful,,,,,,,,O'Reilly Data Newsletter,< 1 year,Unnecessary,Unnecessary,Necessary,Unnecessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),0 - 1 hour,Kaggle Competitions,No,Bachelor's degree,Management information systems,I don't write code to analyze data,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,A bachelor's degree,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Searching general-purpose job board,,Somewhat important,Not important,Very Important,Somewhat important,Somewhat important,Not important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,India,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,23,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,28,Employed full-time,,,Yes,,Data Scientist,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,A health science,6 to 10 years,"Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher,Statistician",University courses,30,10,30,10,20,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Non-profit,"1,000 to 4,999 employees",Increased slightly,More than 10 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,"Not employed, but looking for work",,,,,,,,Python,,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",30,50,0,0,20,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,22,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,C/C++,Anomaly Detection,Python,University/Non-profit research group websites,"Blogs,College/University,Stack Overflow Q&A",,Somewhat useful,Very useful,,,,,,,,,,,Somewhat useful,,,,,,1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,Nice to have,Unnecessary,Unnecessary,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),0 - 1 hour,Experience from work in a company related to ML,No,Master's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,"Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Evolutionary Approaches,Neural Networks - CNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Not important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Not important,Somewhat important,Not important +Male,United Kingdom,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Software Developer/Software Engineer",Self-taught,90,10,0,0,0,0,,,No education,Mix of fields,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,South Korea,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,50,0,0,30,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)",Logistic Regression,I prefer not to answer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,44,Employed full-time,,,Yes,,Computer Scientist,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,Software Developer/Software Engineer,Self-taught,60,30,10,0,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",A master's degree,Retail,20 to 99 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,GPU accelerated Workstation,Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Java,QlikView,R,RapidMiner (free version),SAP BusinessObjects Predictive Analytics,Spark / MLlib,SQL,Tableau",,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,Often,Often,,Rarely,,Most of the time,,,,Sometimes,Most of the time,,,Often,,,,,,,"A/B Testing,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Segmentation,Time Series Analysis",Often,,,,Often,Often,Often,Often,,,,Often,,Most of the time,,Sometimes,,,,,Most of the time,Most of the time,Often,,,,,,,Often,,,,50,20,10,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Need to coordinate with IT,Organization is small and cannot afford a data science team,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",Sometimes,,,,Rarely,,,,,,,,,,Sometimes,Sometimes,,,Rarely,,Sometimes,,76-99% of projects,Approximately half internal and half external,Central Insights Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,"Not employed, but looking for work",,,,,,,,Amazon Machine Learning,Deep learning,Matlab,University/Non-profit research group websites,"College/University,Friends network,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,Very useful,,,Very useful,,,,,,Very useful,,Very useful,Very useful,,,Very useful,,5-10 years,,Nice to have,Nice to have,,Nice to have,,Nice to have,,,Nice to have,,,,,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Master's degree,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Data Analyst,Data Scientist,Machine Learning Engineer,Researcher",Self-taught,40,0,10,50,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Company's Web site/job listing page,6-10,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Very Important,Somewhat important,Somewhat important,Somewhat important,Somewhat important,Very Important +Male,India,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,Other,University courses,0,20,10,70,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs",A doctoral degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Always,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,25,"Not employed, but looking for work",,,,,,,,TensorFlow,Genetic & Evolutionary Algorithms,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Kaggle,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",10,10,30,50,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,32,Employed part-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,,,,,Somewhat useful,,,,Very useful,,,Somewhat useful,Very useful,,Somewhat useful,Very useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,25,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Retail,"10,000 or more employees",Increased slightly,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Rarely,10MB,"Neural Networks,Random Forests","Python,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Rarely,,,,,,,"Cross-Validation,Ensemble Methods,Neural Networks,Random Forests",,,,,,Most of the time,,,Sometimes,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,,,,,70,15,5,5,5,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Difficulties in deployment/scoring,Dirty data,Explaining data science to others,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,Often,Often,Most of the time,Often,,,,,,,,,,,,,,Often,Sometimes,,Less than 10% of projects,Entirely internal,IT Department,,cleaning data,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Email,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Sometimes,3500000,JPY,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,I don't write code to analyze data,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",20,40,0,0,40,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,Less than a year,Data Analyst,Work,15,15,30,0,30,10,"Natural Language Processing,Speech Recognition,Time Series",Decision Trees - Gradient Boosted Machines,I prefer not to answer,Hospitality/Entertainment/Sports,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,68,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,28,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Mathematics or statistics,3 to 5 years,"Data Scientist,Machine Learning Engineer",University courses,30,20,30,20,0,0,"Computer Vision,Machine Translation,Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",No education,Technology,20 to 99 employees,Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,100GB,Neural Networks,"Amazon Machine Learning,Google Cloud Compute,Java,Python,Spark / MLlib",Rarely,,,,,,,Rarely,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Sometimes,,,,,,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,SVMs",Sometimes,,,Often,Sometimes,Often,Often,,,,,,,Often,,Most of the time,,Rarely,Often,,Rarely,,,Sometimes,Often,Often,,Sometimes,,,,,,20,30,15,20,15,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,35,"Not employed, but looking for work",,,,,,,,SQL,,Python,,"Blogs,Company internal community,Kaggle,Official documentation,Online courses,YouTube Videos",,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,Somewhat useful,,1-2 years,,,,,,,,,,,,,,Coursera,,,,No,Master's degree,Mathematics or statistics,I don't write code to analyze data,"Business Analyst,Researcher",Work,0,0,100,0,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,22,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,36,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,I don't write code to analyze data,Other,Other,0,100,0,0,0,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hong Kong,22,Employed full-time,,,Yes,,Engineer,,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,Business Analyst,University courses,10,0,0,90,0,0,Machine Translation,Evolutionary Approaches,A bachelor's degree,Financial,500 to 999 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,26,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",80,20,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,52,Employed full-time,,,Yes,,Data Scientist,Fine,"Employed by professional services/consulting firm,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Other,More than 10 years,"Computer Scientist,Data Analyst,Data Scientist,Programmer,Other",University courses,30,20,10,40,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A master's degree,Military/Security,20 to 99 employees,Stayed the same,Don't know,A tech-specific job board,Very important,Other,Traditional Workstation,Text data,Never,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,Yes,,Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,A social science,3 to 5 years,"Engineer,Other",University courses,40,0,0,60,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Unsupervised Learning","Logistic Regression,Support Vector Machines (SVMs)",A master's degree,Technology,"10,000 or more employees",Stayed the same,,A general-purpose job board,Very important,Other,Laptop or Workstation and private datacenters,Relational data,Never,<1MB,Regression/Logistic Regression,"Python,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,Sometimes,,,,,,,,,,"Cross-Validation,Data Visualization,kNN and Other Clustering",,,,,,Sometimes,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,26,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,Software Developer/Software Engineer,Self-taught,60,10,30,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Internet-based,100 to 499 employees,Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"CNNs,Decision Trees,HMMs,SVMs","Jupyter notebooks,NoSQL,Python,R,SQL",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,HMMs,Lift Analysis,Neural Networks,PCA and Dimensionality Reduction,Time Series Analysis",Most of the time,,,Sometimes,,Rarely,Most of the time,,,,,,Rarely,,Often,,,,,Sometimes,Often,,,,,,,,,Often,,,,40,10,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Inability to integrate findings into organization's decision-making process,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,I don't write code to analyze data,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,41,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,Business Analyst,Self-taught,80,0,20,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,35,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,90,10,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,20 to 99 employees,Increased slightly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,62,Retired,,,Yes,,Other,Fine,Employed by non-profit or NGO,Cloudera,Text Mining,Matlab,"Google Search,Government website,I collect my own data (e.g. web-scraping)","Online courses,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",,,,,,,,,,,Very useful,Very useful,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,50,30,0,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Support Vector Machines (SVMs)",High school,Non-profit,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,29,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,0,0,10,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Other,100 to 499 employees,Decreased significantly,More than 10 years,A general-purpose job board,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Other,Rarely,100MB,"Bayesian Techniques,Neural Networks,Random Forests,SVMs","MATLAB/Octave,Python",,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,kNN and Other Clustering,Logistic Regression,Neural Networks,Random Forests,SVMs,Time Series Analysis",,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,Sometimes,,,Sometimes,,,,,Often,,Most of the time,,,,50,30,0,10,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,Sometimes,,,Most of the time,Rarely,,Often,Often,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Management information systems,1 to 2 years,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,33,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A social science,6 to 10 years,"Business Analyst,Data Analyst",Work,40,0,60,0,0,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU),GPU accelerated Workstation,Laptop or Workstation and private datacenters,Traditional Workstation","Image data,Text data,Relational data",Always,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Machine Learning,Python,R",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Lift Analysis,Logistic Regression,Markov Logic Networks,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,"Programmer,Other",Self-taught,95,0,0,0,5,0,Supervised Machine Learning (Tabular Data),Neural Networks - CNNs,A bachelor's degree,Government,10 to 19 employees,Increased slightly,Don't know,Some other way,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","GPU accelerated Workstation,Laptop or Workstation and local IT supported servers","Text data,Relational data",Sometimes,100MB,"Neural Networks,Regression/Logistic Regression","C/C++,Python,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,20,40,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Inability to integrate findings into organization's decision-making process,Organization is small and cannot afford a data science team,Scaling data science solution up to full database,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,Sometimes,,,,,,,,Sometimes,,Rarely,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,32,Employed full-time,,,Yes,,Computer Scientist,Fine,"Employed by professional services/consulting firm,Employed by college or university",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Other,,28,2,3,65,2,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Neural Networks - RNNs,Support Vector Machines (SVMs)",High school,Academic,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),"Image data,Text data",Rarely,10MB,"Bayesian Techniques,Ensemble Methods,Neural Networks,Random Forests,SVMs","C/C++,Java,MATLAB/Octave,R,SQL",,,,Sometimes,,,,,,,,,,,Often,,,,,,Most of the time,,,,,,,,,,,,Sometimes,,,,,,,,,Rarely,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Segmentation,SVMs",,,Often,,,Most of the time,,,,,,,,,,Sometimes,,Often,,Most of the time,,,Most of the time,,,Often,,Most of the time,,,,,,30,40,25,3,2,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Lack of funds to buy useful datasets from external sources,Limitations of tools,Organization is small and cannot afford a data science team",Often,,Often,,,,,,,Most of the time,,,Sometimes,,,Most of the time,,,,,,,None,Entirely external,Standalone Team,UCI and KEEL,Generate patterns from raw data. ,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)","Commercial Data Platform,Email",,Generic cloud file sharing software (Dropbox/Box/etc.),Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,1 to 2 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",40,30,0,30,0,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"1,000 to 4,999 employees",Increased significantly,6-10 years,Some other way,Important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,1GB,"Decision Trees,Markov Logic Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,Hadoop/Hive/Pig,Python",,Often,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,SVMs",,,,Rarely,,Sometimes,Often,Often,,,,,,,,,,,Sometimes,,,,Sometimes,,,,,Rarely,,,,,,10,30,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Lack of data science talent in the organization,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Brazil,29,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Other,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Other,30,40,30,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Other,"10,000 or more employees",,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,38,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Mexico,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Computer Science,1 to 2 years,"Engineer,Machine Learning Engineer,Programmer,Researcher,Statistician",Kaggle competitions,0,0,50,0,50,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs",Primary/elementary school,Government,"1,000 to 4,999 employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Other",Always,1MB,Other,"C/C++,Java,Oracle Data Mining/ Oracle R Enterprise,R,SQL",,,,Rarely,,,,,,,,,,,Rarely,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,Sometimes,,,,,,,,,,A/B Testing,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,70,10,10,0,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Limitations of tools",Most of the time,,,,Most of the time,,,,Most of the time,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,19,"Not employed, but looking for work",,,,,,,,Microsoft SQL Server Data Mining,Proprietary Algorithms,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Very useful,,,,Very useful,,,Very useful,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Stories Podcast,The Data Skeptic Podcast",,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,Less than a year,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,,,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",60,35,0,0,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,NoSQL,Social Network Analysis,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,"Business Analyst,DBA/Database Engineer,Software Developer/Software Engineer",University courses,30,40,5,25,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,,,,,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Laptop or Workstation and private datacenters,Relational data,Rarely,<1MB,"Bayesian Techniques,Decision Trees,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,SQL,Other",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,"Cross-Validation,Data Visualization,Decision Trees,Naive Bayes,Prescriptive Modeling",,,,,,Often,Most of the time,Often,,,,,,,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,35,15,10,15,25,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Limitations in the state of the art in machine learning,Organization is small and cannot afford a data science team,Privacy issues",Often,,,,Often,Often,,,Most of the time,Often,,Often,,,,Most of the time,Sometimes,,,,,,51-75% of projects,Entirely internal,IT Department,,,"Column-oriented relational (e.g. KDB/MariaDB),Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,"Bitbucket,Other",Sometimes,"12,000,000",UGX,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +,United States,55,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,3 to 5 years,Researcher,Kaggle competitions,30,0,0,0,70,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,100 to 499 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,63,Employed full-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,A humanities discipline,More than 10 years,Researcher,Self-taught,50,0,0,0,50,0,Supervised Machine Learning (Tabular Data),Decision Trees - Gradient Boosted Machines,I don't know/not sure,Academic,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Other,Never,10MB,"Regression/Logistic Regression,Other","Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,Rarely,,,,,,,,,,,,,,,,,,,"Logistic Regression,Simulation,Time Series Analysis",,,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,Often,,,,30,50,0,0,20,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of data science talent in the organization,Lack of funds to buy useful datasets from external sources,Privacy issues",,,,,,,,,Sometimes,Sometimes,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,Spark / MLlib,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website","College/University,Kaggle,Online courses,Podcasts,Tutoring/mentoring,YouTube Videos",,,Very useful,,,,Very useful,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Somewhat useful,"Becoming a Data Scientist Podcast,KDnuggets Blog,R Bloggers Blog Aggregator",< 1 year,Nice to have,Nice to have,Nice to have,Unnecessary,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,,,,"Coursera,DataCamp,Udacity",Traditional Workstation,11 - 39 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,"Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tech-specific job board,0,Very Important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Somewhat important,Very Important,Somewhat important,Very Important,Very Important,Somewhat important,Somewhat important,Very Important,Very Important,Somewhat important +Male,United States,28,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",30,20,0,50,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs",A bachelor's degree,Academic,I don't know,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Traditional Workstation",Text data,Rarely,10GB,"Bayesian Techniques,CNNs,HMMs,Markov Logic Networks,Neural Networks,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,51,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Data Analyst,DBA/Database Engineer","Online courses (coursera, udemy, edx, etc.)",20,50,20,0,10,0,"Outlier detection (e.g. Fraud detection),Time Series",Decision Trees - Gradient Boosted Machines,A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters",Text data,Sometimes,100MB,Decision Trees,"Hadoop/Hive/Pig,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,R,SQL,Tableau,Unix shell / awk",,,,,,,,,Rarely,,,,,,,,,,,,,Sometimes,Most of the time,,,,,,,,,,Often,,,,,,,,,Most of the time,,,Often,,,Sometimes,,,,Decision Trees,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,29,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,24,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Monte Carlo Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,Software Developer/Software Engineer,University courses,20,0,0,80,0,0,"Computer Vision,Recommendation Engines","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,25,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher",University courses,10,10,20,50,10,0,"Computer Vision,Reinforcement learning,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Non-profit,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",Text data,Most of the time,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Java,MATLAB/Octave,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,Python,QlikView,R,SAS Base,SQL,Tableau,TensorFlow",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,,Often,,Often,,,,,,,,Most of the time,Sometimes,Most of the time,,,,,Most of the time,,,,Sometimes,,,Often,Most of the time,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,SVMs,Text Analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,30,20,10,10,30,0,Enough to tune the parameters properly,Explaining data science to others,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,41,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,I don't write code to analyze data,"Business Analyst,Other","Online courses (coursera, udemy, edx, etc.)",0,70,30,0,0,0,Computer Vision,Neural Networks - CNNs,A bachelor's degree,Other,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Other,"Text data,Relational data",Always,,,"Google Cloud Compute,Python",,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,CNNs,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,100,Enough to tune the parameters properly,Need to coordinate with IT,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,39,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),More than 10 years,"Engineer,Researcher",Kaggle competitions,50,0,0,0,50,0,,,A master's degree,Technology,20 to 99 employees,Stayed the same,Don't know,An external recruiter or headhunter,Not at all important,Other,Other,Text data,Never,,,MATLAB/Octave,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,5,0,0,10,0,85,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Researcher,Software Developer/Software Engineer",University courses,8,2,90,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Technology,10 to 19 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Sometimes,1GB,"Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,Often,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,25,Employed full-time,,,No,Yes,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Programmer,Fine,"Employed by college or university,Employed by non-profit or NGO,Employed by government,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer,Other",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Other,"10,000 or more employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Not at all important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service","Image data,Video data",Always,100GB,"CNNs,Ensemble Methods,Gradient Boosted Machines,Neural Networks,RNNs","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,TensorFlow",,Most of the time,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,RNNs",Often,,,Often,,Most of the time,Most of the time,Often,Most of the time,,,Most of the time,,,,Sometimes,,,Sometimes,Most of the time,Often,,,,Sometimes,,,,,,,,,50,5,15,15,15,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Lack of data science talent in the organization",,,,,Most of the time,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,23,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",University courses,10,25,30,30,5,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Financial,"5,000 to 9,999 employees",Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,30,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Blogs,College/University,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Podcasts,Textbook",,Somewhat useful,Very useful,,,,Somewhat useful,,Somewhat useful,Very useful,Somewhat useful,Very useful,Somewhat useful,,Very useful,,,,"Becoming a Data Scientist Podcast,Siraj Raval YouTube Channel",,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,,"Business Analyst,Data Analyst,Data Miner,Predictive Modeler,Researcher,Software Developer/Software Engineer",Self-taught,NA,NA,NA,NA,NA,NA,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,30,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Miner,Data Scientist",University courses,10,20,30,10,30,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",High school,Other,100 to 499 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,1TB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,Python,R,SQL,Tableau,Unix shell / awk",,Most of the time,,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,,Often,,,Rarely,,,Often,,,,"A/B Testing,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests",Often,,,,,Sometimes,Often,Sometimes,Rarely,,,Sometimes,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",0,35,45,0,10,10,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A doctoral degree,Financial,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression,SVMs","Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,Often,,,,,Often,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Malaysia,49,Employed full-time,,,Yes,,Machine Learning Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,Software Developer/Software Engineer,Work,0,30,30,40,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - CNNs,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Manufacturing,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and local IT supported servers,Relational data,Sometimes,10GB,"CNNs,Decision Trees,SVMs","C/C++,MATLAB/Octave,Oracle Data Mining/ Oracle R Enterprise,Perl,RapidMiner (free version),SQL",,,,Often,,,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,Often,,,,,Sometimes,,,,,,,Often,,,,,,,,,,"CNNs,Data Visualization,kNN and Other Clustering,SVMs,Time Series Analysis",,,,Often,,,Often,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,0,60,0,20,20,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,19,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Physics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Speech Recognition,Decision Trees - Random Forests,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Python,Social Network Analysis,Python,I collect my own data (e.g. web-scraping),YouTube Videos,,,,,,,,,,,,,,,,,,Somewhat useful,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Siraj Raval YouTube Channel",< 1 year,,,,,Necessary,Necessary,,,,,,,,,Basic laptop (Macbook),11 - 39 hours,Experience from work in a company related to ML,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,100,0,0,0,0,0,Natural Language Processing,Bayesian Techniques,High school,Retail,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,24,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,1 to 2 years,,Self-taught,40,0,50,0,10,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),"Some college/university study, no bachelor's degree",Insurance,100 to 499 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Sometimes,1GB,Other,"Microsoft Excel Data Mining,QlikView,R",,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Rarely,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Segmentation,Text Analytics",,,,,,,Often,,,,,,,,,,,,,,,,,,,Most of the time,,,Sometimes,,,,,50,10,10,10,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Inability to integrate findings into organization's decision-making process,The lack of a clear question to be answering or a clear direction to go in with the available data",,Sometimes,,,Most of the time,,,Often,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,20,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,36,Employed full-time,,,No,Yes,Other,Fine,Employed by government,Jupyter notebooks,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping),University/Non-profit research group websites","Arxiv,Blogs,College/University,Friends network,Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",10,20,30,20,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",High school,Military/Security,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Software Developer/Software Engineer",Self-taught,30,30,35,5,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,20 to 99 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Other,Rarely,,Regression/Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,21,Employed full-time,,,Yes,,Data Scientist,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Electrical Engineering,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,80,10,10,0,0,0,"Natural Language Processing,Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Increased slightly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Workstation + Cloud service",Relational data,Sometimes,10GB,"CNNs,Decision Trees,Random Forests","Amazon Machine Learning,Google Cloud Compute,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,SQL,TensorFlow",Often,,,,,,,Sometimes,Often,,,,,,,,Most of the time,,,,Sometimes,Sometimes,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,"A/B Testing,CNNs,Cross-Validation,Lift Analysis,Naive Bayes,Neural Networks,Random Forests,Recommender Systems,Text Analytics",Most of the time,,,Often,,Most of the time,,,,,,,,,Sometimes,,,Sometimes,,Sometimes,,,Often,Most of the time,,,,,Often,,,,,40,10,0,50,0,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",0,80,20,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A master's degree,Retail,"10,000 or more employees",Increased significantly,3-5 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,,,Other,"Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,SQL,Tableau",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,,,,,,,,,,Sometimes,,,Sometimes,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,25,0,0,50,25,0,Enough to run the code / standard library,"Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization",,,,,,,,Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Psychology,6 to 10 years,"Business Analyst,Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",75,25,0,0,0,0,"Reinforcement learning,Supervised Machine Learning (Tabular Data)","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A master's degree,Government,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Not very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation,Workstation + Cloud service","Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,25,"Not employed, and not looking for work",No,"Yes, but data science is a small part of what I'm focused on learning",,,,,,R,Cluster Analysis,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Government website","Blogs,College/University,Kaggle,Textbook,YouTube Videos",,Somewhat useful,Not Useful,,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,,< 1 year,Necessary,Nice to have,Nice to have,,,Necessary,Necessary,Nice to have,Unnecessary,Unnecessary,,,,,Basic laptop (Macbook),0 - 1 hour,Experience from work in a company related to ML,No,Bachelor's degree,Computer Science,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,Supervised Machine Learning (Tabular Data),,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,A humanities discipline,Less than a year,Other,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,35,Employed full-time,,,Yes,,Data Scientist,,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A health science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Scientist,Operations Research Practitioner,Predictive Modeler,Researcher",Self-taught,50,10,20,20,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs",,Mix of fields,"5,000 to 9,999 employees",Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and private datacenters,Traditional Workstation",Relational data,Rarely,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Jupyter notebooks,Python,R,SAS Enterprise Miner,SQL,Tableau",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,Rarely,,,Most of the time,,,Most of the time,,,,,,,"Data Visualization,Decision Trees,Ensemble Methods",,,,,,,Most of the time,Sometimes,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,70,0,0,30,0,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data",,Often,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Work,40,40,20,0,0,0,Natural Language Processing,"Bayesian Techniques,Decision Trees - Random Forests,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,I don't know,Increased slightly,More than 10 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,3 to 5 years,"Data Miner,Engineer,Programmer",University courses,10,10,20,60,0,0,Recommendation Engines,"Bayesian Techniques,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - GANs",A bachelor's degree,Technology,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,India,24,Employed full-time,,,No,Yes,Other,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,"Business Analyst,Data Analyst,Engineer",,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,30,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Data Analyst,Engineer",Self-taught,10,80,10,0,0,0,Recommendation Engines,Neural Networks - CNNs,A bachelor's degree,Technology,100 to 499 employees,Stayed the same,Less than one year,A general-purpose job board,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Text data,Relational data",Never,100MB,Bayesian Techniques,"Amazon Web services,C/C++,Java,Python",,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,20,80,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Limitations of tools,Organization is small and cannot afford a data science team",Most of the time,,,,,,,,,,,,Most of the time,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,32,Employed full-time,,,Yes,,Engineer,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,,,50,10,30,0,10,0,,,,,,,,,,,,,,,,"C/C++,Python,Spark / MLlib,SQL",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,Rarely,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,30,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Other,,Researcher,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Argentina,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Computer Scientist,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,50,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Logistic Regression,Other (please specify; separate by semi-colon)","Some college/university study, no bachelor's degree",Academic,"10,000 or more employees",Decreased slightly,Don't know,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,38,Employed full-time,,,Yes,,Researcher,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,"Operations Research Practitioner,Researcher",Self-taught,70,10,20,0,0,0,"Adversarial Learning,Recommendation Engines",Logistic Regression,High school,CRM/Marketing,100 to 499 employees,Stayed the same,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,37,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Poorly,"Employed by a company that doesn't perform advanced analytics,Employed by non-profit or NGO,Self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Some college/university study without earning a bachelor's degree,I never declared a major,More than 10 years,"Computer Scientist,Data Miner,DBA/Database Engineer,Software Developer/Software Engineer,Other",Kaggle competitions,50,35,0,0,15,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Non-profit,,,,,Not very important,Other,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,10GB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,SQL,TensorFlow,Unix shell / awk,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,"Independent contractor, freelancer, or self-employed",,,Yes,,Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,1 to 2 years,"Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,20,30,10,30,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data",Sometimes,10GB,"Regression/Logistic Regression,SVMs,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Statistician",University courses,10,20,0,60,10,0,"Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series","Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,An external recruiter or headhunter,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,Relational data,Always,10GB,Regression/Logistic Regression,"IBM SPSS Statistics,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R,SAS Base,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,Sometimes,,Often,,,,,,,,Sometimes,,,,,Most of the time,,,,Most of the time,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Logistic Regression,Simulation,Time Series Analysis",,,,,,Often,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,,Often,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,42,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,Telecommunications,Fewer than 10 employees,Stayed the same,Less than one year,Some other way,Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),"Text data,Relational data",Sometimes,100MB,"Bayesian Techniques,Decision Trees,RNNs","Amazon Machine Learning,Python,R,TensorFlow,Unix shell / awk",Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Often,,Sometimes,,,,"Bayesian Techniques,Decision Trees,Logistic Regression,RNNs",,,Most of the time,,,,,Often,,,,,,,,Often,,,,,,,,,Often,,,,,,,,,40,30,10,0,20,0,Enough to tune the parameters properly,"Dirty data,Unavailability of/difficult access to data",,,,,Sometimes,,,,,,,,,,,,,,,,Most of the time,,10-25% of projects,More internal than external,Standalone Team,None,Data cleanup takes majority time,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Bitbucket,Sometimes,3500000,INR,Other,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Other,Self-taught,50,0,50,0,0,0,"Computer Vision,Natural Language Processing","Logistic Regression,Support Vector Machines (SVMs)",Primary/elementary school,Financial,100 to 499 employees,Stayed the same,1-2 years,Some other way,Important,Other,Traditional Workstation,"Text data,Relational data",Sometimes,100MB,"Neural Networks,Regression/Logistic Regression","C/C++,Python",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Cross-Validation,Logistic Regression,Naive Bayes,Natural Language Processing,Text Analytics,Time Series Analysis",,,,,,Sometimes,,,,,,,,,,Sometimes,,,Often,,,,,,,,,,Often,Often,,,,60,0,40,0,0,0,Enough to run the code / standard library,Company politics / Lack of management/financial support for a data science team,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,25,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,35,Employed full-time,,,Yes,,Programmer,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,More than 10 years,Other,Work,0,0,100,0,0,0,"Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Support Vector Machines (SVMs)",A master's degree,Technology,20 to 99 employees,Increased significantly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Text data,Always,1TB,Bayesian Techniques,Amazon Web services,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Naive Bayes,Natural Language Processing,Neural Networks,Recommender Systems",Sometimes,,Often,,,,,,,,,,,,,,,Often,Most of the time,Often,,,,Often,,,,,,,,,,20,40,20,10,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Team using multiple ad-hoc development environments such as Python/R/Java/etc.",,,,,Often,,,,,,,,,,,,,,Often,,,,10-25% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Company Developed Platform,Email",,"Generic cloud file sharing software (Dropbox/Box/etc.),Git",Sometimes,12000,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +Male,Italy,33,Employed full-time,,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist",University courses,10,0,10,80,0,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Financial,"10,000 or more employees",Increased slightly,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,45,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),6 to 10 years,Researcher,Self-taught,100,0,0,0,0,0,Time Series,"Logistic Regression,Markov Logic Networks",A master's degree,Manufacturing,"10,000 or more employees",,Less than one year,Some other way,Somewhat important,Other,Laptop or Workstation and private datacenters,Relational data,Rarely,10MB,"Markov Logic Networks,Regression/Logistic Regression","Mathematica,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Minitab,SAS JMP,Tableau",,,,,,,,,,,,,,,,,,,,Most of the time,,Sometimes,Sometimes,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,,Sometimes,,,,,,,"Logistic Regression,Markov Logic Networks,Time Series Analysis",,,,,,,,,,,,,,,,Often,Often,,,,,,,,,,,,,Often,,,,70,20,0,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,I prefer not to say,Unavailability of/difficult access to data,Other",,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Russia,31,Employed part-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Physics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",30,10,20,0,40,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression",A professional degree,Internet-based,100 to 499 employees,Increased slightly,Less than one year,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Canada,23,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,I haven't started working yet,University courses,10,10,0,80,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - CNNs,A doctoral degree,Academic,I don't know,,,,Important,,,Text data,,,"CNNs,Neural Networks,RNNs","Java,Python",,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"CNNs,Decision Trees,Neural Networks,RNNs",,,,Often,,,,Often,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,,,,,,20,60,0,0,20,0,Enough to tune the parameters properly,I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,28,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Psychology,More than 10 years,,Self-taught,78,10,10,0,2,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Neural Networks - RNNs,A doctoral degree,Academic,10 to 19 employees,Stayed the same,More than 10 years,Some other way,Important,Other,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...)",Other,Never,1GB,"CNNs,Neural Networks,RNNs","C/C++,Google Cloud Compute,MATLAB/Octave,Python",,,,Rarely,,,,Sometimes,,,,,,,,,,,,,Most of the time,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,"CNNs,Cross-Validation,Neural Networks,PCA and Dimensionality Reduction,RNNs,Simulation,Time Series Analysis",,,,Sometimes,,Sometimes,,,,,,,,,,,,,,Often,Most of the time,,,,Most of the time,,Often,,,Often,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,No,Yes,Other,Fine,Employed by professional services/consulting firm,QlikView,Neural Nets,R,Google Search,"Other,Other",,,,,,,,,,,,,,,,,,,Talking Machines Podcast,,,,,,,,,,,,,,,,,,,No,Professional degree,,I don't write code to analyze data,Other,Self-taught,NA,NA,NA,NA,NA,100,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A professional degree,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,6 to 10 years,"Computer Scientist,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",90,10,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression","Some college/university study, no bachelor's degree",Financial,500 to 999 employees,Stayed the same,6-10 years,Some other way,Important,Other,"Basic laptop (Macbook),Laptop or Workstation and private datacenters","Relational data,Other",Sometimes,10TB,"Decision Trees,Gradient Boosted Machines,Regression/Logistic Regression","C/C++,NoSQL,Python,R,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Sometimes,,Rarely,,,,,,,,,,,,,,,Most of the time,,,,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,Time Series Analysis",,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,Most of the time,,,,90,7,2,1,0,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Unavailability of/difficult access to data",,,,,Most of the time,,,,,,,,,,,,,,,,Most of the time,,10-25% of projects,Entirely internal,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Other,NFS,Bitbucket,Rarely,1,USD,Has increased 20% or more,9,,,,,,,,,,,,,,,,,, +A different identity,Iran,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),1 to 2 years,Data Scientist,Work,40,10,30,20,0,0,Time Series,Logistic Regression,"Some college/university study, no bachelor's degree",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Pakistan,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Switzerland,60,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,47,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Management information systems,More than 10 years,"Data Analyst,DBA/Database Engineer,Programmer",Self-taught,35,30,35,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",A master's degree,Mix of fields,"10,000 or more employees",Stayed the same,3-5 years,,Very important,,Laptop or Workstation and private datacenters,"Text data,Relational data",,,,"Amazon Machine Learning,Amazon Web services,Python",Often,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,Random Forests,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,30,33,33,1,0,0,Enough to refine and innovate on the algorithm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,36,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,Less than a year,"Business Analyst,Computer Scientist,Data Scientist,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer",University courses,0,0,50,0,10,40,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",A master's degree,Technology,Fewer than 10 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Very important,Research that advances the state of the art of machine learning,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs","Google Cloud Compute,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,Sometimes,Often,,,,Sometimes,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Often,,,Often,Most of the time,Most of the time,,,,,,,,Often,,Often,Sometimes,Sometimes,,,Often,,,,,Often,Most of the time,Most of the time,,,,80,5,5,5,5,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,34,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,"Programmer,Software Developer/Software Engineer",University courses,0,0,0,100,0,0,,,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,France,29,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Data Analyst,University courses,25,10,15,50,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Academic,100 to 499 employees,Stayed the same,Don't know,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Other,Rarely,10MB,Decision Trees,"Jupyter notebooks,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL",,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Rarely,,,Most of the time,,Rarely,,,,,,,,Sometimes,Often,,,,,,,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,0,100,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,51-75% of projects,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,Generic cloud file sharing software (Dropbox/Box/etc.),Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,39,"Independent contractor, freelancer, or self-employed",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,54,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,21,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,5,83,1,1,0,,,A master's degree,Financial,20 to 99 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and private datacenters,Text data,,100MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,SVMs","C/C++,Java,Jupyter notebooks,Python,SQL,TensorFlow",,,,Rarely,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,Employed full-time,,,Yes,,Other,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Fine arts or performing arts,6 to 10 years,"Business Analyst,Data Analyst","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,0,,,A professional degree,Mix of fields,"1,000 to 4,999 employees",Increased significantly,1-2 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",Never,1GB,"Decision Trees,Random Forests,Regression/Logistic Regression","Amazon Web services,Jupyter notebooks,Python,R,SQL,Tableau,TensorFlow",,Rarely,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Often,,Most of the time,,,,,,,,,Most of the time,,,Often,Rarely,,,,,,"A/B Testing,Collaborative Filtering,Data Visualization,Decision Trees,Natural Language Processing,Segmentation,Text Analytics,Time Series Analysis",Often,,,,Sometimes,,Most of the time,Sometimes,,,,,,,,,,,Sometimes,,,,,,,Sometimes,,,Rarely,Often,,,,30,0,0,10,60,0,Enough to run the code / standard library,"Maintaining responsible expectations about the potential impact of data science projects,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,50,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,,6 to 10 years,"Business Analyst,Researcher",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,31,"Independent contractor, freelancer, or self-employed",,,No,Yes,Other,Perfectly,Self-employed,TensorFlow,Deep learning,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Online courses,Stack Overflow Q&A,Textbook",,,,,,,,,,,Somewhat useful,,,Very useful,Somewhat useful,,,,No Free Hunch Blog,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Physics,3 to 5 years,"Data Analyst,Predictive Modeler,Researcher",University courses,0,85,5,10,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Data Analyst,Researcher",Self-taught,60,10,30,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,"10,000 or more employees",Increased slightly,Don't know,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Always,1GB,"Bayesian Techniques,CNNs,Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,RNNs,SVMs","Amazon Machine Learning,Jupyter notebooks,Python,R,SAS Base,Spark / MLlib,Tableau,TensorFlow,Unix shell / awk",Often,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,Often,,,,,,,Most of the time,Often,,Often,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,RNNs,SVMs",,,Often,Often,,Often,Most of the time,Often,Often,,,Often,,Often,,Sometimes,,Sometimes,Most of the time,,,,Often,,Often,,,Sometimes,,,,,,50,15,15,10,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,26,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,70,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,Less than a year,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,3 to 5 years,"Business Analyst,Engineer","Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Natural Language Processing,Logistic Regression,,Mix of fields,,,,,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and local IT supported servers,Text data,Always,10GB,Neural Networks,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Data Scientist,Perfectly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Computer Science,3 to 5 years,"Engineer,Researcher,Software Developer/Software Engineer",Work,15,30,50,0,5,0,"Adversarial Learning,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",I don't know/not sure,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,38,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,Python,Text Mining,R,GitHub,Stack Overflow Q&A,,,,,,,,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,More than 10 years,"Data Analyst,Data Scientist,Programmer,Statistician","Online courses (coursera, udemy, edx, etc.)",0,40,20,30,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",I don't know/not sure,Government,100 to 499 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,1MB,"Random Forests,Regression/Logistic Regression","Amazon Machine Learning,IBM SPSS Statistics,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft R Server (Formerly Revolution Analytics),Microsoft SQL Server Data Mining,Python,R",Often,,,,,,,,,,,Rarely,,,,,Rarely,,,,,Often,Most of the time,Sometimes,Sometimes,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",Often,Sometimes,,,,Often,Most of the time,Most of the time,,,,,,,,Most of the time,,,,,Most of the time,,Often,Often,,Often,,,Often,Often,,,,30,10,20,30,10,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,100% of projects,Do not know,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",I don't typically share data,,Git,Never,55000,EUR,Has increased between 6% and 19%,5,,,,,,,,,,,,,,,,,, +Male,Turkey,22,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Computer Science,Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,29,Employed full-time,,,Yes,,Machine Learning Engineer,Perfectly,"Employed by college or university,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,3 to 5 years,"Researcher,Software Developer/Software Engineer",University courses,60,5,30,5,0,0,"Adversarial Learning,Computer Vision,Outlier detection (e.g. Fraud detection),Time Series","Bayesian Techniques,Neural Networks - CNNs,Neural Networks - RNNs",A doctoral degree,Technology,10 to 19 employees,Increased slightly,1-2 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Workstation + Cloud service","Image data,Video data,Text data",Always,,"Bayesian Techniques,CNNs,Neural Networks,RNNs","C/C++,Java,MATLAB/Octave,Python,TensorFlow",,,,Rarely,,,,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,,,,,"Bayesian Techniques,CNNs,Collaborative Filtering,Decision Trees,HMMs,kNN and Other Clustering,Markov Logic Networks,Neural Networks,RNNs,Time Series Analysis",,,Sometimes,Most of the time,Often,,,Sometimes,,,,,Sometimes,Sometimes,,,Sometimes,,,Often,,,,,Most of the time,,,,,Often,,,,40,40,6,10,4,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Dirty data,Limitations in the state of the art in machine learning,Limitations of tools,Privacy issues,Unavailability of/difficult access to data",,Sometimes,,,Most of the time,,,,,,,Sometimes,Often,,,,Sometimes,,,,Most of the time,,51-75% of projects,More external than internal,Central Insights Team,Do not wish to disclose,Do not wish to disclose,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",I don't typically share data,,Git,Sometimes,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,58,Employed full-time,,,No,Yes,Other,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,I don't write code to analyze data,Other,Self-taught,70,10,10,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Evolutionary Approaches,Neural Networks - RNNs",Primary/elementary school,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,38,Employed full-time,,,Yes,,Other,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,30,35,30,0,5,0,"Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A professional degree,Technology,Fewer than 10 employees,Increased slightly,1-2 years,Some other way,Somewhat important,Build prototypes to explore applying machine learning to new areas,Gaming Laptop (Laptop + CUDA capable GPU),"Image data,Text data",Sometimes,1GB,"CNNs,Decision Trees,Ensemble Methods,Evolutionary Approaches,Gradient Boosted Machines,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Jupyter notebooks,Python,R,Spark / MLlib,TensorFlow",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,Rarely,,,,,,,,Sometimes,,,,,Sometimes,,,,,,"A/B Testing,CNNs,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Evolutionary Approaches,GANs,Gradient Boosted Machines,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,Recommender Systems,RNNs,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Often,Sometimes,Often,Sometimes,Sometimes,Often,Sometimes,Sometimes,Sometimes,Rarely,Sometimes,,Often,,Rarely,Often,Often,Sometimes,,Often,Sometimes,Often,,,Sometimes,Often,Sometimes,,,,30,30,30,10,0,0,"Enough to code it again from scratch, albeit it may run slowly","Organization is small and cannot afford a data science team,Unavailability of/difficult access to data",,,,,,,,,,,,,,,,Most of the time,,,,,Often,,Less than 10% of projects,Entirely internal,IT Department,web scraping,"too few data, hard to obtain","Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Other",dropbox,"Bitbucket,Git",Sometimes,,,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Male,Other,56,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by professional services/consulting firm,IBM Watson / Waton Analytics,Anomaly Detection,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search","Kaggle,Online courses,Personal Projects,Textbook",,,,,,,Somewhat useful,,,,Very useful,Somewhat useful,,,Very useful,,,,R Bloggers Blog Aggregator,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",0,90,8,0,2,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Other,20 to 99 employees,Stayed the same,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Sometimes,10MB,"Decision Trees,Random Forests,Regression/Logistic Regression","Perl,Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Logistic Regression,Random Forests",,,,,,,Often,,,,,,,,,Often,,,,,,,Rarely,,,,,,,,,,,30,50,5,15,0,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Often,,,,Most of the time,,,,,,,,,,,,,,,Often,Sometimes,,100% of projects,Entirely internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.)",Sometimes,50000,USD,Has stayed about the same (has not increased or decreased more than 5%),4,,,,,,,,,,,,,,,,,, +Male,United States,57,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,More than 10 years,"Data Scientist,Statistician",Self-taught,10,0,90,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Gradient Boosting",High school,Financial,"1,000 to 4,999 employees",Increased slightly,1-2 years,A general-purpose job board,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,"Text data,Relational data",Always,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods","Python,R,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,Most of the time,Sometimes,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Decision Trees,Ensemble Methods,Naive Bayes,Natural Language Processing,Recommender Systems",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,SQL,Social Network Analysis,R,"Government website,University/Non-profit research group websites","College/University,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,,,,,,Very useful,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Mathematics or statistics,,"Programmer,Software Developer/Software Engineer,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,53,Employed full-time,,,Yes,,Other,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Other,3 to 5 years,"Data Scientist,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,No,Yes,Other,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,37,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,21,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Other,"Online courses (coursera, udemy, edx, etc.)",30,60,0,10,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A doctoral degree,Pharmaceutical,,,,,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),Text data,,,,"Python,R",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,Sometimes,,,,,,,,,,,,,,,,,,,Neural Networks,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belgium,40,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Czech Republic,36,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Data Miner,Data Scientist,Researcher",University courses,50,25,25,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Financial,100 to 499 employees,Decreased slightly,3-5 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,20,50,10,20,0,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs","Some college/university study, no bachelor's degree",Technology,10 to 19 employees,Stayed the same,Less than one year,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,34,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,Other,Deep learning,Python,Other,"Company internal community,Kaggle,Online courses,Other",,,,Somewhat useful,,,Somewhat useful,,,,Very useful,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,6 to 10 years,"Business Analyst,Data Analyst,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,40,0,10,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A master's degree,Telecommunications,100 to 499 employees,Stayed the same,Less than one year,An external recruiter or headhunter,Somewhat important,Other,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Most of the time,10GB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Web services,Hadoop/Hive/Pig,Jupyter notebooks,MATLAB/Octave,Python,QlikView,R,Spark / MLlib,SQL,TensorFlow",,Rarely,,,,,,,Rarely,,,,,,,,Sometimes,,,,Rarely,,,,,,,,,,Most of the time,Sometimes,Sometimes,,,,,,,,Sometimes,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Text Analytics,Time Series Analysis",,,Rarely,Sometimes,,Most of the time,Most of the time,Most of the time,Most of the time,,,Often,,Often,,Often,,Sometimes,Sometimes,Sometimes,Often,,Sometimes,,,,,Sometimes,Most of the time,Most of the time,,,,40,20,0,30,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of data science talent in the organization,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",Most of the time,,,,Often,,,Often,Sometimes,,,,,,Sometimes,,,,,Most of the time,Most of the time,,76-99% of projects,Entirely internal,Other,Quandl;ECB currency data,Finding the correct data; Cleaning,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Git,Rarely,84000,EUR,Has increased between 6% and 19%,6,,,,,,,,,,,,,,,,,, +Male,India,20,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,23,Employed full-time,,,Yes,,Software Developer/Software Engineer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,Less than a year,Computer Scientist,Self-taught,40,30,0,30,0,0,Computer Vision,Neural Networks - CNNs,,Technology,20 to 99 employees,Increased slightly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Predictive Modeler,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Machine Learning Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",10,40,30,0,20,0,"Computer Vision,Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A master's degree,Financial,20 to 99 employees,Increased slightly,6-10 years,Some other way,Not at all important,Build and/or run a machine learning service that operationally improves your product or workflows,Traditional Workstation,Relational data,Most of the time,100GB,Other,"Amazon Web services,Jupyter notebooks,Python",,Sometimes,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,20,30,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly",I prefer not to say,,,,,,,Sometimes,,,,,,,,,,,,,,,,76-99% of projects,More internal than external,Standalone Team,,,"Column-oriented relational (e.g. KDB/MariaDB),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Company Developed Platform,Share Drive/SharePoint",,Other,Rarely,,USD,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Male,Brazil,28,"Not employed, but looking for work",,,,,,,,Spark / MLlib,Deep learning,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"College/University,Kaggle,Personal Projects,Stack Overflow Q&A",,,Somewhat useful,,,,Very useful,,,,,Very useful,,Very useful,,,,,,3-5 years,,,Necessary,,Necessary,Necessary,Necessary,,Nice to have,,,,,,Traditional Workstation,2 - 10 hours,Github Portfolio,Yes,Some college/university study without earning a bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",55,20,0,20,5,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Support Vector Machines (SVMs),A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,France,45,Employed full-time,,,Yes,,Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,High school,Insurance,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,35,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,6 to 10 years,"Data Analyst,Other",Work,5,5,40,50,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A master's degree,Other,20 to 99 employees,Decreased slightly,More than 10 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,1 to 2 years,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",15,35,30,10,10,0,Computer Vision,"Gradient Boosting,Neural Networks - CNNs",High school,Academic,20 to 99 employees,Increased slightly,3-5 years,A tech-specific job board,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Laptop or Workstation and local IT supported servers,"Image data,Video data,Relational data",Most of the time,1GB,"CNNs,Gradient Boosted Machines","Jupyter notebooks,Python,R",,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,"Association Rules,CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,PCA and Dimensionality Reduction,Random Forests",,Sometimes,,Often,,Often,Often,,,,,Sometimes,,,,,,,,,Sometimes,,Rarely,,,,,,,,,,,30,10,20,20,20,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Lack of significant domain expert input,The lack of a clear question to be answering or a clear direction to go in with the available data",,,,,Sometimes,,,,,,Most of the time,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Computer Science,More than 10 years,"Data Miner,Data Scientist,Machine Learning Engineer",University courses,30,0,20,50,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop or Workstation and local IT supported servers","Text data,Relational data",Most of the time,10GB,"Bayesian Techniques,Ensemble Methods,Regression/Logistic Regression,SVMs,Other","IBM Cognos,IBM SPSS Modeler,IBM SPSS Statistics,IBM Watson / Waton Analytics,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,,Often,Often,Often,Often,,,,,,,,,,,,,,,,,,,,Often,,,,,Often,Sometimes,,Sometimes,Often,,,,,,Often,,,,"A/B Testing,Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,PCA and Dimensionality Reduction,Prescriptive Modeling,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",Often,Often,Sometimes,,Often,Often,Often,Often,Often,,,,,Sometimes,Often,Often,,,Sometimes,,Often,Often,,Often,,Often,,Often,Sometimes,Sometimes,,,,30,30,15,10,15,0,"Enough to code it again from scratch, albeit it may run slowly","Data Science results not used by business decision makers,Did not instrument data useful for scientific analysis and decision-making,Dirty data,Inability to integrate findings into organization's decision-making process,Lack of significant domain expert input,Maintaining responsible expectations about the potential impact of data science projects,Privacy issues,Unavailability of/difficult access to data",,Sometimes,Sometimes,,Often,,,Sometimes,,,Sometimes,,,Sometimes,,,Sometimes,,,,Sometimes,,10-25% of projects,More internal than external,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),Commercial Data Platform,,Git,Most of the time,200000,USD,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Turkey,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Computer Scientist,Data Miner,Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,0,80,20,0,Survival Analysis,Decision Trees - Random Forests,,CRM/Marketing,500 to 999 employees,Stayed the same,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,10TB,"Bayesian Techniques,HMMs,Markov Logic Networks,Regression/Logistic Regression,SVMs","Cloudera,NoSQL,Python,R,SAS Base",,,,,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,Often,,Often,,,,,Most of the time,,,,,,,,,,,,,,Text Analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,,100,0,0,0,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,52,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Ukraine,24,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,Python,Factor Analysis,Python,I collect my own data (e.g. web-scraping),Online courses,,,,,,,,,,,Somewhat useful,,,,,,,,,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Unnecessary,Necessary,,,,"Coursera,edX",Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,No,Master's degree,,,"Data Analyst,Statistician","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,Natural Language Processing,,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6-10,Very Important,Somewhat important,Somewhat important,Very Important,Not important,Somewhat important,Very Important,Somewhat important,Somewhat important,Not important,Very Important,Very Important,Not important,Somewhat important,Not important,Not important +Male,Nigeria,39,Employed full-time,,,Yes,,Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),3 to 5 years,Engineer,Self-taught,50,0,50,0,0,0,Time Series,Logistic Regression,A master's degree,Manufacturing,20 to 99 employees,Stayed the same,3-5 years,An external recruiter or headhunter,Not very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop or Workstation and private datacenters",Relational data,Sometimes,100MB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Colombia,28,Employed part-time,,,Yes,,Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Computer Scientist,Data Analyst,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer",University courses,0,0,10,80,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,Fewer than 10 employees,Stayed the same,Less than one year,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ireland,50,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,Data Analyst,Researcher",University courses,10,5,40,40,5,0,"Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Stayed the same,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,GPU accelerated Workstation,"Image data,Video data,Text data,Relational data",Most of the time,1TB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression,SVMs,Other","Hadoop/Hive/Pig,IBM SPSS Statistics,Jupyter notebooks,Python,R,TensorFlow",,,,,,,,,Sometimes,,,Sometimes,,,,,Often,,,,,,,,,,,,,,Often,,Often,,,,,,,,,,,,,Often,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Random Forests,SVMs,Time Series Analysis",Sometimes,Sometimes,Most of the time,,,Most of the time,Most of the time,Most of the time,Most of the time,,,Most of the time,,Sometimes,,Often,,Often,Often,Rarely,Most of the time,,Most of the time,,,,,Most of the time,,Most of the time,,,,10,50,10,20,10,0,"Enough to code it again from scratch, albeit it may run slowly","Dirty data,Limitations of tools",,,,,Sometimes,,,,,,,,Sometimes,,,,,,,,,,51-75% of projects,More external than internal,Standalone Team,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Graph (e.g. GraphBase/Neo4j),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,"Generic cloud file sharing software (Dropbox/Box/etc.),Git,Subversion",Rarely,82000,,Has increased between 6% and 19%,10 - Highly Satisfied,,,,,,,,,,,,,,,,,, +Male,United States,42,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,3 to 5 years,Other,Self-taught,80,20,0,0,0,0,Time Series,,A bachelor's degree,Financial,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Somewhat important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Rarely,1MB,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Excel Data Mining,Python,Unix shell / awk",,,,,,,,,,,,,,,,,Often,,,,,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,Sometimes,,,,Time Series Analysis,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,,,30,30,0,20,20,0,Enough to refine and innovate on the algorithm,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Israel,44,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,6 to 10 years,"Data Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,30,40,0,0,0,"Natural Language Processing,Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Hidden Markov Models HMMs,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,CRM/Marketing,10 to 19 employees,Increased slightly,Less than one year,A general-purpose job board,Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Gaming Laptop (Laptop + CUDA capable GPU),Relational data,Sometimes,100MB,"Bayesian Techniques,Decision Trees,Evolutionary Approaches,HMMs,Neural Networks,Random Forests,Regression/Logistic Regression,SVMs,Other",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,40,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,A humanities discipline,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",40,60,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,3 to 5 years,"Business Analyst,Programmer",Self-taught,50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),,A bachelor's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,26,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,Computer Science,,"Engineer,Machine Learning Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,21,Employed part-time,,,Yes,,Data Analyst,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Other,1 to 2 years,"Data Analyst,Researcher",Self-taught,50,0,40,0,10,0,"Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Other,20 to 99 employees,Increased significantly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Don't know,1GB,"Bayesian Techniques,Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Python,R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Often,,Often,,,,,,,,,Often,,,,,,,,,,"A/B Testing,Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests",Often,,Rarely,,,Often,Most of the time,Sometimes,Sometimes,,,,,,,Often,,,,,,,Often,,,,,,,,,,,15,15,0,30,40,0,Enough to explain the algorithm to someone non-technical,"Dirty data,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,,,,,,,Often,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,20,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,24,"Not employed, and not looking for work",No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,34,Employed full-time,,,No,Yes,Computer Scientist,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,Less than a year,"Business Analyst,Computer Scientist",Self-taught,20,20,0,0,60,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,40,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,23,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,31,Employed full-time,,,Yes,,Data Scientist,Perfectly,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,1 to 2 years,"Data Scientist,Programmer,Software Developer/Software Engineer",Self-taught,60,10,0,0,30,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A master's degree,Insurance,"10,000 or more employees",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,39,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,Supervised Machine Learning (Tabular Data),"Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs",A bachelor's degree,Hospitality/Entertainment/Sports,"10,000 or more employees",Stayed the same,Don't know,"A friend, family member, or former colleague told me",Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,<1MB,Neural Networks,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,23,Employed full-time,,,Yes,,Researcher,Perfectly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,37,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Hadoop/Hive/Pig,Deep learning,R,University/Non-profit research group websites,"Blogs,College/University,Company internal community,Kaggle,Newsletters",,Somewhat useful,Very useful,Somewhat useful,,,Somewhat useful,Somewhat useful,,,,,,,,,,,"KDnuggets Blog,R Bloggers Blog Aggregator",1-2 years,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Necessary,Nice to have,Nice to have,Nice to have,Nice to have,,,,,Basic laptop (Macbook),2 - 10 hours,Experience from work in a company related to ML,Yes,Doctoral degree,Physics,,"Data Scientist,Researcher,Statistician",Other,NA,NA,NA,NA,NA,NA,Time Series,"Decision Trees - Gradient Boosted Machines,Logistic Regression,Neural Networks - RNNs",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,,Self-taught,60,20,5,15,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)","Some college/university study, no bachelor's degree",Internet-based,"1,000 to 4,999 employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,32,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Mathematics or statistics,Less than a year,Researcher,Self-taught,40,20,0,20,20,0,Adversarial Learning,Bayesian Techniques,I prefer not to answer,Academic,I prefer not to answer,Stayed the same,Less than one year,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,47,Employed part-time,,,No,Yes,Engineer,Poorly,Employed by government,R,Random Forests,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Engineering (non-computer focused),1 to 2 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",25,50,0,0,25,0,,"Bayesian Techniques,Decision Trees - Random Forests",A bachelor's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Poland,36,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Italy,29,Employed full-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Engineering (non-computer focused),1 to 2 years,"Data Analyst,Engineer,Researcher",Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,26,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Fine arts or performing arts,1 to 2 years,"Data Analyst,Other",University courses,25,25,0,50,0,0,Unsupervised Learning,"Logistic Regression,Support Vector Machines (SVMs)",A master's degree,CRM/Marketing,Fewer than 10 employees,Stayed the same,1-2 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,35,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,6 to 10 years,"Data Analyst,DBA/Database Engineer,Programmer","Online courses (coursera, udemy, edx, etc.)",0,70,10,20,0,0,Time Series,,"Some college/university study, no bachelor's degree",Financial,100 to 499 employees,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,33,Employed full-time,,,Yes,,Data Scientist,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Professional degree,,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",10,40,50,0,0,0,"Recommendation Engines,Unsupervised Learning","Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,20 to 99 employees,Decreased slightly,1-2 years,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Greece,40,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,26,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,A social science,3 to 5 years,Predictive Modeler,Work,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Data Miner,Data Scientist,Engineer,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,10,80,0,0,0,"Natural Language Processing,Speech Recognition,Time Series","Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Traditional Workstation","Text data,Other",Always,10TB,"CNNs,HMMs,Neural Networks,RNNs","C/C++,NoSQL,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,RNNs,Time Series Analysis",Often,,,Often,,Most of the time,,,,,,,Most of the time,Sometimes,,Sometimes,,,,Most of the time,,,,,Often,,,,,Most of the time,,,,20,50,15,0,15,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Dirty data,Lack of significant domain expert input,Limitations in the state of the art in machine learning,Limitations of tools,Unavailability of/difficult access to data",,,,,Often,,,,,,Sometimes,Sometimes,Sometimes,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,37,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,More than 10 years,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",University courses,0,0,40,60,0,0,"Machine Translation,Natural Language Processing","Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A professional degree,Technology,"10,000 or more employees",Increased significantly,6-10 years,A general-purpose job board,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and local IT supported servers,Text data,Sometimes,10GB,"Neural Networks,RNNs","C/C++,Jupyter notebooks,Python",,,,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,20,55,15,5,5,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Scaling data science solution up to full database",,,,,Often,,,,,,,,,,,,,Often,,,,,Less than 10% of projects,Approximately half internal and half external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Share Drive/SharePoint,,Git,Never,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,46,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",University courses,60,20,10,10,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",Primary/elementary school,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,Data Scientist,Work,0,20,60,0,20,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,23,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,Less than a year,"Data Analyst,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Iran,30,Employed part-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Iran,24,Employed part-time,,,Yes,,Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,3 to 5 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Perfectly,"Employed by professional services/consulting firm,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",30,60,10,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,40,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Physics,6 to 10 years,"Computer Scientist,Data Scientist,DBA/Database Engineer,Programmer,Researcher",,39,30,10,1,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",,Financial,100 to 499 employees,Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,,Relational data,Sometimes,100TB,"Bayesian Techniques,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Cloudera,Hadoop/Hive/Pig,NoSQL,SQL,TensorFlow",Often,,,,Often,,,,Often,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,Often,,,,Often,,,,,,"A/B Testing,Bayesian Techniques,Data Visualization,Decision Trees,Natural Language Processing,Random Forests,Time Series Analysis",Most of the time,,Most of the time,,,,Most of the time,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,,Most of the time,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,,I haven't started working yet,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,36,Employed full-time,,,Yes,,Computer Scientist,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",University courses,50,5,5,30,5,5,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Reinforcement learning,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",High school,Academic,100 to 499 employees,Increased slightly,1-2 years,I visited the company's Web site and found a job listing there,Very important,Research that advances the state of the art of machine learning,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,100MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","C/C++,Java,Mathematica,MATLAB/Octave",,,,Sometimes,,,,,,,,,,,Sometimes,,,,,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Random Forests,Recommender Systems",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,56,"Independent contractor, freelancer, or self-employed",,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Romania,32,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,32,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,,NA,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,32,Employed full-time,,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,TensorFlow,Deep learning,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Other","Online courses (coursera, udemy, edx, etc.)",30,20,40,0,10,0,"Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,1-2 years,A tech-specific job board,Important,Analyze and understand data to influence product or business decisions,Workstation + Cloud service,"Text data,Relational data",Sometimes,1GB,"Bayesian Techniques,CNNs,Decision Trees,Random Forests,Regression/Logistic Regression","Microsoft SQL Server Data Mining,SAS Base,SAS Enterprise Miner",,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,"CNNs,Data Visualization,Decision Trees,Lift Analysis,Logistic Regression,Random Forests,Segmentation,Time Series Analysis",,,,Rarely,,,Often,Sometimes,,,,,,,Sometimes,Often,,,,,,,Sometimes,,,Sometimes,,,,Often,,,,60,10,5,10,15,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,Privacy issues,Unavailability of/difficult access to data",,,,,Often,,,,,,,,,Rarely,Often,,Rarely,,,,Sometimes,,10-25% of projects,More internal than external,Central Insights Team,Reuters;Bloomberg;,Quality,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),I don't typically share data,,Generic non-cloud file sharing software (Email/Shared Server/etc.),Never,75000,EUR,Has increased 20% or more,8,,,,,,,,,,,,,,,,,, +Male,United States,59,Employed full-time,,,Yes,,Other,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,28,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,Less than a year,,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,,,A master's degree,Pharmaceutical,"10,000 or more employees",Stayed the same,Don't know,A general-purpose job board,Important,,,Relational data,,,,"Jupyter notebooks,Python,QlikView,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Other,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,3 to 5 years,"Business Analyst,Data Analyst,Other",Self-taught,100,0,0,0,0,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",,Insurance,"1,000 to 4,999 employees",Increased slightly,3-5 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Belarus,49,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,Engineer,Kaggle competitions,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Canada,NA,Employed full-time,,,Yes,,Other,,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),More than 10 years,"Software Developer/Software Engineer,Other",Self-taught,80,10,0,0,10,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Government,,,,Some other way,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,37,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,43,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Netherlands,28,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,19,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Denmark,29,Employed part-time,,,Yes,,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,,1 to 2 years,I haven't started working yet,University courses,20,0,20,60,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Unsupervised Learning","Evolutionary Approaches,Neural Networks - GANs","Some college/university study, no bachelor's degree",Technology,"1,000 to 4,999 employees",Increased significantly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,34,Employed full-time,,,Yes,,Engineer,Poorly,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,A humanities discipline,6 to 10 years,"Data Analyst,Data Scientist,Engineer,Other",Work,20,20,60,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A master's degree,Technology,"1,000 to 4,999 employees",Decreased slightly,3-5 years,"A friend, family member, or former colleague told me",Not at all important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and private datacenters","Text data,Relational data,Other",,1TB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","Hadoop/Hive/Pig,Impala,Java,Jupyter notebooks,Python,R,Spark / MLlib,SQL,Unix shell / awk",,,,,,,,,Often,,,,,Rarely,Rarely,,Most of the time,,,,,,,,,,,,,,Most of the time,,Sometimes,,,,,,,,Most of the time,Sometimes,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,24,"Not employed, but looking for work",,,,,,,,R,Support Vector Machines (SVM),Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),I collect my own data (e.g. web-scraping)","College/University,Friends network,Stack Overflow Q&A,Tutoring/mentoring",,,Somewhat useful,,,Very useful,,,,,,,,Somewhat useful,,,Very useful,,"Becoming a Data Scientist Podcast,No Free Hunch Blog,R Bloggers Blog Aggregator",1-2 years,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,University courses,20,0,50,20,10,0,Outlier detection (e.g. Fraud detection),"Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,31,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Physics,6 to 10 years,Researcher,"Online courses (coursera, udemy, edx, etc.)",30,60,0,0,10,0,"Computer Vision,Supervised Machine Learning (Tabular Data)","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Neural Networks - CNNs",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",Less than a year,"Computer Scientist,Programmer",Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,26,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer",University courses,50,0,0,40,10,0,Computer Vision,"Logistic Regression,Neural Networks - CNNs",A master's degree,Retail,20 to 99 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Not very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",Most of the time,100MB,"Neural Networks,Regression/Logistic Regression","C/C++,Jupyter notebooks,Python,SQL",,,,Rarely,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,Often,,,,,,,,,,,Often,,,,,,,,,,"Data Visualization,Logistic Regression,Neural Networks,Text Analytics,Time Series Analysis",,,,,,,Often,,,,,,,,,Often,,,,Rarely,,,,,,,,,Most of the time,Sometimes,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,Less than a year,"Business Analyst,Other",University courses,20,20,10,10,20,20,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Speech Recognition,Survival Analysis",,I don't know/not sure,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,29,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,3 to 5 years,"Programmer,Software Developer/Software Engineer",Self-taught,90,0,10,0,0,0,Other (please specify; separate by semi-colon),Other (please specify; separate by semi-colon),A doctoral degree,Mix of fields,20 to 99 employees,Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,52,Employed full-time,,,Yes,,Researcher,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Physics,3 to 5 years,"Engineer,Programmer,Researcher,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Gradient Boosting",Primary/elementary school,Telecommunications,"10,000 or more employees",Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop or Workstation and local IT supported servers,Workstation + Cloud service",Other,Never,100MB,Regression/Logistic Regression,"Jupyter notebooks,MATLAB/Octave,Microsoft R Server (Formerly Revolution Analytics),Python,R",,,,,,,,,,,,,,,,,Most of the time,,,,Sometimes,,,Often,,,,,,,Most of the time,,Often,,,,,,,,,,,,,,,,,,,"Cross-Validation,PCA and Dimensionality Reduction,Prescriptive Modeling,Time Series Analysis",,,,,,Often,,,,,,,,,,,,,,,Most of the time,Often,,,,,,,,Most of the time,,,,70,20,0,10,0,0,Enough to tune the parameters properly,"Did not instrument data useful for scientific analysis and decision-making,Unavailability of/difficult access to data",,,Often,,,,,,,,,,,,,,,,,,Often,,10-25% of projects,Entirely internal,IT Department,,,"Document-oriented (e.g. MongoDB/Elasticsearch),Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Company Developed Platform,,Git,Never,,,,,,,,,,,,,,,,,,,,,, +Female,France,25,Employed part-time,,,Yes,,Other,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Programmer,University courses,0,0,0,100,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A bachelor's degree,Government,"10,000 or more employees",,Don't know,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,31,Employed full-time,,,Yes,,Data Miner,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,Other,University courses,0,10,0,90,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Ensemble Methods",,Financial,"1,000 to 4,999 employees",Stayed the same,Less than one year,Some other way,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Text data,Never,,,R,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Management information systems,6 to 10 years,"Data Miner,Programmer,Software Developer/Software Engineer",Work,40,0,60,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A master's degree,Internet-based,I don't know,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Not very important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Relational data,Most of the time,10GB,,"IBM SPSS Modeler,IBM SPSS Statistics,Java,Microsoft SQL Server Data Mining,NoSQL,Python,R,SAP BusinessObjects Predictive Analytics,SQL,TIBCO Spotfire,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,31,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,"Employed by company that makes advanced analytic software,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Business Analyst,Data Scientist,Machine Learning Engineer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,35,20,0,15,10,"Supervised Machine Learning (Tabular Data),Time Series",Decision Trees - Random Forests,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,20,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Some college/university study without earning a bachelor's degree,Computer Science,1 to 2 years,,University courses,20,0,30,50,0,0,,,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,,NA,I prefer not to say,No,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,,I never declared a major,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,30,Employed full-time,,,No,Yes,Data Analyst,,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Management information systems,1 to 2 years,"Business Analyst,Data Analyst,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Turkey,30,Employed full-time,,,Yes,,Data Miner,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Predictive Modeler,Programmer,Statistician",Work,0,25,50,0,25,0,"Supervised Machine Learning (Tabular Data),Unsupervised Learning","Decision Trees - Random Forests,Gradient Boosting,Logistic Regression",A master's degree,Retail,100 to 499 employees,Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Most of the time,10GB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,IBM Watson / Waton Analytics,Jupyter notebooks,Microsoft R Server (Formerly Revolution Analytics),R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL",,,,,,,,,,,,Often,Sometimes,,,,Sometimes,,,,,,,Often,,,,,,,,,Most of the time,,,,,,Often,,Rarely,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression,Random Forests,Recommender Systems,Segmentation,Text Analytics,Time Series Analysis",,,,,,Most of the time,Most of the time,Often,Sometimes,,,,,,,Often,,,,,,,Most of the time,Sometimes,,Rarely,,,Sometimes,Most of the time,,,,25,30,10,15,20,0,Enough to tune the parameters properly,"Data Science results not used by business decision makers,Lack of data science talent in the organization",,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,31,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,6 to 10 years,"Engineer,Programmer,Software Developer/Software Engineer",Self-taught,70,30,0,0,0,0,"Computer Vision,Natural Language Processing,Speech Recognition,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,Don't know,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",30,0,70,0,0,0,Recommendation Engines,Logistic Regression,Primary/elementary school,Internet-based,"5,000 to 9,999 employees",Increased slightly,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Software Developer/Software Engineer,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Engineering (non-computer focused),More than 10 years,"Business Analyst,DBA/Database Engineer,Programmer,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,Time Series,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",High school,Financial,,,,,Not very important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Relational data,Most of the time,10GB,Regression/Logistic Regression,"C/C++,Julia,Jupyter notebooks,Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Python,R,SQL",,,,Rarely,,,,,,,,,,,,Rarely,Often,,,,,Sometimes,Sometimes,,,,,,,,Most of the time,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Prescriptive Modeling",,,,,,,Most of the time,Rarely,,,,,,,,Rarely,,,,,,Sometimes,,,,,,,,,,,,30,10,10,40,10,0,Enough to tune the parameters properly,Explaining data science to others,,,,,,Often,,,,,,,,,,,,,,,,,26-50% of projects,Entirely external,Standalone Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Share Drive/SharePoint,,Other,Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,No,Yes,Engineer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",I don't write code to analyze data,Engineer,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Turkey,33,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,Python,Deep learning,C/C++/C#,,Stack Overflow Q&A,,,,,,,,,,,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Electrical Engineering,I don't write code to analyze data,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Supervised Machine Learning (Tabular Data),Other (please specify; separate by semi-colon),,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Computer Science,1 to 2 years,"Data Miner,Programmer",University courses,10,20,50,20,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",,Internet-based,"10,000 or more employees",Increased significantly,3-5 years,A career fair or on-campus recruiting event,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Rarely,10GB,"Bayesian Techniques,CNNs,Decision Trees,Evolutionary Approaches,GANs,Neural Networks,Random Forests,RNNs","C/C++,Google Cloud Compute,IBM SPSS Modeler,Microsoft Azure Machine Learning,NoSQL,Python,R,RapidMiner (commercial version),SQL,TensorFlow",,,,Sometimes,,,,Sometimes,,,Often,,,,,,,,,,,Often,,,,,Most of the time,,,,Most of the time,,Most of the time,Often,,,,,,,,Sometimes,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Evolutionary Approaches,GANs,HMMs,Naive Bayes,Neural Networks,RNNs",,,Sometimes,Often,,,,,,Sometimes,,,Most of the time,,,,,Most of the time,,Most of the time,,,,,Most of the time,,,,,,,,,10,40,20,20,10,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,"Company politics / Lack of management/financial support for a data science team,Dirty data,Explaining data science to others,Lack of significant domain expert input,Unavailability of/difficult access to data",Often,,,,Most of the time,Sometimes,,,,,Most of the time,,,,,,,,,,Often,,26-50% of projects,More external than internal,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)","Commercial Data Platform,Company Developed Platform",,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Mercurial",,,,,,,,,,,,,,,,,,,,,,, +Male,Kenya,22,"Not employed, but looking for work",,,,,,,,SAS Base,Regression,R,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Friends network,Tutoring/mentoring",,,,,,Somewhat useful,,,,,,,,,,,Very useful,,"Becoming a Data Scientist Podcast,Data Machina Newsletter,Statistical Modeling, Causal Inference, and Social Science Blog (Andrew Gelman)",,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,Less than a year,"Data Analyst,Data Scientist,Statistician","Online courses (coursera, udemy, edx, etc.)",20,50,20,10,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis",Logistic Regression,High school,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,43,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,22,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,29,Employed full-time,,,Yes,,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,25,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,A social science,1 to 2 years,"Business Analyst,Researcher",University courses,5,10,5,20,0,60,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",A professional degree,Academic,,,,,Important,Analyze and understand data to influence product or business decisions,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,,,"Amazon Web services,Jupyter notebooks,NoSQL,Python,R,Spark / MLlib,SQL,Tableau",,Sometimes,,,,,,,,,,,,,,,Rarely,,,,,,,,,,Sometimes,,,,Most of the time,,Sometimes,,,,,,,,Sometimes,Sometimes,,,Sometimes,,,,,,,"A/B Testing,Bayesian Techniques,Collaborative Filtering,Naive Bayes,Natural Language Processing,Random Forests",Sometimes,,Sometimes,,Sometimes,,,,,,,,,,,,,Sometimes,Sometimes,,,,Sometimes,,,,,,,,,,,0,0,0,0,0,0,"Enough to code it again from scratch, albeit it may run slowly",Organization is small and cannot afford a data science team,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,40,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Business Analyst,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +A different identity,Kenya,25,Employed part-time,,,Yes,,Data Analyst,Perfectly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",50,12,30,1,1,6,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs),Other (please specify; separate by semi-colon)",A bachelor's degree,Government,10 to 19 employees,Increased slightly,More than 10 years,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,,1GB,RNNs,"Amazon Machine Learning,NoSQL,Oracle Data Mining/ Oracle R Enterprise",,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Logistic Regression,Naive Bayes,Text Analytics,Time Series Analysis",,,Rarely,,,,Sometimes,,,,,,,,,,,Often,,,,,,,,,,,,Most of the time,,,,20,20,20,30,5,5,Enough to refine and innovate on the algorithm,"Maintaining responsible expectations about the potential impact of data science projects,Unavailability of/difficult access to data",,,,,,,,,,,,,,Most of the time,,,,,,,Most of the time,,100% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Egypt,45,Employed full-time,,,Yes,,Engineer,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Doctoral degree,Engineering (non-computer focused),More than 10 years,Researcher,Work,60,0,30,10,0,0,Natural Language Processing,,A doctoral degree,Academic,500 to 999 employees,Increased slightly,Don't know,,Somewhat important,Research that advances the state of the art of machine learning,Gaming Laptop (Laptop + CUDA capable GPU),Text data,Rarely,,,"C/C++,Python,SQL",,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,Most of the time,,,,,,,,,,Natural Language Processing,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,20,50,0,0,0,30,Enough to run the code / standard library,"Lack of data science talent in the organization,Limitations in the state of the art in machine learning,Unavailability of/difficult access to data",,,,,,,,,Often,,,Sometimes,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,44,Employed full-time,,,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,27,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Electrical Engineering,Less than a year,"Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Self-taught,20,0,20,60,0,0,"Computer Vision,Natural Language Processing","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - RNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,100 to 499 employees,Increased slightly,3-5 years,A career fair or on-campus recruiting event,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers","Image data,Text data",Sometimes,10GB,"Decision Trees,Neural Networks,RNNs,SVMs","Amazon Machine Learning,Hadoop/Hive/Pig,Java,Microsoft Excel Data Mining,NoSQL,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Naive Bayes,Neural Networks,RNNs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,NA,I prefer not to say,Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,I collect my own data (e.g. web-scraping),University/Non-profit research group websites",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Other,,"Data Scientist,Software Developer/Software Engineer,I haven't started working yet",University courses,NA,NA,NA,NA,NA,NA,"Recommendation Engines,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,41,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,More than 10 years,"Business Analyst,Programmer","Online courses (coursera, udemy, edx, etc.)",40,50,10,0,0,NA,"Supervised Machine Learning (Tabular Data),Unsupervised Learning",Support Vector Machines (SVMs),Primary/elementary school,Manufacturing,"10,000 or more employees",Increased slightly,Don't know,An external recruiter or headhunter,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data","Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",,100MB,Regression/Logistic Regression,"Amazon Machine Learning,Amazon Web services,MATLAB/Octave,Python,TensorFlow",Sometimes,Sometimes,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Italy,49,Employed full-time,,,Yes,,Other,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,Engineer,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,37,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Computer Science,3 to 5 years,"Programmer,Researcher",University courses,50,0,20,30,0,0,Natural Language Processing,Neural Networks - CNNs,,Academic,"5,000 to 9,999 employees",Stayed the same,3-5 years,A career fair or on-campus recruiting event,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,27,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,39,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I prefer not to answer,,,Other,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,India,20,Employed full-time,,,Yes,,Business Analyst,Perfectly,"Employed by professional services/consulting firm,Employed by company that makes advanced analytic software",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",25,50,10,10,5,0,"Adversarial Learning,Computer Vision,Natural Language Processing,Recommendation Engines,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Support Vector Machines (SVMs)",A doctoral degree,Financial,"5,000 to 9,999 employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Important,Build and/or run a machine learning service that operationally improves your product or workflows,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Text data,Relational data",Most of the time,1GB,"Bayesian Techniques,CNNs,Decision Trees,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs","Amazon Machine Learning,Amazon Web services,Java,NoSQL,Oracle Data Mining/ Oracle R Enterprise,Python,R,TensorFlow",Sometimes,Sometimes,,,,,,,,,,,,,Often,,,,,,,,,,,,Sometimes,Sometimes,,,Most of the time,,Most of the time,,,,,,,,,,,,,Most of the time,,,,,,"Bayesian Techniques,CNNs,Data Visualization,Decision Trees,kNN and Other Clustering,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,Recommender Systems,Segmentation,Text Analytics",,,Often,Often,,,Often,Often,,,,,,Often,,,Often,Often,Often,Most of the time,,,Often,Most of the time,,Often,,,Most of the time,,,,,10,20,30,15,25,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Lack of data science talent in the organization,Lack of significant domain expert input",Often,Sometimes,,,,,,,Often,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United Kingdom,43,Employed full-time,,,Yes,,Scientist/Researcher,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,6 to 10 years,"Data Analyst,Engineer,Researcher",Self-taught,100,0,0,0,0,0,"Adversarial Learning,Outlier detection (e.g. Fraud detection),Recommendation Engines,Time Series","Bayesian Techniques,Evolutionary Approaches,Hidden Markov Models HMMs,Markov Logic Networks",,Telecommunications,"10,000 or more employees",Stayed the same,6-10 years,An external recruiter or headhunter,Not very important,Build prototypes to explore applying machine learning to new areas,Workstation + Cloud service,Text data,Always,10TB,"Bayesian Techniques,Markov Logic Networks,Regression/Logistic Regression","Python,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,Most of the time,,,,"Bayesian Techniques,Naive Bayes",,,Most of the time,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,30,10,10,20,30,0,Enough to code it from scratch and it will run blazingly fast and be super efficient,I prefer not to say,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,25,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),1 to 2 years,"Business Analyst,Data Analyst,Data Miner,Machine Learning Engineer,Predictive Modeler",Self-taught,70,10,15,0,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,49,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,27,Employed part-time,,,Yes,,Statistician,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,,"Online courses (coursera, udemy, edx, etc.)",0,100,0,0,0,0,,Logistic Regression,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United States,41,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Poorly,Employed by a company that doesn't perform advanced analytics,Python,Genetic & Evolutionary Algorithms,C/C++/C#,GitHub,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,"Information technology, networking, or system administration",Less than a year,I haven't started working yet,University courses,0,0,0,100,0,0,Reinforcement learning,Decision Trees - Random Forests,I don't know/not sure,Telecommunications,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,31,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Professional degree,,3 to 5 years,"Computer Scientist,Engineer,Programmer,Researcher","Online courses (coursera, udemy, edx, etc.)",20,60,10,10,0,0,Computer Vision,Logistic Regression,High school,Academic,20 to 99 employees,Stayed the same,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,52,Employed full-time,,,Yes,,Data Scientist,Perfectly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Psychology,3 to 5 years,Data Scientist,"Online courses (coursera, udemy, edx, etc.)",50,30,5,5,10,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Time Series","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Academic,100 to 499 employees,Decreased significantly,3-5 years,A tech-specific job board,Very important,Build prototypes to explore applying machine learning to new areas,"Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data,Text data,Relational data",Most of the time,10MB,"Bayesian Techniques,CNNs,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,Markov Logic Networks,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","Amazon Web services,Java,Julia,Jupyter notebooks,Microsoft Azure Machine Learning,Python,R,RapidMiner (free version)",,Most of the time,,,,,,,,,,,,,Often,Often,Often,,,,,Sometimes,,,,,,,,,Often,,Most of the time,,Often,,,,,,,,,,,,,,,,,"A/B Testing,Association Rules,Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Gradient Boosted Machines,kNN and Other Clustering,Lift Analysis,Logistic Regression,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Prescriptive Modeling,Random Forests,Recommender Systems,Text Analytics,Time Series Analysis",Often,Often,Most of the time,Often,,Most of the time,Most of the time,,,,,Most of the time,,Most of the time,Often,Most of the time,,,Often,Most of the time,Most of the time,Sometimes,Most of the time,Often,,,,,Most of the time,Most of the time,,,,50,30,10,5,5,0,"Enough to code it again from scratch, albeit it may run slowly","Company politics / Lack of management/financial support for a data science team,Did not instrument data useful for scientific analysis and decision-making,Need to coordinate with IT,Scaling data science solution up to full database",Most of the time,,Often,,,,,,,,,,,,Often,,,Often,,,,,76-99% of projects,Approximately half internal and half external,Business Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG)",Commercial Data Platform,,"Generic cloud file sharing software (Dropbox/Box/etc.),Generic non-cloud file sharing software (Email/Shared Server/etc.),Git,Mercurial",Most of the time,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,36,Employed full-time,,,Yes,,Business Analyst,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Data Analyst,Data Miner,Data Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer,Statistician",Work,40,10,40,10,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A doctoral degree,Other,"10,000 or more employees",Stayed the same,3-5 years,I visited the company's Web site and found a job listing there,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Always,10GB,"Ensemble Methods,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression,RNNs,SVMs","C/C++,Hadoop/Hive/Pig,Java,Microsoft Azure Machine Learning,QlikView,R,SQL",,,,Sometimes,,,,,Often,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,,,,Most of the time,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Association Rules,Cross-Validation,Data Visualization,Ensemble Methods,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,Random Forests,RNNs,Simulation,Time Series Analysis",,Sometimes,,,,Often,Most of the time,,Sometimes,,,,,Sometimes,,Most of the time,,,,Most of the time,Most of the time,,Often,,Most of the time,,Most of the time,,,Most of the time,,,,30,30,20,10,10,0,Enough to refine and innovate on the algorithm,"Company politics / Lack of management/financial support for a data science team,Dirty data,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT,The lack of a clear question to be answering or a clear direction to go in with the available data",Often,,,,Sometimes,,,,Sometimes,,,,,Sometimes,Most of the time,,,,,Most of the time,,,51-75% of projects,More internal than external,Business Department,,,,,,,,,,,1 - Highly Dissatisfied,,,,,,,,,,,,,,,,,, +Male,South Africa,26,Employed full-time,,,Yes,,Engineer,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,3 to 5 years,"Business Analyst,Data Analyst,Engineer,Programmer",Self-taught,92.5,0,2.5,5,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Israel,51,Employed full-time,,,Yes,,Researcher,Fine,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Physics,More than 10 years,"Researcher,Other","Online courses (coursera, udemy, edx, etc.)",5,10,80,0,5,0,Computer Vision,"Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Other,20 to 99 employees,Increased significantly,3-5 years,"A friend, family member, or former colleague told me",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,1,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,,3 to 5 years,Statistician,University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,20,Employed full-time,,,No,Yes,Machine Learning Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,Less than a year,Software Developer/Software Engineer,Self-taught,50,50,0,0,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Speech Recognition","Neural Networks - RNNs,Support Vector Machines (SVMs)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Germany,51,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Electrical Engineering,More than 10 years,"Data Analyst,Data Miner,Data Scientist,Researcher,Software Developer/Software Engineer",University courses,20,0,30,50,0,0,"Outlier detection (e.g. Fraud detection),Recommendation Engines,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",High school,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,New Zealand,47,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,"Data Analyst,Data Scientist,Researcher",Work,30,0,30,30,0,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,24,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,19,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,Microsoft Azure Machine Learning,Deep learning,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),Google Search,Government website","Kaggle,Official documentation,Online courses,Stack Overflow Q&A,Tutoring/mentoring,YouTube Videos",,,,,,,Very useful,,,Somewhat useful,Very useful,,,Very useful,,,Very useful,Very useful,"Data Machina Newsletter,FlowingData Blog,No Free Hunch Blog",< 1 year,Necessary,Nice to have,Necessary,Nice to have,Necessary,Necessary,Necessary,Nice to have,Nice to have,Nice to have,,,,Udacity,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Computer Science,,I haven't started working yet,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning",Decision Trees - Gradient Boosted Machines,A master's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,35,Employed full-time,,,Yes,,Software Developer/Software Engineer,,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,"Information technology, networking, or system administration",6 to 10 years,"Data Analyst,Programmer",University courses,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Independent contractor, freelancer, or self-employed",,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,45,Employed full-time,,,Yes,,Researcher,Poorly,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Doctoral degree,Electrical Engineering,More than 10 years,,University courses,30,30,30,10,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Philippines,34,Employed full-time,,,Yes,,Statistician,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Statistician",University courses,10,10,80,0,0,0,,,,Government,"5,000 to 9,999 employees",Decreased slightly,More than 10 years,I visited the company's Web site and found a job listing there,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Most of the time,10GB,"Decision Trees,Neural Networks,Regression/Logistic Regression","Microsoft Excel Data Mining,SQL",,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Time Series Analysis",,,,,,Often,Most of the time,Often,,,,,,,,,,,,,,,,,,,,,,Often,,,,10,40,10,15,25,0,Enough to explain the algorithm to someone non-technical,"Data Science results not used by business decision makers,Lack of data science talent in the organization,Scaling data science solution up to full database",,Often,,,,,,,Often,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Russia,32,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,60,0,0,20,0,Time Series,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Financial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,30,Employed full-time,,,Yes,,Machine Learning Engineer,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),3 to 5 years,"Engineer,Programmer,Software Developer/Software Engineer",Work,70,0,30,0,0,0,"Survival Analysis,Time Series","Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs",A bachelor's degree,Technology,"10,000 or more employees",Increased significantly,3-5 years,I visited the company's Web site and found a job listing there,Very important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Other,,1GB,Regression/Logistic Regression,"Jupyter notebooks,Python,SAS Base",,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,Often,,,,,,,,,,,,,,"A/B Testing,Time Series Analysis",Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,70,0,0,30,0,0,Enough to tune the parameters properly,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,20,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,29,Employed full-time,,,Yes,,Business Analyst,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,"Information technology, networking, or system administration",1 to 2 years,Business Analyst,"Online courses (coursera, udemy, edx, etc.)",NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +,India,23,Employed full-time,,,No,Yes,Data Analyst,Fine,Employed by professional services/consulting firm,Orange,Cluster Analysis,SQL,Google Search,"Blogs,Kaggle,Textbook,Tutoring/mentoring,YouTube Videos",,Somewhat useful,,,,,Very useful,,,,,,,,Not Useful,,Somewhat useful,Somewhat useful,Other (Separate different answers with semicolon),,,,,,,,,,,,,,,,,,,Sort of (Explain more),Professional degree,,1 to 2 years,"Business Analyst,Data Analyst,Data Scientist",Work,20,20,60,0,0,0,Recommendation Engines,Decision Trees - Random Forests,A bachelor's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Taiwan,28,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,Engineering (non-computer focused),3 to 5 years,"Data Analyst,Data Scientist,DBA/Database Engineer,Machine Learning Engineer,Researcher,Software Developer/Software Engineer",Work,30,30,20,0,20,0,"Adversarial Learning,Computer Vision,Reinforcement learning,Supervised Machine Learning (Tabular Data)","Ensemble Methods,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A doctoral degree,Manufacturing,"10,000 or more employees",Stayed the same,3-5 years,An external recruiter or headhunter,Somewhat important,Build prototypes to explore applying machine learning to new areas,"GPU accelerated Workstation,Laptop or Workstation and private datacenters","Image data,Video data,Text data,Relational data",Rarely,1TB,"CNNs,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Neural Networks,Regression/Logistic Regression,RNNs,SVMs","Cloudera,Hadoop/Hive/Pig,NoSQL,Python,SQL,TensorFlow",,,,,Sometimes,,,,Sometimes,,,,,,,,,,,,,,,,,,Sometimes,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Most of the time,,,,,,"A/B Testing,CNNs,Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,GANs,Gradient Boosted Machines,kNN and Other Clustering,Natural Language Processing,Neural Networks,PCA and Dimensionality Reduction,Recommender Systems,RNNs,Segmentation,SVMs,Text Analytics,Time Series Analysis",Sometimes,,,Sometimes,,Sometimes,Most of the time,Often,Often,,Rarely,Often,,Often,,,,,Often,Often,Often,,,Often,Sometimes,Rarely,,Often,Often,Often,,,,30,15,10,15,30,0,"Enough to code it again from scratch, albeit it may run slowly","Did not instrument data useful for scientific analysis and decision-making,Dirty data,Explaining data science to others,Limitations of tools,Organization is small and cannot afford a data science team,The lack of a clear question to be answering or a clear direction to go in with the available data",,,Often,,Most of the time,Most of the time,,,,,,,Sometimes,,,Often,,,,Sometimes,,,None,More internal than external,IT Department,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Company Developed Platform,,Git,Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Other,43,Employed full-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Professional degree,,Less than a year,"Computer Scientist,DBA/Database Engineer,Engineer,Programmer,Software Developer/Software Engineer",Work,10,50,30,10,0,0,"Adversarial Learning,Computer Vision,Speech Recognition,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Evolutionary Approaches,Markov Logic Networks,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Academic,10 to 19 employees,Stayed the same,Less than one year,A general-purpose job board,Important,,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Image data,Video data",Most of the time,100MB,"Bayesian Techniques,CNNs,Markov Logic Networks,Neural Networks",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,43,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Analyst,Perfectly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,"Information technology, networking, or system administration",1 to 2 years,DBA/Database Engineer,"Online courses (coursera, udemy, edx, etc.)",20,15,15,30,0,20,"Survival Analysis,Time Series","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Australia,36,"Not employed, and not looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,34,Employed full-time,,,Yes,,Other,Perfectly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,More than 10 years,"Business Analyst,Data Analyst,Data Miner,Data Scientist,Predictive Modeler,Programmer,Researcher",University courses,10,5,85,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression",A bachelor's degree,Other,Fewer than 10 employees,Decreased slightly,More than 10 years,Some other way,Important,Other,"Basic laptop (Macbook),Laptop + Cloud service (AWS, Azure, GCE ...)",Relational data,Most of the time,10GB,"Decision Trees,Gradient Boosted Machines,Random Forests","IBM SPSS Modeler,Python,RapidMiner (free version)",,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Gradient Boosted Machines,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,Most of the time,Most of the time,Most of the time,Often,,,Most of the time,,,,Often,,,,,Sometimes,,Often,,,,,,,,,,,50,10,20,10,10,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,NA,Employed full-time,,,Yes,,Other,Fine,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Electrical Engineering,More than 10 years,"Computer Scientist,DBA/Database Engineer,Engineer,Machine Learning Engineer,Programmer,Researcher,Software Developer/Software Engineer",Work,20,20,50,0,10,0,"Recommendation Engines,Reinforcement learning,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Markov Logic Networks,Support Vector Machines (SVMs)",,Technology,,,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Build and/or run a machine learning service that operationally improves your product or workflows,"Laptop + Cloud service (AWS, Azure, GCE ...),Laptop or Workstation and local IT supported servers,Laptop or Workstation and private datacenters","Text data,Relational data",,,"Bayesian Techniques,Decision Trees,HMMs,Markov Logic Networks,Neural Networks","Amazon Machine Learning,Jupyter notebooks,MATLAB/Octave,Microsoft Azure Machine Learning,NoSQL,Python,Spark / MLlib,Unix shell / awk",Sometimes,,,,,,,,,,,,,,,,Often,,,,Often,Sometimes,,,,,Sometimes,,,,Most of the time,,,,,,,,,,Most of the time,,,,,,,Sometimes,,,,"Association Rules,Bayesian Techniques,Collaborative Filtering,Cross-Validation,Data Visualization,Decision Trees,HMMs,kNN and Other Clustering,Logistic Regression,Naive Bayes,Neural Networks,Recommender Systems,Segmentation,SVMs,Text Analytics,Time Series Analysis",,Sometimes,Often,,Often,Most of the time,Most of the time,Sometimes,,,,,Often,Sometimes,,Often,,Often,,Often,,,,Often,,Most of the time,,Sometimes,Most of the time,Often,,,,20,30,15,15,20,0,"Enough to code it again from scratch, albeit it may run slowly","Difficulties in deployment/scoring,Dirty data,Explaining data science to others,Limitations in the state of the art in machine learning,Limitations of tools,Need to coordinate with IT,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,Unavailability of/difficult access to data",,,,Sometimes,Sometimes,Often,,,,,,Often,Often,,Often,,,,Most of the time,,Often,,100% of projects,More external than internal,Central Insights Team,,,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Key-value store (e.g. Redis/Riak),Other tabular (e.g. Cassandra/BigTable/BigQuery/Redshift)",Share Drive/SharePoint,,"Generic non-cloud file sharing software (Email/Shared Server/etc.),Git",Rarely,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,University courses,35,50,5,0,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",A bachelor's degree,Other,100 to 499 employees,Stayed the same,Less than one year,"A friend, family member, or former colleague told me",Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Never,10MB,"Decision Trees,Ensemble Methods,Random Forests,Regression/Logistic Regression","R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,Most of the time,,,,,,,,,,"Logistic Regression,Random Forests",,,,,,,,,,,,,,,,Often,,,,,,,Sometimes,,,,,,,,,,,90,5,0,5,0,0,Enough to explain the algorithm to someone non-technical,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Singapore,34,Employed full-time,,,Yes,,Scientist/Researcher,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Doctoral degree,Computer Science,6 to 10 years,"Computer Scientist,Machine Learning Engineer,Predictive Modeler,Researcher",University courses,20,10,70,0,0,0,"Adversarial Learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Ensemble Methods,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",A master's degree,Government,100 to 499 employees,Increased slightly,6-10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Research that advances the state of the art of machine learning,"GPU accelerated Workstation,Laptop + Cloud service (AWS, Azure, GCE ...),Workstation + Cloud service",Other,,100GB,"CNNs,GANs,HMMs,Neural Networks,RNNs,SVMs","C/C++,Jupyter notebooks,MATLAB/Octave,Python,TensorFlow,Unix shell / awk",,,,Most of the time,,,,,,,,,,,,,Often,,,,Often,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,Often,,Most of the time,,,,"Bayesian Techniques,CNNs,Cross-Validation,GANs,HMMs,kNN and Other Clustering,Logistic Regression,Neural Networks,PCA and Dimensionality Reduction,RNNs,Segmentation,SVMs,Time Series Analysis",,,Often,Often,,Most of the time,,,,,Sometimes,,Often,Sometimes,,Often,,,,Most of the time,Often,,,,Most of the time,Often,,Sometimes,,Often,,,,10,40,30,10,10,0,"Enough to code it again from scratch, albeit it may run slowly","Lack of funds to buy useful datasets from external sources,Scaling data science solution up to full database,Team using multiple ad-hoc development environments such as Python/R/Java/etc.,The lack of a clear question to be answering or a clear direction to go in with the available data,Unavailability of/difficult access to data",,,,,,,,,,Most of the time,,,,,,,,Often,Most of the time,Often,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,27,Employed full-time,,,Yes,,DBA/Database Engineer,Fine,Employed by non-profit or NGO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,3 to 5 years,Programmer,"Online courses (coursera, udemy, edx, etc.)",20,80,0,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Decision Trees - Random Forests,Ensemble Methods,Logistic Regression",High school,Mix of fields,10 to 19 employees,Increased slightly,Less than one year,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Gaming Laptop (Laptop + CUDA capable GPU),Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Never,10MB,"Decision Trees,Ensemble Methods,Gradient Boosted Machines","Amazon Web services,Java,Jupyter notebooks,MATLAB/Octave,NoSQL,Python,R,SAS Base,SAS Enterprise Miner,Spark / MLlib,SQL,Tableau,TensorFlow",,Sometimes,,,,,,,,,,,,,Most of the time,,Most of the time,,,,Sometimes,,,,,,Most of the time,,,,Most of the time,,Most of the time,,,,,Often,Often,,Often,Most of the time,,,Most of the time,Sometimes,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Ensemble Methods,Logistic Regression",,,,,,Most of the time,Most of the time,Most of the time,Most of the time,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,20,20,10,30,20,0,Enough to run the code / standard library,"Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Lack of significant domain expert input",,Often,,,Most of the time,Most of the time,,,Most of the time,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Brazil,33,Employed full-time,,,Yes,,Other,Poorly,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,6 to 10 years,"Data Analyst,Software Developer/Software Engineer,Other","Online courses (coursera, udemy, edx, etc.)",50,50,0,0,0,0,Outlier detection (e.g. Fraud detection),Logistic Regression,A bachelor's degree,Other,"10,000 or more employees",Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Very important,Other,Laptop or Workstation and private datacenters,"Text data,Relational data",,,,"IBM Cognos,IBM SPSS Modeler",,,,,,,,,,Often,Rarely,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Data Visualization,,,,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,60,20,5,5,10,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Lack of funds to buy useful datasets from external sources,Limitations of tools,Privacy issues",Sometimes,Sometimes,,,Often,,,,,Often,,,Often,,,,Often,,,,,,10-25% of projects,Approximately half internal and half external,Standalone Team,,,,,,,,,,I do not want to share information about my salary/compensation,8,,,,,,,,,,,,,,,,,, +Female,People 's Republic of China,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Spain,22,Employed part-time,,,No,Yes,Other,Fine,Employed by a company that doesn't perform advanced analytics,Julia,Decision Trees,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,College/University,Kaggle,Personal Projects,Stack Overflow Q&A,Textbook,YouTube Videos",Very useful,,Very useful,,,,Somewhat useful,,,,,Somewhat useful,,Very useful,Very useful,,,Very useful,,< 1 year,,,,,Necessary,Nice to have,,,,,,,,,"Laptop + Cloud service (AWS, Azure, GCE ...),Traditional Workstation",2 - 10 hours,Online Courses and Certifications,No,Bachelor's degree,Mathematics or statistics,Less than a year,"Programmer,Other","Online courses (coursera, udemy, edx, etc.)",40,40,0,20,0,0,"Adversarial Learning,Computer Vision","Bayesian Techniques,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",Primary/elementary school,Other,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,22,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Engineering (non-computer focused),1 to 2 years,"Data Scientist,Engineer,Programmer,Software Developer/Software Engineer,Statistician",Self-taught,30,40,10,0,20,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis,Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Hidden Markov Models HMMs,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs","Some college/university study, no bachelor's degree",Technology,"10,000 or more employees",Stayed the same,Don't know,A career fair or on-campus recruiting event,Very important,Research that advances the state of the art of machine learning,GPU accelerated Workstation,"Text data,Relational data",Rarely,100GB,"Bayesian Techniques,CNNs,Decision Trees,HMMs,Neural Networks,Regression/Logistic Regression","Cloudera,MATLAB/Octave,Microsoft Azure Machine Learning,Oracle Data Mining/ Oracle R Enterprise,Python,R,Spark / MLlib,SQL,TensorFlow",,,,,Sometimes,,,,,,,,,,,,,,,,Rarely,Sometimes,,,,,,Rarely,,,,,Most of the time,,,,,,,,Often,Most of the time,,,,Sometimes,,,,,,"Bayesian Techniques,CNNs,Cross-Validation,Data Visualization,Decision Trees,Logistic Regression,Markov Logic Networks,Naive Bayes,Natural Language Processing,Neural Networks,Random Forests,SVMs",,,Sometimes,Often,,Most of the time,Most of the time,Most of the time,,,,,,,,Most of the time,Often,Often,Rarely,Most of the time,,,Often,,,,,Rarely,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Philippines,24,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,Mathematics or statistics,3 to 5 years,"Data Analyst,Data Scientist,Predictive Modeler,Researcher",Work,0,10,90,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",Neural Networks - RNNs,A bachelor's degree,Other,20 to 99 employees,Stayed the same,6-10 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,"Basic laptop (Macbook),Gaming Laptop (Laptop + CUDA capable GPU)",Relational data,Sometimes,1GB,"Neural Networks,Regression/Logistic Regression,RNNs","Hadoop/Hive/Pig,Jupyter notebooks,Python,SQL,TensorFlow",,,,,,,,,Most of the time,,,,,,,,Most of the time,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,Sometimes,,,,,,"Cross-Validation,Data Visualization,Neural Networks,RNNs",,,,,,Sometimes,Sometimes,,,,,,,,,,,,,Most of the time,,,,,Most of the time,,,,,,,,,40,50,0,10,0,0,Enough to run the code / standard library,I prefer not to say,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,33,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Electrical Engineering,3 to 5 years,Programmer,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,24,"Not employed, and not looking for work",Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Engineering (non-computer focused),,Other,University courses,NA,NA,NA,NA,NA,NA,,,A doctoral degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,21,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,I don't write code to analyze data,Other,Other,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,United Kingdom,27,Employed full-time,,,Yes,,Scientist/Researcher,Fine,"Employed by college or university,Employed by government",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,3 to 5 years,,University courses,90,0,0,10,0,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection)",,A bachelor's degree,Academic,I don't know,Increased slightly,Don't know,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,34,Employed full-time,,,No,Yes,Other,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,I did not complete any formal education past high school,,I don't write code to analyze data,,Self-taught,100,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Ukraine,26,Employed full-time,,,No,Yes,Programmer,Perfectly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Programmer,Software Developer/Software Engineer",Self-taught,30,70,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series",,A master's degree,Technology,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Chile,39,Employed full-time,,,Yes,,Other,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Master's degree,Other,3 to 5 years,Other,"Online courses (coursera, udemy, edx, etc.)",20,70,10,0,0,0,Recommendation Engines,"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests",A doctoral degree,Other,20 to 99 employees,Increased slightly,1-2 years,I was contacted directly by someone at the company (e.g. internal recruiter),Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),"Image data,Relational data",Sometimes,100MB,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Data Visualization,Decision Trees,kNN and Other Clustering,Logistic Regression,PCA and Dimensionality Reduction,Random Forests",,,,,,,Sometimes,Often,,,,,,Often,,Often,,,,,Sometimes,,Sometimes,,,,,,,,,,,50,30,10,10,0,0,Enough to explain the algorithm to someone non-technical,"Dirty data,Explaining data science to others,Maintaining responsible expectations about the potential impact of data science projects",,,,,Often,Often,,,,,,,,Sometimes,,,,,,,,,10-25% of projects,Entirely internal,Standalone Team,,,Row-oriented relational (e.g. MySQL/Microsoft SQL Server),"Commercial Data Platform,Email",,,Never,90000,,Has stayed about the same (has not increased or decreased more than 5%),6,,,,,,,,,,,,,,,,,, +Male,Poland,26,"Independent contractor, freelancer, or self-employed",,,Yes,,Computer Scientist,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,Software Developer/Software Engineer,"Online courses (coursera, udemy, edx, etc.)",10,60,0,30,0,0,Natural Language Processing,Hidden Markov Models HMMs,A master's degree,Internet-based,,,,,Important,Build prototypes to explore applying machine learning to new areas,Basic laptop (Macbook),Text data,Never,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"Natural Language Processing,Neural Networks",,,,,,,,,,,,,,,,,,,Most of the time,Sometimes,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,23,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Engineering (non-computer focused),Less than a year,Engineer,University courses,50,0,0,50,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,Employed full-time,,,Yes,,Computer Scientist,Fine,Employed by company that makes advanced analytic software,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Republic of China,30,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Engineering (non-computer focused),Less than a year,Programmer,Self-taught,60,20,20,0,0,0,Supervised Machine Learning (Tabular Data),"Bayesian Techniques,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Technology,100 to 499 employees,Increased significantly,Less than one year,I visited the company's Web site and found a job listing there,Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Text data,Rarely,10MB,"Bayesian Techniques,Regression/Logistic Regression,SVMs","C/C++,Python,Spark / MLlib,TensorFlow,Unix shell / awk",,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,,Sometimes,,,,,Rarely,,Sometimes,,,,"Bayesian Techniques,kNN and Other Clustering,Logistic Regression,Naive Bayes,SVMs",,,Sometimes,,,,,,,,,,,Sometimes,,Often,,Sometimes,,,,,,,,,,Often,,,,,,0,0,0,0,0,100,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Spain,41,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Doctoral degree,Mathematics or statistics,More than 10 years,Statistician,Self-taught,60,0,40,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Egypt,65,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Pakistan,35,Employed full-time,,,Yes,,Data Analyst,Poorly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Data Analyst,Data Miner,Data Scientist",University courses,10,30,10,40,10,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression,Neural Networks - CNNs",,Academic,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,44,Employed full-time,,,No,Yes,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,TensorFlow,Deep learning,Matlab,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Arxiv,Conferences,Kaggle,Online courses,Personal Projects,Podcasts,YouTube Videos",Somewhat useful,,,,Somewhat useful,,Somewhat useful,,,,Very useful,Somewhat useful,Somewhat useful,,,,,Somewhat useful,"No Free Hunch Blog,Other (Separate different answers with semicolon)",< 1 year,Nice to have,Nice to have,Nice to have,,Necessary,,,Nice to have,Necessary,Nice to have,,,,Coursera,Gaming Laptop (Laptop + CUDA capable GPU),2 - 10 hours,Online Courses and Certifications,Sort of (Explain more),Bachelor's degree,Computer Science,More than 10 years,"Computer Scientist,Data Analyst,Programmer,Researcher,Software Developer/Software Engineer",Work,0,50,40,10,0,0,"Computer Vision,Supervised Machine Learning (Tabular Data)",Logistic Regression,A bachelor's degree,Manufacturing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Vietnam,22,"Independent contractor, freelancer, or self-employed",,,Yes,,Scientist/Researcher,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Engineering (non-computer focused),Less than a year,"Data Analyst,Data Miner,Engineer,Programmer,Researcher,Software Developer/Software Engineer,Statistician","Online courses (coursera, udemy, edx, etc.)",90,0,2,6,2,0,Natural Language Processing,Bayesian Techniques,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Hungary,21,Employed part-time,,,Yes,,Programmer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Mathematics or statistics,1 to 2 years,,University courses,0,0,0,0,100,0,Outlier detection (e.g. Fraud detection),"Decision Trees - Gradient Boosted Machines,Logistic Regression",A doctoral degree,Financial,"1,000 to 4,999 employees",Increased significantly,6-10 years,A career fair or on-campus recruiting event,Important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Traditional Workstation,Text data,,10TB,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,25,Employed full-time,,,Yes,,Software Developer/Software Engineer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,3 to 5 years,"Computer Scientist,Machine Learning Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",50,30,20,0,0,0,"Natural Language Processing,Supervised Machine Learning (Tabular Data),Unsupervised Learning","Bayesian Techniques,Decision Trees - Random Forests,Logistic Regression,Neural Networks - CNNs,Support Vector Machines (SVMs)",A bachelor's degree,Technology,"10,000 or more employees",Increased slightly,1-2 years,A tech-specific job board,Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,"Image data,Text data,Relational data",Rarely,100MB,"CNNs,Decision Trees,Regression/Logistic Regression,SVMs",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,27,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Finland,38,Employed full-time,,,Yes,,Business Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,"Information technology, networking, or system administration",1 to 2 years,"Business Analyst,Programmer,Other","Online courses (coursera, udemy, edx, etc.)",10,70,20,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series","Decision Trees - Random Forests,Logistic Regression",A master's degree,CRM/Marketing,"10,000 or more employees",Stayed the same,3-5 years,A general-purpose job board,Very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Sometimes,,Regression/Logistic Regression,"Jupyter notebooks,Microsoft Excel Data Mining,Python,QlikView,R,SQL,Tableau",,,,,,,,,,,,,,,,,Sometimes,,,,,,Sometimes,,,,,,,,Sometimes,Often,Sometimes,,,,,,,,,Often,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Malaysia,55,"Independent contractor, freelancer, or self-employed",,,Yes,,Business Analyst,,,KNIME (free version),Text Mining,R,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),GitHub,Government website,University/Non-profit research group websites","Blogs,Conferences,Kaggle,Online courses,YouTube Videos",,Somewhat useful,,,Somewhat useful,,Very useful,,,,Very useful,,,,,,,Very useful,,,,,,,,,,,,,,,,,,,,Sort of (Explain more),Bachelor's degree,Other,More than 10 years,Business Analyst,Self-taught,60,20,20,0,0,0,Supervised Machine Learning (Tabular Data),"Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Logistic Regression,Support Vector Machines (SVMs)",,Manufacturing,,,,,Not very important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,"Text data,Relational data",Sometimes,100MB,"Gradient Boosted Machines,Random Forests,Regression/Logistic Regression","Java,Python,R",,,,,,,,,,,,,,,Sometimes,,,,,,,,,,,,,,,,Sometimes,,Most of the time,,,,,,,,,,,,,,,,,,,"Data Visualization,Random Forests,Time Series Analysis",,,,,,,Most of the time,,,,,,,,,,,,,,,,Often,,,,,,,Often,,,,60,10,10,10,10,0,Enough to explain the algorithm to someone non-technical,"Difficulties in deployment/scoring,Lack of data science talent in the organization,Maintaining responsible expectations about the potential impact of data science projects,Need to coordinate with IT",,,,Sometimes,,,,,Sometimes,,,,,Often,Often,,,,,,,,51-75% of projects,Approximately half internal and half external,Business Department,Published prices,Format,"Flat files not in a database or cache (e.g. CSV, JSON, XML, PNG, MPG),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)",Email,,,Sometimes,,,I do not want to share information about my salary/compensation,6,,,,,,,,,,,,,,,,,, +Female,Iran,22,"Independent contractor, freelancer, or self-employed",,,No,Yes,Statistician,Poorly,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,Mathematics or statistics,I don't write code to analyze data,I haven't started working yet,Self-taught,80,0,0,10,0,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Israel,42,Employed part-time,,,Yes,,Programmer,Poorly,"Employed by company that makes advanced analytic software,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Engineer,Programmer,Software Developer/Software Engineer",Work,0,0,100,0,0,0,Recommendation Engines,,High school,Mix of fields,"10,000 or more employees",Stayed the same,Don't know,Some other way,Not at all important,Analyze and understand data to influence product or business decisions,Traditional Workstation,Text data,,10GB,"Decision Trees,Random Forests,Regression/Logistic Regression,Other","Amazon Web services,Cloudera,Python,Spark / MLlib,Tableau",,Often,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,Often,,,,Sometimes,,,,,,,"Cross-Validation,Data Visualization,Decision Trees,Simulation",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Germany,32,Employed part-time,,,Yes,,Scientist/Researcher,Fine,Employed by college or university,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,Less than a year,"Computer Scientist,Software Developer/Software Engineer",University courses,34,33,0,33,0,0,"Computer Vision,Unsupervised Learning","Decision Trees - Random Forests,Support Vector Machines (SVMs)",A master's degree,Academic,10 to 19 employees,Increased significantly,1-2 years,"A friend, family member, or former colleague told me",Important,Build prototypes to explore applying machine learning to new areas,Laptop or Workstation and private datacenters,Image data,Rarely,1TB,"CNNs,Ensemble Methods,Random Forests,SVMs","Jupyter notebooks,NoSQL,Python,R,SQL,TensorFlow",,,,,,,,,,,,,,,,,Often,,,,,,,,,,Sometimes,,,,Most of the time,,Rarely,,,,,,,,,Often,,,,Most of the time,,,,,,"CNNs,Cross-Validation,Decision Trees,GANs,Random Forests,SVMs",,,,Most of the time,,Most of the time,,Rarely,,,Rarely,,,,,,,,,,,,Sometimes,,,,,Most of the time,,,,,,30,30,10,15,15,0,Enough to explain the algorithm to someone non-technical,"Lack of data science talent in the organization,Privacy issues,Unavailability of/difficult access to data",,,,,,,,,Sometimes,,,,,,,,Sometimes,,,,Often,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Poland,39,Employed full-time,,,No,Yes,Other,Perfectly,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Some college/university study without earning a bachelor's degree,,Less than a year,Business Analyst,University courses,20,10,0,50,20,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Norway,34,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Physics,1 to 2 years,"Business Analyst,Data Analyst",Self-taught,70,10,0,0,20,0,Natural Language Processing,Support Vector Machines (SVMs),A bachelor's degree,Financial,20 to 99 employees,Stayed the same,,I was contacted directly by someone at the company (e.g. internal recruiter),Very important,Analyze and understand data to influence product or business decisions,Basic laptop (Macbook),Relational data,Always,1GB,"Decision Trees,Regression/Logistic Regression","Microsoft Excel Data Mining,Python,SQL",,,,,,,,,,,,,,,,,,,,,,,Often,,,,,,,,Most of the time,,,,,,,,,,,Most of the time,,,,,,,,,,"Data Visualization,Decision Trees,Logistic Regression,Simulation",,,,,,,Often,Sometimes,,,,,,,,Often,,,,,,,,,,,Most of the time,,,,,,,10,10,50,10,20,0,Enough to explain the algorithm to someone non-technical,"Company politics / Lack of management/financial support for a data science team,Data Science results not used by business decision makers,Dirty data,Explaining data science to others,Lack of data science talent in the organization,Limitations of tools,Organization is small and cannot afford a data science team",Most of the time,Sometimes,,,Most of the time,Often,,,Often,,,,Most of the time,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,"Data Analyst,Researcher","Online courses (coursera, udemy, edx, etc.)",25,70,5,0,0,0,"Computer Vision,Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Survival Analysis","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Evolutionary Approaches,Logistic Regression",A bachelor's degree,Financial,"10,000 or more employees",Decreased slightly,More than 10 years,I was contacted directly by someone at the company (e.g. internal recruiter),Somewhat important,Build and/or run a machine learning service that operationally improves your product or workflows,Basic laptop (Macbook),Text data,Never,,"Decision Trees,Random Forests,Regression/Logistic Regression","IBM SPSS Statistics,R,SAS Base,SQL,Tableau",,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,Sometimes,,,,Sometimes,,,Sometimes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,United States,NA,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,,3 to 5 years,"Business Analyst,Data Analyst,Data Scientist,Operations Research Practitioner,Statistician","Online courses (coursera, udemy, edx, etc.)",35,65,0,0,0,0,"Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - RNNs",,Government,,,,,Very important,Build and/or run a machine learning service that operationally improves your product or workflows,Workstation + Cloud service,Relational data,Sometimes,100MB,"Decision Trees,Gradient Boosted Machines,Neural Networks,Random Forests,Regression/Logistic Regression","Amazon Machine Learning,Amazon Web services,C/C++,Java,NoSQL,Python,R,SQL",Sometimes,Sometimes,,Rarely,,,,,,,,,,,Sometimes,,,,,,,,,,,,Sometimes,,,,Sometimes,,Most of the time,,,,,,,,,Sometimes,,,,,,,,,,"Bayesian Techniques,Cross-Validation,Data Visualization,Decision Trees,Gradient Boosted Machines,Logistic Regression,Naive Bayes,Neural Networks,Prescriptive Modeling,Random Forests",,,Sometimes,,,Sometimes,Often,Often,,,,Often,,,,Often,,Often,,Often,,Often,Often,,,,,,,,,,,35,25,10,10,20,0,"Enough to code it again from scratch, albeit it may run slowly",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,24,Employed full-time,,,Yes,,Business Analyst,Fine,"Employed by professional services/consulting firm,Employed by a company that doesn't perform advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Computer Science,1 to 2 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Other","Online courses (coursera, udemy, edx, etc.)",15,35,35,5,10,0,"Adversarial Learning,Computer Vision,Machine Translation,Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines,Reinforcement learning,Speech Recognition,Supervised Machine Learning (Tabular Data),Survival Analysis,Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Markov Logic Networks",A master's degree,Technology,20 to 99 employees,Increased slightly,1-2 years,A general-purpose job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,24,"Not employed, and not looking for work",Yes,"Yes, I'm focused on learning mostly data science skills",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Computer Science,,"Computer Scientist,Data Scientist,Software Developer/Software Engineer",Kaggle competitions,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,32,Employed full-time,,,No,Yes,Other,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Mathematics or statistics,Less than a year,,Self-taught,40,0,40,20,0,0,,,,Mix of fields,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,NA,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by government,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,,3 to 5 years,"Data Analyst,Researcher",University courses,10,20,30,40,0,0,"Supervised Machine Learning (Tabular Data),Survival Analysis","Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Logistic Regression,Support Vector Machines (SVMs)",A bachelor's degree,Government,"1,000 to 4,999 employees",Increased slightly,More than 10 years,I visited the company's Web site and found a job listing there,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,South Korea,45,"Independent contractor, freelancer, or self-employed",,,Yes,,Data Scientist,Fine,Self-employed,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Master's degree,A social science,More than 10 years,"Data Analyst,Researcher",Work,40,30,30,0,0,0,Other (please specify; separate by semi-colon),Decision Trees - Random Forests,High school,Financial,,,,,Somewhat important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Rarely,,"Decision Trees,Ensemble Methods","IBM SPSS Modeler,Python,SAS Base,SAS Enterprise Miner,SQL",,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,Rarely,,,,,,,Most of the time,Most of the time,,,Most of the time,,,,,,,,,,"A/B Testing,Decision Trees,Segmentation",Often,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,Often,,,,,,,,70,10,10,5,5,0,Enough to run the code / standard library,"Company politics / Lack of management/financial support for a data science team,Dirty data",Often,,,,Often,,,,,,,,,,,,,,,,,,10-25% of projects,Do not know,Central Insights Team,NA,NA,Graph (e.g. GraphBase/Neo4j),Commercial Data Platform,,Other,Sometimes,10000,USD,I am not currently employed,6,,,,,,,,,,,,,,,,,, +Male,India,33,Employed full-time,,,Yes,,Data Scientist,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bachelor's degree,"Information technology, networking, or system administration",More than 10 years,"Business Analyst,Data Analyst,Operations Research Practitioner,Programmer,Software Developer/Software Engineer",Work,30,20,50,0,0,0,Supervised Machine Learning (Tabular Data),Logistic Regression,A master's degree,Technology,"10,000 or more employees",Increased slightly,3-5 years,A career fair or on-campus recruiting event,Very important,Analyze and understand data to influence product or business decisions,"Laptop + Cloud service (AWS, Azure, GCE ...)","Text data,Relational data",Sometimes,,,"Microsoft Azure Machine Learning,Microsoft Excel Data Mining,Microsoft SQL Server Data Mining,R",,,,,,,,,,,,,,,,,,,,,,Often,Often,,Often,,,,,,,,Often,,,,,,,,,,,,,,,,,,,"Bayesian Techniques,Data Visualization,Naive Bayes",,,Rarely,,,,Most of the time,,,,,,,,,,,Rarely,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,25,Employed full-time,,,Yes,,Data Analyst,Fine,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Other,1 to 2 years,Data Analyst,University courses,40,0,30,30,0,0,Computer Vision,,I prefer not to answer,Financial,,,,,Important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Relational data,Most of the time,,,"R,SQL",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sometimes,,,,,,,,,Most of the time,,,,,,,,,,kNN and Other Clustering,,,,,,,,,,,,,,Often,,,,,,,,,,,,,,,,,,,,50,0,0,20,30,0,Enough to explain the algorithm to someone non-technical,I prefer not to say,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Other,25,Employed full-time,,,Yes,,Data Analyst,Poorly,"Employed by professional services/consulting firm,Employed by a company that performs advanced analytics",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Mathematics or statistics,1 to 2 years,Data Analyst,"Online courses (coursera, udemy, edx, etc.)",30,40,10,0,20,0,"Natural Language Processing,Outlier detection (e.g. Fraud detection),Recommendation Engines","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Gradient Boosting,Logistic Regression,Support Vector Machines (SVMs)",I prefer not to answer,Financial,500 to 999 employees,Stayed the same,3-5 years,An external recruiter or headhunter,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,26,"Not employed, but looking for work",,,,,,,,Python,Neural Nets,Python,"Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),University/Non-profit research group websites","College/University,Conferences,Friends network,Kaggle,Non-Kaggle online communities,Official documentation,Online courses,Personal Projects,Stack Overflow Q&A,Textbook,Tutoring/mentoring,YouTube Videos",,,Somewhat useful,,Somewhat useful,Somewhat useful,Very useful,,Very useful,Somewhat useful,Very useful,Very useful,,Very useful,Very useful,,Very useful,Very useful,Becoming a Data Scientist Podcast,< 1 year,Nice to have,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Necessary,Nice to have,Nice to have,Necessary,Unnecessary,Unnecessary,Unnecessary,Coursera,Laptop or Workstation and local IT supported servers,0 - 1 hour,Kaggle Competitions,Yes,Master's degree,Management information systems,Less than a year,Programmer,Kaggle competitions,0,0,0,0,100,0,Supervised Machine Learning (Tabular Data),Evolutionary Approaches,A bachelor's degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,India,28,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Computer Science,1 to 2 years,I haven't started working yet,Self-taught,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Australia,NA,Employed full-time,,,Yes,,Other,Poorly,Employed by a company that doesn't perform advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,Electrical Engineering,More than 10 years,"Business Analyst,Computer Scientist,Data Miner,Data Scientist,Engineer,Predictive Modeler,Programmer,Researcher,Software Developer/Software Engineer",Self-taught,80,10,10,0,0,0,"Outlier detection (e.g. Fraud detection),Supervised Machine Learning (Tabular Data),Time Series,Unsupervised Learning","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Gradient Boosting,Logistic Regression,Neural Networks - CNNs,Neural Networks - GANs,Neural Networks - RNNs,Support Vector Machines (SVMs)",,Mix of fields,"5,000 to 9,999 employees",Stayed the same,3-5 years,An external recruiter or headhunter,Not at all important,Analyze and understand data to influence product or business decisions,Laptop or Workstation and private datacenters,Other,,,,"C/C++,Hadoop/Hive/Pig,Java,Julia,Orange,Perl,R,Salfrod Systems CART/MARS/TreeNet/RF/SPM,Spark / MLlib,SQL,Stan,TensorFlow,TIBCO Spotfire,Unix shell / awk",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Indonesia,30,Employed full-time,,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,21,Employed full-time,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,34,Employed full-time,,,Yes,,DBA/Database Engineer,Poorly,Employed by a company that performs advanced analytics,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Master's degree,Electrical Engineering,3 to 5 years,"DBA/Database Engineer,Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",20,30,50,0,0,0,"Recommendation Engines,Supervised Machine Learning (Tabular Data)","Bayesian Techniques,Decision Trees - Gradient Boosted Machines,Decision Trees - Random Forests,Ensemble Methods,Evolutionary Approaches,Logistic Regression","Some college/university study, no bachelor's degree",Financial,"10,000 or more employees",Increased slightly,1-2 years,A tech-specific job board,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,People 's Republic of China,23,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,Bachelor's degree,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Other,24,"Not employed, but looking for work",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,No,Master's degree,Management information systems,Less than a year,"Researcher,I haven't started working yet",University courses,20,10,10,50,0,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Indonesia,25,Employed full-time,,,Yes,,Programmer,Fine,Employed by a company that doesn't perform advanced analytics,Jupyter notebooks,Bayesian Methods,Python,Dataset aggregator/platform (i.e. Socrata/Kaggle Datasets/data.world/etc.),"Kaggle,Non-Kaggle online communities,Online courses,Stack Overflow Q&A,YouTube Videos",,,,,,,Somewhat useful,,Somewhat useful,,Somewhat useful,,,Somewhat useful,,,,Somewhat useful,,,,,,,,,,,,,,,,,,,,No,Bachelor's degree,Computer Science,1 to 2 years,"Programmer,Software Developer/Software Engineer","Online courses (coursera, udemy, edx, etc.)",10,75,5,0,10,0,,,"Some college/university study, no bachelor's degree",Mix of fields,10 to 19 employees,Stayed the same,1-2 years,I visited the company's Web site and found a job listing there,Somewhat important,"Build and/or run the data infrastructure that your business uses for storing, analyzing, and operationalizing data",Basic laptop (Macbook),"Text data,Relational data",,,,Python,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Most of the time,,,,,,,,,,,,,,,,,,,,,"kNN and Other Clustering,Logistic Regression",,,,,,,,,,,,,,Often,,Sometimes,,,,,,,,,,,,,,,,,,50,30,0,20,0,0,Enough to tune the parameters properly,"Lack of data science talent in the organization,Organization is small and cannot afford a data science team",,,,,,,,,Sometimes,,,,,,,Sometimes,,,,,,,Less than 10% of projects,More internal than external,IT Department,,,"Key-value store (e.g. Redis/Riak),Row-oriented relational (e.g. MySQL/Microsoft SQL Server)","Email,Share Drive/SharePoint",,Git,Rarely,,IDR,I do not want to share information about my salary/compensation,7,,,,,,,,,,,,,,,,,, +Female,Taiwan,25,Employed part-time,,,No,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Female,Singapore,16,I prefer not to say,Yes,"Yes, but data science is a small part of what I'm focused on learning",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I prefer not to answer,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, +Male,Japan,27,Employed full-time,,,No,Yes,Programmer,Fine,Employed by professional services/consulting firm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,I did not complete any formal education past high school,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NA,NA,NA,NA,NA,NA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, \ No newline at end of file